Censored Likelihood Multiple Imputation in R

Loading lodi and an example dataset

For convenience we have included a example dataset called toy_data, which can be loaded by running data("toy_data"). Let’s look at the first 10 entries of the example dataset.

library(lodi)

data("toy_data")

head(toy_data, n = 10)
#>       id case_cntrl      poll smoking gender batch1  lod
#> 1  13707          1  3.588607       0      1      0 0.65
#> 2  18641          1        NA       0      0      0 0.65
#> 3  27407          1  2.619124       1      0      0 0.65
#> 4  45462          1  7.203193       0      1      1 0.80
#> 5  50357          1  7.336160       1      1      1 0.80
#> 6  59168          1        NA       0      0      0 0.65
#> 7  61477          1  5.136974       0      1      0 0.65
#> 8  76585          1 11.794483       1      1      0 0.65
#> 9  80681          1  1.280289       0      0      1 0.80
#> 10 84391          1  5.480510       1      1      0 0.65

id corresponds to the study ID and is unimportant for the purposes of this example. case_cntrl takes values 0 or 1, where 1 indicates that the subject has the disease of interest and 0 indicates that the subject is a healthy control. poll is the environmental exposure of interest, where NA indicates that the concentration is below the limit of detection (LOD). smoking and gender are covariates that we will include in the imputation model. lod corresponds to the limit of detection for each individual’s batch. Finally, batch1 takes two values; 1 if the subject’s biosample was assayed in batch 1 and 0 if the subject’s biosample was assayed in batch 2.

Implementing Censored Likelihood Multiple Imputation

The function that performs censored likelihood multiple imputation is the clmi function. For more details see help(clmi).

clmi.out <- clmi(formula = log(poll) ~ case_cntrl + smoking + gender,
                   df = toy_data, lod = lod, seed = 12345, n.imps = 5)

The main input to clmi is a R formula. The left hand side of the formula must be the exposure, and the right hand side must be the outcome followed by the covariates you want to include in the imputation model. The order of variables on the right hand side matters. You can apply a transformation to the exposure by applying a univariate function to it, as done above. The lod argument refers to the name of the lod variable in your data.frame.

The imputed datasets can be extracted as a list using $imputed.dfs:

extract.imputed.dfs <- clmi.out$imputed.dfs

Fit and pool outcomes models

The pool.clmi function takes the output generated by the clmi function, fits outcome models on each of the imputed datasets, and pools inference across outcome models using Rubin’s rules. For details see help(pool.clmi).

results <- pool.clmi(formula = case_cntrl ~ poll_transform_imputed + smoking +
                                 gender, clmi.out = clmi.out, type = logistic)

In pool.clmi, formula contains the outcome variable on the left hand side and the first variable on the right hand side should be the imputed exposure variable. clmi outputs the exposure variable as ((your-exposure))_transform_imputed. In this example, our exposure is poll, so the name of the imputed variable is poll_transform_imputed.

  • Note: There are two valid options for the type argument. If you have binary outcome data (as in the current example) use type = logistic so that the model fit on the imputed datasets are logistic regression models. If you have continuous outcome data use regression.type = linear so that models fit on the imputed datasets are linear regression models.

To display the pooled results use $output:

results$output
#>                               est        se       df   p.values      LCL.95
#> (Intercept)            -0.6021156 0.3338019 93.75401 0.07447254 -1.26490980
#> poll_transform_imputed  0.3619278 0.2192230 86.31785 0.10238131 -0.07385026
#> smoking                -0.3245100 0.5245765 93.01341 0.53768319 -1.36621287
#> gender                  0.8611192 0.4904714 93.65164 0.08240975 -0.11277055
#>                            UCL.95
#> (Intercept)            0.06067861
#> poll_transform_imputed 0.79770585
#> smoking                0.71719295
#> gender                 1.83500898

If you want to look at the individual regressions fit on each imputed dataset use $regression.summaries

results$regression.summaries

Reference

Boss J, Mukherjee B, Ferguson KK, et al. Estimating outcome-exposure associations when exposure biomarker detection limits vary across batches. Epidemiology. 2019;30(5):746-755. 10.1097/EDE.0000000000001052

##Contact information

If you would like to report a bug in the code, ask questions, or send requests/suggestions e-mail Jonathan Boss at [email protected].