lodi
and an example datasetFor convenience we have included a example dataset called
toy_data
, which can be loaded by running
data("toy_data")
. Let’s look at the first 10 entries of the
example dataset.
library(lodi)
data("toy_data")
head(toy_data, n = 10)
#> id case_cntrl poll smoking gender batch1 lod
#> 1 13707 1 3.588607 0 1 0 0.65
#> 2 18641 1 NA 0 0 0 0.65
#> 3 27407 1 2.619124 1 0 0 0.65
#> 4 45462 1 7.203193 0 1 1 0.80
#> 5 50357 1 7.336160 1 1 1 0.80
#> 6 59168 1 NA 0 0 0 0.65
#> 7 61477 1 5.136974 0 1 0 0.65
#> 8 76585 1 11.794483 1 1 0 0.65
#> 9 80681 1 1.280289 0 0 1 0.80
#> 10 84391 1 5.480510 1 1 0 0.65
id
corresponds to the study ID and is unimportant for
the purposes of this example. case_cntrl
takes values 0 or
1, where 1 indicates that the subject has the disease of interest and 0
indicates that the subject is a healthy control. poll
is
the environmental exposure of interest, where NA
indicates
that the concentration is below the limit of detection (LOD).
smoking
and gender
are covariates that we will
include in the imputation model. lod
corresponds to the
limit of detection for each individual’s batch. Finally,
batch1
takes two values; 1 if the subject’s biosample was
assayed in batch 1 and 0 if the subject’s biosample was assayed in batch
2.
The function that performs censored likelihood multiple imputation is
the clmi
function. For more details see
help(clmi)
.
clmi.out <- clmi(formula = log(poll) ~ case_cntrl + smoking + gender,
df = toy_data, lod = lod, seed = 12345, n.imps = 5)
The main input to clmi
is a R formula. The left hand
side of the formula must be the exposure, and the right hand side must
be the outcome followed by the covariates you want to include in the
imputation model. The order of variables on the right hand side matters.
You can apply a transformation to the exposure by applying a univariate
function to it, as done above. The lod
argument refers to
the name of the lod variable in your data.frame.
The imputed datasets can be extracted as a list using
$imputed.dfs
:
The pool.clmi
function takes the output generated by the
clmi
function, fits outcome models on each of the imputed
datasets, and pools inference across outcome models using Rubin’s rules.
For details see help(pool.clmi)
.
results <- pool.clmi(formula = case_cntrl ~ poll_transform_imputed + smoking +
gender, clmi.out = clmi.out, type = logistic)
In pool.clmi
, formula
contains the outcome
variable on the left hand side and the first variable on the right hand
side should be the imputed exposure variable. clmi
outputs
the exposure variable as
((your-exposure))_transform_imputed
. In this example, our
exposure is poll
, so the name of the imputed variable is
poll_transform_imputed
.
type
argument. If you have binary outcome data (as in the current example)
use type = logistic
so that the model fit on the imputed
datasets are logistic regression models. If you have continuous outcome
data use regression.type = linear
so that models fit on the
imputed datasets are linear regression models.To display the pooled results use $output
:
results$output
#> est se df p.values LCL.95
#> (Intercept) -0.6021156 0.3338019 93.75401 0.07447254 -1.26490980
#> poll_transform_imputed 0.3619278 0.2192230 86.31785 0.10238131 -0.07385026
#> smoking -0.3245100 0.5245765 93.01341 0.53768319 -1.36621287
#> gender 0.8611192 0.4904714 93.65164 0.08240975 -0.11277055
#> UCL.95
#> (Intercept) 0.06067861
#> poll_transform_imputed 0.79770585
#> smoking 0.71719295
#> gender 1.83500898
If you want to look at the individual regressions fit on each imputed
dataset use $regression.summaries
Boss J, Mukherjee B, Ferguson KK, et al. Estimating outcome-exposure associations when exposure biomarker detection limits vary across batches. Epidemiology. 2019;30(5):746-755. 10.1097/EDE.0000000000001052
##Contact information
If you would like to report a bug in the code, ask questions, or send
requests/suggestions e-mail Jonathan Boss at
[email protected]
.