---
title: "Censored Likelihood Multiple Imputation in R"
author: "Jonathan Boss"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Censored Likelihood Multiple Imputation in R}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Loading `lodi` and an example dataset

For convenience we have included a example dataset called `toy_data`, which can
be loaded by running `data("toy_data")`. Let's look at the first 10 entries
of the example dataset.

```{r}
library(lodi)

data("toy_data")

head(toy_data, n = 10)
```

`id` corresponds to the study ID and is unimportant for the purposes of this
example. `case_cntrl` takes values 0 or 1, where 1 indicates that
the subject has the disease of interest and 0 indicates that the subject is a
healthy control. `poll` is the environmental exposure of interest, where `NA`
indicates that the concentration is below the limit of detection (LOD).
`smoking` and `gender` are covariates that we will include in the imputation
model. `lod` corresponds to the limit of detection for each
individual's batch. Finally, `batch1` takes two values; 1 if the subject's
biosample was assayed in batch 1 and 0 if the subject's biosample was assayed
in batch 2.

## Implementing Censored Likelihood Multiple Imputation

The function that performs censored likelihood multiple imputation is the
`clmi` function. For more details see `help(clmi)`.

```{r}
clmi.out <- clmi(formula = log(poll) ~ case_cntrl + smoking + gender,
                   df = toy_data, lod = lod, seed = 12345, n.imps = 5)
```

The main input to `clmi` is a R formula. The left hand side of the formula must
be the exposure, and the right hand side must be the outcome followed by the
covariates you want to include in the imputation model. The order of variables
on the right hand side matters. You can apply a transformation to the exposure
by applying a univariate function to it, as done above. The `lod` argument
refers to the name of the lod variable in your data.frame.

The imputed datasets can be extracted as a list using `$imputed.dfs`:

```{r, eval=FALSE}
extract.imputed.dfs <- clmi.out$imputed.dfs
```
## Fit and pool outcomes models

The `pool.clmi` function takes the output generated by the `clmi` function, fits
outcome models on each of the imputed datasets, and pools inference across
outcome models using Rubin's rules. For details see `help(pool.clmi)`.

```{r}
results <- pool.clmi(formula = case_cntrl ~ poll_transform_imputed + smoking +
                                 gender, clmi.out = clmi.out, type = logistic)
```
In `pool.clmi`, `formula` contains the outcome variable on the left hand side
and the first variable on the right hand side should be the imputed exposure
variable. `clmi` outputs the exposure variable as
`((your-exposure))_transform_imputed`. In this example, our exposure is `poll`,
so the name of the imputed variable is `poll_transform_imputed`.

- Note: There are two valid options for the `type` argument. If you have binary
  outcome data (as in the current example) use `type = logistic` so that the
  model fit on the imputed datasets are logistic regression models. If you have
  continuous outcome data use `regression.type = linear` so that models fit on
  the imputed datasets are linear regression models.

To display the pooled results use `$output`:

```{r}
results$output
```

If you want to look at the individual regressions fit on each imputed dataset
use `$regression.summaries`

```{r, eval=FALSE}
results$regression.summaries
```

## Reference
Boss J, Mukherjee B, Ferguson KK, et al. Estimating outcome-exposure
associations when exposure biomarker detection limits vary across batches.
*Epidemiology*. 2019;30(5):746-755. [10.1097/EDE.0000000000001052](https://doi.org/10.1097/EDE.0000000000001052)

##Contact information

If you would like to report a bug in the code, ask questions, or send
requests/suggestions e-mail Jonathan Boss at `bossjona@umich.edu`.