Skip to content

Commit 652abbd

Browse files
committed
make vignette available via utils::browseVignettes() command
1 parent 6873651 commit 652abbd

19 files changed

+4004
-3
lines changed

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,5 +34,4 @@ DoubleML.Rproj
3434
# documentation files
3535
docs/
3636
docs
37-
doc
3837
inst/doc

doc/Introduction_to_DoubleML.R

Lines changed: 404 additions & 0 deletions
Large diffs are not rendered by default.

vignettes/Introduction_to_DoubleML/Introduction_to_DoubleML.Rmd renamed to doc/Introduction_to_DoubleML.Rmd

File renamed without changes.

doc/Introduction_to_DoubleML.html

Lines changed: 1529 additions & 0 deletions
Large diffs are not rendered by default.

doc/getstarted.R

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
## ----setup, include=FALSE-----------------------------------------------------
2+
knitr::opts_chunk$set(echo = TRUE)
3+
knitr::opts_chunk$set(eval = TRUE)
4+
5+
## ---- eval = FALSE------------------------------------------------------------
6+
# remotes::install_github("DoubleML/doubleml-for-r")
7+
8+
## ---- message=FALSE, warning=FALSE--------------------------------------------
9+
library(DoubleML)
10+
11+
## -----------------------------------------------------------------------------
12+
library(DoubleML)
13+
14+
# Load bonus data
15+
df_bonus = fetch_bonus(return_type="data.table")
16+
head(df_bonus)
17+
18+
# Simulate data
19+
set.seed(3141)
20+
n_obs = 500
21+
n_vars = 100
22+
theta = 3
23+
X = matrix(rnorm(n_obs*n_vars), nrow=n_obs, ncol=n_vars)
24+
d = X[,1:3]%*%c(5,5,5) + rnorm(n_obs)
25+
y = theta*d + X[, 1:3]%*%c(5,5,5) + rnorm(n_obs)
26+
27+
## -----------------------------------------------------------------------------
28+
# Specify the data and variables for the causal model
29+
dml_data_bonus = DoubleMLData$new(df_bonus,
30+
y_col = "inuidur1",
31+
d_cols = "tg",
32+
x_cols = c("female", "black", "othrace", "dep1", "dep2",
33+
"q2", "q3", "q4", "q5", "q6", "agelt35", "agegt54",
34+
"durable", "lusd", "husd"))
35+
print(dml_data_bonus)
36+
37+
# matrix interface to DoubleMLData
38+
dml_data_sim = double_ml_data_from_matrix(X=X, y=y, d=d)
39+
dml_data_sim
40+
41+
## -----------------------------------------------------------------------------
42+
library(mlr3)
43+
library(mlr3learners)
44+
# surpress messages from mlr3 package during fitting
45+
lgr::get_logger("mlr3")$set_threshold("warn")
46+
47+
learner = lrn("regr.ranger", num.trees=500, mtry=floor(sqrt(n_vars)), max.depth=5, min.node.size=2)
48+
ml_g_bonus = learner$clone()
49+
ml_m_bonus = learner$clone()
50+
51+
learner = lrn("regr.glmnet", lambda = sqrt(log(n_vars)/(n_obs)))
52+
ml_g_sim = learner$clone()
53+
ml_m_sim = learner$clone()
54+
55+
## -----------------------------------------------------------------------------
56+
set.seed(3141)
57+
obj_dml_plr_bonus = DoubleMLPLR$new(dml_data_bonus, ml_g=ml_g_bonus, ml_m=ml_m_bonus)
58+
obj_dml_plr_bonus$fit()
59+
print(obj_dml_plr_bonus)
60+
61+
obj_dml_plr_sim = DoubleMLPLR$new(dml_data_sim, ml_g=ml_g_sim, ml_m=ml_m_sim)
62+
obj_dml_plr_sim$fit()
63+
print(obj_dml_plr_sim)
64+

doc/getstarted.Rmd

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
---
2+
title: "Getting Started with DoubleML"
3+
output: rmarkdown::html_vignette
4+
vignette: >
5+
%\VignetteIndexEntry{Getting Started with DoubleML}
6+
%\VignetteEngine{knitr::rmarkdown}
7+
%\VignetteEncoding{UTF-8}
8+
---
9+
10+
```{r setup, include=FALSE}
11+
knitr::opts_chunk$set(echo = TRUE)
12+
knitr::opts_chunk$set(eval = TRUE)
13+
```
14+
15+
The purpose of the following case-studies is to demonstrate the core functionalities of `DoubleML`.
16+
17+
18+
## Installation
19+
20+
The **DoubleML** package for R can be downloaded using (requires previous installation of the [`remotes` package](https://remotes.r-lib.org/index.html)).
21+
22+
```{r, eval = FALSE}
23+
remotes::install_github("DoubleML/doubleml-for-r")
24+
```
25+
26+
Load the package after completed installation.
27+
28+
```{r, message=FALSE, warning=FALSE}
29+
library(DoubleML)
30+
```
31+
32+
The python package `DoubleML` is available via the github repository. For more information, please visit our user guide.
33+
34+
## Data
35+
36+
For our case study we download the Bonus data set from the Pennsylvania Reemployment Bonus experiment and as a second example we simulate data from a partially linear regression model.
37+
38+
```{r}
39+
library(DoubleML)
40+
41+
# Load bonus data
42+
df_bonus = fetch_bonus(return_type="data.table")
43+
head(df_bonus)
44+
45+
# Simulate data
46+
set.seed(3141)
47+
n_obs = 500
48+
n_vars = 100
49+
theta = 3
50+
X = matrix(rnorm(n_obs*n_vars), nrow=n_obs, ncol=n_vars)
51+
d = X[,1:3]%*%c(5,5,5) + rnorm(n_obs)
52+
y = theta*d + X[, 1:3]%*%c(5,5,5) + rnorm(n_obs)
53+
```
54+
55+
56+
## The causal model
57+
58+
\begin{align*}
59+
Y = D \theta_0 + g_0(X) + \zeta, & &\mathbb{E}(\zeta | D,X) = 0, \\
60+
D = m_0(X) + V, & &\mathbb{E}(V | X) = 0,
61+
\end{align*}
62+
where $Y$ is the outcome variable and $D$ is the policy variable of interest.
63+
The high-dimensional vector $X = (X_1, \ldots, X_p)$ consists of other confounding covariates,
64+
and $\zeta$ and $V$ are stochastic errors.
65+
66+
## The data-backend `DoubleMLData`
67+
68+
`DoubleML` provides interfaces to objects of class [`data.table`](https://rdatatable.gitlab.io/data.table/) as well as R base classes `data.frame` and `matrix`. Details on the data-backend and the interfaces can be found in the user guide. The `DoubleMLData` class serves as data-backend and can be initialized from a dataframe by specifying the column `y_col="inuidur1"` serving as outcome variable $Y$, the column(s) `d_cols = "tg"` serving as treatment variable $D$ and the columns `x_cols=c("female", "black", "othrace", "dep1", "dep2", "q2", "q3", "q4", "q5", "q6", "agelt35", "agegt54", "durable", "lusd", "husd")` specifying the confounders. Alternatively a matrix interface can be used as shown below for the simulated data.
69+
70+
71+
```{r}
72+
# Specify the data and variables for the causal model
73+
dml_data_bonus = DoubleMLData$new(df_bonus,
74+
y_col = "inuidur1",
75+
d_cols = "tg",
76+
x_cols = c("female", "black", "othrace", "dep1", "dep2",
77+
"q2", "q3", "q4", "q5", "q6", "agelt35", "agegt54",
78+
"durable", "lusd", "husd"))
79+
print(dml_data_bonus)
80+
81+
# matrix interface to DoubleMLData
82+
dml_data_sim = double_ml_data_from_matrix(X=X, y=y, d=d)
83+
dml_data_sim
84+
```
85+
86+
87+
## Learners to estimate the nuisance models
88+
89+
To estimate our partially linear regression (PLR) model with the double machine learning algorithm, we first have to specify machine learners to estimate $m_0$ and $g_0$. For the bonus data we use a random forest regression model and for our simulated data from a sparse partially linear model we use a Lasso regression model. The implementation of `DoubleML` is based on the meta-packages [mlr3](https://mlr3.mlr-org.com/) for R. For details on the specification of learners and their hyperparameters we refer to the user guide Learners, hyperparameters and hyperparameter tuning.
90+
91+
```{r}
92+
library(mlr3)
93+
library(mlr3learners)
94+
# surpress messages from mlr3 package during fitting
95+
lgr::get_logger("mlr3")$set_threshold("warn")
96+
97+
learner = lrn("regr.ranger", num.trees=500, mtry=floor(sqrt(n_vars)), max.depth=5, min.node.size=2)
98+
ml_g_bonus = learner$clone()
99+
ml_m_bonus = learner$clone()
100+
101+
learner = lrn("regr.glmnet", lambda = sqrt(log(n_vars)/(n_obs)))
102+
ml_g_sim = learner$clone()
103+
ml_m_sim = learner$clone()
104+
```
105+
106+
107+
## Cross-fitting, DML algorithms and Neyman-orthogonal score functions
108+
109+
When initializing the object for PLR models `DoubleMLPLR`, we can further set parameters specifying the resampling:
110+
111+
* The number of folds used for cross-fitting `n_folds` (defaults to `n_folds = 5`) as well as
112+
* the number of repetitions when applying repeated cross-fitting `n_rep` (defaults to `n_rep = 1`).
113+
114+
Additionally, one can choose between the algorithms `"dml1"` and `"dml2"` via `dml_procedure` (defaults to `"dml2"`). Depending on the causal model, one can further choose between different Neyman-orthogonal score / moment functions. For the PLR model the default score is `"partialling out"`.
115+
116+
The user guide provides details about the Sample-splitting, cross-fitting and repeated cross-fitting, the Double machine learning algorithms and the Score functions
117+
118+
119+
## Estimate double/debiased machine learning models
120+
121+
We now initialize `DoubleMLPLR` objects for our examples using default parameters. The models are estimated by calling the `fit()` method and we can for example inspect the estimated treatment effect using the `summary()` method. A more detailed result summary can be obtained via the `print()` method. Besides the `fit()` method `DoubleML` model classes also provide functionalities to perform statistical inference like `bootstrap()`, `confint()` and `p_adjust()`, for details see the user guide Variance estimation, confidence intervals and boostrap standard errors.
122+
123+
```{r}
124+
set.seed(3141)
125+
obj_dml_plr_bonus = DoubleMLPLR$new(dml_data_bonus, ml_g=ml_g_bonus, ml_m=ml_m_bonus)
126+
obj_dml_plr_bonus$fit()
127+
print(obj_dml_plr_bonus)
128+
129+
obj_dml_plr_sim = DoubleMLPLR$new(dml_data_sim, ml_g=ml_g_sim, ml_m=ml_m_sim)
130+
obj_dml_plr_sim$fit()
131+
print(obj_dml_plr_sim)
132+
```
133+
134+
135+
136+
137+
138+

0 commit comments

Comments
 (0)