DoubleML
diff --git a/‎.gitignore‎
Lines changed: 0 additions & 1 deletion b/‎.gitignore‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎doc/Introduction_to_DoubleML.R‎
Lines changed: 404 additions & 0 deletions b/‎doc/Introduction_to_DoubleML.R‎
Lines changed: 404 additions & 0 deletions
diff --git a/‎…to_DoubleML/Introduction_to_DoubleML.Rmd‎ ‎doc/Introduction_to_DoubleML.Rmd‎vignettes/Introduction_to_DoubleML/Introduction_to_DoubleML.Rmd renamed to doc/Introduction_to_DoubleML.Rmd b/‎…to_DoubleML/Introduction_to_DoubleML.Rmd‎ ‎doc/Introduction_to_DoubleML.Rmd‎vignettes/Introduction_to_DoubleML/Introduction_to_DoubleML.Rmd renamed to doc/Introduction_to_DoubleML.Rmd
diff --git a/‎doc/Introduction_to_DoubleML.html‎
Lines changed: 1529 additions & 0 deletions b/‎doc/Introduction_to_DoubleML.html‎
Lines changed: 1529 additions & 0 deletions
diff --git a/‎doc/getstarted.R‎
Lines changed: 64 additions & 0 deletions b/‎doc/getstarted.R‎
Lines changed: 64 additions & 0 deletions
diff --git a/‎doc/getstarted.Rmd‎
Lines changed: 138 additions & 0 deletions b/‎doc/getstarted.Rmd‎
Lines changed: 138 additions & 0 deletions
@@ -34,5 +34,4 @@ DoubleML.Rproj
 # documentation files
 docs/
 docs
-doc
 inst/doc
@@ -0,0 +1,64 @@
+## ----setup, include=FALSE-----------------------------------------------------
+knitr::opts_chunk$set(echo = TRUE)
+knitr::opts_chunk$set(eval = TRUE)
+
+## ---- eval = FALSE------------------------------------------------------------
+#  remotes::install_github("DoubleML/doubleml-for-r")
+
+## ---- message=FALSE, warning=FALSE--------------------------------------------
+library(DoubleML)
+
+## -----------------------------------------------------------------------------
+library(DoubleML)
+
+# Load bonus data
+df_bonus = fetch_bonus(return_type="data.table")
+head(df_bonus)
+
+# Simulate data
+set.seed(3141)
+n_obs = 500
+n_vars = 100
+theta = 3
+X = matrix(rnorm(n_obs*n_vars), nrow=n_obs, ncol=n_vars)
+d = X[,1:3]%*%c(5,5,5) + rnorm(n_obs)
+y = theta*d + X[, 1:3]%*%c(5,5,5) + rnorm(n_obs)
+
+## -----------------------------------------------------------------------------
+# Specify the data and variables for the causal model
+dml_data_bonus = DoubleMLData$new(df_bonus,
+                             y_col = "inuidur1",
+                             d_cols = "tg",
+                             x_cols = c("female", "black", "othrace", "dep1", "dep2",
+                                        "q2", "q3", "q4", "q5", "q6", "agelt35", "agegt54",
+                                          "durable", "lusd", "husd"))
+print(dml_data_bonus)
+
+# matrix interface to DoubleMLData
+dml_data_sim = double_ml_data_from_matrix(X=X, y=y, d=d)
+dml_data_sim
+
+## -----------------------------------------------------------------------------
+library(mlr3)
+library(mlr3learners)
+# surpress messages from mlr3 package during fitting
+lgr::get_logger("mlr3")$set_threshold("warn")
+
+learner = lrn("regr.ranger", num.trees=500, mtry=floor(sqrt(n_vars)), max.depth=5, min.node.size=2)
+ml_g_bonus = learner$clone()
+ml_m_bonus = learner$clone()
+
+learner = lrn("regr.glmnet", lambda = sqrt(log(n_vars)/(n_obs)))
+ml_g_sim = learner$clone()
+ml_m_sim = learner$clone()
+
+## -----------------------------------------------------------------------------
+set.seed(3141)
+obj_dml_plr_bonus = DoubleMLPLR$new(dml_data_bonus, ml_g=ml_g_bonus, ml_m=ml_m_bonus)
+obj_dml_plr_bonus$fit()
+print(obj_dml_plr_bonus)
+
+obj_dml_plr_sim = DoubleMLPLR$new(dml_data_sim, ml_g=ml_g_sim, ml_m=ml_m_sim)
+obj_dml_plr_sim$fit()
+print(obj_dml_plr_sim)
+
@@ -0,0 +1,138 @@
+---
+title: "Getting Started with DoubleML"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Getting Started with DoubleML}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+knitr::opts_chunk$set(eval = TRUE)
+```
+
+The purpose of the following case-studies is to demonstrate the core functionalities of `DoubleML`.
+
+
+## Installation 
+
+The **DoubleML** package for R can be downloaded using (requires previous installation of the [`remotes` package](https://remotes.r-lib.org/index.html)).
+
+```{r, eval = FALSE}
+remotes::install_github("DoubleML/doubleml-for-r")
+```
+
+Load the package after completed installation. 
+
+```{r, message=FALSE, warning=FALSE}
+library(DoubleML)
+```
+
+The python package `DoubleML` is available via the github repository. For more information, please visit our user guide. 
+
+## Data
+
+For our case study we download the Bonus data set from the Pennsylvania Reemployment Bonus experiment and as a second example we simulate data from a partially linear regression model.
+
+```{r}
+library(DoubleML)
+
+# Load bonus data
+df_bonus = fetch_bonus(return_type="data.table")
+head(df_bonus)
+
+# Simulate data
+set.seed(3141)
+n_obs = 500
+n_vars = 100
+theta = 3
+X = matrix(rnorm(n_obs*n_vars), nrow=n_obs, ncol=n_vars)
+d = X[,1:3]%*%c(5,5,5) + rnorm(n_obs)
+y = theta*d + X[, 1:3]%*%c(5,5,5) + rnorm(n_obs)
+```
+
+
+## The causal model
+
+\begin{align*}
+Y = D \theta_0 + g_0(X) + \zeta, & &\mathbb{E}(\zeta | D,X) = 0, \\
+D = m_0(X) + V, & &\mathbb{E}(V | X) = 0,
+\end{align*}
+where $Y$ is the outcome variable and $D$ is the policy variable of interest.
+The high-dimensional vector $X = (X_1, \ldots, X_p)$ consists of other confounding covariates,
+and $\zeta$ and $V$ are stochastic errors.
+
+## The data-backend `DoubleMLData`
+
+`DoubleML` provides interfaces to objects of class [`data.table`](https://rdatatable.gitlab.io/data.table/) as well as R base classes `data.frame` and `matrix`. Details on the data-backend and the interfaces can be found in the user guide. The `DoubleMLData` class serves as data-backend and can be initialized from a dataframe by specifying the column `y_col="inuidur1"` serving as outcome variable $Y$, the column(s) `d_cols = "tg"` serving as treatment variable $D$ and the columns `x_cols=c("female", "black", "othrace", "dep1", "dep2", "q2", "q3", "q4", "q5", "q6", "agelt35", "agegt54", "durable", "lusd", "husd")` specifying the confounders. Alternatively a matrix interface can be used as shown below for the simulated data.
+
+
+```{r}
+# Specify the data and variables for the causal model
+dml_data_bonus = DoubleMLData$new(df_bonus,
+                             y_col = "inuidur1",
+                             d_cols = "tg",
+                             x_cols = c("female", "black", "othrace", "dep1", "dep2",
+                                        "q2", "q3", "q4", "q5", "q6", "agelt35", "agegt54",
+                                          "durable", "lusd", "husd"))
+print(dml_data_bonus)
+
+# matrix interface to DoubleMLData
+dml_data_sim = double_ml_data_from_matrix(X=X, y=y, d=d)
+dml_data_sim
+```
+
+
+## Learners to estimate the nuisance models
+
+To estimate our partially linear regression (PLR) model with the double machine learning algorithm, we first have to specify machine learners to estimate $m_0$ and $g_0$. For the bonus data we use a random forest regression model and for our simulated data from a sparse partially linear model we use a Lasso regression model. The implementation of `DoubleML` is based on the meta-packages [mlr3](https://mlr3.mlr-org.com/) for R. For details on the specification of learners and their hyperparameters we refer to the user guide Learners, hyperparameters and hyperparameter tuning.
+
+```{r}
+library(mlr3)
+library(mlr3learners)
+# surpress messages from mlr3 package during fitting
+lgr::get_logger("mlr3")$set_threshold("warn")
+
+learner = lrn("regr.ranger", num.trees=500, mtry=floor(sqrt(n_vars)), max.depth=5, min.node.size=2)
+ml_g_bonus = learner$clone()
+ml_m_bonus = learner$clone()
+
+learner = lrn("regr.glmnet", lambda = sqrt(log(n_vars)/(n_obs)))
+ml_g_sim = learner$clone()
+ml_m_sim = learner$clone()
+```
+
+
+## Cross-fitting, DML algorithms and Neyman-orthogonal score functions
+
+When initializing the object for PLR models `DoubleMLPLR`, we can further set parameters specifying the resampling: 
+
+* The number of folds used for cross-fitting `n_folds` (defaults to `n_folds = 5`) as well as 
+* the number of repetitions when applying repeated cross-fitting `n_rep` (defaults to `n_rep = 1`). 
+
+Additionally, one can choose between the algorithms `"dml1"` and `"dml2"` via `dml_procedure` (defaults to `"dml2"`). Depending on the causal model, one can further choose between different Neyman-orthogonal score / moment functions. For the PLR model the default score is `"partialling out"`.
+
+The user guide provides details about the Sample-splitting, cross-fitting and repeated cross-fitting, the Double machine learning algorithms and the Score functions
+
+
+## Estimate double/debiased machine learning models
+
+We now initialize `DoubleMLPLR` objects for our examples using default parameters. The models are estimated by calling the `fit()` method and we can for example inspect the estimated treatment effect using the `summary()` method. A more detailed result summary can be obtained via the `print()` method. Besides the `fit()` method `DoubleML` model classes also provide functionalities to perform statistical inference like `bootstrap()`, `confint()` and `p_adjust()`, for details see the user guide Variance estimation, confidence intervals and boostrap standard errors.
+
+```{r}
+set.seed(3141)
+obj_dml_plr_bonus = DoubleMLPLR$new(dml_data_bonus, ml_g=ml_g_bonus, ml_m=ml_m_bonus)
+obj_dml_plr_bonus$fit()
+print(obj_dml_plr_bonus)
+
+obj_dml_plr_sim = DoubleMLPLR$new(dml_data_sim, ml_g=ml_g_sim, ml_m=ml_m_sim)
+obj_dml_plr_sim$fit()
+print(obj_dml_plr_sim)
+```
+
+
+
+
+
+