Title: | Assign Treatments, Power Calculations, Balances, Impact Evaluation of Experiments |
---|---|
Description: | Assists in the whole process of designing and evaluating Randomized Control Trials. Robust treatment assignment by strata/blocks, that handles misfits; Power calculations of the minimum detectable treatment effect or minimum populations; Balance tables of T-test of covariates; Balance Regression: (treatment ~ all x variables) with F-test of null model; Impact_evaluation: Impact evaluation regressions. This function gives you the option to include control_vars, fixed effect variables, cluster variables (for robust SE), multiple endogenous variables and multiple heterogeneous variables (to test treatment effect heterogeneity) summary_statistics: Function that creates a summary statistics table with statistics rank observations in n groups: Creates a factor variable with n groups. Each group has a min and max label attach to each category. Athey, Susan, and Guido W. Imbens (2017) <arXiv:1607.00698>. |
Authors: | Isidoro Garcia-Urquieta [aut, cre] |
Maintainer: | Isidoro Garcia-Urquieta <[email protected]> |
License: | GPL-2 |
Version: | 1.2 |
Built: | 2025-02-15 04:10:17 UTC |
Source: | https://github.com/isidorogu/rct |
balance_regression() Runs a LPM of treatment status against all covariates (treatment~X'B).
balance_regression(data, treatment)
balance_regression(data, treatment)
data |
A data.frame, tibble or data.table |
treatment |
a string with treatment status column |
This functions runs a Linear Probability model of each treatment group & control on all the columns in data. For instance, if treatment column has values of (0,1,2), balance_regression will run two models: 1) LPM(treatment(0,1)~X'b) and 2) LPM(treatment(0,2)~X'b). The value are the regression tables and details of the F_test of these models.
A list: regression_tables = regression output of treatment against all covariates, F_test = table with the F tests of each regression
data <-data.frame(x1 = rnorm(n = 100, mean = 100, sd = 15), x2= rnorm(n = 100, mean = 65), treat = rep(c(0,1,2,3,4), each = 20)) balance_regression(data = data, treatment = "treat")
data <-data.frame(x1 = rnorm(n = 100, mean = 100, sd = 15), x2= rnorm(n = 100, mean = 65), treat = rep(c(0,1,2,3,4), each = 20)) balance_regression(data = data, treatment = "treat")
Creates balance table for the X variables across treatment status
balance_table(data, treatment)
balance_table(data, treatment)
data |
A data.frame, tibble or data.table |
treatment |
a string with treatment status column |
balance_table() performs t.test(X~treatment) for each X column in data. Every value of treatment i.e 1,2,3,...N is compared against control value (0) or the first value of the treatment column. For instance, If treatment column has values of (0,1,2,3), balance_table will return: the mean value of each treatment (for all X's), and the p_values of the t.test of (1,2,3) against treatment = 0.
A tibble with Mean_value of each treatment status and p_values
data <-data.frame(x1 = rnorm(n = 100, mean = 100, sd = 15), x2=rnorm(n = 100, mean = 65), treatment = rep(c(0,1,2,3,4), each = 20)) balance_table(data, "treatment")
data <-data.frame(x1 = rnorm(n = 100, mean = 100, sd = 15), x2=rnorm(n = 100, mean = 65), treatment = rep(c(0,1,2,3,4), each = 20)) balance_table(data, "treatment")
Impact Evaluation of Treatment Effects
impact_eval( data, endogenous_vars, treatment, heterogenous_vars, fixed_effect_vars = NULL, control_vars, cluster_var )
impact_eval( data, endogenous_vars, treatment, heterogenous_vars, fixed_effect_vars = NULL, control_vars, cluster_var )
data |
A data.frame, tibble or data.table |
endogenous_vars |
Vector of Y's on which treatment effects will be evaluated |
treatment |
Variable indicating the treatment status |
heterogenous_vars |
Vector of variables for which you wish to assess treatment distributions/heterogeneities. |
fixed_effect_vars |
Vector of variables to add as fixed effects. Default is without fixed effects |
control_vars |
Vector of variables to control for in the evaluation. Default is without controls |
cluster_var |
String of cluster variable the standard errors. Default is without clustered std errors |
This function carries out the evaluation of treatment effects on endogenous variables. It automatically runs the regressions of all the endogenous_vars supplied & all the combinations of endogenous_vars and heterogenous_vars. Additionally, the function has the option of include fixed_effects, controls and cluster variables for clustered std errors.
impact_eval() returns a list of regression tables. The names of the list are the same as the endogenous variables. for heterogeneities the names are endogenous_var_heterogenous_var
data <- data.frame(y_1 = rnorm(n = 100, mean = 100, sd = 15), y_2 = rnorm(n = 100, mean = 8, sd = 2), treat = rep(c(0,1,2,3), each = 25), heterogenous_var1 = rep(c("X_Q1", "X_Q2", "X_Q3", "X_Q4"), times = 25), cluster_var1 = rep(c(1:5), times = 20), fixed_effect_var1 = rep(c(1,2), times = 50), control_var1 = rnorm(n = 100, mean = 20, sd = 1)) evaluation<-impact_eval(data = data, endogenous_vars = c("y_1", "y_2"), treatment = "treat", heterogenous_vars = c("heterogenous_var1"), cluster_var = "cluster_var1", fixed_effect_vars = c("fixed_effect_var1"), control_vars = c("control_var1"))
data <- data.frame(y_1 = rnorm(n = 100, mean = 100, sd = 15), y_2 = rnorm(n = 100, mean = 8, sd = 2), treat = rep(c(0,1,2,3), each = 25), heterogenous_var1 = rep(c("X_Q1", "X_Q2", "X_Q3", "X_Q4"), times = 25), cluster_var1 = rep(c(1:5), times = 20), fixed_effect_var1 = rep(c(1,2), times = 50), control_var1 = rnorm(n = 100, mean = 20, sd = 1)) evaluation<-impact_eval(data = data, endogenous_vars = c("y_1", "y_2"), treatment = "treat", heterogenous_vars = c("heterogenous_var1"), cluster_var = "cluster_var1", fixed_effect_vars = c("fixed_effect_var1"), control_vars = c("control_var1"))
N_min() computes the minimum population needed to detect difference between control group and each treatment, given a target minimum detectable effect
N_min( outcome_var, tau_min, power = 0.8, significance = 0.05, share_control, n_groups = 2 )
N_min( outcome_var, tau_min, power = 0.8, significance = 0.05, share_control, n_groups = 2 )
outcome_var |
the variable for which you wish to test the impact of treatment |
tau_min |
the target detectable effect (in outcome_var units) |
power |
The level of power of the test (1 - Pr(Reject H_0 | H_0 True) ). Default is 0.8 |
significance |
The level of significance of the test Pr(Reject H_0 | H_0 False). Default is 0.05 |
share_control |
The share of observations in N assigned to control. This argument allows for sequences (i.e. seq(0,1,0.1)) |
n_groups |
Number of groups (control + # treatment groups) |
This function calculates the minimum experiment's population needed in order to detect at least a difference of tau_min statistically significantly. This is between any two given groups (e.g. control vs each treatment), given the outcome variable, power and significance
A tibble with the share_control and N observations in control group (N_control), the share and N of each treatment c(share_ti, N_ti), total share of treatment rows and N treated (share_treat, N_treat), N, the minimum detectable difference between control and all treatments together (tau_min_global), the minimum detectable difference between control and each treatment (tau_min_each_treat)
data <- data.frame(y_1 = rbinom(n = 100, size = 1, prob = 0.3), y_2 = rnorm(n = 100, mean = 8, sd = 2)) N_min(data$y_1, tau_min = 0.01, share_control = seq(0,1,0.1), n_groups = 3)
data <- data.frame(y_1 = rbinom(n = 100, size = 1, prob = 0.3), y_2 = rnorm(n = 100, mean = 8, sd = 2)) N_min(data$y_1, tau_min = 0.01, share_control = seq(0,1,0.1), n_groups = 3)
ntile_label() ranks observations in n groups, with labels
ntile_label(var, n, digits = 0)
ntile_label(var, n, digits = 0)
var |
The variable wished to be ntile_label |
n |
rank the variable in n groups |
digits |
How many digits to include in the label |
n_tile_label is very similar to ntile from dplyr. But n_tile_label creates the n groups and then labels them. For each group i, the value of the ntile_label is [min(i) - max(i)].
A ordered factor vector of each n group. The value has the form of [min(n_i) - max(n_i)]
data <- data.frame(y_1 = rbinom(n = 100, size = 1, prob = 0.3), y_2 = rnorm(n = 100, mean = 8, sd = 2)) data$y_1_2 <- ntile_label(data$y_1, n = 2, digits = 0) data$y_2_4 <- ntile_label(data$y_2, n = 4, digits = 1)
data <- data.frame(y_1 = rbinom(n = 100, size = 1, prob = 0.3), y_2 = rnorm(n = 100, mean = 8, sd = 2)) data$y_1_2 <- ntile_label(data$y_1, n = 2, digits = 0) data$y_2_4 <- ntile_label(data$y_2, n = 4, digits = 1)
RCT provides three important group of functions: a) functions for pre-processing the design of the RCT b) Functions for assigning treatment status and checking for balances c) Function for evaluating the impact of the RCT
RCT helps you focus on the statistics of the randomized control trials, rather than the heavy programming lifting. RCT helps you in the whole process of designing and evaluating a RCT. 1. Clean and summarise the data in which you want to randomly assign treatment 2. Decide the share of observations that will go to control group 3. Decide which variables to use for strata building 4. Robust Random Assignment by strata/blocks 5 Impact evaluation of all y's and heterogeneities To lean more about RCT, start with the vignette: browseVignettes(package = "RCT")
treatment_assign: Robust treatment assign by strata/blocks
impact_eval: Automatized impact evaluation with heterogeneous treatment effects
balance_table: Balance tables for any length of covariates
balance_regression: LPM of treatment status against covariates with F-test
tau_min: Computation of the minimum detectable effect between control & treatment units
tau_min_probability: Computation of the minimum detectable effect between control & treatment units for dichotomous y-vars
summary_statistics: Summary statistics of all numeric columns in your data
ntile_label: Rank and divide observations in n groups, with label
Isidoro Garcia Urquieta, [email protected]
Athey, Susan, and Guido W. Imbens (2017) "The Econometrics Randomized Experiments". Handbook of economic field experiments. https://arxiv.org/abs/1607.00698
Useful links: https://github.com/isidorogu/RCT Report bugs at https://github.com/isidorogu/RCT/issues
summary_statistics() Creates summary statistics table of all numeric variables in data
summary_statistics( data, probs = c(0, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 1), na.rm = T )
summary_statistics( data, probs = c(0, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 1), na.rm = T )
data |
A data.frame, tibble or data.table |
probs |
The quantiles to compute. Default is c(0, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 1) |
na.rm |
whether to exclude NA's from calculations |
This function computes the selected quantiles, mean and N values of all the numeric columns of data.
A tibble with the Mean, N (not NA) and probs selects for each numeric column
data <-data.frame(x = c(1:5), y = c(100, 200, 300, 410, 540), z = rep("c", 5)) summary_statistics(data)
data <-data.frame(x = c(1:5), y = c(100, 200, 300, 410, 540), z = rep("c", 5)) summary_statistics(data)
tau_min() computes the minimum detectable difference between control group and each treatment
tau_min( outcome_var, N, power = 0.8, significance = 0.05, share_control, n_groups = 2 )
tau_min( outcome_var, N, power = 0.8, significance = 0.05, share_control, n_groups = 2 )
outcome_var |
the variable for which you wish to test the impact of treatment |
N |
number of observations in the RCT, usually nrow(data) |
power |
The level of power of the test (1 - Pr(Reject H_0 | H_0 True) ). Default is 0.8 |
significance |
The level of significance of the test Pr(Reject H_0 | H_0 False). Default is 0.05 |
share_control |
The share of observations in N assigned to control. This argument allows for sequences (i.e. seq(0,1,0.1)) |
n_groups |
Number of groups (control + # treatment groups) |
This function calculates the minimum difference that could show significant E[Y(1)-Y(0)] = tau, between any two given groups (e.g. control vs each treatment), given the population size (N), the outcome variable, power and significance
A tibble with the share_control and N observations in control group (N_control), the share and N of each treatment c(share_ti, N_ti), total share of treatment rows and N treated (share_treat, N_treat), N, the minimum detectable difference between control and all treatments together (tau_min_global), the minimum detectable difference between control and each treatment (tau_min_each_treat)
data <- data.frame(y_1 = rbinom(n = 100, size = 1, prob = 0.3), y_2 = rnorm(n = 100, mean = 8, sd = 2)) tau_min(data$y_1, N = nrow(data), share_control = seq(0,1,0.1), n_groups = 3)
data <- data.frame(y_1 = rbinom(n = 100, size = 1, prob = 0.3), y_2 = rnorm(n = 100, mean = 8, sd = 2)) tau_min(data$y_1, N = nrow(data), share_control = seq(0,1,0.1), n_groups = 3)
tau_min_probability() computes the minimum detectable difference between control group and each treatment for a dichotomous variable
tau_min_probability( prior, N, power = 0.8, significance = 0.05, share_control, n_groups = 2 )
tau_min_probability( prior, N, power = 0.8, significance = 0.05, share_control, n_groups = 2 )
prior |
Pr(Y=1). |
N |
number of observations in the RCT, usually nrow(data) |
power |
The level of power of the test (1 - Pr(Reject H_0 | H_0 True) ). Default is 0.8 |
significance |
The level of significance of the test Pr(Reject H_0 | H_0 False). Default is 0.05 |
share_control |
The share of observations in N assigned to control. This argument allows for sequences (i.e. seq(0,1,0.1)) |
n_groups |
Number of groups (control + # treatment groups) |
This function calculates the minimum difference that could show significant Pr[Y(1)-Y(0)] = tau, between any two given groups (e.g. control vs each treatment), given the population size (N), the outcome variable, power and significance
A tibble with the share_control and N observations in control group (N_control), the share and N of each treatment c(share_ti, N_ti), total share of treatment rows and N treated (share_treat, N_treat), N, the minimum detectable difference between control and all treatments together (tau_min_global), the minimum detectable difference between control and each treatment (tau_min_each_treat)
tau_min_probability(0.4, N = 1000, share_control = seq(0,1,0.1), n_groups = 3)
tau_min_probability(0.4, N = 1000, share_control = seq(0,1,0.1), n_groups = 3)
treatment_assign() carries out robust treatment assignment by strata/blocks
treatment_assign( data, share_control, n_t = 2, strata_varlist, missfits = c("global", "NA", "strata"), seed = 1990, share_ti = rep(1/n_t - share_control/n_t, times = n_t), key )
treatment_assign( data, share_control, n_t = 2, strata_varlist, missfits = c("global", "NA", "strata"), seed = 1990, share_ti = rep(1/n_t - share_control/n_t, times = n_t), key )
data |
A data.frame, tibble or data.table |
share_control |
share of the observations assigned to control group |
n_t |
Number of treatments groups |
strata_varlist |
vector of categorical variables to form the strata/blocks for random assignment. Should be in the form of vars(var1, var2, ...) |
missfits |
How to handle the misfits. Default is "global". See Carril (2016) for details. |
seed |
A number used to set.seed(). |
share_ti |
The share of each treatment group. If NULL (Default), each treatment group will have equal share. |
key |
The key identifier column of data. |
This function creates a variable that indicates the treatment status. The random assignment is made by strata/blocks. It can handle equal or unequal treatment shares. Finally, it has three methods available to handle misfits (same as randtreat in STATA): "global": assigning the observations that couldn't be randomly assigned globally, "strata": assigning the observations that couldn't be randomly assigned by strata, "NA": set the the treat observations that couldn't be randomly assigned to NA.
A list: "data" = the data with key, treat, strata, misfit column., "summary_strata" = A summary tibble with the membership of each strata and its size.
data<-data.frame(key = c(1:1000), ing_quartile = rep(c("Q1", "Q2", "Q3", "Q4"), each = 250), age_quartile = rep(c("Q1", "Q2", "Q3", "Q4"), times = 250)) assigment<-treatment_assign(data = data, share_control = 0.1, n_t = 3, strata_varlist = dplyr::vars(ing_quartile, age_quartile), missfits = "strata", seed = 1990, key = "key") table(data$treat, useNA = "ifany") prop.table(table(data$treat, useNA = "ifany"))
data<-data.frame(key = c(1:1000), ing_quartile = rep(c("Q1", "Q2", "Q3", "Q4"), each = 250), age_quartile = rep(c("Q1", "Q2", "Q3", "Q4"), times = 250)) assigment<-treatment_assign(data = data, share_control = 0.1, n_t = 3, strata_varlist = dplyr::vars(ing_quartile, age_quartile), missfits = "strata", seed = 1990, key = "key") table(data$treat, useNA = "ifany") prop.table(table(data$treat, useNA = "ifany"))