*===================================================================* * School of Economics * * Universidad Nacional de La Plata * *-------------------------------------------------------------------* * Testing intrument validity for LATE identification * * from Huber & Mellace (2011) * * Description: bootstrap test for the validity of instrumental * * variables in just identified treatment effect models with * * endogeneity * * Guillermo Cruces Florencia Pinto Dario Tortarolo * * gcruces@cedlas.org fpinto@cedlas.org dtortarolo@cedlas.org * * Date: 19-07-2012 * *===================================================================* capture program drop ivtest program define ivtest, rclass version 8.0 syntax varlist [if] [in], Reps(real) Sec(real) [SAving(passthru)] [replace] [Dots] tokenize `varlist' local y `1' local d `2' local z `3' * Example: * ivtest workedforpay morethan2 twins2 [if exp], reps(1000) dots saving(bs1000) replace * INPUT: * y: outcome * d: binary treatment * z: binary instrument * reps: n_boot1 (number of bootstrap replications for the first bootstrap) * sec: n_boot2 (number of bootstrap replications for the second bootstrap) * binary: indicator whether y is binary * OUTPUT: * fullr.pval: p-value under full recentering * partialr.pval: p-value under partial recentering * stdif.treated: Standardized difference between nearest bound and point estimate for treated (positive values indicate violation of the null) * stdif.nontreated: Standardized difference between nearest bound and point estimate for nontreated (positive values indicate violation of the null) *************************************** /*( IV TEST BASED ON BENNETT (2009) )*/ ************************************* quietly { summ `y' local n = r(N) local sd_y = r(sd) local sqrt_n = sqrt(`n') * Create and indicator whether y is binary (inlist is a function that returns 1 if `y' is coded as 0 or 1; otherwise, returns 0) * Ojo, la variable explicada no puede tener missings tempvar binary gen `binary' = inlist(`y',0,1) summ `d' if `z'==1 local p1_1 = r(mean) local p0_1 = 1 - r(mean) summ `d' if `z'==0 local p1_0 = r(mean) local p0_0 = 1 - r(mean) } * Assign names to local macro that may be used as temporary variable names tempvar z1 z0 y1 y0 y11 y01 y10 y00 unif00 unif11 y11temp y00temp y11min y11max y00min y00max gen `z1' = `z' if `d'==1 gen `z0' = `z' if `d'==0 gen `y1' = `y' if `d'==1 gen `y0' = `y' if `d'==0 gen `y11' = `y1' if `z1'==1 gen `y01' = `y1' if `z1'==0 gen `y10' = `y0' if `z0'==1 gen `y00' = `y0' if `z0'==0 * To generate random variates over the interval [a,b), use a+(b-a)*runiform() gen `unif00' = -0.1+(0.1+0.1)*runiform() if `y00'!=. gen `unif11' = -0.1+(0.1+0.1)*runiform() if `y11'!=. if `binary'==1 { gen `y11temp' = `y11' + `unif11' gen `y00temp' = `y00' + `unif00' } else { gen `y11temp' = `y11' gen `y00temp' = `y00' } * Create local containing an specific quantile to be computed from a variable local prop1 = ( `p1_0' / `p1_1' ) * 100 /* proportion of always takers in eq (1) */ local prop2 = (1 - (`p1_0' / `p1_1')) * 100 /* proportion of compliers in eq (1) */ local prop3 = ( `p0_1' / `p0_0' ) * 100 /* proportion of never takers in eq (3) */ local prop4 = (1 - (`p0_1' / `p0_0')) * 100 /* proportion of compliers in eq (3) */ di in ye "Prop 1 = " `prop1' di in ye "Prop 2 = " `prop2' di in ye "Prop 3 = " `prop3' di in ye "Prop 4 = " `prop4' * Compute two quantiles for each variable -y11temp and y00temp- _pctile `y11temp', percentiles(`prop1') local quant1 = r(r1) _pctile `y11temp', percentiles(`prop2') local quant2 = r(r1) _pctile `y00temp', percentiles(`prop3') local quant3 = r(r1) _pctile `y00temp', percentiles(`prop4') local quant4 = r(r1) * Create qth conditional quantile of the outcome ==> in the paper correspond to y(q), y(1-q), y(r) and y(1-r) gen `y11min' = `y11' if ( `y11temp' <= `quant1' ) gen `y11max' = `y11' if ( `y11temp' >= `quant2' ) gen `y00min' = `y00' if ( `y00temp' <= `quant3' ) gen `y00max' = `y00' if ( `y00temp' >= `quant4' ) * Compute the normalized differences foreach var of varlist `y11' `y01' `y11min' `y11max' `y00' `y10' `y00min' `y00max' { sum `var' local mean_`var' = r(mean) } if `mean_`y11'' > `mean_`y01'' & `mean_`y11''!=. { local mean_y11_y01_1 = 1 } else { local mean_y11_y01_1 = 0 } if `mean_`y11'' < `mean_`y01'' & `mean_`y01''!=. { local mean_y11_y01_2 = 1 } else { local mean_y11_y01_2 = 0 } if `mean_`y00'' > `mean_`y10'' & `mean_`y00''!=. { local mean_y00_y10_1 = 1 } else { local mean_y00_y10_1 = 0 } if `mean_`y00'' < `mean_`y10'' & `mean_`y10''!=. { local mean_y00_y10_2 = 1 } else { local mean_y00_y10_2 = 0 } local normdif_treated = ( (`mean_y11_y01_1'*(`mean_`y11min'' - `mean_`y01'')) - (`mean_y11_y01_2'*(`mean_`y11max'' - `mean_`y01'')) ) / `sd_y' local normdif_nontreated = ( (`mean_y00_y10_1'*(`mean_`y00min'' - `mean_`y10'')) - (`mean_y00_y10_2'*(`mean_`y00max'' - `mean_`y10'')) ) / `sd_y' * El segundo termino de cada producto corresponde a la primer y tercera linea de la Ho del test. /*( The minimum p-value type test proposed by Bennett(2009) demands 2 individual bootstraps, where the second resamples from the distribution of the first bootstrap )*/ * Firts Bootstrap /* First bootstrap Draw B1 bootstrap samples of size n from the original sample Los siguientes son 4 vectores que seran rellenados de las estimaciones de los parametros poblacionales "Theta" Tn1b debera contener las estimaciones de "Theta 1", i.e. la primer fila de la hipotesis nula, de las b = 1,...,B muestras (cada una con n observaciones). A su vez, B = n_boot1 = `reps' number of bootstrap replications. Ver pagina 11 paper. Tn1b = c() Tn2b = c() Tn3b = c() Tn4b = c() */ * Creamos 4 vectores vacios. Con el correr del bootstrap se iran llenando de las estimaciones de los parametros poblacionales THETA * rowsof(M) returns the number of rows of matrix M. forvalues i=1(1)4 { matrix Tn`i'b = (.) } local n_Tn1b = rowsof(Tn1b) tempvar zb db yb z1b z0b y1b y0b y11b y01b y10b y00b unifb00 unifb11 y11tempb y00tempb while `n_Tn1b'<=`reps' { preserve di in red "Bootstrap draw No `n_Tn1b'" * Draw a random sample of size n (size of variable y) with replacement bsample `n' gen `yb'=`y' gen `zb'=`z' gen `db'=`d' summ `yb' local obsb = r(N) summ `db' local obs1b = r(sum) local obs0b = r(N) - r(sum) summ `db' if `zb'==1 local p1_1b = r(mean) local p0_1b = 1 - r(mean) summ `db' if `zb'==0 local p1_0b = r(mean) local p0_0b = 1 - r(mean) * Create local containing an specific quantile local prop1b = ( `p1_0b' / `p1_1b' ) * 100 /* proportion of always takers in eq (1) */ local prop2b = (1 - (`p1_0b' / `p1_1b') ) * 100 /* proportion of compliers in eq (1) */ local prop3b = ( `p0_1b' / `p0_0b' ) * 100 /* proportion of never takers in eq (3) */ local prop4b = (1 - (`p0_1b' / `p0_0b') ) * 100 /* proportion of compliers in eq (3) */ if `prop1b'>0 & `prop1b'<=100 & `prop2b'>0 & `prop2b'<=100 & `prop3b'>0 & `prop3b'<=100 & `prop4b'>0 & `prop4b'<=100 { gen `z1b' = `zb' if `db'==1 gen `z0b' = `zb' if `db'==0 gen `y1b' = `yb' if `db'==1 gen `y0b' = `yb' if `db'==0 gen `y11b' = `y1b' if `z1b'==1 gen `y01b' = `y1b' if `z1b'==0 gen `y10b' = `y0b' if `z0b'==1 gen `y00b' = `y0b' if `z0b'==0 gen `unifb00' = -0.1+(0.1+0.1)*runiform() if `y00b'!=. gen `unifb11' = -0.1+(0.1+0.1)*runiform() if `y11b'!=. if `binary'==1 { gen `y11tempb' = `y11b' + `unifb11' gen `y00tempb' = `y00b' + `unifb00' } else { gen `y11tempb' = `y11b' gen `y00tempb' = `y00b' } _pctile `y11tempb', percentiles(`prop1b') local quant1b = r(r1) _pctile `y11tempb', percentiles(`prop2b') local quant2b = r(r1) _pctile `y00tempb', percentiles(`prop3b') local quant3b = r(r1) _pctile `y00tempb', percentiles(`prop4b') local quant4b = r(r1) * Create qth conditional quantile of the outcome ==> in the paper correspond to y(q) y(1-q) y(r) and y(1-r) * Estimacion de los extremos de la desigualdad (6) y (7) sum `y11b' if ( `y11tempb' <= `quant1b' ) local mean_y11minb = r(mean) sum `y11b' if ( `y11tempb' >= `quant2b' ) local mean_y11maxb = r(mean) sum `y00b' if ( `y00tempb' <= `quant3b' ) local mean_y00minb = r(mean) sum `y00b' if ( `y00tempb' >= `quant4b' ) local mean_y00maxb = r(mean) * Estimacion del término central de las expresiones (6) y (7) del paper summ `y01b' local mean_y01b = r(mean) summ `y10b' local mean_y10b = r(mean) * Rellenamos las matrices de arriba (this would be the end of step 2 of the algorithm) matrix Tn1b = (Tn1b \ `mean_y11minb'-`mean_y01b') matrix Tn2b = (Tn2b \ `mean_y01b'-`mean_y11maxb') matrix Tn3b = (Tn3b \ `mean_y00minb'-`mean_y10b') matrix Tn4b = (Tn4b \ `mean_y10b'-`mean_y00maxb') local n_Tn1b = rowsof(Tn1b) } restore } * Elimina la primer fila missing de cada matriz. matrix Tn1b = Tn1b[2..`n_Tn1b', 1] matrix Tn2b = Tn2b[2..`n_Tn1b', 1] matrix Tn3b = Tn3b[2..`n_Tn1b', 1] matrix Tn4b = Tn4b[2..`n_Tn1b', 1] * Variances: en este paso R obtiene la varianza del vector-columna (de dimension Bx1) que contiene a los parámetros estimados, para luego usarlos en el cálculo de la secuencia delta_n. * (algo así como calcular la varianza de una variable en stata) tempvar Tn1b1 Tn2b1 Tn3b1 Tn4b1 forvalues i=1(1)4 { svmat Tn`i'b qui sum Tn`i'b1 local varTn`i'b = r(Var) drop Tn`i'b1 } * Create objects for partial and full recentering * crea matrices de 1x`n_boot2' y las rellena con ceros matrix FCbsample1 = J(1,`sec', 0) matrix FCbsample2 = J(1,`sec', 0) matrix FCbsample3 = J(1,`sec', 0) matrix FCbsample4 = J(1,`sec', 0) matrix PCbsample1 = J(1,`sec', 0) matrix PCbsample2 = J(1,`sec', 0) matrix PCbsample3 = J(1,`sec', 0) matrix PCbsample4 = J(1,`sec', 0) matrix FCp_vals1 = J(`sec', 1, 0) matrix FCp_vals2 = J(`sec', 1, 0) matrix FCp_vals3 = J(`sec', 1, 0) matrix FCp_vals4 = J(`sec', 1, 0) matrix PCp_vals1 = J(`sec', 1, 0) matrix PCp_vals2 = J(`sec', 1, 0) matrix PCp_vals3 = J(`sec', 1, 0) matrix PCp_vals4 = J(`sec', 1, 0) * Fix value of sequence delta_n (footnote 7 in the paper) local delta_n = sqrt( 2 * log(log(`n')) / `n' ) * Compute test statistics * Step 1 of the algorithm: estimate the vector of parameters Theta in the original sample local Tn1 = `mean_`y11min'' - `mean_`y01'' local Tn2 = `mean_`y01'' - `mean_`y11max'' local Tn3 = `mean_`y00min'' - `mean_`y10'' local Tn4 = `mean_`y10'' - `mean_`y00max'' * Compute recentering terms forvalues i=1(1)4 { if `Tn`i'' >= (-`delta_n'*sqrt(`varTn`i'b')) { local Tn`i'_1 = 1 } else { local Tn`i'_1 = 0 } if `Tn`i'' < (-`delta_n'*sqrt(`varTn`i'b')) { local Tn`i'_2 = 1 } else { local Tn`i'_2 = 0 } } * Esto es para el vector recentrado parcialmente, que al vector de estimadores de cada muestra bootstrap le resta el máximo entre el vector estimado de la data inicial y una secuencia delta (Paso 3)). Es decir, las siguientes 4 lineas serian el computo de max(Theta, - delta_n) local maxVec1 = (`Tn1_1' * `Tn1') + (`Tn1_2' * (-`delta_n'*sqrt(`varTn1b'))) local maxVec2 = (`Tn2_1' * `Tn2') + (`Tn2_2' * (-`delta_n'*sqrt(`varTn2b'))) local maxVec3 = (`Tn3_1' * `Tn3') + (`Tn3_2' * (-`delta_n'*sqrt(`varTn3b'))) local maxVec4 = (`Tn4_1' * `Tn4') + (`Tn4_2' * (-`delta_n'*sqrt(`varTn4b'))) * Corregimos las matrices y los vectores por la raiz de n (ecuación (11)) local Tn1 = `sqrt_n'*`Tn1' local Tn2 = `sqrt_n'*`Tn2' local Tn3 = `sqrt_n'*`Tn3' local Tn4 = `sqrt_n'*`Tn4' matrix Tn1b = `sqrt_n'*Tn1b matrix Tn2b = `sqrt_n'*Tn2b matrix Tn3b = `sqrt_n'*Tn3b matrix Tn4b = `sqrt_n'*Tn4b * Generate bootstrap samples (using first bootstrap) - full recentering - Step 3 /* rep(x,times) replicates the values in x. times is an integer vector giving the (non-negative) number of times to repeat each element if of length(x), or to repeat the whole vector if of length 1. If times consists of a single integer, the result consists of the whole input repeated this many times. If times is a vector of the same length as x (after replication by each), the result consists of x[1] repeated times[1] times, x[2] repeated times[2] times and so on. */ matrix rep_Tn1 = J(`reps',1,`Tn1') matrix rep_Tn2 = J(`reps',1,`Tn2') matrix rep_Tn3 = J(`reps',1,`Tn3') matrix rep_Tn4 = J(`reps',1,`Tn4') matrix FCBootstrap1 = Tn1b - rep_Tn1 matrix FCBootstrap2 = Tn2b - rep_Tn2 matrix FCBootstrap3 = Tn3b - rep_Tn3 matrix FCBootstrap4 = Tn4b - rep_Tn4 * Generate bootstrap samples (using first bootstrap)- partial recentering - Step 3 matrix rep_maxVec1 = J(`reps',1,`sqrt_n'*`maxVec1') matrix rep_maxVec2 = J(`reps',1,`sqrt_n'*`maxVec2') matrix rep_maxVec3 = J(`reps',1,`sqrt_n'*`maxVec3') matrix rep_maxVec4 = J(`reps',1,`sqrt_n'*`maxVec4') matrix PCBootstrap1 = Tn1b - rep_maxVec1 matrix PCBootstrap2 = Tn2b - rep_maxVec2 matrix PCBootstrap3 = Tn3b - rep_maxVec3 matrix PCBootstrap4 = Tn4b - rep_maxVec4 * Evaluate the test statistics at the empirical distributions to compute the p-values - Step 4 * Equation 11: Share of bootstrap replications in which the recentered parameters are larger than the estimates in the original sample /* p_val_1 = mean(FCBootstrap1>Tn1) p_val_2 = mean(FCBootstrap2>Tn2) p_val_3 = mean(FCBootstrap3>Tn3) p_val_4 = mean(FCBootstrap4>Tn4) */ forvalues i=1(1)4 { * Convertimos las matrices en variables svmat PCBootstrap`i' svmat FCBootstrap`i' gen aux`i' = . replace aux`i' = 1 if FCBootstrap`i'1>`Tn`i'' & FCBootstrap`i'1!=. replace aux`i' = 0 if FCBootstrap`i'1<=`Tn`i'' qui sum aux`i' local p_val_`i' = r(mean) drop aux`i' } * Compute the minimum p-value based on the original sample - Step 5 local p_val = min(`p_val_1', `p_val_2', `p_val_3', `p_val_4') * Second bootstrap - draw observations from first bootstrap (Draw B2 values from the distributions of "Theta_f_b" and "Theta_p_b") /* bsample2 <- sample(n_boot1, n_boot2, replace=T) */ preserve keep if PCBootstrap11!=. keep PCBootstrap11 PCBootstrap21 PCBootstrap31 PCBootstrap41 FCBootstrap11 FCBootstrap21 FCBootstrap31 FCBootstrap41 bsample `sec' * Reconvertimos las variables en matrices. forvalues i=1(1)4 { mkmat PCBootstrap`i'1 mkmat FCBootstrap`i'1 } * Step 6 * Draw samples of recentered test statistics out of first bootstrap - full recentering matrix FCbsample1 = FCBootstrap11[1..`sec',1] matrix FCbsample2 = FCBootstrap21[1..`sec',1] matrix FCbsample3 = FCBootstrap31[1..`sec',1] matrix FCbsample4 = FCBootstrap41[1..`sec',1] * Draw samples of recentered test statistics out of first bootstrap - partial recentering matrix PCbsample1 = PCBootstrap11[1..`sec',1] matrix PCbsample2 = PCBootstrap21[1..`sec',1] matrix PCbsample3 = PCBootstrap31[1..`sec',1] matrix PCbsample4 = PCBootstrap41[1..`sec',1] * Step 7 in paper: in each bootstrap sample, compute the minimum P-values of minP.f and minP.p /* FCp_vals1[1:n_boot2,1] = 1-FC_ECDF1[[1]](FCbsample1[1,1:n_boot2]) can be replaced by a vector consisting of (see eq. 14 in the paper) mean(FCBootstrap1>FCbsample1[1,1]), mean(FCBootstrap1>FCbsample1[1,2]), ..., mean(FCBootstrap1>FCbsample1[1,n_boot2]) */ * Empirical distributions of fully recentered p-values forvalues i=1(1)4 { matrix FCp_vals`i' = J(`sec',1,.) svmat FCBootstrap`i'1 forvalues j=1(1)`sec' { gen aux`i' =. replace aux`i' =1 if FCBootstrap`i'11 > FCbsample`i'[`j',1] & FCBootstrap`i'11!=. & FCbsample`i'[`j',1]!=. replace aux`i' =0 if FCBootstrap`i'11 <=FCbsample`i'[`j',1] & FCBootstrap`i'11!=. & FCbsample`i'[`j',1]!=. sum aux`i' matrix FCp_vals`i'[`j',1] = r(mean) drop aux`i' } } matrix FCp_vals = FCp_vals1 , FCp_vals2 , FCp_vals3 , FCp_vals4 * Empirical distributions of partially recentered p-values forvalues i=1(1)4 { matrix PCp_vals`i' = J(`sec',1,.) svmat PCBootstrap`i'1 forvalues j=1(1)`sec' { gen aux`i' =. replace aux`i' =1 if FCBootstrap`i'11 > PCbsample`i'[`j',1] & PCBootstrap`i'11!=. & PCbsample`i'[`j',1]!=. replace aux`i' =0 if FCBootstrap`i'11 <=PCbsample`i'[`j',1] & PCBootstrap`i'11!=. & PCbsample`i'[`j',1]!=. sum aux`i' matrix PCp_vals`i'[`j',1] = r(mean) drop aux`i' } } matrix PCp_vals = PCp_vals1 , PCp_vals2 , PCp_vals3 , PCp_vals4 * Compute adjusted p-values - full recentering - Step 8 /* Compute the p-values of the minP.f and minP.p tests by the share of bootstrapped minimum p-values that are smaller than the respective minimum p-value of the original sample 1) edfF_n(p_val) is the same as mean(min_p_vec<=p_val) (see eq 15 in the paper) 2) apply(FCp_vals,1,min) is equal to a vector taking the minimum of FCp_vals in each row, i.e., min(FCp_vals[1,]),min(FCp_vals[2,]),min(FCp_vals[3,]),.... (note that FCp_vals consists of 4 columns which correspond to each constraint and you take the minimum of those constraints in each row) */ * Create a vector containing the minimum of FCp_vals and PCp_vals in each row. rowmin(X): returns the minimum of each row of X mata: st_matrix("min_p_vec_f", rowmin(st_matrix("FCp_vals"))) mata: st_matrix("min_p_vec_p", rowmin(st_matrix("PCp_vals"))) * Transform matrixs into variables *tempvar min_p_vec_f min_p_vec_p svmat min_p_vec_f svmat min_p_vec_p gen aux = . replace aux = 1 if min_p_vec_f1 <= `p_val' replace aux = 0 if min_p_vec_f1 > `p_val' & min_p_vec_f1!=. qui sum aux local FCpval = r(mean) drop aux gen aux = . replace aux = 1 if min_p_vec_p1 <= `p_val' replace aux = 0 if min_p_vec_p1 > `p_val' & min_p_vec_p1!=. qui sum aux local PCpval = r(mean) drop aux restore capture gen f = 1 forvalues i=1(1)4 { drop PCBootstrap`i' drop FCBootstrap`i' } * REPORT OF THE RESULTS ++++++++++++++++++++++++++++++++++++++++++++++++ tempvar indice_nombre indice_valor gen `indice_valor'=. label variable `indice_valor' "Output" replace `indice_valor' = `FCpval' if _n==1 replace `indice_valor' = `PCpval' if _n==2 replace `indice_valor' = `normdif_treated' if _n==3 replace `indice_valor' = `normdif_nontreated' if _n==4 gen `indice_nombre'=. label variable `indice_nombre' "Results of the tests on IV validity" replace `indice_nombre'=_n if `indice_valor'!=. label define `indice_nombre' /* */ 1 "p-value under full recentering" /* */ 2 "p-value under partial recentering" /* */ 3 "Normalized difference for treated" /* */ 4 "Normalized difference for nontreated" label values `indice_nombre' `indice_nombre' tabdisp `indice_nombre', cellvar(`indice_valor') concise format(%6.4f) di _newline return scalar partialr_pval = `PCpval' return scalar fullr_pval = `FCpval' return scalar normdif_treated = `normdif_treated' return scalar normdif_nontreated = `normdif_nontreated' end