![]() |
![]() |
![]() ![]() ![]() ![]() ![]() |
||
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
![]() |
Undoubtedly the most popular approach to dealing with selection bias is the Heckman (1979) two-stage approach. The first stage involves modeling selection into the program. Usually this takes the form of a single equation explaining program participation/non-participation10: P = ßX + U where P is a dummy variable (1 for participants and 0 for non-participants), X is a set of all observed factors that may account for participation in the program (e.g., age, sex), and U is a random error term which is assumed to be normally distributed to take account of unobserved factors that influence participation in the program. From this equation, the inverse of Mill's ratio is computed, which is then inserted into a second stage outcome equation to estimate program impact (usually via ordinary least squares): Y = ßX + aP + dM + U where Y is the outcome of interest, X is a vector of observed variables, P is the participation dummy, and M is the inverse of Mill's Ratio. If the assumptions underlying the model are correct, the Heckman procedure removes the selection bias (d), thereby producing an unbiased estimate of program impact (a). (The measure of program impact is the estimated coefficient on the indicator variable for participation/non-participation in the program.) If this equation were estimated by ordinary least squares without the inclusion of the selection bias correction term, the estimates would potentially be biased. However, if the model is properly specified, the addition of the "selection bias correction" variable removes this potential bias, thus giving unbiased estimates of program impact. Another powerful means of controlling for differences between groups is called the differences-in-differences method. Longitudinal data are collected for key outcome measures In equation form (Moffitt, 1991): where t = the posttreatment point, t-1 = pretreatment point, and Y*it - Y*i,t-1 = change in Y*it from t-1 to t if treatment not received Y**it - Y*i,t-1 = change in Y*it from t-1 to t if treatment received Instrumental variable (IV) methods are widely used in situations in which ordinary least squares (OLS) estimates may be biased due to a correlation between one or more of the explanatory variables and the random error term in the model. In the evaluation/selection bias context, such potential bias arises because of the possible correlation between the participation/non-participation variable and the random error term in the outcome equation. This potential bias can be removed if one or more "instrumental variables" are available and included in the model. The basic model of program outcome or impact is: Y = ßX + aP + U (1) This can be written as: Y = CW + U (2) where C = (ß a)' and W = (X P) using matrix notation. The least squares estimator of (2) is given by: c = (W'W)-1 W'Y where (W'W)-1 is the inverse of the matrix (W'W). In general this estimator is biased because of the correlation between W and U (i.e. E{W'U} does not equal zero, where E{ } represents the expectations operator). The IV estimator of (2) is given by: c* = (Z'W)-1 Z'Y where Z is the matrix of instrumental variables. This estimator is in general unbiased because Z and U are uncorrelated, i.e. E{Z'U} equals zero if Z is an appropriate instrument.
|