Government of Canada | Gouvernement du Canada Government of Canada
    FrançaisContact UsHelpSearchHRDC Site
  EDD'S Home PageWhat's NewHRDC FormsHRDC RegionsQuick Links

·
·
·
·
 
·
·
·
·
·
·
·
 

6.0 Procedures for Addressing Selection Bias


Before describing the various procedures that have been developed and extensively used to deal with potential selection bias, two preliminary observations are noted. First, as the above examples make clear, selection into the program may be based on observable or unobservable factors. For example, in the case of comparing coop and non-coop programs, the qualifications of the coop and non-coop students may be observed (e.g. their high school grades) but the degree of career-orientation of the students may not. What factors are observed and what factors are unobserved will depend on the richness of the available data.

Controlling for selection into the program which took place on the basis of observable factors is straightforward. Thus the richer the available data (and thus the fewer the number of unobservable factors), the smaller will be the magnitude of selection bias due to unobserved factors.

The second observation is that although there are some unobserved factors which influence selection into the program (which will almost always be the case), this does not necessarily imply that simple comparisons of participants and non-participants will be subject to selection bias. Selection bias arises when the unobserved factors which influence participation/non-participation in the program also influence the impacts of the program.

To make this point clear, consider an extreme example. Suppose we wish to compare public and private schools in terms of their educational outcomes such as student performance on standardized tests. Suppose that, on average, private schools enroll more students with odd numbered birthdays than public schools. To the researcher, the attribute "having an odd/ even birthdate" is unobserved. This is a case of non-random selection; if students were randomly selected into public and private schools then the proportion of students with odd numbered birthdays would be approximately equal in the two school types. However, as long as having an odd numbered birthday does not influence the outcomes of interest (student performance on standardized tests), this non-random selection will not bias a simple comparison of student achievement in public and private schools.

For these reasons, methods for dealing with selection bias focus on the potential problems associated with factors which influence selection into the program which are (i) unobserved by the researcher/evaluator, and (ii) correlated with the program outcomes of interest. Unfortunately, in many evaluation studies there are a number of such factors meeting both these conditions. For this reason, it is strongly advisable in virtually any evaluation study to address potential selection bias.

By their very nature, the factors that might give rise to selection bias are unobserved. In some cases, the methods described below will lead the evaluator to conclude that such potential sources of selection bias are not quantitatively important, and therefore do not lead to bias. This could be because unobserved factors leading to non-random selection into the program are not quantitatively significant in this particular program, or it could be because such factors are quantitatively significant but are not correlated with the program outcomes of interest. In other cases, the methods described below will lead the evaluator to conclude that selection bias is quantitatively important. In these cases, the methods also provide an estimate of the magnitude of the bias so that an estimate of the true impact(s) of the program on outcomes can be derived.

6.1 Two Step Adjustment Procedures

Two step (or two stage) procedures for addressing selection bias were developed by James Heckman and others in the late 1970s, and have become the most commonly used methods. In the first stage, the probability of participation in the program is analyzed. This analysis usually consists of a single equation model in which the dependent variable is the probability of participating in the program (an indicator variable which equals unity for program participants and zero for non-participants) and the independent variables are various factors that are believed to influence program participation/non-participation. The main purpose of the first stage is to obtain a correction factor (called the "inverse Mills ratio") which is used in the second stage to take account of possible selection bias. As well, the estimates obtained in this first stage may be of interest in themselves in that they provide insight into the importance of the various factors that influence participation/non-participation in the program.

The second stage involves estimating program impact using a specified model (equation). The model includes:

  • a "dependent variable," which is the outcome the training program is supposed to affect, say earnings;

  • several "independent" or explanatory variables, which are observed factors presumed to influence the outcome (e.g., age, sex, education);

  • the "selection bias correction" variable (or inverse Mills ratio) obtained in the first stage;

  • an indicator variable for participation/non-participation in the program; and

  • a random error term to account for unobserved forces that may affect the outcome measure.

The model in words:

Earnings = the effect of various observed factors + the effect of selection bias + the effect of the program + random error

(see Appendix C for the mathematical equation and further explanation)

Thus, the model isolates the impact of the program from other potential influences. If the model is properly specified, the addition of the "selection bias correction" variable removes this potential bias, thus giving unbiased estimates of program impact. We return below to the important issue of how to determine whether the model is properly specified.

A useful way to interpret this two step procedure is as follows. It is well known that omitting an important variable from a model will result in biased estimates of the coefficients on the variables included in the model. In the absence of a method for accounting for selection into the program, the estimation of the outcome equation omits an important factor -- the determinants of program participation. The "selection bias correction" term obtained in the first stage provides an estimate of this factor, which is why including this term results in unbiased estimates (providing the model is properly specified).

One final observation relating to these two step procedures is in order. It is important to have one or more variables that influence selection into the program (i.e., that enter the first stage equation) but which do not influence the outcome(s) of the program (i.e., do not enter the second stage equation). Such variable(s) allow the separate identification of participation in the program and the outcomes or impacts of the program. Apart from the importance of having one or more such "identifying variables," the first stage participation equation and the second stage outcome equation may have many variables in common. The importance of these "identifying variables" also arises in the context of the method discussed next.

6.2 Instrumental Variable Methods

Selection bias arises because of the correlation between the indicator variable for participation/ non-participation in the program and the random error term in the outcome equation. The "instrumental variable" (IV) method to solving the selection bias problem, discussed by Heckman and Robb (1985) and Moffitt (1991) among others, centres on finding a variable (or variables) that influences selection into the program but does not influence the outcome of the program (and is thus not correlated with the random error term in the outcome equation). Because the instrumental variable is not correlated with the random error term, it can be used in the estimation without introducing bias. The formula for the IV estimator is given in Appendix C.

The search for IVs entails an in-depth investigation of the selection process. Personal characteristics of individuals would seldom suffice as instrumental variables because they are usually related to the outcome. For instance, level of education likely affects one's employability. Moffitt suggests that variation in the availability of treatment may yield a suitable variable. If a training program is available in one region but not another for reasons unrelated to the program's intended outcome, region is a legitimate instrumental variable. This may be the case if the program is not available for political, bureaucratic or economic reasons.

In order to be a legitimate "instrument," the variable must be related to program participation/ non-participation but unrelated to the outcome(s) of the program. In some situations there may be numerous potential instrumental variables. In these circumstances, how should the analyst choose among these various potential IVs? The answer to this question is as follows. Each IV that is indeed unrelated to the outcome of the program (i.e. is uncorrelated with the random error term in the outcome equation) will yield unbiased estimates of the impact of the program. However, some IVs will yield more precise estimates of the impact of the program. Specifically, the more highly correlated is the IV with program participation/non-participation, the more precise will be the estimates of program impact. Thus the challenge in IV estimation is to find an instrumental variable that is highly correlated with program participation but uncorrelated with the outcome of the program. Unfortunately, it is often difficult to find variables that meet both these requirements, and therefore difficult to find good IVs among the many potential IVs.

As a way of further understanding the principles of IV estimation, consider the case of a program in which participation/non-participation is determined by random assignment. Suppose that each applicant is assigned as a participant P (non-participant NP) if the toss of a fair coin yields Heads (Tails). Then the indicator variable H (which equals unity if Heads and zero if Tails) is an ideal instrumental variable because H is perfectly correlated with the indicator variable for participation and because H is uncorrelated with the outcome of the program. Of course, in practice it is rare to have available such an ideal IV, but this example illustrates the characteristics one searches for when using this method.

This method thus has some features in common with the "identifying variable(s)" used in the two step method discussed above. In effect, both procedures require similar information. The main differences are: (i) the instrumental variable method is carried out in a single stage and does not therefore involve explicitly modelling the process of participation into the program; and (ii) the IV method produces estimates free of selection bias (if the model is properly specified) but does not provide an estimate of the magnitude of the selection bias, as is provided in the two step method6.

Many examples of "instrumental variables" could be given. Moffitt (1991) gives the example of a government-funded health counseling program which, for reasons that are unrelated to the health needs of the populations in the two areas, funds the program in one area of a city but not in the other area. As a consequence, an indicator variable for the two areas of the city is unrelated to the health needs of the population in the two areas, but will influence the participation in the program. Another example, related to the assessment of public versus private schools discussed above, would be a measure of proximity to a private school for each of the students in the sample. Such proximity would be expected to influence the likelihood of attending a private school, but not the outcome of private schooling on student achievement. Appendix C has more on the IV method including equations.

6.3 Longitudinal Methods

The two step and instrumental variables methods discussed above can be implemented with post-program data alone (that is, cross-sectional data on participants and non-participants). However, if pre-program data on participants and non-participants are available, this should also be used and its incorporation in these methods will generally lead to more precise and more credible estimates of program impact. Longitudinal data follow the same individuals over two or more periods of time, and the longitudinal methods discussed here require at least one pre-program observation and at least one post-program observation on both program participants and non-participants.

The most common longitudinal estimator of program impact is the "fixed effects" or "difference-in- differences" estimator (sometimes also simply called the "differences" estimator). In the simplest case, in which there is one pre-program observation and one post-program observation, this approach proceeds as follows. First, take the difference between the post-program value of the outcome measure and the pre-program of the outcome measure for each participant and non-participant. This difference is thus a measure of how much change was observed in the outcome of interest between the period prior to the program and the period following the program. If the outcome of interest is earnings, as would be the case in assessing training programs, this would be the earnings gain or loss for each participant and non-participant. Next, take the difference between the average pre- versus post-program change for participants and the average pre- versus post-program change for non-participants (see Appendix C for equation). In the case of a training program, this is simply the difference between the average earnings gain or loss of participants and the average gain or loss of non-participants.

This simple estimator of program impact will be free of selection bias if selection into the program depends on unobserved person-specific "fixed effects," that is, factors that are specific (or unique) to each individual in the sample, but are constant ("fixed") over the time period of the analysis. For example, in the case of comparing coop and non-coop educational programs, such an unobserved factor could be how career-oriented the individual student is. If, as discussed previously, selection into coop and non-coop programs depends on how career-oriented the student is, and if this factor is not observed by the researcher (as will often be the case), the bias that would arise because of this unobserved factor will be removed by the use of the "difference-in-differences" estimator providing the extent of career-orientation of the student is constant over the sample period. Similarly, in the case of training programs, the "fixed effects" assumption is appropriate when unobserved person-specific factors such as ambition, labour force attachment and suitability for training are constant over time (but may vary across individuals). In many cases, such a "fixed effect" assumption will seem reasonable for such unobserved person-specific factors, although the validity of the assumption should be tested as discussed further below.

In summary, the difference-in-differences estimator will yield unbiased estimates of program impact -- even in the presence of potential selection bias -- when the source of potential bias is a correlation between participation/non-participation in the program and an unobserved factor which may differ across individuals but which is constant over time for each individual. If selection into the program takes this form, the simple difference-in-differences estimator is a straightforward way of dealing with selection bias.

More complicated longitudinal estimators are available for situations in which the assumption of constant or "fixed" person-specific effects is not appropriate. These generally require more than one pre-program and one post-program observation on each individual. For example, Moffitt (1991) discusses a "difference-in-differences in growth rates" estimator which is appropriate when the period-to-period change in the person-specific effect is constant over time. This estimator requires at least two pre-program and two post-program observations. Other types of longitudinal estimators are discussed in the context of training programs by Ashenfelter and Card (1985).

6.4 Specification Tests of Alternative Models

Three general classes of methods of dealing with selection bias have been outlined. Within each of these general classes, there are a number of variants on the basic procedure. The question which naturally arises is: Which of these methods should be employed, and under what circumstances?

In part, the answer to this important question is that the appropriate method should be determined by the researcher/evaluator, according to the circumstances of the program being assessed. A key aspect of being a good evaluator is being able to specify the model, including the methods for dealing with potential selection bias, appropriately. This aspect requires judgment, experience, and the ability to obtain information about the nature of the process by which participants are selected into the program.

Although factors such as judgment and experience are important, in most circumstances they are unlikely to be sufficient to enable the evaluator to know with reasonable certainty which method is most appropriate for dealing with selection bias. For this reason, it is important to employ a variety of specification tests which are available for determining which models or specifications are consistent with the data and which are not consistent. Examples of such specification tests are described in contributions by Heckman and Hotz (1989) and Moffitt (1991). Heckman and Hotz (1989) find, in the case of training programs, that use of a number of specification tests enables them to substantially narrow down the number of possible alternative forms which selection into the program may take, thus reducing substantially the range of possible estimates of program impact.

To date, such specification tests to assess the validity of alternative models have not been used as extensively as should have been the case. However, in part because of recent contributions to the evaluation literature, this use is increasingly becoming an important aspect of well-executed evaluations. An important side-effect of this trend is that evaluators are increasingly required to devote thought and attention to the process by which selection into the program takes place (and therefore to the appropriate way to take account of potential selection bias), rather than to rely on a mechanical technique such as the Heckman two-step procedure.

6.5 Determinants of Program Participation

A final observation is that the more information that can be obtained about the process by which participants end up in the program and non-participants do not end up in the program, the more credible will be the estimates of program impact. As discussed previously, program participation depends on both observed and unobserved factors, and accounting for the influence of observed factors is much more straightforward and less uncertain than accounting for the role of unobserved factors. Thus acquiring richer data on the determinants of program participation is one of the most effective methods of dealing with potential selection bias.

In any evaluation there will always be some factors that are unobserved but which potentially could result in selection bias. In order to best take account of such possibilities, the greater the availability of rich qualitative information on the program and on participant and non-participant characteristics, the more able the evaluator is to choose the most appropriate method for addressing selection bias.


Footnotes

6 Also, the IV estimation procedure does not require the assumption of normally distributed random error terms in the first stage probit equation used in the two step procedure. [To Top]


[Previous Page][Table of Contents][Next Page]