Discrete Factor Approximations in Simultaneous Equation Models:
Estimating the Impact of a Dummy Endogenous Variable on a
Continuous Outcome
University of North Carolina, Chapel Hill
Keywords: Simultaneous Equation Models, Binary Response Models, Latent Variables, Finite Mixture Distributions, Monte Carlo Studies.
Comments are encouraged. This paper is an extension and elaboration of earlier work I did with David Guilkey. David provided much advice on this version. I would like to thank David Blau, Ron Gallant, James Heckman, Hidehiko Ichimura, Peter Schmidt, and seminar participants at the University of Minnesota and the Research Triangle Econometics Seminar and the Duke/UNC Labor Economics Seminar for useful advice. The Office of Information Technology at UNC, especially Mike Padrick, Larry Mason, and Jim Gogin, and the computer staff at the Carolina Population Center helped to make the extensive computations in this paper feasible. Timothy Savage provided superb research assistance. Partial funding for this project came from NIH grant R01 HD29551-01-03 and a UNC Junior Faculty Development Award.
(Click here for tables 1 through 9 or here for the appendix tables)
I. Introduction
Empirical researchers examining simultaneous equation models with limited dependent variables face difficult tradeoffs among the precision of estimates, the sensitivity of results to distributional assumptions, and computational feasibility. Researchers often use maximum likelihood estimators based upon joint normality assumptions in these models, for these methods tend to yield relatively small estimated standard errors. Computational limitations and numerical inaccuracies, however, frequently force researchers to examine only low dimensional systems with the maximum likelihood approaches. Many researchers use two-stage estimation procedures. These approaches often relax some of the arbitrary normal distribution assumptions, and they tend to impose fewer computational burdens than maximum likelihood procedures. However, often it is not possible to adapt the two-stage estimators to more complex empirical models. In addition, these less demanding methods typically produce quite inaccurate results. See Heckman(1978), Maddala(1983), and Amemiya(1985) for detailed discussions of these estimation procedures.
Little work appears in the literature on empirical approaches that retain some of the precision of the maximum likelihood approaches, relax stringent distributional assumptions, and are computationally feasible in large systems. This paper addresses these issues directly. It describes a set of simple and numerically stable estimators that can serve as replacements for maximum likelihood estimators imposing joint normality. These estimators model the joint endogeneity of outcomes as arising from common unobserved factors. The multivariate normal distribution, for example, falls within this class.
Like Heckman and Singer's(1984) approach for modeling unobserved explanatory factors in hazard rate analyses, this approach assumes that these unobserved variables can be approximated by a discrete distribution. Extending the discrete factor framework to high dimensions is both straightforward and computationally feasible. This paper uses a set of Monte Carlo experiments to evaluate the performance of the discrete factor estimator and demonstrates the approach by examining the impact of marriage on men's wages.
The paper evaluates the performance of the discrete factor approximations in an econometric model where a continuous outcome depends upon an endogenous dummy variable. Researchers have used such models to examine, for example, the impacts of training programs on wages, the effects of living in a single parent household on children's school performance, union wage effects and, as in the example presented below, the impact of marriage on wages. Heckman(1978) contains a thorough discussion of limited dependent models, including the one examined here, under the assumption of joint normality. For excellent reviews of applications of limited dependent variables models see Heckman and MaCurdy (1981,1986), Killingsworth (1983), Maddala(1983), and Killingsworth and Heckman(1986).
Several researchers have examined the consequences of imposing arbitrary normality assumptions in the context of Tobit models. See, for example, Hurd(1979), Arabmazar and Schmidt (1981,1982), Goldberger(1983), and Paarsch(1984). Both Lee(1982,1983) and Heckman and MaCurdy(1986) have proposed some simple expansions that relax the normality assumption in the context of the sample selection model, but we know of no work that provides a simple and general framework for dealing with general specifications of endogenous variables in mixed continuous-discrete distributions without imposing strong distribution assumptions.
More recently, econometricians have developed several semiparametric estimators for discrete choice and sample selection models. For examples, see Cosslett(1983), Powell(1987), Robinson(1988), and Ahn and Powell(1993). This paper does not evaluate these approaches for a variety of reasons. Most of these semiparametric procedures cannot be extended in a straightforward manner to complicated systems with both discrete and continuous endogenous explanatory variables, and few applied researchers use these techniques. More importantly, the little evidence available on the empirical performance of these estimators suggests that they do not perform better in practice than the two-stage estimators we use as benchmarks for the performance of the approximation estimators. See, for example, Newey, Powell, and Walker's (1990) semiparametric evaluation of Mroz's (1987) parametric estimates of female labor supply functions.
Heckman(1981) describes the use of factor models in discrete panel data, and Heckman and Willis(1978) use a parametric assumption on an unobserved factor in a study of women's sequential labor force participation decisions. Versions of the discrete factor models similar to those used here have been applied by Keane, Moffitt, and Runkle(1988) to control for sample selection biases when estimating wage equations; by Gritz(1993) in a study of the impact of job training programs on wages; by Mroz and Weir(1990) in an analysis of the impact of the number of surviving children on a couple's propensity to regulate fertility; by Kochar(1991) in a study of the access to formal and informal credit markets in India; by Blau(1994) in a study of retirement behavior; and by Narendranathan and Elias(1993) in a study of youth unemployment.
The results of the Monte Carlo analysis presented here suggest that discrete factor approximation models may be useful in a wide variety of situations. When the true distribution of the disturbances is joint normal, the discrete factor estimators compare favorably to the normal maximum likelihood estimators in terms of bias and Mean Square Error (MSE). This suggests that there may be little bias or efficiency loss by incorrectly assuming a discrete factor model when normality is true. When the true distribution of the unobservables is not normal, the discrete factor approximations perform better than maximum likelihood estimators (incorrectly) assuming joint normality in most of the cases we examined. The two-stage estimator always work well in terms of bias, but the empirical distributions estimated in this study suggest that the two stage estimator may often be too inefficient to be useful without large sample sizes.
(Click here for tables 1 through 9 or here for the appendix tables)
II. Experimental Design
The process generating the outcomes in this paper is

We use a variety of procedures to estimate the impact of the
dummy variable. The first is OLS applied directly to equation
(1b), which ignores the possible endogeneity of the dummy
variable. The second method applies a probit procedure to
equation (1a). One then substitutes the predicted probability
that {d=1} into (1b), and estimates the transformed equation (1b)
by OLS. This method yields consistent estimates when
1 is
normally distributed. The third method uses a maximum likelihood
estimator based upon the assumption of joint normality of the
disturbances
1 and
2. When the disturbances are normally
distributed, the maximum likelihood estimator will be
asymptotically efficient. But when the disturbances are non-normal, these maximum likelihood methods typically will not yield
consistent estimates. The final group of procedures uses
discrete factor approximations to control for the endogeneity of
the dummy explanatory variable. In most experiments we focus on
procedures with two, three, and four points of support for the
distribution of a discrete unobserved factor.
(Click here for tables 1 through 9 or here for the appendix tables)
III. Overview of the Discrete Factor Method
These discrete factor models are, in spirit, identical to the semiparametric methods proposed by Heckman and Singer(1984) to control for unobserved heterogeneity in hazard rate models. Like Heckman and Singer, we derive the likelihood function for an observation's observed outcomes conditional upon the value of the unobserved factors (heterogeneity) and then integrate out over the distribution of the unobserved factors. By choosing a discrete distribution for these factors, the resulting unconditional distribution function falls in the class of mixture distributions. As the number of points of support for the discrete distribution grows large, this approach can approximate a "kernel" distribution for multivariate random variables.
III.1 The Basic Formulation
Suppose that one is interested in estimating a two equation model with homoscedastic error terms generated by the process

where u1, u2, and v are assumed to have mean 0, are mutually
independent, and are independent of the exogenous variables in
the model. One convenient interpretation of this formulation
considers v to be an unobserved variable that has a linear effect
on the outcomes influenced by these two disturbances. This
formulation places no substantive restrictions on the correlation
of
1 and
2. The strategy followed in this study assumes that u1
and u2 are, in addition, normally distributed. The discussion
below suggests ways to relax this assumption, but we do not
examine these more complicated estimators in the Monte Carlo
evaluation. Cameron and Taber (1994) provide identification
conditions and consistency proofs for the discrete factor
estimator when one knows the distribution of u1 and u2.
Conditional upon the value taken by the factor v, the joint
distribution of
1 and
2 is given by

where
1 and
2 are the standard deviations of u1 and u2 and
is
the standard normal density function. If the cumulative
distribution function of v is F(v), then the unconditional
distribution of
1 and
2 is
![]()
Suppose one assumes that v follows a standard normal
distribution. In this instance, the joint distribution of
1 and
2 simplifies to a bivariate normal distribution with zero means,
variances (
12 +
12) and (
22 +
22) , and covariance
1
2. This
formulation, then, contains the standard bivariate normal
distribution as a special case.
Like Heckman and Singer's(1984) proposal for dealing with unobserved heterogeneity in duration models, this paper assumes that the cumulative distribution of v can be approximated by a step function. In particular, suppose that the distribution of v is given by


Examination of equation (5) reveals that the joint distribution is a weighted sum of products of univariate normal distributions. Everitt and Hand(1981) provides an excellent overview of finite mixture distributions of this type.
The Monte Carlo results discussed in Sections VII and VIII evaluate the use of the approximation in equation (5) under a variety of different distributions for the unobserved factor v, sample sizes, error variances, error correlations, and expected frequncies of the event {di=1}. The discrete factor, quasi-likelihood function for the model considered in this study is

III.2 Some Potential Extensions.
(A) Multivariate Models
Extending this discrete factor framework to higher
dimensions is straightforward. Suppose that the disturbance
vector
has G elements. Similar to the bivariate case, let
each element of
be approximated by
where there are J common factors, the ug's are normally
distributed and the elements of (u1,...,ug,v1,...,vJ) are mutually
independent. In this case the joint distribution of
1,...,
G is
given by

where

This formulation permits there to be a different number of points of support for each of the J factors, and it can readily be modified to allow for dependence among the Vj's. In all of these modifications, the joint distribution function is a weighted sum of products of univariate normal distribution functions.
The computational utility of this approach can be
demonstrated with a multinomial probit example. Suppose that
there are a large number of binary outcomes, and that each
outcome is generated by
g crossing a threshold. In addition,
suppose that the threshold for outcome g depends upon whether
some of the other outcomes take place. In general, a recursive
structure may be necessary to assure logical consistency in these
types of formulations. Under the assumption of joint normality
of the
g's, evaluation of the likelihood function of this model
would require high dimension integrals of the joint normal
distribution function. Both the Gibbs sampling approach (see,
for example, Geweke(1991)) and McFadden's(1989) simulated method
of moments estimators reduce the computational burdens of
evaluating the multivariate integrals, but these approaches
typically require the researcher to assume an arbitrary form for
the multivariate distribution.
If, instead of the joint normality assumption, one were to use a discrete factor approximation as in equation (6), then the evaluation of the likelihood function would require only weighted sums of products of univariate normal integrals. Such integrals can be approximated to a high degree of accuracy, and this type of formulation can exploit features of both parallel and vector processors. This is one class of models where discrete factor assumptions may make it possible for researchers to consider more complex interactions than have previously been feasible.
(B) Non-Normal Disturbances with Independence
One drawback of the formulation given in equation (5) is that it permits non-normality in both equations only when there is a non-zero correlation of the disturbances; the joint non-normality can only arise when the common factor enters both equations. A possible modification is to allow there to be three factors, one of which can be present in both equations. For example, suppose that

This formulation contains that in equation (5) as a special case,
and it also permits non-normality even when the disturbances
1
and
2 are independent. This formulation could result in
computational problems when
1 and
2 are independent, for all of
the terms relating to the common factor, VC, would then be
unidentified.
(C) Approximating Arbitrary, Homoscedastic Multivariate Distributions
A general formulation of the discrete factor representation of a bivariate distribution function is given by


This comparison of the discrete factor approximation to
multivariate kernel estimators illustrates the basic difference
between the approach suggested in this paper and standard
nonparametric density estimation. In the standard kernel
estimation approach, the number of points of support is set equal
to the sample size, the probability of each point of support
equals the inverse of the sample size, the locations of the
points of support are set equal to the outcomes
1 and
2, and the
bandwidths
1 and
2 are "fixed." Consistency of the estimator
of the density in this instance can be achieved by allowing the
bandwidths to approach zero slowly. When used in real
applications, the asymptotic results provide little guidance for
setting the bandwidths; researchers usually experiment with
various values until the estimates appear "well-behaved." See,
for example, Silverman's (1984) discussion of the choice of
bandwidths for kernel estimators and Park and Marron's (1990)
critique of bandwidth selection procedures.
For the estimation problems considered in this paper, one
often can observe only particular ranges for the random variable.
There is no exact solution for the disturbance vector even when
the true parameter values are known, so a standard kernel
estimation approach is not feasible here. Instead, this paper
suggests that the researcher experiment with the number of points
of support. The observed data determine the amount of smoothing
(the
j's), the locations of the points of support (the vj's),
and the weight attached to each point of support (the pk's),
conditional upon each chosen number of points of support.
These factor models can also be adapted to allow for random coefficients in the economic model or other sources of heteroscedasticity. In some instances, the random coefficients may be correlated with the disturbances, as would be implied in many dynamic models with random coefficients. Mroz and Weir (1994) provide an example of how a discrete factor approach can model self-selected, random coefficients in longitudinal data models. Like the multinomial probit discussed above, there have been no Monte Carlo evaluations of any of these extensions to the factor model estimators.
(Click here for tables 1 through 9 or here for the appendix tables)
IV. Identification
The first identification issue concerns the location and
scale of the distribution function of V. When each equation
contains an intercept, then one must constrain arbitrarily the
location of the discrete distribution function. In practice it is
often easiest to set one of the points of support to zero. The
scale of the discrete factor is also underdetermined. One can
arbitrarily set one of the factor loadings to a non-zero constant
(e.g.,
1=1), or one can restrict the range of the points of
support for the discrete distribution function (e.g.,
1=0,
2=1,
and
k
(0,1), for k>2 ).
Besides the need to eliminate these trivially
underdetermined parameters, there are several substantive
identification issues. Suppose that the true distribution of the
unobserved common factors in equation (3) is standard normal. In
this instance the parameters of the factor model are
underidentified, for there are two parameters defining the
(single value) correlation of
1 and
2. The underidentification
is analogous to the identification problem due to rotations in
standard factor analysis models. In part it is also due to the
fact that convolutions of normal random variables remain in the
class of normal distributions. See Anderson and Rubin (1956) for
a discussion of identification in factor models when only the
first two moments of the distribution are of interest.
When the disturbances are non-normal, there may be fewer
identification problems than in the normal case. This is due to
the fact that convolutions of the unobserved factor and the
assumed normally distributed white noise terms (u1 and u2) fall
outside the class of normal distributions. In these non-normal
models the higher order moments are not necessarily determined
completely by the first two moments of the disturbances
1 and
2, as is the case with normal disturbances and normal factors.
Even when all of the error components are normally distributed,
by choosing a finite number of points of support for the
distribution of the unobserved factor one might achieve
identification of all "parameters" in the factor model. If the
number of points of support for the unobserved factor
distribution grows large as the sample size increases, however,
the identification problems may reappear. It is important to
note that this form of underidentification typically has little
substantive importance, for the impacts of the covariates on the
outcomes will usually be identified. The underidentification
will, in general, impact only the estimators of the components of
the distribution of (
1,
2).
As in White's (1980) discussion of consistency in misspecified models, it is not clear how one should interpret the parameter estimates obatined from discrete factor approximations. In general the estimator will converge to a particular value in large samples. The relationship between these limiting values and the parameters of interest, however, is a complicated function of all of the parameters of interest, the true underlying joint distribution of the disturbances and the assumed exogenous variables, and the imposed distributional assumptions. The results of this Monte Carlo study, in conjunction with the Monte Carlo results reported in Mroz and Guilkey (1992), suggest that one can have some confidence in placing conventional interpretations on the parameter estimates obtained from these discrete factor approximations. These approximation estimators do appear to work well in a variety of situations. The Monte Carlo results suggest that they can help researchers avoid false inferences due to the imposition of incorrect joint distribution assumptions while providing relatively precise point estimates.
V. Practical Problems in the Estimation of the Factor Models
There appear to be three difficulties in estimating parameters based upon the quasi-maximum likelihood factor models. The first problem is the existence of multiple local optima. Our strategy is to choose a fairly extensive grid for starting values. In practice, we proceed in two stages. We first select a grid of 15 to 75 separate starting values for each maximization problem and find the best set of estimates for each replication (estimation) in each specification of the data generating process in the Monte Carlo study. Next, we take the entire set of estimates for a particular specification of the data generating process and use each of the "final" estimates in the set as starting points in the grid as additional starting values. The results reported here are based upon 100 replications for each experiment, so well over 100 different starting values are used in each optimization problem. Our experiences suggest this is usually more than adequate for eliminating non-global optima.
The second problem arises in the discrete branch of the likelihood function. In some instances the best set of estimates implies that the Prob(d=1) has a point mass. In the two point factor model, for example, we sometimes find that
The third problem arises in higher point of support factor models. In a few instances the estimates imply, for example, that the three points of support are identical to two points of support. This also gives rise to a singular Hessian matrix. Again, one must use this as a pretest to indicate that a simpler factor model fits the data in order to obtain "standard error" estimators. Our evaluation of the standard error estimators incorporates both of these types of pretests.
VI. Data Generating Process
The majority of our experiments focus on what we consider
to be fairly typical scenarios for most micro econometric
studies. First, we use four sample sizes (1,000, 2,000, 3,000,
and 5,000) that roughly capture the range of sample sizes used in
many micro studies. Second, we set the pseudo-R2, defined by
Var(y1*-
)/Var(y1*) or Var(y2-
2)/Var(y2), to be approximately 0.20
in both equations. Third, we usually set the error correlations
to 0.33. Besides varying the sample sizes, we also undertake
some limited experiments with higher R2 values(0.33 and 0.50) and
error correlations (0.50). Appendix 1 describes the data
generating process in detail.
Most real economic applications of this class of econometric models contain numerous regressors, but it is not be feasible to undertake detailed comparisons of the estimators within a high dimensional parameter space. We chose not to use real data to define the exogenous variables in this model, as we felt that the low dimensional parameter spaces we are forced to examine could not be suitably manipulated to approximate accurately "real" situations. We do, however, choose to use a distribution for the exogenous variables that roughly matches the distribution of education in the U.S. population. There is a rapid rise in the distribution function of the exogenous variables to a sharp peak(e.g., 12 years of school), followed by a fast drop(13-15 years), then a moderate peak(16 years), and rapid decline. To achieve this the exogenous variables are drawn from a skewed distribution (a convolution of a chi-square random variable and a compound normal random variable). The marginal distributions of the two exogenous variables, x and z, are identical, and the correlation between the exogenous random variables arises from correlations of the compound normal components. Figure 1 compares the empirical density of the standardized exogenous random variables and a normal distribution.
In most experiments we impose the condition that the exogenous variables have a 0.80 correlation coefficient. This high level of correlation seems appropriate, given that economic theories usually imply that nearly all exogenous variables influence both outcomes. We also undertake a more limited set of experiments where the exogenous variables in the two equations are identical. This corresponds to the case in which the researcher is unwilling to specify exclusion restrictions but is willing to achieve identification through functional form and distributional assumptions.
We consider two different frequencies for the occurrence of
the discrete events (E(d)
{0.50,0.75}) and three distributional
classes for the bivariate distribution. In all instances the
correlation of the error terms and the non-normality of the
disturbances is generated through the unobserved factors. For
normal disturbances, the unobserved factors are standard normal
random variables. We use two different methods to generate non-normal factors. The first uses a continuous uniform
distribution, and the second uses a skewed distribution. The
skewed factors come from a mixture of three normal distributions
with unequal means and variances. Figure 2 contains a comparison
of the standardized skewed factor distribution and a standard
normal distribution.
Note that the disturbances considered in this study never fall within the class defined by the discrete factor model in equation (5) when the number of points of support is finite. This means that the discrete factor estimator is always incorrectly specified. We consider the fact that this "incorrectly specified" estimator performs quite well under a variety of distributional assumptions to be one of the most attractive features of the approach.
VII. Monte Carlo Results: Biases and Mean Square Errors
We begin our discussion of the Monte Carlo experiments by
examining the performance of the estimators when the true data
generating process has bivariate normal disturbances. In this
instance normal maximum likelihood is consistent and achieves the
Cramer-Rao lower bound; the two stage estimator is consistent;
and the discrete factor estimators are inconsistent. We examine
four different sample sizes, two frequencies of the E(d), two
different specifications of the correlation of disturbances, and
three different R2's. Still, in the context of normal
disturbances, we examine the consequences of not imposing
exclusion restrictions across equations by allowing the same
exogenous regressor to influence both the continuous and discrete
outcomes. We next examine briefly the performance of the
estimators when the disturbances are symmetric but not normally
distributed. Finally, we examine the performance of the
estimators when there are unobserved skewed factors.
VII.1 Normal Distributions
Table 1 contains summary statistics from the Monte Carlo
experiments for our baseline specification: R2=0.20 in both
equations, error correlation =0.33, and regressor
correlation=0.80. It contains only information about the
parameter
2, the impact of the dummy endogenous variable on the
continuous outcome, the true value of which is 1.00. The left
side of the table uses an expected frequency of the discrete
event of 0.50, and the right side has an expected frequency of
0.75. The four horizontal panels present results for sample
sizes 1,000, 2,000, 3,000, and 5,000. All statistics are based
upon 100 replications of each experiment. A similar format is
used in all tables. To provide an indication of the large sample
bias of the estimators, Appendix Table A.1 contains estimates
from a single replication with a sample size of 100,000 for most
of the data generating processes we examine.
Ignoring the endogeneity of the dummy endogenous variable (OLS, the first row of each panel) yields a significant bias, with the average point estimates being 120 to 130% larger than the true parameter value. The two-stage estimator, normal MLE, and the three and four point of support DFM estimators all appear to have little bias. The two point of support DFM estimator does have appreciable bias, but the bias is only half as large as that found with the OLS estimator. Given this bias, we focus mainly on the three and four point of support models in the discussion below.
Not surprisingly, the normal MLE has the smallest MSE of all estimators for all specifications considered in this table. The MSE for the two-stage estimator is appreciably larger than that for the normal MLE. In terms of the MSE, the two-stage estimator appears to perform about the same as the three point of support estimator and slightly better than the four point of support estimator when E(d)=0.50. At sample size 5,000, however, it has a larger MSE than both of these discrete factor estimators. At E(d)=0.75, only for sample size 1,000 does the two-stage estimator have a smaller MSE than either of these two discrete factor estimators. With an unequal split of the endogenous discrete event (E(d)=0.75), the discrete factor models appear to outperform the two stage estimator.
Table 2 contains summary statistics when the R2 in each equation is 0.33 instead of the 0.20 examined above, and Table 3 considers the case where the R2 are 0.50. All other aspects of the data generating processes are the same as in Table 1, including the normally distributed disturbances and the 0.33 error correlation. As expected, all estimators perform better in terms of MSE with these higher R2. In general the comparisons of the estimators are quite similar to those found in Table 1. The two-stage and the three and four point of support estimators show little bias, and the normal MLE typically provides the smallest MSE.
There are, however, two important exceptions. In only one of the sixteen specifications examined in Tables 2 and 3 does the two-stage estimator have a smaller MSE than either the three or four point of support discrete factor estimators. This happens despite the fact that the two-stage estimator typically has a smaller bias than either of these two inconsistent estimators. Second, the performance of the three and four point of support estimators relative to the normal MLE appears to improve at higher R2. In several instances the three point of support estimator has a smaller empirical MSE than the efficient MLE, and the four point of support estimator often has MSE's only 0-15% higher than the normal MLE. Tables 2 and 3 indicate that the higher the explanatory power of the model the less one should rely upon inefficient two-stage estimators and, especially at larger sample sizes, the smaller the advantage of the efficient MLE over either the three or four point of support estimators.
Table 4 has a data generating process identical to that used in Table 1, except that there is an error correlation of 0.50 rather than 0.33. One important result of the higher error correlation is the increase in the bias of the discrete factor estimators. Except at the smallest sample size, the bias is still relatively small for the four point of support model. We also examined discrete factor approximations with five, six, and seven points of support for some of the specifications in Table 4. The bias decreased appreciably as we added more points of support.
Again, the two-stage estimator performs quite poorly in terms of MSE. The ratio of its MSE to that of the normal MLE ranges from 1.64 to 3.69, and in all four specifications with E(d)=0.75 the ratio is above 2.30. Compared to the three and four point of support estimators, the two-stage estimator also appears deficient in terms of MSE, despite the fact that its bias is much smaller than the bias of three point of support estimator. In only one of eight cases does the two-stage estimator have a smaller MSE than either the three or the four point of support estimators (16 comparisons, two for each of eight cases). In this one instance, its MSE is only 4% smaller than the MSE for the three point of support model. For the specifications with E(d)=0.50, the four point of support model has only a 16-35% larger MSE than the efficient normal MLE. At sample size 5,000 its MSE is less than 16% larger than the normal MLE. The relative performance of the four point of support model does fall appreciably at smaller sample sizes when the E(d)=0.75.
Table 5 uses the same data generating process used in Table 1 with one important exception, namely that exactly the same exogenous regressor is used in both the discrete and continuous outcome equations. Even though all estimators are identified through the linearity and distributional assumptions, the performance of every estimator in terms of MSE deteriorates dramatically in this instance. This is especially true for the two-stage estimator despite the fact that it typically has the smallest bias of any of the six estimators. In fact, the naive OLS estimator appears to have the smallest MSE on average. Table A.2 contains estimates for specifications with skewed error distributions and no exclusion restrictions. With no exclusion restrictions and skewed disturbances, each of the discrete factor models usually has an MSE at least 50% smaller than those found for the OLS, two-stage, and normal MLE estimators. With weak instruments it appears that the discrete factor models outperform the other estimators attempting to control for endogeneity, unless one knows the class of the joint error distribution and uses the appropriate maximum likelihood estimator.
VII.2 Non-Normal Disturbances
Table 6 uses the same data generating process as in Table 1 except that the unobserved factor giving rise to the error correlation follows a uniform distribution. This specification provides a more platykurtic error distribution while retaining symmetry. In this instance, each of six estimators we examine is asymptotically biased, but of the estimators that attempt to control for endogeneity, only the two point of support estimator appears to have much bias. The major difference from Table 1 is that the bias of the two point of support estimator falls considerably in each of the eight specifications. This bias reduction results in the two point of support estimator having the smallest MSE in half of the eight specifications. Each of the other four estimators that attempt to control for the non-zero error correlation have little bias, and the bias of the OLS estimator is approximately the same as it is for normal distributions.
The normal MLE still performs quite well with these symmetric, non-normal disturbances. The relative performance of the two-stage estimator and the four point of support estimator also appears to be about the same. The two-stage estimator has smaller MSE's than the four point estimator for E(d)=0.50 at small sample sizes. At larger sample sizes or for E(d)=0.75 the four point of support estimator tends to have smaller MSE's. In seven of the eight specifications, the MSE increases as one uses a discrete factor model with more points of support, though at larger sample sizes the efficiency loss with additional points of support is usually small. This suggests that it may be useful to consider more complex models with larger sample sizes.
Table 7 summarizes the Monte Carlo experiments for skewed distributions at sample size 3,000. The first panel in Table 7 corresponds to the specification in Table 1, except that there are skewed disturbances. Panels 2 through 4 similarly correspond to Tables 2 through 4, respectively. Results for all four sample sizes are in Appendix Tables A.3-A.6.
The baseline case examined in the first panel indicates substantively different performances by the estimators in the presence of skewed disturbances. The most noteworthy change is the considerable bias in the normal based MLE when E(d)=0.75. Even when the normal based MLE appears to have little bias (e.g., when E(d)=0.50, with low correlation), its MSE is larger than those of the four point of support estimator. As with symmetric disturbances, the two-stage estimator always has little empirical bias, but its MSE is often 50% to 200% larger than the MSE of either the three or four point of support models.
Panels 2 and 3 in Table 7 investigate the consequences of higher R2 on the estimators. The bias of the normal based MLE does diminish considerably with more explanatory power, but it is still fairly large when E(d)=0.75. The three and four point of support estimators still appear to be superior to the two-stage estimator and the normal MLE in terms of MSE's.
The final panel of Table 7 examines the consequences of higher error correlations with skewed distributions, and the evidence clearly points out the superior performance of the three and four point of support estimators. Normal MLE is severely biased when E(d)=0.50. When E(d)=0.75, it estimates large, incorrectly-signed estimates which, with absolute t-statistics exceeding 10, a researcher would conclude are quite significant. The MSE of the two-stage estimator is three to six times that of the four point of support estimator. At smaller sample sizes it can have a MSE more than 15 times larger than the four point of support model (see Appendix Table A.5).
At larger sample sizes with skewed distributions the relative performance of the two-stage estimator does improve substantially. But even at sample size 5,000 its best relative MSE is 36% larger than that of the four point model (range 36% - 285% larger MSE for n=5,000 with skewed distributions). This improvement in relative performance appears to be due to the fact that the discrete factor models we examined are somewhat biased even at four points of support. In a few instances we examined discrete factor models with five, six, and seven points of support, and typically the bias and the MSE declined as we added additional points of support.
In the presence of skewed disturbances neither the normal based MLE nor the two-stage estimator performs well when compared to the discrete factor model with four points of support. Normal based MLE can be extremely biased, with the bias depending upon the correlation of disturbances, the R2, and E(d). As in all the symmetric disturbance experiments, the two-stage estimator performs well in terms of bias with skewed distributions. It is, however, fairly imprecise when compared to the discrete factor models.
(Click here for tables 1 through 9 or here for the appendix tables)
VIII. Monte Carlo Results: Choosing the Number of Points of Support
Little research has been done on selecting the number of points of support for discrete factor distributions in finite samples, and we use our Monte Carlo experiments to help shed light on this issue. Based upon a Mean Square Error metric, we find that one should use a fairly liberal criterion for adding additional points of support. It appears, for example, that both the Schwartz and the Akaike Information Criterion (AIC) lead one to choose too few points of support, especially in small samples. We also evaluate the size of the confidence intervals for these estimators in the presence of pretests for the selection of the model.
The primary approach we consider for selecting the number of
support points is based on an examination of the increase in the
value of the quasi-likelihood function when one adds an
additional point of support. This is a strict upwards-testing
approach, but it seems to correspond to the approach used by many
empirical researchers when deciding whether to use more
complicated empirical models. We start with a one point of
support model, which corresponds to an OLS estimation of (1b) and
an independent probit estimation of (1a). We compare its
likelihood function to that of a discrete factor model with two
points of support. The two point of support model adds three
parameters (two factor loadings,
1 and
2, and one discrete
probability), and we use a likelihood ratio "Chi-Square" test
with 3 degrees of freedom at significance level
to determine
the rejection or acceptance of the model with one point of
support. If we accept the simpler model at significance level
,
we do not consider more complicated models.
If we reject the one-point of support model, we perform a
Chi-Square test for whether one should reject or accept the two
point of support model when compared to a three point of support
model. This test has 2 degrees of freedom (one additional point
of support,
, and one additional discrete probability). We
carry out the test at the same significance level used to test
for the "significance" of adding a second point of support. If
we reject this two point of support model in favor of the three
point of support model, we then consider whether to accept the
three point of support model or the four point of support model.
We use the same approach and
significance level to choose
between these two models.
Due to computational constraints we examine discrete factor models with more than four points of support in the Monte Carlo experiments in only a few instances. In all cases presented here, unless noted otherwise, a rejection of the three point of support model means that we use the four point model as the preferred point estimate without further upwards testing. We use the Monte Carlo experiments to examine the choice of significance level for the bias, mean square error, and performance of the estimated standard errors for the estimator based upon these pre-test criteria.
Under the null hypothesis that the smaller number of points of support is the true model, the Hessian matrix for the alternative model (i.e., additional points of support) is singular. Consequently, the likelihood ratio test statistic does not follow an asymptotic Chi-Square distribution under the null. We examine the performance of this "invalid" test statistic because it is simple to calculate and because it is quite similar to the Akaike and Schwartz test statistics. Also, it appears to work quite well.
We did consider a second approach for choosing the number of
points of support. This approach examines whether the
coefficient on the dummy endogenous variable,
2, changes
"significantly" as one adds additional points of support. The
intuition behind this test is whether allowing for a more complex
discrete approximation to an underlying continuous distribution
function has an appreciable impact on the parameter of interest.
To do this we construct the joint covariance matrix for all
parameters in the one, two, three, and four point of support
models (see, Mroz, 1987) and carry out "t-tests" of no
significant change in
2 when adding each additional point of
support. Again, we use a strict upwards testing criteria for
selecting the number of points of support and use the same
significance level at each step of the upwards testing procedure.
This approach performed comparably to the likelihood ratio test
in the Monte Carlo experiments. In the empirical example,
however, this approach performed much worse than the likelihood
value approach. We do not recommend its use unless one allows for
non-normal disturbances without endogeneity.
We also use similar upwards testing criteria for evaluating whether one should use the OLS estimate of equation (1b) instead of either the two stage estimator or the normal based maximum likelihood estimator. For the test of OLS versus the two stage estimator, we use a Durbin-Wu-Hausman test of whether the predicted probability significantly enters an OLS regression after controlling for the impact of the dummy variable. The standard errors we use control for the pre-estimation bias. For deciding whether to use OLS or the normal based maximum likelihood estimator we use a standard likelihood ratio test of the null hypothesis that the correlation coefficient equals zero. The Monte Carlo experiments help us to evaluate the performance of various significance levels as a metric for determining the types of controls one should use, if any, to control for endogeneity.
We also consider four specifications of the data generating process where the error correlation is zero when we present results based upon data driven criteria for selecting the number of points of support and for deciding whether to control for endogeneity (E(d)=0.50 and 0.75 with normal errors, and E(d)=0.50 and 0.75 with skewed errors in the probit). In these four cases endogeneity is not a problem. Appendix Tables A.7 and A.8 contain details on the Monte Carlo results with these data generating processes. In general, the discrete factor models have higher MSE than the other estimators when there are no endogeneity problems. This is not surprising, as the discrete factor models add at least three new parameters to control for endogeneity while the two stage and normal MLE each add only one new parameter.
As discussed in Section IV, in some instances the calculated Hessian matrix for the discrete factor model can be singular. When we encounter situations where this happens, we impose constraints on the parameters determining the discrete factor distribution so that the singularity disappears. All standard error estimators used in this study use this pre-test when constructing covariance matrices for the discrete factor models.
VIII.1 Biases and Mean Squared Errors
To evaluate the performance of these simple rules for selecting the number of points of support, Figures 3a - 3d present the average point estimates for each estimation procedure for particular sample sizes and frequencies of the event {d=1}. Each point on the graph is an average of 1000 estimates, 100 from each of ten different specifications of the data generating process. These ten specifications are those in Tables 1,2,3,and 4, and from Appendix Tables A.3-A.8. To conserve space, we only present these graphs for sample sizes 1000 and 5000.
The horizontal axis on each graph measures the "significance
level"
used to decide whether to accept a simpler model
according to the upwards testing criteria discussed above. At
=100, for example, the maximum likelihood approach always allows
to be different from zero; the two stage approach always uses
the predicted value of the P{d=1} instead of the actual value of
the dummy variable; and the discrete factor models always use
four points of support. At
=0 one would always accept the
simplest model for each approach; this is the OLS estimate for
each of the approaches we consider. For interior points, the
level of
determines: (1) whether the two stage approach uses
the predicted values of the dummy variable instead of just a
simple OLS estimation with the actual value of the dummy
variable, (2)whether the maximum likelihood approach permits
to
be different from zero, and (3) the number of points of support
to use for the discrete factor models. To retain more detail in
the graphs we do not include the point
=0, but the average value
corresponding to the OLS estimation is reported in each graph's
lower title. The solid horizontal line at value 1 indicates the
true value of the parameter in all data generating processes.
Figures 3a and 3b use E(d)=0.50, and Figures 3c and 3d use
E(d)=0.75. Figures 3a and 3c have sample sizes of 1000, while
Figures 3b and 3d have sample sizes of 5000. Note that the
vertical scale is different for each of the four graphs. Looking
first at E(d)=0.50, we see that the mean estimates for the most
complex models for each approach (at
=100) are quite close to
the true value. There is a slight indication of negative bias
for the normal maximum likelihood estimator, and this is due
nearly entirely to the data generating processess with skewed
disturbances. At E(d)=0.75 (Figures 3c and 3d), we see much the
same behavior, except that the normal maximum likelihood
estimator exhibits a substantial negative bias.
These figures provide key insights about the significance level one should use for deciding whether to control for endogeneity of the dummy explanatory variable. They also provide guidance for deciding how many points of support to use with the discrete factor model. For small sample sizes, typical significance levels of 5 or 10 percent for deciding whether endogeneity is an important concern would yield fairly large biases for all estimation approaches. While not displayed directly in these graphs, this is even true for the normal maximum likelihood estimator when all disturbances are truly normally distributed.
These figures do suggest that one should use a fairly liberal criterion for deciding whether to use more complex estimation procedures to control for endogeneity. At small sample sizes, a 25% test to decide whether to control for endogeneity would eliminate most of the biases in the two stage procedure; this is also the case for the normal maximum likelihood estimator when the disturbances are joint normal. For the discrete factor estimator it appears that one should use at least at 25% significance level for deciding whether to add additional points of support to the discrete factor distribution. In none of the examples we consider did there appear to be much evidence that one should consider significance levels higher than 50%.
Note that the AIC would suggest using too small a significance level for deciding whether to consider more complex models. For the normal maximum likelihood approach the AIC value woud imply a likelihood ratio "test" significance level of about 16%. For the discrete factor models the AIC would imply a significance level of 11.2% as indicating whether one should control for any form of endogeneity, and a significance level of 13.5% when deciding whether to add additional points of support. Note that the Schwartz criterion would yield considerably more bias in the point estimates than the AIC, as it is more conservative than the AIC for deciding when to consider more complex models.
Figures 4a-4d display the empirical mean square errors of
the estimation approaches. These figures follow the same format
as Figures 3a-3d; the horizontal axes here also indicate the
"significance levels" used to decide whether one should use a
more complex estimation approach. Figures 4c and 4d do not
include the MSE from the normal maximum likelihood estimator
because they are so large. The graph titles display the smallest
MSE across all
levels for normal MLE as well as the MSE for the
OLS estimator. At all levels greater than 5% there is not a
single instance where either the normal likelihood approach or
the two stage approach has a smaller MSE than the discrete factor
model. This statement is also true for the sample sizes 2,000 and
3,000.
With several of the data generating processes it appeared
that discrete factor models with more than four points of
support were needed to fit the data well. In a couple instances
we allowed for up to seven points of support for the discrete
factor models. In those instances we followed the same approach
as that used in Figures 3 and 4 to select models. There was a
fairly large decline in both the bias and the MSE for the
discrete factor models when we expanded the maximum number of
points of support from four to seven. Consider the model with
E(d)=0.75, skewed disturbances, and 5,000 observations(as in
Table A.6, bottom panel). When we allowed for a maximum of seven
points of support instead of a maximum of four, at an upwards
testing level of 0.25 the mean point estimate fell from 1.22 to
1.10, and the MSE fell by nearly 40%, from 0.056 to 0.036 . At
an
level of 0.50 the mean point estimate was 1.015 with a MSE
of 0.025, compared to a point estimate of 1.22 and a MSE of 0.054
when there is a maximum of four points of support. This suggests
that our assessments in Figures 3 and Figures 4, based on a
maximum of four points of support, understate the advantages of
the discrete factor model over either the two stage or the normal
maximum likelihood approach.
The evidence from Figures 3 and 4 present a very strong case
for using the discrete factor models in this setting. Provided
that one uses at least a 25% "significance level" for deciding
whether to add additional points of support, there is little bias
in the discrete factor models. Even in the worst case the bias
is less than 15%, and this falls off quickly as the sample size
increases. The mean square error graphs also indicate that using
an upwards testing
-level of 25 to 50% for selecting all models
will tend to yield the smallest mean square errors. The
performance of the discrete factor models in terms of mean square
error is quite remarkable. This approach appears to dominate the
only slightly biased two stage estimator at all but the smallest
significance levels for choosing more complex models.
VIII.2 Size Tests
To examine the performance of the standard error estimators,
we compare the number of rejections of the null hypothesis
2=1
in favor of the alternative
21. To carry out these hypothesis
tests we incorporate the upwards testing approach for deciding
whether to control for endogeneity and, in the case of the
discrete factor model, whether to add additional points of
support. Based on the implications from Figures 3 and 4, we use a
25% significance level for deciding whether to use the simpler
model. At smaller significance levels there are large biases for
all of the estimation procedures, resulting in excessive
rejections of the null hypothesis. The test statistic we use for
evaluating the hypothesis
2=1 is a t-test, where the standard
error estimator is adjusted for possible model misspecifications.
Figures 5(a)-5(d) contain graphs of the fraction of rejections by estimation approach against the requested size of the test, for sample sizes 1,000 and 5,000 and E(d)=0.50 and E(d)=0.75. Each graph is based on 100 replications of the ten different data generating processes listed above. At small sample sizes, the null hypothesis is rejected too frequently for each of the estimation methods. This is especially true at the E(d)=0.75. At larger sample sizes, the two stage estimator's empirical size matches the theoretical values quite closely, but both the normal maximum likelihood and the discrete factor model exhibit a tendency to overreject the null hypothesis.
The poor size performance of the maximum likelihood and discrete factor estimators is due mostly to the bias in these two estimators. If we restrict the size comparisons to data generating processes with normal disturbances, the normal maximum likelihood estimator has much better size properties. In those instances where the discrete factor model with 4 points of support has little bias, its empirical size matches the theoretical size much more closely. Because of the number of pretests used for choosing the number of points of support, its empirical size is still a bit too large in these instances.
The poor size properties of the discrete factor models can be mitigated by allowing for more than four points of support. As above, we took the data generating process with the poorest performing discrete factor model at sample size 5000 and examined estimations that allowed for up to seven points of support. This is the data generating process used in Table A.6 with 5,000 observations. Table 8 contains the empirical sizes for the four point of support estimator, the seven point of support estimator, and estimators using the upwards testing approach with maximums of four and seven points of support. Only 100 replications are used to construct these size tests.
The empirical size of the discrete factor model's standard error estimator improves dramatically by allowing additional points of support. This is not surprising, given how often the four point maximum was a potentially binding constraint. In 72 out of 100 experiments the 25% upwards testing criterion was restricted from examining more than four points of support; for 92 cases out of 100 the 50% upwards testing criterion was restricted to at most a four point of support model. However, even when allowing for up to seven points of support the upwards testing approach yields a fairly large positive bias. This results in excessive rejections of the null hypothesis. Estimators using an upwards testing criterion to select the number of points of support falsely reject much more frequently. While the size performance of the best upwards testing estimator appears quite poor, it is important to recognize that this poor performance is due mostly to pretest bias.
To further place this poor size performance into context, the number of rejections in the last row of Table 8 are almost identical to the number of rejections for the two step estimator when one has a sample size of 1,000 for the same data generating procedure and uses a standard 5% test for deciding whether to control for endogeneity. In models that rely upon asymptotic expansions, it can be the case that the size properties of tests improve by allowing for data driven selections of the number of terms to include in the expansions. Eastwood and Gallant's (1991) Monte Carlo experiments show that the bias reductions obtained by using random rules (e.g., upwards testing) for choosing the number of terms in the expansion instead of fixed rules (e.g., determined by sample size only) often improves the size properties of the estimators.
(Click here for tables 1 through 9 or here for the appendix tables)
IX. The Impact of Marriage on Wages
This example uses wage and demographic data for men aged 25-33 from the 1990 interview of the NLSY in conjunction with "attitudinal" data from the 1979 NLSY interview on marriage plans and views on the role of women and men in the family. We examine how marital status appears to impact the men's average hourly earnings for 1989, and we use the attitudinal data as instruments for predicting marital status. We focus on the White cross section cohort. Appendix 2 presents the sample selection criteria, the means of the data, and some estimation results. Overall, 927 of the 1678 young working men in this sample are married. The wage analysis drops 23 men with average hourly earnings below $2.00 or above $200.00. These observations are used in the marriage analysis.
Table 9 presents estimates of the "marriage effect" on log wages from the discrete factor model, normal maximum likelihood, and the two-stage procedure. The upwards testing criterion based on the value of the quasi-likelihood function for the discrete factor model suggests that one should use six points of support.
The point estimate from the six point discrete factor model is essentially 0, with a standard error of 0.048. There is a slight indication that the exclusion restrictions are not valid (p-value=.07) for this model, but relaxing these restrictions barely changes the point estimate (-.003) and the standard error (.043). Note that if one had used a standard 5% likelihood ratio test for adding points of support, one would have chosen a model with three points of support, and the estimated effect of marriage on wages would have been nearly 13%.
Normal maximum likelihood yields a point estimate of .02 with a much larger standard error (0.12) when the exclusion restrictions are imposed. When the exclusion restrictions are relaxed, the point estimate jumps dramatically to .63 with a standard error of .09. Most labor economists would agree that this is an absurd estimate. The two stage estimator behaves quite similarly to the normal MLE, except that the standard error increases to 2.79 for the model without the exclusion restrictions. The two stage estimator is almost completely uninformative here. If one had relied on a likelihood ratio test from the normal maximum likelihood estimator, most likely one would have concluded that endogeneity was not an important issue. In the specification with exclusion restrictions, the p-value for the likelihood ratio test is about 50%. Similarly, no test statistic from the two-stage estimator provided any indication of marriage being endogenous in the wage equation.
This example suggests that the likelihood function value approach for choosing the number of points of support for the discrete factor approximation is superior to an approach examining the point estimates for significant changes as one adds additional points of support. In this example, departures from normality of the log wage disturbances appear to be the most important problems with the goodness of fit for the model assuming independence (DFM 1). This is true even though the wage data have been trimmed to remove outliers. In fact, if one had relied upon a test of the significance of the change in the coefficient on the dummy endogenous variable to decide whether to add additional points of support, one would have concluded that endogeneity of marital status was not a problem.
The first few points of support appear to fill out the wage distribution to capture departures from normality. The two point of support model, for example, adds a point of support with weight 0.5% about 4 standard deviations away from the point of support with weight 99.5%. The three point of support model adds two points of support with combined weight of 2% about 2.5 standard deviations on either side of the point of support having 97.9% of the weight. Only with five or more points of support does the estimated correlation of the wage and marriage disturbances move appreciably from 0. The large increases in the value of the likelihood function that result from such low-weight mass points suggest that it is important to consider departures from normality when examining real data. The approach described in equation (7) provides a simple way to address this issue, but it has not been implemented in this example.
This real world example appears to mimic the Monte Carlo results with non-normal disturbances. The normal maximum likelihood estimates are somewhat unstable, and the two stage procedure is quite imprecise. The discrete factor model yields stable point estimates with relatively small standard errors.
It is useful to compare these estimates to those obtained by Korenman and Neumark (1991) in their examination of the impact of marriage on young men's wages. Korenman and Neumark use a "fixed effect" approach with longitudinal data to control for the endogeneity of marriage. They find that the estimated marriage effect falls from 12% when one ignores endogeneity to 6% when using the fixed effects model.
Their estimated effect is still much larger than the 0% effect found here. While it is not possible in this paper to evaluate precisely why their estimated marriage effects are so different from those found here, the differences may be due to the restrictive nature of the fixed effect estimator they use. In particular, Korenman and Neumark's fixed effect/first difference model assumes that there are no unobservable variables influencing both the change in marital status and changes in wages. This assumption rules out, for example, divorces resulting from temporary declines in wages as well as marriages that occur because one leaves school and starts to earn higher wages. Such assumptions seems quite unrealistic, and they might be leading to the larger marriage effects Korenman and Neumark find after "controlling" for the endogeneity of marriage through fixed effects models. In this instance, it could be the case that cross sectional data with careful controls for endogeneity yield better estimates than longitudinal data with simple fixed effect estimators.
VIII. Conclusions
These Monte Carlo results indicate that discrete factor approximations can provide reliable estimates in simple models with both continuous and discrete endogenous variables. The computational simplicity of the discrete factor approximations and their ability to yield interpretable estimates in a wide variety of circumstances suggest that these estimators may be important tools for empirical researchers. They appear to work as well in the simultaneous equation framework considered here as they work in hazard model analyses (see, Heckman and Singer(1984) or Mroz and Weir(1990)). The approach appears to allow researchers to relax arbitrary distributional assumptions while retaining much of the efficiency of maximum likelihood estimators.
The Monte Carlo results we present, however, are fairly limited. We examine only one econometric model, relatively high error correlations, models with only one or two explanatory variables, and homoscedastic disturbances. The utility of the approach in a wider range of settings needs to be explored further.
Other Monte Carlo studies have examined the performance of the discrete factor estimators. Cameron and Taber(1994), for example, present Monte Carlo evidence on the performance of the discrete factor approach in a different context: endogenous selection on whether one observes a discrete outcome. Their focus is on the ability of the discrete factor models to control for selection biases in longitudinal data, rather than on the ability of these methods to control for endogeneity. There are only exogenous variables in their model. Their conclusions about the performance of this approach are much stronger than ours. Mroz and Guilkey (1992) examined selection models and continuous endogenous determinants of discrete events. While their Monte Carlo experiments were not as detailed as those presented here, they also found the discrete factor models to have small bias and mean square error.
We do little to address the question of whether one should control for endogeneity when it is not a problem. Although there is no evidence that any of the estimators we consider have appreciable biases, the efficiency losses from controlling for endogeneity when it is not present can be enormous. At sample size 2,000 with E(d)=0.50 and normal disturbances, for example, the OLS estimate has a MSE of only 0.01. If one uses a 10 percent test as evidence of endogeneity, the MSE increases by about a factor of 10 for all the approaches we considered. Using a 25% test yields MSEs about 15 times larger than that obtained from OLS. Translating these MSEs to standard errors of estimates implies that correctly imposing independence (OLS) would yield a standard error of the estimated effect of 0.10. This would rise to 0.32 with a 10% test, and to 0.39 with a 25% test. Needlessly controlling for endogeneity clearly yields a significant cost in terms of precision of the estimated effect.
If, however, one does suspect that endogeneity is an important issue in models with continuous and discrete variables, then the results from this paper are quite useful. If one only cares about the bias of the estimates, two stage procedures appear to perform the best. An unbiased estimator, however, can easily yield estimates further away from the truth than an estimator with appreciable bias, and this can happen more frequently than not. If a researcher is confident of the true error distribution, nothing surpasses maximum likelihood. The cost of imposing incorrect assumptions to address endogeneity issues, however, can be greater than cost of ignoring the endogeneity problems. If one desires to retain much of the efficiency of maximum likelihood estimators while guarding against biases caused by imposing incorrect distributional assumptions, then one should seriously consider using discrete factor approximations.
(Click here for tables 1 through 9 or here for the appendix tables)
Disturbances
The disturbances in this Monte Carlo study are generated by
where u1 and u2 are independent draws from a standard normal distribution, and v is a mean 0 variance 1 random variable that is independent of both u1 and u2. The common factor in the disturbances, v, usually comes from a mixture of three normal random variables. Its distribution function is given by

The µ's and
k*s are adjusted such that E(v) = 0 and Var(v) = 1.
Note that
1* = 1 will generate a standard normal distribution
function.
The skewed factor distribution used in this study is obtained by setting

Figure 2 contain plots of the empirical distribution function for this choice of parameters for the common factor distribution and a comparison to a standard normal density. For the uniform factors, we use a mean zero, variance one continuous uniform distribution to generate the common factor.
The parameters,
1,
2,
1, and
2 are chosen such that



in all specifications.
Parameters
Let d and y2 in equation (1) be the observed outcomes in the model examined in this study. Here the continuous outcome depends upon an endogenous dummy shift. The quasi R2 for the y2 equation in (1) is defined as

Explanatory Variables
Ruud (1984) demonstrates that certain linearity conditions
on the expectation of the explanatory variables alone can lead to
consistent estimation of some functions of the parameters of
interest when one arbitrarily assumes a joint normal distribution
for the disturbances in mixed continuous discrete models. Joint
normality of the regressors would satisfy these conditions. To
rule out such possibilities, we generate the explanatory
variables from a compound normal distribution contaminated by a
normalized
2 random variable. We permit the two "exogenous"
variables to be correlated, and we impose the restriction that
the marginal distributions of these "exogenous" variables are
identical. These marginal distributions do not depend upon the
degree of correlation between the exogenous variables, provided
that the joint distribution function, as defined by the following
procedure, exists.
Let X and Z denote the explanatory variables. The data
generating process for these two variables comes from the
convolution of a normalized
2 variate and a compound normal
random variable:

C1 and C2 are independent standardized
2 random variables

W1 and W2 are correlated compound normal random variables, and they are generated according to the following rules.
Let the vectors V1 and V2 be bivariate normal random variables, where

Throughout this study we use the following parameterization to generate the explanatory variables X and Z
11 = 1, µ1 = -2
22 = 4
q = .5
K = 2 (i.e., a Chi-square with 2 degrees of freedom)
r = .2
The exogenous variables x and z have the same distribution, and they are normalized to have mean zero and variance one. Figure 1 plots the distribution function for the exogenous variables.
Sample Selection Criteria:
Cross Sections 1 & 2 in NLSY(White men) Total Observations: 2439
Number of observations droppped in order of deletion
1) unknown occupation 411
2) in army 2
3) in school 131
4) not coded as in labor force (ESR) 91
5) non-interview or valid skip on
CPS measure of average hourly wage 66
6) unknown education 4
7) valid skip or non-interview on
local unemployment rate 26
8) valid skip or non-interview on
urban/rural residence 1
9) missing marital status 1
10) missing job tenure 27
11) missing health limitations 1
Total deleted 761
Remaining sample 1678
Those with cps wage <$2.00 or >$200.00/hr 23
Wage sample 1655
Summary statistics
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
currently|
married | 1678 .5524434 .4973903 0 1
logwage | 1678 2.334773 .6977119 -4.60517 7.090077
"Trimmed"| 1655 2.358087 .4817482 .751416 5.092461
age | 1678 28.93862 2.241352 25 33
(age/10)2| 1678 8.424642 1.306761 6.25 10.89
educ | 1678 13.14839 2.387361 5 20
(ed/10)2 | 1678 1.785763 .6596152 .25 4
age·educ| 1678 3.807479 .7676286 1.25 6.6
# of children wanted in 1978
kidswant | 1678 2.308105 1.265503 0 12
kidswant | 1678 .0256257 .1580631 0 1
unknown |
traditional family roles
tradfamr | 1678 .5262217 .4994608 0 1
expect to marry within 5 years in 1978
msoon | 1678 .3533969 .4781671 0 1
msoon | 1678 .0995232 .2994525 0 1
unknown |
age expect to marry [ <20, 20-24, 25-29, 30+, unknown; never excl. ]
emlt20 | 1678 .0268176 .1615983 0 1
em2024 | 1678 .4338498 .4957526 0 1
em2529 | 1678 .3730632 .4837629 0 1
em30p | 1678 .079261 .2702263 0 1
emu | 1678 .0637664 .2444092 0 1
Probit Estimates Number of obs = 1678
chi2(15) = 117.30
Prob > chi2 = 0.0000
Log Likelihood = -1095.2053 Pseudo R2 = 0.0508
------------------------------------------------------------------------------
cmarr | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
age | .0858466 .387242 0.222 0.825 -.6731338 .8448269
age2 | -.2217953 .6637703 -0.334 0.738 -1.522761 1.079171
educ | -.173512 .1891488 -0.917 0.359 -.5442369 .1972129
educ2 | -.3789734 .3951314 -0.959 0.338 -1.153417 .39547
ageed | .9090156 .6057084 1.501 0.133 -.2781511 2.096182
kidswant | -.0773628 .0274517 -2.818 0.005 -.1311671 -.0235586
kidswu | -.2335656 .2077477 -1.124 0.261 -.6407437 .1736125
tradfamr | .0955492 .0640704 1.491 0.136 -.0300266 .2211249
msoon | .2674536 .0892743 2.996 0.003 .0924792 .4424279
msoonu | .4587366 .1621475 2.829 0.005 .1409334 .7765399
emlt20 | .5557342 .2965235 1.874 0.061 -.0254411 1.13691
em2024 | .5333174 .2241575 2.379 0.017 .0939768 .9726581
em2529 | .352462 .2186334 1.612 0.107 -.0760515 .7809755
em30p | .2352825 .2364911 0.995 0.320 -.2282315 .6987965
emu | .4374516 .2932701 1.492 0.136 -.1373472 1.01225
_cons | -1.408929 5.856403 -0.241 0.810 -12.88727 10.06941
------------------------------------------------------------------------------
. test
> kidswant kidswu tradfamr
> msoon msoonu
> emlt20 em2024 em2529 em30p emu
> ==0
chi2( 10) = 53.26
Prob > chi2 = 0.0000
OLS wage regression imposing exclusion restrictions.
Source | SS df MS Number of obs = 1655
---------+------------------------------ F( 6, 1648) = 53.91
Model | 62.9816611 6 10.4969435 Prob > F = 0.0000
Residual | 320.880823 1648 .194709238 R-squared = 0.1641
---------+------------------------------ Adj R-squared = 0.1610
Total | 383.862485 1654 .232081309 Root MSE = .44126
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
cmarr | .1060622 .0222558 4.766 0.000 .0624095 .1497148
age | .0153748 .1319505 0.117 0.907 -.2434336 .2741831
age2 | -.0248268 .2236989 -0.111 0.912 -.4635909 .4139373
educ | .0502236 .0643461 0.781 0.435 -.0759852 .1764324
educ2 | -.1066927 .1339753 -0.796 0.426 -.3694723 .156087
ageed | .1831362 .2025048 0.904 0.366 -.2140576 .5803301
_cons | .8966725 2.009013 0.446 0.655 -3.043815 4.83716
------------------------------------------------------------------------------
Second stage OLS regression using predicted marital status from probit.
Source | SS df MS Number of obs = 1655
---------+------------------------------ F( 6, 1648) = 49.46
Model | 58.5727897 6 9.76213161 Prob > F = 0.0000
Residual | 325.289695 1648 .197384524 R-squared = 0.1526
---------+------------------------------ Adj R-squared = 0.1495
Total | 383.862485 1654 .232081309 Root MSE = .44428
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
pmarr | .0323493 .1253236 0.258 0.796 -.213461 .2781597
age | .0210875 .1330871 0.158 0.874 -.2399501 .2821251
age2 | -.0330511 .2255605 -0.147 0.884 -.4754665 .4093644
educ | .047695 .0651635 0.732 0.464 -.0801169 .1755069
educ2 | -.1179224 .1356532 -0.869 0.385 -.3839931 .1481483
ageed | .1988506 .2063959 0.963 0.335 -.2059752 .6036763
_cons | .8352219 2.023691 0.413 0.680 -3.134055 4.804499
------------------------------------------------------------------------------
OLS regressions without imposing exclusion restrictions
Source | SS df MS Number of obs = 1655
---------+------------------------------ F( 16, 1638) = 21.29
Model | 66.0916726 16 4.13072954 Prob > F = 0.0000
Residual | 317.770812 1638 .193999275 R-squared = 0.1722
---------+------------------------------ Adj R-squared = 0.1641
Total | 383.862485 1654 .232081309 Root MSE = .44045
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
cmarr | .1083769 .0225856 4.798 0.000 .0640772 .1526765
age | -.0363344 .1330175 -0.273 0.785 -.2972366 .2245678
age2 | .0920415 .2269451 0.406 0.685 -.3530916 .5371745
educ | .0624841 .0650886 0.960 0.337 -.0651815 .1901497
educ2 | -.0555548 .1350153 -0.411 0.681 -.3203757 .209266
ageed | .08221 .2068036 0.398 0.691 -.3234173 .4878373
kidswant | .0108928 .0094552 1.152 0.249 -.0076527 .0294383
kidswu | -.1123383 .0724686 -1.550 0.121 -.2544792 .0298026
tradfamr | -.0227265 .0221737 -1.025 0.306 -.0662183 .0207652
msoon | .0131649 .0311337 0.423 0.672 -.0479012 .074231
msoonu | -.0422679 .0551466 -0.766 0.444 -.1504332 .0658973
emlt20 | .0335269 .1032856 0.325 0.746 -.1690588 .2361127
em2024 | .055538 .0791391 0.702 0.483 -.0996865 .2107624
em2529 | .0551726 .0773436 0.713 0.476 -.0965302 .2068754
em30p | .0364417 .0835504 0.436 0.663 -.1274352 .2003187
emu | -.018971 .1010414 -0.188 0.851 -.217155 .179213
_cons | 1.48087 2.019545 0.733 0.463 -2.480292 5.442032
------------------------------------------------------------------------------
Source | SS df MS Number of obs = 1655
---------+------------------------------ F( 16, 1638) = 19.58
Model | 61.6318245 16 3.85198903 Prob > F = 0.0000
Residual | 322.23066 1638 .196722015 R-squared = 0.1606
---------+------------------------------ Adj R-squared = 0.1524
Total | 383.862485 1654 .232081309 Root MSE = .44353
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
pmarr | .6288386 3.309861 0.190 0.849 -5.863167 7.120845
age | -.0576732 .1946963 -0.296 0.767 -.4395531 .3242068
age2 | .144464 .408141 0.354 0.723 -.6560692 .9449972
educ | .0989321 .2312275 0.428 0.669 -.3546006 .5524648
educ2 | .0159662 .4907274 0.033 0.974 -.946553 .9784854
ageed | -.1017288 1.168164 -0.087 0.931 -2.392981 2.189524
kidswant | .0260079 .0970085 0.268 0.789 -.1642659 .2162816
kidswu | -.0680534 .2974283 -0.229 0.819 -.6514331 .5153263
tradfamr | -.041243 .1213657 -0.340 0.734 -.2792914 .1968054
msoon | -.0412366 .3469587 -0.119 0.905 -.7217659 .6392928
msoonu | -.1331389 .5786455 -0.230 0.818 -1.268102 1.001824
emlt20 | -.0759555 .7084654 -0.107 0.915 -1.465549 1.313638
em2024 | -.0488251 .6755105 -0.072 0.942 -1.37378 1.27613
em2529 | -.0138478 .4518125 -0.031 0.976 -.9000388 .8723432
em30p | -.0095669 .3101193 -0.031 0.975 -.617839 .5987053
emu | -.1012738 .5398936 -0.188 0.851 -1.160228 .9576807
_cons | 1.54769 2.113395 0.732 0.464 -2.59755 5.692931
Six Point of Support Model with Exclusion Restrictions Imposed
log likelihood: -2050.23781543049972
name estimate std err t-stat
Marriage "Probit"
consprob -.810056E+00 .594568E+01 -.136243E+00
age .126258E+00 .393788E+00 .320624E+00
age2 -.302344E+00 .675563E+00 -.447543E+00
educ -.184793E+00 .192056E+00 -.962183E+00
educ2 -.402649E+00 .400869E+00 -.100444E+01
ageed .975862E+00 .615062E+00 .158661E+01
kidswant -.813971E-01 .278529E-01 -.292239E+01
kidswu -.195354E+00 .211818E+00 -.922274E+00
tradfamr .106713E+00 .648320E-01 .164599E+01
msoon .263024E+00 .901635E-01 .291719E+01
msoonu .467607E+00 .164004E+00 .285119E+01
emlt20 .557630E+00 .298319E+00 .186924E+01
em2024 .527880E+00 .225944E+00 .233633E+01
em2529 .348290E+00 .220420E+00 .158012E+01
em30p .232671E+00 .238403E+00 .975957E+00
emu .455103E+00 .296246E+00 .153623E+01
Log(wage) "Regression"
conslwag .220381E+01 .186459E+01 .118193E+01
cmarr .452250E-02 .423772E-01 .106720E+00
age .882706E-01 .123841E+00 .712771E+00
age2 -.159894E+00 .211250E+00 -.756898E+00
educ .294886E-01 .569697E-01 .517620E+00
educ2 -.129426E+00 .118802E+00 -.108943E+01
ageed .277169E+00 .180929E+00 .153193E+01
sigma .259782E+00 .147016E-01 .176704E+02
probrho -.186763E+01 .738784E+00 -.252798E+01
contrho -.369695E+01 .238730E+00 -.154859E+02
prcof2 -.754086E-01 .264781E-01 -.284796E+01
prcof3 .147483E+00 .608543E-01 .242354E+01
prcof4 -.745873E+00 .132947E+00 -.561031E+01
prcof5 .414619E+01 .199546E+00 .207782E+02
prcof6 .107719E+01 .114423E+00 .941413E+01
supcof2 -.102733E+01 .358057E+00 -.286916E+01
supcof3 -.162662E+00 .154333E+00 -.105397E+01
supcof4 .368555E+00 .126970E+00 .290270E+01
supcof5 .113419E+01 .166642E+00 .680611E+01
k: 1 support: .0000000 pweight: .0012067
k: 2 support: .2636029 pweight: .0046096
k: 3 support: .4594240 pweight: .1127383
k: 4 support: .5911099 pweight: .6523969
k: 5 support: .7562690 pweight: .2234337
k: 6 support: 1.0000000 pweight: .0056148
hetero mean: 0.613238773661688996
hetero var: 0.941486038428253000E-02 hetero sd: 0.970302034640891098E-01
correlation matrix
discrete: 1.000000 .102627
continuous: .102627 1.000000
Six Point of Support Model without Exclusion Restrictions Imposed
log likelihood: -2041.19809206370337
name estimate std err t-stat
consprob -.790552E+00 .595756E+01 -.132697E+00
age .122184E+00 .394641E+00 .309608E+00
age2 -.285726E+00 .676734E+00 -.422212E+00
educ -.178199E+00 .192267E+00 -.926827E+00
educ2 -.402010E+00 .401546E+00 -.100116E+01
ageed .949956E+00 .615241E+00 .154404E+01
kidswant -.797748E-01 .280079E-01 -.284830E+01
kidswu -.221593E+00 .213289E+00 -.103893E+01
tradfamr .106811E+00 .652185E-01 .163774E+01
msoon .261440E+00 .907911E-01 .287957E+01
msoonu .450052E+00 .164902E+00 .272922E+01
emlt20 .592101E+00 .300381E+00 .197117E+01
em2024 .564634E+00 .227769E+00 .247898E+01
em2529 .382769E+00 .222107E+00 .172336E+01
em30p .252865E+00 .239859E+00 .105422E+01
emu .475527E+00 .298238E+00 .159446E+01
conslwag .237777E+01 .188449E+01 .126176E+01
cmarr -.296655E-02 .433844E-01 -.683783E-01
age .544600E-01 .125640E+00 .433460E+00
age2 -.724368E-01 .215648E+00 -.335902E+00
educ .452740E-01 .565562E-01 .800514E+00
educ2 -.105410E+00 .118385E+00 -.890402E+00
ageed .190389E+00 .181903E+00 .104665E+01
kidswant .629455E-02 .904924E-02 .695590E+00
kidswu -.971542E-01 .688796E-01 -.141049E+01
tradfamr -.177277E-02 .201495E-01 -.879810E-01
msoon -.786345E-03 .289598E-01 -.271530E-01
msoonu -.554660E-01 .507629E-01 -.109265E+01
emlt20 .112273E+00 .932978E-01 .120339E+01
em2024 .122687E+00 .729740E-01 .168125E+01
em2529 .115605E+00 .712517E-01 .162249E+01
em30p .708609E-01 .758613E-01 .934084E+00
emu .557587E-01 .906693E-01 .614968E+00
sigma .258756E+00 .135135E-01 .191479E+02
probrho -.196630E+01 .728849E+00 -.269782E+01
contrho -.362461E+01 .251086E+00 -.144357E+02
prcof2 -.815992E-01 .298748E-01 -.273137E+01
prcof3 .140578E+00 .503797E-01 .279037E+01
prcof4 .720175E+00 .125231E+00 .575077E+01
prcof5 .255168E+01 .146856E+00 .173754E+02
prcof6 .119825E+01 .692427E-01 .173050E+02
supcof2 -.117876E+01 .397184E+00 -.296779E+01
supcof3 -.224523E+00 .161624E+00 -.138917E+01
supcof4 .356524E+00 .132969E+00 .268127E+01
supcof5 .114378E+01 .175268E+00 .652591E+01
k: 1 support: .0000000 pweight: .0012120
k: 2 support: .2352758 pweight: .0035934
k: 3 support: .4441040 pweight: .0904507
k: 4 support: .5881988 pweight: .6669649
k: 5 support: .7580317 pweight: .2316757
k: 6 support: 1.0000000 pweight: .0061032
hetero mean: 0.615043682890612731
hetero var: 0.974158473377843837E-02 hetero sd: 0.986994667350256649E-01
correlation matrix
discrete: 1.000000 .109598
continuous: .109598 1.000000
References
Amemiya, T., 1985, Advanced Econometrics,Cambridge: Harvard University Press.
Anderson, T. and H. Rubin, 1956, "Statistical Inference in Factor Analysis," in
Proceedings of the Third Berkeley Symposium on Mathematical Statistics and
Probability, J.Neyman, ed., Berkeley: University of California, Vol. V, pp. 111-150.
Arabmazar, A. and P. Schmidt, 1981, "Further Evidence on the Robustness of the Tobit
Estimator to Heteroskedasticity," Journal of Econometrics, 17, pp. 253-58.
, 1982, "An Investigation of the Robustness of the Tobit Estimator to
Non-Normality," Econometrica, 50, pp. 1055-63.
Blau, D., 1994, "Labor Force Dynamics of Older Men," Econometrica, 62(1), pp.117-56.
Cameron, S. and C. Taber, 1994, "Evaluation and Identification of Semiparametric Maximum
Likelihood Models of Dynamic Discrete Choice,"Mimeo, University of Chicago, November.
Cosslett, S.J., 1983, "Distribution-free Maximum Likelihood Estimator of the Binary Choice
Model," Econometrica, 51, pp. 765-82.
David, P.A., and T.A. Mroz," 1989a, "Evidence of Fertility Regulation Among Rural French
Vllagers, 1749-1789: A Sequential Econometric Modelof Birth-Spacing Behavior
(Part 1),"European Journal of Population, Vol. 5, No. 1, (1989), pp. 1-26.
David, P.A., and T.A. Mroz," 1989b, "Evidence of Fertility Regulation Among Rural French
Vllagers, 1749-1789: A Sequential Econometric Modelof Birth-Spacing Behavior
(Part 1),"European Journal of Population, Vol. 5, No. 2,(1989), pp. 173-206.
Eastwood, B. J., and A. R. Gallant,1991, "Adaptive Rules for Seminonparametric Estimators
that Achieve Asymptotic Normality,"Econometric Theory, No.3, Vol. 7, pp.307-40.
Everitt, B.S. and D. J. Hand, 1981, Finite Mixture Distributions, London: Chapman
and Hall.
Follman, D. and D. Lambert, 1989, "Generalizing Logistic Regression by Nonparametric Mixing,"
Journal of the American Statistical Association, Vol. 84, pp. 295-300.
Geweke, J., 1991, "Efficient Simulation from the Multivariate Normal and Student-t
Distributions Subject to Linear Constraints," forthcoming, Computing Science and
Statistics: Proceedings of the Twenty-Third Symposium on the Interface.
Goldberger, A., 1983, "Abnormal Selection Bias," in S. Karlin, T. Amemiya, and L. Goodman,
eds., Studies in Econometrics, Time Series and Multivariate Statistics, New York:
Academic Press.
Gritz, R. M., 1993, "The Impact of Training on the Frequency and Duration of Employment."
Journal of Econometrics, Vol. 57, pp. 21-51.
Heckman, J., 1978, "Dummy Endogenous Variables in a Simultaneous Equation System,"
Econometrica, Vol. 46, pp. 931-960.
Heckman, J., 1981, "Statistical Models for Discrete Panel Data," in C. Manski and D. McFadden,
eds., Structural Analysis of Discrete Data with Econometric Applications,
Cambridge: The MIT Press.
Heckman, J. and T. MaCurdy, 1981, "New Methods for Estimating Labor Supply Functions: A
Survey," in R. Ehrenberg, ed., Research in Labor Economics. London: JAI Press, 4.
Heckman, J. and T. MaCurdy, 1986, "Labor Econometrics," in Z. Griliches and M. Intriligator,
eds., Handbook of Econometrics, Vol. 3, New York: North-Holland, pp. 1917-1977.
Heckman, J. and B. Singer, 1984, "A Method for Minimizing the Impact of Distributional
Assumptions in Econometric Models for Duration Data," Econometrica, Vol. 52,
pp. 271-320.
Heckman, J. and J. Walker, 1990, "The Relationship between Wage and Income and the Timing and
Spacing of Births: Evidence from Swedish Longitudinal Data," Econometrica,
Vol 58, pp. 1411-1441.
Heckman, J. and R. Willis, 1977, "A Beta Logistic Model for Analysis of Sequential Labor Force
Participation by Married Women," Journal of Political Economy, Vol. 85, pp. 27-58.
Hurd, M., 1979, "Estimation in Truncated Samples When There is Heteroskedasticity," Journal
of Econometrics, 11, pp. 247-58.
Keane, M, R. Moffitt, and D. Runkle, 1988, "Real Wages over the Business Cycle: Estimating the
Impact of Heterogeneity with Micro Data," Journal of Political Economy, 96,
No.6, pp. 1232-1266.
Killingsworth, M., 1983, Labor Supply, Cambridge: Cambridge University Press.
Killingsworth, M. and J. Heckman, 1986, "Female Labor Supply: A Survey," in O. Ashenfelter and
R. Layard, eds., Handbook of Labor Economics, Vol. 1, New York: North-Holland, pp.
3-204.
Kochar, A., 1991, An Empirical Investigation of Rationing Constraints in Rural Credit
Markets in India, Ph.D. dissertation, Department of Economics, University of Chicago.
Korenman, S. and D. Neumark,1991, "Does Marriage Really Make Men More Productive?", Journal
of Human Resources, Vol.26 No.2, pp. 282-307.
Lee, L-F, 1982, "Some Approaches to the Correction of Selectivity Bias," Review of Economic
Studies, Vol. 49, pp. 355-72.
Lee, L-F, 1983, "Generalized Econometric Models with Selectivity," Econometrica, Vol.
51, pp. 507-12.
Maddala, G.,1983, Limited-Dependent and Qualitative Variables in Econometrics,
Cambridge: Cambridge University Press.
McFadden, D., 1989, "A Method of Simulated Moments for Estimation of Discrete Response Models
without Numerical Integration," Econometrica, Vol. 57, pp. 995-1026.
Mroz, T., 1987, "The Sensitivity of an Empirical Model of Married Women's Hours of Work to
Economic and Statistical Assumptions," Econometrica, Vol. 55, pp. 765-799.
Mroz, T. and D. Guilkey, 1992, "Discrete Factor Approximations for Use in Simultaneous
Equation Models with Both Continuous and Discrete Endogenous Variables," mimeo,
Department of Economics, University of North Carolina, Chapel Hill.
Mroz, T. and D. Weir, 1990, "Structural Change in Life Cycle Fertility During the Fertility
Transition: France Before and After the Revolution of 1789," Population Studies,
vol.44, pp. 61-87.
Mroz, T. and D. Weir, 1994, "Random Parameters and Approximations to Stochastic Dynamic
Optimization Models with an Application to Age at Marriage and Life Cycle Fertility
Control inFrance Under the Ancien Regime," Mimeo, UNC, Chapel Hill.
Narendranathan, W. and P. Elias, 93,"Influences of Past History on the Incidence of Youth
Unemployment: Empirical Findings for the UK," Oxford Bulletin of Economics,
Vol. 55, vol. 2, pp. 161-85.
Newey, W., Powell, J., and Walker, J., 1990, "Semiparametric Estimation of Selection Models:
Some Empirical Results," American Economic Review, 80, No. 2, pp. 324-28.
Paarsch, H., 1984, "A Monte Carlo Comparison of Estimators for Censored Regression Models,"
Journal of Econometrics, 24, pp. 197-213.
Park, B. U., and Marron, J. S., 1990, "Comparisons of Data- Driven Bandwidth Selections,"
Journal of the American Statistical Association, Vol. 85, pp.66-72.
Powell, J., 1987, "Semiparametric Estimation of Bivariate Latent Variable Models," Working
Paper No. 8704, SSRI, University of Wisconsin-Madison, July.
Rivers, D. and Q. Vuong, 1988, "Limited Information Estimation and Exogeneity Tests for
Simultaneous Probit Models," Journal of Econometrics, 39, pp. 347-66.
Robinson,P.M., 1988, "Root-N-Consistent Semiparametric Regression," Econometrica, 56,
pp. 931-54.
Ruud,P., 1986, "Consistent Estimation of Limited Dependent Variable Models Despite
Misspecification of Distribution," Journal of Econometrics, 32, pp. 157-187.
White, H., 1982, "Maximum Likelihood Estimation of Misspecified odels," Econometrica
Vol. 50, pp. 1-26.
(Click here for tables 1 through 9 or here for the appendix tables)