Tom Mroz

Discrete Factor Approximations in Simultaneous Equation Models:
Estimating the Impact of a Dummy Endogenous Variable on a
Continuous Outcome

Thomas A. Mroz

Department of Economics
University of North Carolina, Chapel Hill

and

The Carolina Population Center



October 1997




Keywords: Simultaneous Equation Models, Binary Response Models, Latent Variables, Finite Mixture Distributions, Monte Carlo Studies.



Comments are encouraged. This paper is an extension and elaboration of earlier work I did with David Guilkey. David provided much advice on this version. I would like to thank David Blau, Ron Gallant, James Heckman, Hidehiko Ichimura, Peter Schmidt, and seminar participants at the University of Minnesota and the Research Triangle Econometics Seminar and the Duke/UNC Labor Economics Seminar for useful advice. The Office of Information Technology at UNC, especially Mike Padrick, Larry Mason, and Jim Gogin, and the computer staff at the Carolina Population Center helped to make the extensive computations in this paper feasible. Timothy Savage provided superb research assistance. Partial funding for this project came from NIH grant R01 HD29551-01-03 and a UNC Junior Faculty Development Award.




(Click here for tables 1 through 9 or here for the appendix tables)

I. Introduction

Empirical researchers examining simultaneous equation models with limited dependent variables face difficult tradeoffs among the precision of estimates, the sensitivity of results to distributional assumptions, and computational feasibility. Researchers often use maximum likelihood estimators based upon joint normality assumptions in these models, for these methods tend to yield relatively small estimated standard errors. Computational limitations and numerical inaccuracies, however, frequently force researchers to examine only low dimensional systems with the maximum likelihood approaches. Many researchers use two-stage estimation procedures. These approaches often relax some of the arbitrary normal distribution assumptions, and they tend to impose fewer computational burdens than maximum likelihood procedures. However, often it is not possible to adapt the two-stage estimators to more complex empirical models. In addition, these less demanding methods typically produce quite inaccurate results. See Heckman(1978), Maddala(1983), and Amemiya(1985) for detailed discussions of these estimation procedures.

Little work appears in the literature on empirical approaches that retain some of the precision of the maximum likelihood approaches, relax stringent distributional assumptions, and are computationally feasible in large systems. This paper addresses these issues directly. It describes a set of simple and numerically stable estimators that can serve as replacements for maximum likelihood estimators imposing joint normality. These estimators model the joint endogeneity of outcomes as arising from common unobserved factors. The multivariate normal distribution, for example, falls within this class.

Like Heckman and Singer's(1984) approach for modeling unobserved explanatory factors in hazard rate analyses, this approach assumes that these unobserved variables can be approximated by a discrete distribution. Extending the discrete factor framework to high dimensions is both straightforward and computationally feasible. This paper uses a set of Monte Carlo experiments to evaluate the performance of the discrete factor estimator and demonstrates the approach by examining the impact of marriage on men's wages.

The paper evaluates the performance of the discrete factor approximations in an econometric model where a continuous outcome depends upon an endogenous dummy variable. Researchers have used such models to examine, for example, the impacts of training programs on wages, the effects of living in a single parent household on children's school performance, union wage effects and, as in the example presented below, the impact of marriage on wages. Heckman(1978) contains a thorough discussion of limited dependent models, including the one examined here, under the assumption of joint normality. For excellent reviews of applications of limited dependent variables models see Heckman and MaCurdy (1981,1986), Killingsworth (1983), Maddala(1983), and Killingsworth and Heckman(1986).

Several researchers have examined the consequences of imposing arbitrary normality assumptions in the context of Tobit models. See, for example, Hurd(1979), Arabmazar and Schmidt (1981,1982), Goldberger(1983), and Paarsch(1984). Both Lee(1982,1983) and Heckman and MaCurdy(1986) have proposed some simple expansions that relax the normality assumption in the context of the sample selection model, but we know of no work that provides a simple and general framework for dealing with general specifications of endogenous variables in mixed continuous-discrete distributions without imposing strong distribution assumptions.

More recently, econometricians have developed several semiparametric estimators for discrete choice and sample selection models. For examples, see Cosslett(1983), Powell(1987), Robinson(1988), and Ahn and Powell(1993). This paper does not evaluate these approaches for a variety of reasons. Most of these semiparametric procedures cannot be extended in a straightforward manner to complicated systems with both discrete and continuous endogenous explanatory variables, and few applied researchers use these techniques. More importantly, the little evidence available on the empirical performance of these estimators suggests that they do not perform better in practice than the two-stage estimators we use as benchmarks for the performance of the approximation estimators. See, for example, Newey, Powell, and Walker's (1990) semiparametric evaluation of Mroz's (1987) parametric estimates of female labor supply functions.

Heckman(1981) describes the use of factor models in discrete panel data, and Heckman and Willis(1978) use a parametric assumption on an unobserved factor in a study of women's sequential labor force participation decisions. Versions of the discrete factor models similar to those used here have been applied by Keane, Moffitt, and Runkle(1988) to control for sample selection biases when estimating wage equations; by Gritz(1993) in a study of the impact of job training programs on wages; by Mroz and Weir(1990) in an analysis of the impact of the number of surviving children on a couple's propensity to regulate fertility; by Kochar(1991) in a study of the access to formal and informal credit markets in India; by Blau(1994) in a study of retirement behavior; and by Narendranathan and Elias(1993) in a study of youth unemployment.

The results of the Monte Carlo analysis presented here suggest that discrete factor approximation models may be useful in a wide variety of situations. When the true distribution of the disturbances is joint normal, the discrete factor estimators compare favorably to the normal maximum likelihood estimators in terms of bias and Mean Square Error (MSE). This suggests that there may be little bias or efficiency loss by incorrectly assuming a discrete factor model when normality is true. When the true distribution of the unobservables is not normal, the discrete factor approximations perform better than maximum likelihood estimators (incorrectly) assuming joint normality in most of the cases we examined. The two-stage estimator always work well in terms of bias, but the empirical distributions estimated in this study suggest that the two stage estimator may often be too inefficient to be useful without large sample sizes.




(Click here for tables 1 through 9 or here for the appendix tables)

II. Experimental Design

The process generating the outcomes in this paper is

where y1* is a latent variable determining whether the discrete outcome d takes place, and the exogenous variables x and z are independent of the error terms1 and 2. Only the outcomes d and y2 are observed. The primary parameter of interest in this case is 2 , the impact of the endogenous dummy variable on the continuous outcome y2. We set this parameter to 1 in the data generating process. Appendix 1 contains the exact specifications of the exogenous variables, the remaining parameters, and the disturbances used in the data generating process.

We use a variety of procedures to estimate the impact of the dummy variable. The first is OLS applied directly to equation (1b), which ignores the possible endogeneity of the dummy variable. The second method applies a probit procedure to equation (1a). One then substitutes the predicted probability that {d=1} into (1b), and estimates the transformed equation (1b) by OLS. This method yields consistent estimates when 1 is normally distributed. The third method uses a maximum likelihood estimator based upon the assumption of joint normality of the disturbances 1 and 2. When the disturbances are normally distributed, the maximum likelihood estimator will be asymptotically efficient. But when the disturbances are non-normal, these maximum likelihood methods typically will not yield consistent estimates. The final group of procedures uses discrete factor approximations to control for the endogeneity of the dummy explanatory variable. In most experiments we focus on procedures with two, three, and four points of support for the distribution of a discrete unobserved factor.



(Click here for tables 1 through 9 or here for the appendix tables)

III. Overview of the Discrete Factor Method

These discrete factor models are, in spirit, identical to the semiparametric methods proposed by Heckman and Singer(1984) to control for unobserved heterogeneity in hazard rate models. Like Heckman and Singer, we derive the likelihood function for an observation's observed outcomes conditional upon the value of the unobserved factors (heterogeneity) and then integrate out over the distribution of the unobserved factors. By choosing a discrete distribution for these factors, the resulting unconditional distribution function falls in the class of mixture distributions. As the number of points of support for the discrete distribution grows large, this approach can approximate a "kernel" distribution for multivariate random variables.


III.1 The Basic Formulation

Suppose that one is interested in estimating a two equation model with homoscedastic error terms generated by the process

where u1, u2, and v are assumed to have mean 0, are mutually independent, and are independent of the exogenous variables in the model. One convenient interpretation of this formulation considers v to be an unobserved variable that has a linear effect on the outcomes influenced by these two disturbances. This formulation places no substantive restrictions on the correlation of 1 and 2. The strategy followed in this study assumes that u1 and u2 are, in addition, normally distributed. The discussion below suggests ways to relax this assumption, but we do not examine these more complicated estimators in the Monte Carlo evaluation. Cameron and Taber (1994) provide identification conditions and consistency proofs for the discrete factor estimator when one knows the distribution of u1 and u2.

Conditional upon the value taken by the factor v, the joint distribution of 1 and 2 is given by

where 1 and 2 are the standard deviations of u1 and u2 and is the standard normal density function. If the cumulative distribution function of v is F(v), then the unconditional distribution of 1 and 2 is

Suppose one assumes that v follows a standard normal distribution. In this instance, the joint distribution of 1 and 2 simplifies to a bivariate normal distribution with zero means, variances (12 + 12) and (22 + 22) , and covariance 12. This formulation, then, contains the standard bivariate normal distribution as a special case.

Like Heckman and Singer's(1984) proposal for dealing with unobserved heterogeneity in duration models, this paper assumes that the cumulative distribution of v can be approximated by a step function. In particular, suppose that the distribution of v is given by

The integral given in equation (3) reduces to

Examination of equation (5) reveals that the joint distribution is a weighted sum of products of univariate normal distributions. Everitt and Hand(1981) provides an excellent overview of finite mixture distributions of this type.

The Monte Carlo results discussed in Sections VII and VIII evaluate the use of the approximation in equation (5) under a variety of different distributions for the unobserved factor v, sample sizes, error variances, error correlations, and expected frequncies of the event {di=1}. The discrete factor, quasi-likelihood function for the model considered in this study is

where N is the sample size, (.) is the standard normal density function, K is the number of points of support chosen for the discrete factor distribution, and pk is the "probability" that the unobserved factor takes on the "value" k. In practice, one estimates simultaneously the parameters ., ., , 1, 2, and (1,...,K)and (p1,...,pK) subject to some trivial normalizations.


III.2 Some Potential Extensions.

(A) Multivariate Models

Extending this discrete factor framework to higher dimensions is straightforward. Suppose that the disturbance vector has G elements. Similar to the bivariate case, let each element of be approximated by

where there are J common factors, the ug's are normally distributed and the elements of (u1,...,ug,v1,...,vJ) are mutually independent. In this case the joint distribution of 1,...,G is given by

where

This formulation permits there to be a different number of points of support for each of the J factors, and it can readily be modified to allow for dependence among the Vj's. In all of these modifications, the joint distribution function is a weighted sum of products of univariate normal distribution functions.

The computational utility of this approach can be demonstrated with a multinomial probit example. Suppose that there are a large number of binary outcomes, and that each outcome is generated by g crossing a threshold. In addition, suppose that the threshold for outcome g depends upon whether some of the other outcomes take place. In general, a recursive structure may be necessary to assure logical consistency in these types of formulations. Under the assumption of joint normality of the g's, evaluation of the likelihood function of this model would require high dimension integrals of the joint normal distribution function. Both the Gibbs sampling approach (see, for example, Geweke(1991)) and McFadden's(1989) simulated method of moments estimators reduce the computational burdens of evaluating the multivariate integrals, but these approaches typically require the researcher to assume an arbitrary form for the multivariate distribution.

If, instead of the joint normality assumption, one were to use a discrete factor approximation as in equation (6), then the evaluation of the likelihood function would require only weighted sums of products of univariate normal integrals. Such integrals can be approximated to a high degree of accuracy, and this type of formulation can exploit features of both parallel and vector processors. This is one class of models where discrete factor assumptions may make it possible for researchers to consider more complex interactions than have previously been feasible.


(B) Non-Normal Disturbances with Independence

One drawback of the formulation given in equation (5) is that it permits non-normality in both equations only when there is a non-zero correlation of the disturbances; the joint non-normality can only arise when the common factor enters both equations. A possible modification is to allow there to be three factors, one of which can be present in both equations. For example, suppose that

This formulation contains that in equation (5) as a special case, and it also permits non-normality even when the disturbances 1 and 2 are independent. This formulation could result in computational problems when 1 and 2 are independent, for all of the terms relating to the common factor, VC, would then be unidentified.


(C) Approximating Arbitrary, Homoscedastic Multivariate Distributions

A general formulation of the discrete factor representation of a bivariate distribution function is given by

The joint distribution of the disturbances is given by

where

In this instance, the dependence of the disturbances 1 and 2 is captured through the implicit dependence of the factors V1 and V2. If 1 and 2 approach zero and the number of points of support, K, grows large, then this representation corresponds to a bivariate kernel for arbitrarily dependent bivariate random variables. A discrete factor approach can thus approximate almost any pattern of dependence between the disturbances. The form of dependence examined in this paper, namely where the dependence is generated by common linear factors, is just one convenient case that can be captured by discrete factor approximations.

This comparison of the discrete factor approximation to multivariate kernel estimators illustrates the basic difference between the approach suggested in this paper and standard nonparametric density estimation. In the standard kernel estimation approach, the number of points of support is set equal to the sample size, the probability of each point of support equals the inverse of the sample size, the locations of the points of support are set equal to the outcomes 1 and 2, and the bandwidths 1 and 2 are "fixed." Consistency of the estimator of the density in this instance can be achieved by allowing the bandwidths to approach zero slowly. When used in real applications, the asymptotic results provide little guidance for setting the bandwidths; researchers usually experiment with various values until the estimates appear "well-behaved." See, for example, Silverman's (1984) discussion of the choice of bandwidths for kernel estimators and Park and Marron's (1990) critique of bandwidth selection procedures.

For the estimation problems considered in this paper, one often can observe only particular ranges for the random variable. There is no exact solution for the disturbance vector even when the true parameter values are known, so a standard kernel estimation approach is not feasible here. Instead, this paper suggests that the researcher experiment with the number of points of support. The observed data determine the amount of smoothing (the j's), the locations of the points of support (the vj's), and the weight attached to each point of support (the pk's), conditional upon each chosen number of points of support.

These factor models can also be adapted to allow for random coefficients in the economic model or other sources of heteroscedasticity. In some instances, the random coefficients may be correlated with the disturbances, as would be implied in many dynamic models with random coefficients. Mroz and Weir (1994) provide an example of how a discrete factor approach can model self-selected, random coefficients in longitudinal data models. Like the multinomial probit discussed above, there have been no Monte Carlo evaluations of any of these extensions to the factor model estimators.



(Click here for tables 1 through 9 or here for the appendix tables)

IV. Identification

The first identification issue concerns the location and scale of the distribution function of V. When each equation contains an intercept, then one must constrain arbitrarily the location of the discrete distribution function. In practice it is often easiest to set one of the points of support to zero. The scale of the discrete factor is also underdetermined. One can arbitrarily set one of the factor loadings to a non-zero constant (e.g., 1=1), or one can restrict the range of the points of support for the discrete distribution function (e.g., 1=0,2=1, and k (0,1), for k>2 ).

Besides the need to eliminate these trivially underdetermined parameters, there are several substantive identification issues. Suppose that the true distribution of the unobserved common factors in equation (3) is standard normal. In this instance the parameters of the factor model are underidentified, for there are two parameters defining the (single value) correlation of 1 and 2. The underidentification is analogous to the identification problem due to rotations in standard factor analysis models. In part it is also due to the fact that convolutions of normal random variables remain in the class of normal distributions. See Anderson and Rubin (1956) for a discussion of identification in factor models when only the first two moments of the distribution are of interest.

When the disturbances are non-normal, there may be fewer identification problems than in the normal case. This is due to the fact that convolutions of the unobserved factor and the assumed normally distributed white noise terms (u1 and u2) fall outside the class of normal distributions. In these non-normal models the higher order moments are not necessarily determined completely by the first two moments of the disturbances 1 and 2, as is the case with normal disturbances and normal factors. Even when all of the error components are normally distributed, by choosing a finite number of points of support for the distribution of the unobserved factor one might achieve identification of all "parameters" in the factor model. If the number of points of support for the unobserved factor distribution grows large as the sample size increases, however, the identification problems may reappear. It is important to note that this form of underidentification typically has little substantive importance, for the impacts of the covariates on the outcomes will usually be identified. The underidentification will, in general, impact only the estimators of the components of the distribution of (1,2).

As in White's (1980) discussion of consistency in misspecified models, it is not clear how one should interpret the parameter estimates obatined from discrete factor approximations. In general the estimator will converge to a particular value in large samples. The relationship between these limiting values and the parameters of interest, however, is a complicated function of all of the parameters of interest, the true underlying joint distribution of the disturbances and the assumed exogenous variables, and the imposed distributional assumptions. The results of this Monte Carlo study, in conjunction with the Monte Carlo results reported in Mroz and Guilkey (1992), suggest that one can have some confidence in placing conventional interpretations on the parameter estimates obtained from these discrete factor approximations. These approximation estimators do appear to work well in a variety of situations. The Monte Carlo results suggest that they can help researchers avoid false inferences due to the imposition of incorrect joint distribution assumptions while providing relatively precise point estimates.


V. Practical Problems in the Estimation of the Factor Models

There appear to be three difficulties in estimating parameters based upon the quasi-maximum likelihood factor models. The first problem is the existence of multiple local optima. Our strategy is to choose a fairly extensive grid for starting values. In practice, we proceed in two stages. We first select a grid of 15 to 75 separate starting values for each maximization problem and find the best set of estimates for each replication (estimation) in each specification of the data generating process in the Monte Carlo study. Next, we take the entire set of estimates for a particular specification of the data generating process and use each of the "final" estimates in the set as starting points in the grid as additional starting values. The results reported here are based upon 100 replications for each experiment, so well over 100 different starting values are used in each optimization problem. Our experiences suggest this is usually more than adequate for eliminating non-global optima.

The second problem arises in the discrete branch of the likelihood function. In some instances the best set of estimates implies that the Prob(d=1) has a point mass. In the two point factor model, for example, we sometimes find that

This arises because the loading on the common factor, 1, is large. This does not present any major difficulties in the statistical model, but it should make one hesitant to place a substantive interpretation on estimates of the distribution of the unobserved factor. The main drawback from obtaining such estimates is that the Hessian matrix is singular. In order to obtain "standard errors" of the estimates, one must use such an occurrence as a pretest and formulate a simplified factor structure that builds in the point mass feature. With larger sample sizes, this problem occurred infrequently. In many applications, however, an extreme point mass can be quite meaningful. David and Mroz(1989a, 1989b) and Heckman and Walker(1990) use such formulations explicitly to model sterility in fertility models.

The third problem arises in higher point of support factor models. In a few instances the estimates imply, for example, that the three points of support are identical to two points of support. This also gives rise to a singular Hessian matrix. Again, one must use this as a pretest to indicate that a simpler factor model fits the data in order to obtain "standard error" estimators. Our evaluation of the standard error estimators incorporates both of these types of pretests.


VI. Data Generating Process

The majority of our experiments focus on what we consider to be fairly typical scenarios for most micro econometric studies. First, we use four sample sizes (1,000, 2,000, 3,000, and 5,000) that roughly capture the range of sample sizes used in many micro studies. Second, we set the pseudo-R2, defined by Var(y1*-)/Var(y1*) or Var(y2-2)/Var(y2), to be approximately 0.20 in both equations. Third, we usually set the error correlations to 0.33. Besides varying the sample sizes, we also undertake some limited experiments with higher R2 values(0.33 and 0.50) and error correlations (0.50). Appendix 1 describes the data generating process in detail.

Most real economic applications of this class of econometric models contain numerous regressors, but it is not be feasible to undertake detailed comparisons of the estimators within a high dimensional parameter space. We chose not to use real data to define the exogenous variables in this model, as we felt that the low dimensional parameter spaces we are forced to examine could not be suitably manipulated to approximate accurately "real" situations. We do, however, choose to use a distribution for the exogenous variables that roughly matches the distribution of education in the U.S. population. There is a rapid rise in the distribution function of the exogenous variables to a sharp peak(e.g., 12 years of school), followed by a fast drop(13-15 years), then a moderate peak(16 years), and rapid decline. To achieve this the exogenous variables are drawn from a skewed distribution (a convolution of a chi-square random variable and a compound normal random variable). The marginal distributions of the two exogenous variables, x and z, are identical, and the correlation between the exogenous random variables arises from correlations of the compound normal components. Figure 1 compares the empirical density of the standardized exogenous random variables and a normal distribution.

In most experiments we impose the condition that the exogenous variables have a 0.80 correlation coefficient. This high level of correlation seems appropriate, given that economic theories usually imply that nearly all exogenous variables influence both outcomes. We also undertake a more limited set of experiments where the exogenous variables in the two equations are identical. This corresponds to the case in which the researcher is unwilling to specify exclusion restrictions but is willing to achieve identification through functional form and distributional assumptions.

We consider two different frequencies for the occurrence of the discrete events (E(d){0.50,0.75}) and three distributional classes for the bivariate distribution. In all instances the correlation of the error terms and the non-normality of the disturbances is generated through the unobserved factors. For normal disturbances, the unobserved factors are standard normal random variables. We use two different methods to generate non-normal factors. The first uses a continuous uniform distribution, and the second uses a skewed distribution. The skewed factors come from a mixture of three normal distributions with unequal means and variances. Figure 2 contains a comparison of the standardized skewed factor distribution and a standard normal distribution.

Note that the disturbances considered in this study never fall within the class defined by the discrete factor model in equation (5) when the number of points of support is finite. This means that the discrete factor estimator is always incorrectly specified. We consider the fact that this "incorrectly specified" estimator performs quite well under a variety of distributional assumptions to be one of the most attractive features of the approach.


VII. Monte Carlo Results: Biases and Mean Square Errors

We begin our discussion of the Monte Carlo experiments by examining the performance of the estimators when the true data generating process has bivariate normal disturbances. In this instance normal maximum likelihood is consistent and achieves the Cramer-Rao lower bound; the two stage estimator is consistent; and the discrete factor estimators are inconsistent. We examine four different sample sizes, two frequencies of the E(d), two different specifications of the correlation of disturbances, and three different R2's. Still, in the context of normal disturbances, we examine the consequences of not imposing exclusion restrictions across equations by allowing the same exogenous regressor to influence both the continuous and discrete outcomes. We next examine briefly the performance of the estimators when the disturbances are symmetric but not normally distributed. Finally, we examine the performance of the estimators when there are unobserved skewed factors.

VII.1 Normal Distributions

Table 1 contains summary statistics from the Monte Carlo experiments for our baseline specification: R2=0.20 in both equations, error correlation =0.33, and regressor correlation=0.80. It contains only information about the parameter 2, the impact of the dummy endogenous variable on the continuous outcome, the true value of which is 1.00. The left side of the table uses an expected frequency of the discrete event of 0.50, and the right side has an expected frequency of 0.75. The four horizontal panels present results for sample sizes 1,000, 2,000, 3,000, and 5,000. All statistics are based upon 100 replications of each experiment. A similar format is used in all tables. To provide an indication of the large sample bias of the estimators, Appendix Table A.1 contains estimates from a single replication with a sample size of 100,000 for most of the data generating processes we examine.

Ignoring the endogeneity of the dummy endogenous variable (OLS, the first row of each panel) yields a significant bias, with the average point estimates being 120 to 130% larger than the true parameter value. The two-stage estimator, normal MLE, and the three and four point of support DFM estimators all appear to have little bias. The two point of support DFM estimator does have appreciable bias, but the bias is only half as large as that found with the OLS estimator. Given this bias, we focus mainly on the three and four point of support models in the discussion below.

Not surprisingly, the normal MLE has the smallest MSE of all estimators for all specifications considered in this table. The MSE for the two-stage estimator is appreciably larger than that for the normal MLE. In terms of the MSE, the two-stage estimator appears to perform about the same as the three point of support estimator and slightly better than the four point of support estimator when E(d)=0.50. At sample size 5,000, however, it has a larger MSE than both of these discrete factor estimators. At E(d)=0.75, only for sample size 1,000 does the two-stage estimator have a smaller MSE than either of these two discrete factor estimators. With an unequal split of the endogenous discrete event (E(d)=0.75), the discrete factor models appear to outperform the two stage estimator.

Table 2 contains summary statistics when the R2 in each equation is 0.33 instead of the 0.20 examined above, and Table 3 considers the case where the R2 are 0.50. All other aspects of the data generating processes are the same as in Table 1, including the normally distributed disturbances and the 0.33 error correlation. As expected, all estimators perform better in terms of MSE with these higher R2. In general the comparisons of the estimators are quite similar to those found in Table 1. The two-stage and the three and four point of support estimators show little bias, and the normal MLE typically provides the smallest MSE.

There are, however, two important exceptions. In only one of the sixteen specifications examined in Tables 2 and 3 does the two-stage estimator have a smaller MSE than either the three or four point of support discrete factor estimators. This happens despite the fact that the two-stage estimator typically has a smaller bias than either of these two inconsistent estimators. Second, the performance of the three and four point of support estimators relative to the normal MLE appears to improve at higher R2. In several instances the three point of support estimator has a smaller empirical MSE than the efficient MLE, and the four point of support estimator often has MSE's only 0-15% higher than the normal MLE. Tables 2 and 3 indicate that the higher the explanatory power of the model the less one should rely upon inefficient two-stage estimators and, especially at larger sample sizes, the smaller the advantage of the efficient MLE over either the three or four point of support estimators.

Table 4 has a data generating process identical to that used in Table 1, except that there is an error correlation of 0.50 rather than 0.33. One important result of the higher error correlation is the increase in the bias of the discrete factor estimators. Except at the smallest sample size, the bias is still relatively small for the four point of support model. We also examined discrete factor approximations with five, six, and seven points of support for some of the specifications in Table 4. The bias decreased appreciably as we added more points of support.

Again, the two-stage estimator performs quite poorly in terms of MSE. The ratio of its MSE to that of the normal MLE ranges from 1.64 to 3.69, and in all four specifications with E(d)=0.75 the ratio is above 2.30. Compared to the three and four point of support estimators, the two-stage estimator also appears deficient in terms of MSE, despite the fact that its bias is much smaller than the bias of three point of support estimator. In only one of eight cases does the two-stage estimator have a smaller MSE than either the three or the four point of support estimators (16 comparisons, two for each of eight cases). In this one instance, its MSE is only 4% smaller than the MSE for the three point of support model. For the specifications with E(d)=0.50, the four point of support model has only a 16-35% larger MSE than the efficient normal MLE. At sample size 5,000 its MSE is less than 16% larger than the normal MLE. The relative performance of the four point of support model does fall appreciably at smaller sample sizes when the E(d)=0.75.

Table 5 uses the same data generating process used in Table 1 with one important exception, namely that exactly the same exogenous regressor is used in both the discrete and continuous outcome equations. Even though all estimators are identified through the linearity and distributional assumptions, the performance of every estimator in terms of MSE deteriorates dramatically in this instance. This is especially true for the two-stage estimator despite the fact that it typically has the smallest bias of any of the six estimators. In fact, the naive OLS estimator appears to have the smallest MSE on average. Table A.2 contains estimates for specifications with skewed error distributions and no exclusion restrictions. With no exclusion restrictions and skewed disturbances, each of the discrete factor models usually has an MSE at least 50% smaller than those found for the OLS, two-stage, and normal MLE estimators. With weak instruments it appears that the discrete factor models outperform the other estimators attempting to control for endogeneity, unless one knows the class of the joint error distribution and uses the appropriate maximum likelihood estimator.

VII.2 Non-Normal Disturbances

Table 6 uses the same data generating process as in Table 1 except that the unobserved factor giving rise to the error correlation follows a uniform distribution. This specification provides a more platykurtic error distribution while retaining symmetry. In this instance, each of six estimators we examine is asymptotically biased, but of the estimators that attempt to control for endogeneity, only the two point of support estimator appears to have much bias. The major difference from Table 1 is that the bias of the two point of support estimator falls considerably in each of the eight specifications. This bias reduction results in the two point of support estimator having the smallest MSE in half of the eight specifications. Each of the other four estimators that attempt to control for the non-zero error correlation have little bias, and the bias of the OLS estimator is approximately the same as it is for normal distributions.

The normal MLE still performs quite well with these symmetric, non-normal disturbances. The relative performance of the two-stage estimator and the four point of support estimator also appears to be about the same. The two-stage estimator has smaller MSE's than the four point estimator for E(d)=0.50 at small sample sizes. At larger sample sizes or for E(d)=0.75 the four point of support estimator tends to have smaller MSE's. In seven of the eight specifications, the MSE increases as one uses a discrete factor model with more points of support, though at larger sample sizes the efficiency loss with additional points of support is usually small. This suggests that it may be useful to consider more complex models with larger sample sizes.

Table 7 summarizes the Monte Carlo experiments for skewed distributions at sample size 3,000. The first panel in Table 7 corresponds to the specification in Table 1, except that there are skewed disturbances. Panels 2 through 4 similarly correspond to Tables 2 through 4, respectively. Results for all four sample sizes are in Appendix Tables A.3-A.6.

The baseline case examined in the first panel indicates substantively different performances by the estimators in the presence of skewed disturbances. The most noteworthy change is the considerable bias in the normal based MLE when E(d)=0.75. Even when the normal based MLE appears to have little bias (e.g., when E(d)=0.50, with low correlation), its MSE is larger than those of the four point of support estimator. As with symmetric disturbances, the two-stage estimator always has little empirical bias, but its MSE is often 50% to 200% larger than the MSE of either the three or four point of support models.

Panels 2 and 3 in Table 7 investigate the consequences of higher R2 on the estimators. The bias of the normal based MLE does diminish considerably with more explanatory power, but it is still fairly large when E(d)=0.75. The three and four point of support estimators still appear to be superior to the two-stage estimator and the normal MLE in terms of MSE's.

The final panel of Table 7 examines the consequences of higher error correlations with skewed distributions, and the evidence clearly points out the superior performance of the three and four point of support estimators. Normal MLE is severely biased when E(d)=0.50. When E(d)=0.75, it estimates large, incorrectly-signed estimates which, with absolute t-statistics exceeding 10, a researcher would conclude are quite significant. The MSE of the two-stage estimator is three to six times that of the four point of support estimator. At smaller sample sizes it can have a MSE more than 15 times larger than the four point of support model (see Appendix Table A.5).

At larger sample sizes with skewed distributions the relative performance of the two-stage estimator does improve substantially. But even at sample size 5,000 its best relative MSE is 36% larger than that of the four point model (range 36% - 285% larger MSE for n=5,000 with skewed distributions). This improvement in relative performance appears to be due to the fact that the discrete factor models we examined are somewhat biased even at four points of support. In a few instances we examined discrete factor models with five, six, and seven points of support, and typically the bias and the MSE declined as we added additional points of support.

In the presence of skewed disturbances neither the normal based MLE nor the two-stage estimator performs well when compared to the discrete factor model with four points of support. Normal based MLE can be extremely biased, with the bias depending upon the correlation of disturbances, the R2, and E(d). As in all the symmetric disturbance experiments, the two-stage estimator performs well in terms of bias with skewed distributions. It is, however, fairly imprecise when compared to the discrete factor models.



(Click here for tables 1 through 9 or here for the appendix tables)

VIII. Monte Carlo Results: Choosing the Number of Points of Support

Little research has been done on selecting the number of points of support for discrete factor distributions in finite samples, and we use our Monte Carlo experiments to help shed light on this issue. Based upon a Mean Square Error metric, we find that one should use a fairly liberal criterion for adding additional points of support. It appears, for example, that both the Schwartz and the Akaike Information Criterion (AIC) lead one to choose too few points of support, especially in small samples. We also evaluate the size of the confidence intervals for these estimators in the presence of pretests for the selection of the model.

The primary approach we consider for selecting the number of support points is based on an examination of the increase in the value of the quasi-likelihood function when one adds an additional point of support. This is a strict upwards-testing approach, but it seems to correspond to the approach used by many empirical researchers when deciding whether to use more complicated empirical models. We start with a one point of support model, which corresponds to an OLS estimation of (1b) and an independent probit estimation of (1a). We compare its likelihood function to that of a discrete factor model with two points of support. The two point of support model adds three parameters (two factor loadings, 1 and 2, and one discrete probability), and we use a likelihood ratio "Chi-Square" test with 3 degrees of freedom at significance level to determine the rejection or acceptance of the model with one point of support. If we accept the simpler model at significance level , we do not consider more complicated models.

If we reject the one-point of support model, we perform a Chi-Square test for whether one should reject or accept the two point of support model when compared to a three point of support model. This test has 2 degrees of freedom (one additional point of support, , and one additional discrete probability). We carry out the test at the same significance level used to test for the "significance" of adding a second point of support. If we reject this two point of support model in favor of the three point of support model, we then consider whether to accept the three point of support model or the four point of support model. We use the same approach and significance level to choose between these two models.

Due to computational constraints we examine discrete factor models with more than four points of support in the Monte Carlo experiments in only a few instances. In all cases presented here, unless noted otherwise, a rejection of the three point of support model means that we use the four point model as the preferred point estimate without further upwards testing. We use the Monte Carlo experiments to examine the choice of significance level for the bias, mean square error, and performance of the estimated standard errors for the estimator based upon these pre-test criteria.

Under the null hypothesis that the smaller number of points of support is the true model, the Hessian matrix for the alternative model (i.e., additional points of support) is singular. Consequently, the likelihood ratio test statistic does not follow an asymptotic Chi-Square distribution under the null. We examine the performance of this "invalid" test statistic because it is simple to calculate and because it is quite similar to the Akaike and Schwartz test statistics. Also, it appears to work quite well.

We did consider a second approach for choosing the number of points of support. This approach examines whether the coefficient on the dummy endogenous variable, 2, changes "significantly" as one adds additional points of support. The intuition behind this test is whether allowing for a more complex discrete approximation to an underlying continuous distribution function has an appreciable impact on the parameter of interest. To do this we construct the joint covariance matrix for all parameters in the one, two, three, and four point of support models (see, Mroz, 1987) and carry out "t-tests" of no significant change in 2 when adding each additional point of support. Again, we use a strict upwards testing criteria for selecting the number of points of support and use the same significance level at each step of the upwards testing procedure. This approach performed comparably to the likelihood ratio test in the Monte Carlo experiments. In the empirical example, however, this approach performed much worse than the likelihood value approach. We do not recommend its use unless one allows for non-normal disturbances without endogeneity.

We also use similar upwards testing criteria for evaluating whether one should use the OLS estimate of equation (1b) instead of either the two stage estimator or the normal based maximum likelihood estimator. For the test of OLS versus the two stage estimator, we use a Durbin-Wu-Hausman test of whether the predicted probability significantly enters an OLS regression after controlling for the impact of the dummy variable. The standard errors we use control for the pre-estimation bias. For deciding whether to use OLS or the normal based maximum likelihood estimator we use a standard likelihood ratio test of the null hypothesis that the correlation coefficient equals zero. The Monte Carlo experiments help us to evaluate the performance of various significance levels as a metric for determining the types of controls one should use, if any, to control for endogeneity.

We also consider four specifications of the data generating process where the error correlation is zero when we present results based upon data driven criteria for selecting the number of points of support and for deciding whether to control for endogeneity (E(d)=0.50 and 0.75 with normal errors, and E(d)=0.50 and 0.75 with skewed errors in the probit). In these four cases endogeneity is not a problem. Appendix Tables A.7 and A.8 contain details on the Monte Carlo results with these data generating processes. In general, the discrete factor models have higher MSE than the other estimators when there are no endogeneity problems. This is not surprising, as the discrete factor models add at least three new parameters to control for endogeneity while the two stage and normal MLE each add only one new parameter.

As discussed in Section IV, in some instances the calculated Hessian matrix for the discrete factor model can be singular. When we encounter situations where this happens, we impose constraints on the parameters determining the discrete factor distribution so that the singularity disappears. All standard error estimators used in this study use this pre-test when constructing covariance matrices for the discrete factor models.


VIII.1 Biases and Mean Squared Errors

To evaluate the performance of these simple rules for selecting the number of points of support, Figures 3a - 3d present the average point estimates for each estimation procedure for particular sample sizes and frequencies of the event {d=1}. Each point on the graph is an average of 1000 estimates, 100 from each of ten different specifications of the data generating process. These ten specifications are those in Tables 1,2,3,and 4, and from Appendix Tables A.3-A.8. To conserve space, we only present these graphs for sample sizes 1000 and 5000.

The horizontal axis on each graph measures the "significance level" used to decide whether to accept a simpler model according to the upwards testing criteria discussed above. At =100, for example, the maximum likelihood approach always allows to be different from zero; the two stage approach always uses the predicted value of the P{d=1} instead of the actual value of the dummy variable; and the discrete factor models always use four points of support. At =0 one would always accept the simplest model for each approach; this is the OLS estimate for each of the approaches we consider. For interior points, the level of determines: (1) whether the two stage approach uses the predicted values of the dummy variable instead of just a simple OLS estimation with the actual value of the dummy variable, (2)whether the maximum likelihood approach permits to be different from zero, and (3) the number of points of support to use for the discrete factor models. To retain more detail in the graphs we do not include the point =0, but the average value corresponding to the OLS estimation is reported in each graph's lower title. The solid horizontal line at value 1 indicates the true value of the parameter in all data generating processes.

Figures 3a and 3b use E(d)=0.50, and Figures 3c and 3d use E(d)=0.75. Figures 3a and 3c have sample sizes of 1000, while Figures 3b and 3d have sample sizes of 5000. Note that the vertical scale is different for each of the four graphs. Looking first at E(d)=0.50, we see that the mean estimates for the most complex models for each approach (at =100) are quite close to the true value. There is a slight indication of negative bias for the normal maximum likelihood estimator, and this is due nearly entirely to the data generating processess with skewed disturbances. At E(d)=0.75 (Figures 3c and 3d), we see much the same behavior, except that the normal maximum likelihood estimator exhibits a substantial negative bias.

These figures provide key insights about the significance level one should use for deciding whether to control for endogeneity of the dummy explanatory variable. They also provide guidance for deciding how many points of support to use with the discrete factor model. For small sample sizes, typical significance levels of 5 or 10 percent for deciding whether endogeneity is an important concern would yield fairly large biases for all estimation approaches. While not displayed directly in these graphs, this is even true for the normal maximum likelihood estimator when all disturbances are truly normally distributed.

These figures do suggest that one should use a fairly liberal criterion for deciding whether to use more complex estimation procedures to control for endogeneity. At small sample sizes, a 25% test to decide whether to control for endogeneity would eliminate most of the biases in the two stage procedure; this is also the case for the normal maximum likelihood estimator when the disturbances are joint normal. For the discrete factor estimator it appears that one should use at least at 25% significance level for deciding whether to add additional points of support to the discrete factor distribution. In none of the examples we consider did there appear to be much evidence that one should consider significance levels higher than 50%.

Note that the AIC would suggest using too small a significance level for deciding whether to consider more complex models. For the normal maximum likelihood approach the AIC value woud imply a likelihood ratio "test" significance level of about 16%. For the discrete factor models the AIC would imply a significance level of 11.2% as indicating whether one should control for any form of endogeneity, and a significance level of 13.5% when deciding whether to add additional points of support. Note that the Schwartz criterion would yield considerably more bias in the point estimates than the AIC, as it is more conservative than the AIC for deciding when to consider more complex models.

Figures 4a-4d display the empirical mean square errors of the estimation approaches. These figures follow the same format as Figures 3a-3d; the horizontal axes here also indicate the "significance levels" used to decide whether one should use a more complex estimation approach. Figures 4c and 4d do not include the MSE from the normal maximum likelihood estimator because they are so large. The graph titles display the smallest MSE across all levels for normal MLE as well as the MSE for the OLS estimator. At all levels greater than 5% there is not a single instance where either the normal likelihood approach or the two stage approach has a smaller MSE than the discrete factor model. This statement is also true for the sample sizes 2,000 and 3,000.

With several of the data generating processes it appeared that discrete factor models with more than four points of support were needed to fit the data well. In a couple instances we allowed for up to seven points of support for the discrete factor models. In those instances we followed the same approach as that used in Figures 3 and 4 to select models. There was a fairly large decline in both the bias and the MSE for the discrete factor models when we expanded the maximum number of points of support from four to seven. Consider the model with E(d)=0.75, skewed disturbances, and 5,000 observations(as in Table A.6, bottom panel). When we allowed for a maximum of seven points of support instead of a maximum of four, at an upwards testing level of 0.25 the mean point estimate fell from 1.22 to 1.10, and the MSE fell by nearly 40%, from 0.056 to 0.036 . At an level of 0.50 the mean point estimate was 1.015 with a MSE of 0.025, compared to a point estimate of 1.22 and a MSE of 0.054 when there is a maximum of four points of support. This suggests that our assessments in Figures 3 and Figures 4, based on a maximum of four points of support, understate the advantages of the discrete factor model over either the two stage or the normal maximum likelihood approach.

The evidence from Figures 3 and 4 present a very strong case for using the discrete factor models in this setting. Provided that one uses at least a 25% "significance level" for deciding whether to add additional points of support, there is little bias in the discrete factor models. Even in the worst case the bias is less than 15%, and this falls off quickly as the sample size increases. The mean square error graphs also indicate that using an upwards testing -level of 25 to 50% for selecting all models will tend to yield the smallest mean square errors. The performance of the discrete factor models in terms of mean square error is quite remarkable. This approach appears to dominate the only slightly biased two stage estimator at all but the smallest significance levels for choosing more complex models.


VIII.2 Size Tests

To examine the performance of the standard error estimators, we compare the number of rejections of the null hypothesis 2=1 in favor of the alternative 21. To carry out these hypothesis tests we incorporate the upwards testing approach for deciding whether to control for endogeneity and, in the case of the discrete factor model, whether to add additional points of support. Based on the implications from Figures 3 and 4, we use a 25% significance level for deciding whether to use the simpler model. At smaller significance levels there are large biases for all of the estimation procedures, resulting in excessive rejections of the null hypothesis. The test statistic we use for evaluating the hypothesis 2=1 is a t-test, where the standard error estimator is adjusted for possible model misspecifications.

Figures 5(a)-5(d) contain graphs of the fraction of rejections by estimation approach against the requested size of the test, for sample sizes 1,000 and 5,000 and E(d)=0.50 and E(d)=0.75. Each graph is based on 100 replications of the ten different data generating processes listed above. At small sample sizes, the null hypothesis is rejected too frequently for each of the estimation methods. This is especially true at the E(d)=0.75. At larger sample sizes, the two stage estimator's empirical size matches the theoretical values quite closely, but both the normal maximum likelihood and the discrete factor model exhibit a tendency to overreject the null hypothesis.

The poor size performance of the maximum likelihood and discrete factor estimators is due mostly to the bias in these two estimators. If we restrict the size comparisons to data generating processes with normal disturbances, the normal maximum likelihood estimator has much better size properties. In those instances where the discrete factor model with 4 points of support has little bias, its empirical size matches the theoretical size much more closely. Because of the number of pretests used for choosing the number of points of support, its empirical size is still a bit too large in these instances.

The poor size properties of the discrete factor models can be mitigated by allowing for more than four points of support. As above, we took the data generating process with the poorest performing discrete factor model at sample size 5000 and examined estimations that allowed for up to seven points of support. This is the data generating process used in Table A.6 with 5,000 observations. Table 8 contains the empirical sizes for the four point of support estimator, the seven point of support estimator, and estimators using the upwards testing approach with maximums of four and seven points of support. Only 100 replications are used to construct these size tests.

The empirical size of the discrete factor model's standard error estimator improves dramatically by allowing additional points of support. This is not surprising, given how often the four point maximum was a potentially binding constraint. In 72 out of 100 experiments the 25% upwards testing criterion was restricted from examining more than four points of support; for 92 cases out of 100 the 50% upwards testing criterion was restricted to at most a four point of support model. However, even when allowing for up to seven points of support the upwards testing approach yields a fairly large positive bias. This results in excessive rejections of the null hypothesis. Estimators using an upwards testing criterion to select the number of points of support falsely reject much more frequently. While the size performance of the best upwards testing estimator appears quite poor, it is important to recognize that this poor performance is due mostly to pretest bias.

To further place this poor size performance into context, the number of rejections in the last row of Table 8 are almost identical to the number of rejections for the two step estimator when one has a sample size of 1,000 for the same data generating procedure and uses a standard 5% test for deciding whether to control for endogeneity. In models that rely upon asymptotic expansions, it can be the case that the size properties of tests improve by allowing for data driven selections of the number of terms to include in the expansions. Eastwood and Gallant's (1991) Monte Carlo experiments show that the bias reductions obtained by using random rules (e.g., upwards testing) for choosing the number of terms in the expansion instead of fixed rules (e.g., determined by sample size only) often improves the size properties of the estimators.



(Click here for tables 1 through 9 or here for the appendix tables)

IX. The Impact of Marriage on Wages

This example uses wage and demographic data for men aged 25-33 from the 1990 interview of the NLSY in conjunction with "attitudinal" data from the 1979 NLSY interview on marriage plans and views on the role of women and men in the family. We examine how marital status appears to impact the men's average hourly earnings for 1989, and we use the attitudinal data as instruments for predicting marital status. We focus on the White cross section cohort. Appendix 2 presents the sample selection criteria, the means of the data, and some estimation results. Overall, 927 of the 1678 young working men in this sample are married. The wage analysis drops 23 men with average hourly earnings below $2.00 or above $200.00. These observations are used in the marriage analysis.

Table 9 presents estimates of the "marriage effect" on log wages from the discrete factor model, normal maximum likelihood, and the two-stage procedure. The upwards testing criterion based on the value of the quasi-likelihood function for the discrete factor model suggests that one should use six points of support.

The point estimate from the six point discrete factor model is essentially 0, with a standard error of 0.048. There is a slight indication that the exclusion restrictions are not valid (p-value=.07) for this model, but relaxing these restrictions barely changes the point estimate (-.003) and the standard error (.043). Note that if one had used a standard 5% likelihood ratio test for adding points of support, one would have chosen a model with three points of support, and the estimated effect of marriage on wages would have been nearly 13%.

Normal maximum likelihood yields a point estimate of .02 with a much larger standard error (0.12) when the exclusion restrictions are imposed. When the exclusion restrictions are relaxed, the point estimate jumps dramatically to .63 with a standard error of .09. Most labor economists would agree that this is an absurd estimate. The two stage estimator behaves quite similarly to the normal MLE, except that the standard error increases to 2.79 for the model without the exclusion restrictions. The two stage estimator is almost completely uninformative here. If one had relied on a likelihood ratio test from the normal maximum likelihood estimator, most likely one would have concluded that endogeneity was not an important issue. In the specification with exclusion restrictions, the p-value for the likelihood ratio test is about 50%. Similarly, no test statistic from the two-stage estimator provided any indication of marriage being endogenous in the wage equation.

This example suggests that the likelihood function value approach for choosing the number of points of support for the discrete factor approximation is superior to an approach examining the point estimates for significant changes as one adds additional points of support. In this example, departures from normality of the log wage disturbances appear to be the most important problems with the goodness of fit for the model assuming independence (DFM 1). This is true even though the wage data have been trimmed to remove outliers. In fact, if one had relied upon a test of the significance of the change in the coefficient on the dummy endogenous variable to decide whether to add additional points of support, one would have concluded that endogeneity of marital status was not a problem.

The first few points of support appear to fill out the wage distribution to capture departures from normality. The two point of support model, for example, adds a point of support with weight 0.5% about 4 standard deviations away from the point of support with weight 99.5%. The three point of support model adds two points of support with combined weight of 2% about 2.5 standard deviations on either side of the point of support having 97.9% of the weight. Only with five or more points of support does the estimated correlation of the wage and marriage disturbances move appreciably from 0. The large increases in the value of the likelihood function that result from such low-weight mass points suggest that it is important to consider departures from normality when examining real data. The approach described in equation (7) provides a simple way to address this issue, but it has not been implemented in this example.

This real world example appears to mimic the Monte Carlo results with non-normal disturbances. The normal maximum likelihood estimates are somewhat unstable, and the two stage procedure is quite imprecise. The discrete factor model yields stable point estimates with relatively small standard errors.

It is useful to compare these estimates to those obtained by Korenman and Neumark (1991) in their examination of the impact of marriage on young men's wages. Korenman and Neumark use a "fixed effect" approach with longitudinal data to control for the endogeneity of marriage. They find that the estimated marriage effect falls from 12% when one ignores endogeneity to 6% when using the fixed effects model.

Their estimated effect is still much larger than the 0% effect found here. While it is not possible in this paper to evaluate precisely why their estimated marriage effects are so different from those found here, the differences may be due to the restrictive nature of the fixed effect estimator they use. In particular, Korenman and Neumark's fixed effect/first difference model assumes that there are no unobservable variables influencing both the change in marital status and changes in wages. This assumption rules out, for example, divorces resulting from temporary declines in wages as well as marriages that occur because one leaves school and starts to earn higher wages. Such assumptions seems quite unrealistic, and they might be leading to the larger marriage effects Korenman and Neumark find after "controlling" for the endogeneity of marriage through fixed effects models. In this instance, it could be the case that cross sectional data with careful controls for endogeneity yield better estimates than longitudinal data with simple fixed effect estimators.


VIII. Conclusions

These Monte Carlo results indicate that discrete factor approximations can provide reliable estimates in simple models with both continuous and discrete endogenous variables. The computational simplicity of the discrete factor approximations and their ability to yield interpretable estimates in a wide variety of circumstances suggest that these estimators may be important tools for empirical researchers. They appear to work as well in the simultaneous equation framework considered here as they work in hazard model analyses (see, Heckman and Singer(1984) or Mroz and Weir(1990)). The approach appears to allow researchers to relax arbitrary distributional assumptions while retaining much of the efficiency of maximum likelihood estimators.

The Monte Carlo results we present, however, are fairly limited. We examine only one econometric model, relatively high error correlations, models with only one or two explanatory variables, and homoscedastic disturbances. The utility of the approach in a wider range of settings needs to be explored further.

Other Monte Carlo studies have examined the performance of the discrete factor estimators. Cameron and Taber(1994), for example, present Monte Carlo evidence on the performance of the discrete factor approach in a different context: endogenous selection on whether one observes a discrete outcome. Their focus is on the ability of the discrete factor models to control for selection biases in longitudinal data, rather than on the ability of these methods to control for endogeneity. There are only exogenous variables in their model. Their conclusions about the performance of this approach are much stronger than ours. Mroz and Guilkey (1992) examined selection models and continuous endogenous determinants of discrete events. While their Monte Carlo experiments were not as detailed as those presented here, they also found the discrete factor models to have small bias and mean square error.

We do little to address the question of whether one should control for endogeneity when it is not a problem. Although there is no evidence that any of the estimators we consider have appreciable biases, the efficiency losses from controlling for endogeneity when it is not present can be enormous. At sample size 2,000 with E(d)=0.50 and normal disturbances, for example, the OLS estimate has a MSE of only 0.01. If one uses a 10 percent test as evidence of endogeneity, the MSE increases by about a factor of 10 for all the approaches we considered. Using a 25% test yields MSEs about 15 times larger than that obtained from OLS. Translating these MSEs to standard errors of estimates implies that correctly imposing independence (OLS) would yield a standard error of the estimated effect of 0.10. This would rise to 0.32 with a 10% test, and to 0.39 with a 25% test. Needlessly controlling for endogeneity clearly yields a significant cost in terms of precision of the estimated effect.

If, however, one does suspect that endogeneity is an important issue in models with continuous and discrete variables, then the results from this paper are quite useful. If one only cares about the bias of the estimates, two stage procedures appear to perform the best. An unbiased estimator, however, can easily yield estimates further away from the truth than an estimator with appreciable bias, and this can happen more frequently than not. If a researcher is confident of the true error distribution, nothing surpasses maximum likelihood. The cost of imposing incorrect assumptions to address endogeneity issues, however, can be greater than cost of ignoring the endogeneity problems. If one desires to retain much of the efficiency of maximum likelihood estimators while guarding against biases caused by imposing incorrect distributional assumptions, then one should seriously consider using discrete factor approximations.





(Click here for tables 1 through 9 or here for the appendix tables)

Appendix 1

The Data Generating Process for the

Disturbances, Parameters, and Explanatory Variables


Disturbances

The disturbances in this Monte Carlo study are generated by

where u1 and u2 are independent draws from a standard normal distribution, and v is a mean 0 variance 1 random variable that is independent of both u1 and u2. The common factor in the disturbances, v, usually comes from a mixture of three normal random variables. Its distribution function is given by

The µ's and k*s are adjusted such that E(v) = 0 and Var(v) = 1. Note that 1* = 1 will generate a standard normal distribution function.

The skewed factor distribution used in this study is obtained by setting

Figure 2 contain plots of the empirical distribution function for this choice of parameters for the common factor distribution and a comparison to a standard normal density. For the uniform factors, we use a mean zero, variance one continuous uniform distribution to generate the common factor.

The parameters,1, 2, 1, and 2 are chosen such that

R may vary from experiment to experiment, and 2 = 1. There is an identification problem in specifying 1 and 2 from a restriction only on the covariance between the disturbances. We arbitrarily solve this by setting

in all specifications.


Parameters

Let d and y2 in equation (1) be the observed outcomes in the model examined in this study. Here the continuous outcome depends upon an endogenous dummy shift. The quasi R2 for the y2 equation in (1) is defined as

A choice of this quasi R2 value, along with the restrictions 1 = 0 and 2 = 1, determines the value of 1. We use numerical simulations to help solve for the value of 1. The intercept 0 is chosen by numerical simulation such that a pre-specified proportion of y1's > 0 (i.e., d's =1) is obtained. 0 is always arbitrarily set to zero. In all situations we estimate parameters corresponding to 0, 1, 0, 1, and 2.

Explanatory Variables

Ruud (1984) demonstrates that certain linearity conditions on the expectation of the explanatory variables alone can lead to consistent estimation of some functions of the parameters of interest when one arbitrarily assumes a joint normal distribution for the disturbances in mixed continuous discrete models. Joint normality of the regressors would satisfy these conditions. To rule out such possibilities, we generate the explanatory variables from a compound normal distribution contaminated by a normalized 2 random variable. We permit the two "exogenous" variables to be correlated, and we impose the restriction that the marginal distributions of these "exogenous" variables are identical. These marginal distributions do not depend upon the degree of correlation between the exogenous variables, provided that the joint distribution function, as defined by the following procedure, exists.

Let X and Z denote the explanatory variables. The data generating process for these two variables comes from the convolution of a normalized 2 variate and a compound normal random variable:

C1 and C2 are independent standardized 2 random variables

W1 and W2 are correlated compound normal random variables, and they are generated according to the following rules.

Let the vectors V1 and V2 be bivariate normal random variables, where

and define the bivariate random variable W as W' = [W1 W2] . With probability q* , W = V1, and with probability (1-q*) W = V2. The means µ1 and µ2 are normalized such that E(W) = 0, and the variances 11 and 22 are chosen such that V(w1) = V(w2) = 1. The covariances 121 and 122 are chosen such that corr(v11, v21) = corr(v12, v22) and the correlation of X and Z equals a specified value in the data generating process. Note that it may not be possible to generate negative correlations or small positive correlations between X and Z from this framework.

Throughout this study we use the following parameterization to generate the explanatory variables X and Z


11 = 1, µ1 = -2

22 = 4

q = .5

K = 2 (i.e., a Chi-square with 2 degrees of freedom)

r = .2

The exogenous variables x and z have the same distribution, and they are normalized to have mean zero and variance one. Figure 1 plots the distribution function for the exogenous variables.




Appendix 2


Sample Description for the Impact of Marriage on Wages


Sample Selection Criteria:

Cross Sections 1 & 2 in NLSY(White men) Total Observations: 2439


Number of observations droppped in order of deletion

1)  unknown occupation                                       411
2)  in army                                                    2
3)  in school                                                131
4)  not coded as in labor force (ESR)                         91
5)  non-interview or valid skip on
     CPS measure of average hourly wage                       66
6)  unknown education                                          4
7)  valid skip or non-interview on
     local unemployment rate                                  26
8)  valid skip or non-interview on
     urban/rural residence                                     1
9)  missing marital status                                     1
10)  missing job tenure                                       27
11) missing health limitations                                 1

Total deleted                                                761

Remaining sample                                            1678

Those with cps wage <$2.00 or >$200.00/hr               23
Wage sample                                                 1655


Summary statistics


Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
currently|
 married |    1678    .5524434   .4973903          0          1  
logwage  |    1678    2.334773   .6977119   -4.60517   7.090077  
"Trimmed"|    1655    2.358087   .4817482    .751416   5.092461  
     age |    1678    28.93862   2.241352         25         33  
(age/10)2|    1678    8.424642   1.306761       6.25      10.89
    educ |    1678    13.14839   2.387361          5         20
(ed/10)2 |    1678    1.785763   .6596152        .25          4
 age·educ|    1678    3.807479   .7676286       1.25        6.6

# of children wanted in 1978
kidswant |    1678    2.308105   1.265503          0         12
kidswant |    1678    .0256257   .1580631          0          1 
unknown  |

traditional family roles
tradfamr |    1678    .5262217   .4994608          0          1

expect to marry within 5 years in 1978
msoon    |    1678    .3533969   .4781671          0          1
msoon    |    1678    .0995232   .2994525          0          1
unknown  |

age expect to marry [ <20, 20-24, 25-29, 30+, unknown; never excl. ]
  emlt20 |    1678    .0268176   .1615983          0          1  
  em2024 |    1678    .4338498   .4957526          0          1  
  em2529 |    1678    .3730632   .4837629          0          1  
   em30p |    1678     .079261   .2702263          0          1  
     emu |    1678    .0637664   .2444092          0          1  



Probit Estimates                                        Number of obs =   1678
                                                        chi2(15)      = 117.30
                                                        Prob > chi2   = 0.0000
Log Likelihood = -1095.2053                             Pseudo R2     = 0.0508

------------------------------------------------------------------------------
   cmarr |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     age |   .0858466    .387242      0.222   0.825      -.6731338    .8448269
    age2 |  -.2217953   .6637703     -0.334   0.738      -1.522761    1.079171
    educ |   -.173512   .1891488     -0.917   0.359      -.5442369    .1972129
   educ2 |  -.3789734   .3951314     -0.959   0.338      -1.153417      .39547
   ageed |   .9090156   .6057084      1.501   0.133      -.2781511    2.096182
kidswant |  -.0773628   .0274517     -2.818   0.005      -.1311671   -.0235586
  kidswu |  -.2335656   .2077477     -1.124   0.261      -.6407437    .1736125
tradfamr |   .0955492   .0640704      1.491   0.136      -.0300266    .2211249
   msoon |   .2674536   .0892743      2.996   0.003       .0924792    .4424279
  msoonu |   .4587366   .1621475      2.829   0.005       .1409334    .7765399
  emlt20 |   .5557342   .2965235      1.874   0.061      -.0254411     1.13691
  em2024 |   .5333174   .2241575      2.379   0.017       .0939768    .9726581
  em2529 |    .352462   .2186334      1.612   0.107      -.0760515    .7809755
   em30p |   .2352825   .2364911      0.995   0.320      -.2282315    .6987965
     emu |   .4374516   .2932701      1.492   0.136      -.1373472     1.01225
   _cons |  -1.408929   5.856403     -0.241   0.810      -12.88727    10.06941
------------------------------------------------------------------------------

. test  

> kidswant kidswu tradfamr 
> msoon msoonu 
> emlt20 em2024 em2529 em30p emu
>  ==0
           chi2( 10) =   53.26
         Prob > chi2 =    0.0000



OLS wage regression  imposing exclusion restrictions.

  Source |       SS       df       MS                  Number of obs =    1655
---------+------------------------------               F(  6,  1648) =   53.91
   Model |  62.9816611     6  10.4969435               Prob > F      =  0.0000
Residual |  320.880823  1648  .194709238               R-squared     =  0.1641
---------+------------------------------               Adj R-squared =  0.1610
   Total |  383.862485  1654  .232081309               Root MSE      =  .44126

------------------------------------------------------------------------------
   lwage |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   cmarr |   .1060622   .0222558      4.766   0.000       .0624095    .1497148
     age |   .0153748   .1319505      0.117   0.907      -.2434336    .2741831
    age2 |  -.0248268   .2236989     -0.111   0.912      -.4635909    .4139373
    educ |   .0502236   .0643461      0.781   0.435      -.0759852    .1764324
   educ2 |  -.1066927   .1339753     -0.796   0.426      -.3694723     .156087
   ageed |   .1831362   .2025048      0.904   0.366      -.2140576    .5803301
   _cons |   .8966725   2.009013      0.446   0.655      -3.043815     4.83716
------------------------------------------------------------------------------



Second stage OLS regression using predicted marital status from probit.

  Source |       SS       df       MS                  Number of obs =    1655
---------+------------------------------               F(  6,  1648) =   49.46
   Model |  58.5727897     6  9.76213161               Prob > F      =  0.0000
Residual |  325.289695  1648  .197384524               R-squared     =  0.1526
---------+------------------------------               Adj R-squared =  0.1495
   Total |  383.862485  1654  .232081309               Root MSE      =  .44428

------------------------------------------------------------------------------
   lwage |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   pmarr |   .0323493   .1253236      0.258   0.796       -.213461    .2781597
     age |   .0210875   .1330871      0.158   0.874      -.2399501    .2821251
    age2 |  -.0330511   .2255605     -0.147   0.884      -.4754665    .4093644
    educ |    .047695   .0651635      0.732   0.464      -.0801169    .1755069
   educ2 |  -.1179224   .1356532     -0.869   0.385      -.3839931    .1481483
   ageed |   .1988506   .2063959      0.963   0.335      -.2059752    .6036763
   _cons |   .8352219   2.023691      0.413   0.680      -3.134055    4.804499
------------------------------------------------------------------------------



OLS regressions without imposing exclusion restrictions

  Source |       SS       df       MS                  Number of obs =    1655
---------+------------------------------               F( 16,  1638) =   21.29
   Model |  66.0916726    16  4.13072954               Prob > F      =  0.0000
Residual |  317.770812  1638  .193999275               R-squared     =  0.1722
---------+------------------------------               Adj R-squared =  0.1641
   Total |  383.862485  1654  .232081309               Root MSE      =  .44045

------------------------------------------------------------------------------
   lwage |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   cmarr |   .1083769   .0225856      4.798   0.000       .0640772    .1526765
     age |  -.0363344   .1330175     -0.273   0.785      -.2972366    .2245678
    age2 |   .0920415   .2269451      0.406   0.685      -.3530916    .5371745
    educ |   .0624841   .0650886      0.960   0.337      -.0651815    .1901497
   educ2 |  -.0555548   .1350153     -0.411   0.681      -.3203757     .209266
   ageed |     .08221   .2068036      0.398   0.691      -.3234173    .4878373
kidswant |   .0108928   .0094552      1.152   0.249      -.0076527    .0294383
  kidswu |  -.1123383   .0724686     -1.550   0.121      -.2544792    .0298026
tradfamr |  -.0227265   .0221737     -1.025   0.306      -.0662183    .0207652
   msoon |   .0131649   .0311337      0.423   0.672      -.0479012     .074231
  msoonu |  -.0422679   .0551466     -0.766   0.444      -.1504332    .0658973
  emlt20 |   .0335269   .1032856      0.325   0.746      -.1690588    .2361127
  em2024 |    .055538   .0791391      0.702   0.483      -.0996865    .2107624
  em2529 |   .0551726   .0773436      0.713   0.476      -.0965302    .2068754
   em30p |   .0364417   .0835504      0.436   0.663      -.1274352    .2003187
     emu |   -.018971   .1010414     -0.188   0.851       -.217155     .179213
   _cons |    1.48087   2.019545      0.733   0.463      -2.480292    5.442032
------------------------------------------------------------------------------



  Source |       SS       df       MS                  Number of obs =    1655
---------+------------------------------               F( 16,  1638) =   19.58
   Model |  61.6318245    16  3.85198903               Prob > F      =  0.0000
Residual |   322.23066  1638  .196722015               R-squared     =  0.1606
---------+------------------------------               Adj R-squared =  0.1524
   Total |  383.862485  1654  .232081309               Root MSE      =  .44353

------------------------------------------------------------------------------
   lwage |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   pmarr |   .6288386   3.309861      0.190   0.849      -5.863167    7.120845
     age |  -.0576732   .1946963     -0.296   0.767      -.4395531    .3242068
    age2 |    .144464    .408141      0.354   0.723      -.6560692    .9449972
    educ |   .0989321   .2312275      0.428   0.669      -.3546006    .5524648
   educ2 |   .0159662   .4907274      0.033   0.974       -.946553    .9784854
   ageed |  -.1017288   1.168164     -0.087   0.931      -2.392981    2.189524
kidswant |   .0260079   .0970085      0.268   0.789      -.1642659    .2162816
  kidswu |  -.0680534   .2974283     -0.229   0.819      -.6514331    .5153263
tradfamr |   -.041243   .1213657     -0.340   0.734      -.2792914    .1968054
   msoon |  -.0412366   .3469587     -0.119   0.905      -.7217659    .6392928
  msoonu |  -.1331389   .5786455     -0.230   0.818      -1.268102    1.001824
  emlt20 |  -.0759555   .7084654     -0.107   0.915      -1.465549    1.313638
  em2024 |  -.0488251   .6755105     -0.072   0.942       -1.37378     1.27613
  em2529 |  -.0138478   .4518125     -0.031   0.976      -.9000388    .8723432
   em30p |  -.0095669   .3101193     -0.031   0.975       -.617839    .5987053
     emu |  -.1012738   .5398936     -0.188   0.851      -1.160228    .9576807
   _cons |    1.54769   2.113395      0.732   0.464       -2.59755    5.692931



Six Point of Support Model with Exclusion Restrictions Imposed


    log likelihood: -2050.23781543049972

  name       estimate          std err          t-stat       
Marriage "Probit"
 consprob    -.810056E+00      .594568E+01     -.136243E+00  
 age          .126258E+00      .393788E+00      .320624E+00  
 age2        -.302344E+00      .675563E+00     -.447543E+00  
 educ        -.184793E+00      .192056E+00     -.962183E+00  
 educ2       -.402649E+00      .400869E+00     -.100444E+01  
 ageed        .975862E+00      .615062E+00      .158661E+01  
 kidswant    -.813971E-01      .278529E-01     -.292239E+01  
 kidswu      -.195354E+00      .211818E+00     -.922274E+00  
 tradfamr     .106713E+00      .648320E-01      .164599E+01  
 msoon        .263024E+00      .901635E-01      .291719E+01  
 msoonu       .467607E+00      .164004E+00      .285119E+01  
 emlt20       .557630E+00      .298319E+00      .186924E+01  
 em2024       .527880E+00      .225944E+00      .233633E+01  
 em2529       .348290E+00      .220420E+00      .158012E+01  
 em30p        .232671E+00      .238403E+00      .975957E+00  
 emu          .455103E+00      .296246E+00      .153623E+01  
Log(wage) "Regression"
 conslwag     .220381E+01      .186459E+01      .118193E+01  
 cmarr        .452250E-02      .423772E-01      .106720E+00  
 age          .882706E-01      .123841E+00      .712771E+00  
 age2        -.159894E+00      .211250E+00     -.756898E+00  
 educ         .294886E-01      .569697E-01      .517620E+00  
 educ2       -.129426E+00      .118802E+00     -.108943E+01  
 ageed        .277169E+00      .180929E+00      .153193E+01  
 sigma        .259782E+00      .147016E-01      .176704E+02  
 probrho     -.186763E+01      .738784E+00     -.252798E+01  
 contrho     -.369695E+01      .238730E+00     -.154859E+02  
 prcof2      -.754086E-01      .264781E-01     -.284796E+01  
 prcof3       .147483E+00      .608543E-01      .242354E+01  
 prcof4      -.745873E+00      .132947E+00     -.561031E+01  
 prcof5       .414619E+01      .199546E+00      .207782E+02  
 prcof6       .107719E+01      .114423E+00      .941413E+01  
 supcof2     -.102733E+01      .358057E+00     -.286916E+01  
 supcof3     -.162662E+00      .154333E+00     -.105397E+01  
 supcof4      .368555E+00      .126970E+00      .290270E+01  
 supcof5      .113419E+01      .166642E+00      .680611E+01  

 k:  1  support:    .0000000  pweight:    .0012067   
 k:  2  support:    .2636029  pweight:    .0046096   
 k:  3  support:    .4594240  pweight:    .1127383   
 k:  4  support:    .5911099  pweight:    .6523969   
 k:  5  support:    .7562690  pweight:    .2234337   
 k:  6  support:   1.0000000  pweight:    .0056148   

   hetero mean: 0.613238773661688996
    hetero var: 0.941486038428253000E-02   hetero sd: 0.970302034640891098E-01
     correlation matrix 
          discrete:      1.000000         .102627
     continuous:          .102627        1.000000



Six Point of Support Model without Exclusion Restrictions Imposed
        log likelihood: -2041.19809206370337
   name       estimate          std err          t-stat      
 consprob    -.790552E+00      .595756E+01     -.132697E+00 
 age          .122184E+00      .394641E+00      .309608E+00 
 age2        -.285726E+00      .676734E+00     -.422212E+00 
 educ        -.178199E+00      .192267E+00     -.926827E+00 
 educ2       -.402010E+00      .401546E+00     -.100116E+01 
 ageed        .949956E+00      .615241E+00      .154404E+01 
 kidswant    -.797748E-01      .280079E-01     -.284830E+01 
 kidswu      -.221593E+00      .213289E+00     -.103893E+01 
 tradfamr     .106811E+00      .652185E-01      .163774E+01 
 msoon        .261440E+00      .907911E-01      .287957E+01 
 msoonu       .450052E+00      .164902E+00      .272922E+01 
 emlt20       .592101E+00      .300381E+00      .197117E+01 
 em2024       .564634E+00      .227769E+00      .247898E+01 
 em2529       .382769E+00      .222107E+00      .172336E+01 
 em30p        .252865E+00      .239859E+00      .105422E+01 
 emu          .475527E+00      .298238E+00      .159446E+01 
 conslwag     .237777E+01      .188449E+01      .126176E+01 
 cmarr       -.296655E-02      .433844E-01     -.683783E-01 
 age          .544600E-01      .125640E+00      .433460E+00 
 age2        -.724368E-01      .215648E+00     -.335902E+00 
 educ         .452740E-01      .565562E-01      .800514E+00 
 educ2       -.105410E+00      .118385E+00     -.890402E+00 
 ageed        .190389E+00      .181903E+00      .104665E+01 
 kidswant     .629455E-02      .904924E-02      .695590E+00 
 kidswu      -.971542E-01      .688796E-01     -.141049E+01 
 tradfamr    -.177277E-02      .201495E-01     -.879810E-01 
 msoon       -.786345E-03      .289598E-01     -.271530E-01 
 msoonu      -.554660E-01      .507629E-01     -.109265E+01 
 emlt20       .112273E+00      .932978E-01      .120339E+01 
 em2024       .122687E+00      .729740E-01      .168125E+01 
 em2529       .115605E+00      .712517E-01      .162249E+01 
 em30p        .708609E-01      .758613E-01      .934084E+00 
 emu          .557587E-01      .906693E-01      .614968E+00 
 sigma        .258756E+00      .135135E-01      .191479E+02 
 probrho     -.196630E+01      .728849E+00     -.269782E+01 
 contrho     -.362461E+01      .251086E+00     -.144357E+02 
 prcof2      -.815992E-01      .298748E-01     -.273137E+01 
 prcof3       .140578E+00      .503797E-01      .279037E+01 
 prcof4       .720175E+00      .125231E+00      .575077E+01 
 prcof5       .255168E+01      .146856E+00      .173754E+02 
 prcof6       .119825E+01      .692427E-01      .173050E+02 
 supcof2     -.117876E+01      .397184E+00     -.296779E+01 
 supcof3     -.224523E+00      .161624E+00     -.138917E+01 
 supcof4      .356524E+00      .132969E+00      .268127E+01 
 supcof5      .114378E+01      .175268E+00      .652591E+01 
    k:  1  support:    .0000000  pweight:    .0012120  
    k:  2  support:    .2352758  pweight:    .0035934  
    k:  3  support:    .4441040  pweight:    .0904507  
    k:  4  support:    .5881988  pweight:    .6669649  
    k:  5  support:    .7580317  pweight:    .2316757  
    k:  6  support:   1.0000000  pweight:    .0061032  
 
   hetero mean: 0.615043682890612731
    hetero var: 0.974158473377843837E-02   hetero sd: 0.986994667350256649E-01
   correlation matrix 
       discrete:      1.000000         .109598
     continuous:       .109598        1.000000



References
Amemiya, T., 1985, Advanced Econometrics,Cambridge: Harvard University Press. Anderson, T. and H. Rubin, 1956, "Statistical Inference in Factor Analysis," in Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, J.Neyman, ed., Berkeley: University of California, Vol. V, pp. 111-150. Arabmazar, A. and P. Schmidt, 1981, "Further Evidence on the Robustness of the Tobit Estimator to Heteroskedasticity," Journal of Econometrics, 17, pp. 253-58. , 1982, "An Investigation of the Robustness of the Tobit Estimator to Non-Normality," Econometrica, 50, pp. 1055-63. Blau, D., 1994, "Labor Force Dynamics of Older Men," Econometrica, 62(1), pp.117-56. Cameron, S. and C. Taber, 1994, "Evaluation and Identification of Semiparametric Maximum Likelihood Models of Dynamic Discrete Choice,"Mimeo, University of Chicago, November. Cosslett, S.J., 1983, "Distribution-free Maximum Likelihood Estimator of the Binary Choice Model," Econometrica, 51, pp. 765-82. David, P.A., and T.A. Mroz," 1989a, "Evidence of Fertility Regulation Among Rural French Vllagers, 1749-1789: A Sequential Econometric Modelof Birth-Spacing Behavior (Part 1),"European Journal of Population, Vol. 5, No. 1, (1989), pp. 1-26. David, P.A., and T.A. Mroz," 1989b, "Evidence of Fertility Regulation Among Rural French Vllagers, 1749-1789: A Sequential Econometric Modelof Birth-Spacing Behavior (Part 1),"European Journal of Population, Vol. 5, No. 2,(1989), pp. 173-206. Eastwood, B. J., and A. R. Gallant,1991, "Adaptive Rules for Seminonparametric Estimators that Achieve Asymptotic Normality,"Econometric Theory, No.3, Vol. 7, pp.307-40. Everitt, B.S. and D. J. Hand, 1981, Finite Mixture Distributions, London: Chapman and Hall. Follman, D. and D. Lambert, 1989, "Generalizing Logistic Regression by Nonparametric Mixing," Journal of the American Statistical Association, Vol. 84, pp. 295-300. Geweke, J., 1991, "Efficient Simulation from the Multivariate Normal and Student-t Distributions Subject to Linear Constraints," forthcoming, Computing Science and Statistics: Proceedings of the Twenty-Third Symposium on the Interface. Goldberger, A., 1983, "Abnormal Selection Bias," in S. Karlin, T. Amemiya, and L. Goodman, eds., Studies in Econometrics, Time Series and Multivariate Statistics, New York: Academic Press. Gritz, R. M., 1993, "The Impact of Training on the Frequency and Duration of Employment." Journal of Econometrics, Vol. 57, pp. 21-51. Heckman, J., 1978, "Dummy Endogenous Variables in a Simultaneous Equation System," Econometrica, Vol. 46, pp. 931-960. Heckman, J., 1981, "Statistical Models for Discrete Panel Data," in C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data with Econometric Applications, Cambridge: The MIT Press.

Heckman, J. and T. MaCurdy, 1981, "New Methods for Estimating Labor Supply Functions: A Survey," in R. Ehrenberg, ed., Research in Labor Economics. London: JAI Press, 4. Heckman, J. and T. MaCurdy, 1986, "Labor Econometrics," in Z. Griliches and M. Intriligator, eds., Handbook of Econometrics, Vol. 3, New York: North-Holland, pp. 1917-1977.

Heckman, J. and B. Singer, 1984, "A Method for Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration Data," Econometrica, Vol. 52, pp. 271-320. Heckman, J. and J. Walker, 1990, "The Relationship between Wage and Income and the Timing and Spacing of Births: Evidence from Swedish Longitudinal Data," Econometrica, Vol 58, pp. 1411-1441. Heckman, J. and R. Willis, 1977, "A Beta Logistic Model for Analysis of Sequential Labor Force Participation by Married Women," Journal of Political Economy, Vol. 85, pp. 27-58. Hurd, M., 1979, "Estimation in Truncated Samples When There is Heteroskedasticity," Journal of Econometrics, 11, pp. 247-58. Keane, M, R. Moffitt, and D. Runkle, 1988, "Real Wages over the Business Cycle: Estimating the Impact of Heterogeneity with Micro Data," Journal of Political Economy, 96, No.6, pp. 1232-1266. Killingsworth, M., 1983, Labor Supply, Cambridge: Cambridge University Press. Killingsworth, M. and J. Heckman, 1986, "Female Labor Supply: A Survey," in O. Ashenfelter and R. Layard, eds., Handbook of Labor Economics, Vol. 1, New York: North-Holland, pp. 3-204. Kochar, A., 1991, An Empirical Investigation of Rationing Constraints in Rural Credit Markets in India, Ph.D. dissertation, Department of Economics, University of Chicago. Korenman, S. and D. Neumark,1991, "Does Marriage Really Make Men More Productive?", Journal of Human Resources, Vol.26 No.2, pp. 282-307. Lee, L-F, 1982, "Some Approaches to the Correction of Selectivity Bias," Review of Economic Studies, Vol. 49, pp. 355-72. Lee, L-F, 1983, "Generalized Econometric Models with Selectivity," Econometrica, Vol. 51, pp. 507-12. Maddala, G.,1983, Limited-Dependent and Qualitative Variables in Econometrics, Cambridge: Cambridge University Press. McFadden, D., 1989, "A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration," Econometrica, Vol. 57, pp. 995-1026. Mroz, T., 1987, "The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions," Econometrica, Vol. 55, pp. 765-799. Mroz, T. and D. Guilkey, 1992, "Discrete Factor Approximations for Use in Simultaneous Equation Models with Both Continuous and Discrete Endogenous Variables," mimeo, Department of Economics, University of North Carolina, Chapel Hill. Mroz, T. and D. Weir, 1990, "Structural Change in Life Cycle Fertility During the Fertility Transition: France Before and After the Revolution of 1789," Population Studies, vol.44, pp. 61-87. Mroz, T. and D. Weir, 1994, "Random Parameters and Approximations to Stochastic Dynamic Optimization Models with an Application to Age at Marriage and Life Cycle Fertility Control inFrance Under the Ancien Regime," Mimeo, UNC, Chapel Hill. Narendranathan, W. and P. Elias, 93,"Influences of Past History on the Incidence of Youth Unemployment: Empirical Findings for the UK," Oxford Bulletin of Economics, Vol. 55, vol. 2, pp. 161-85. Newey, W., Powell, J., and Walker, J., 1990, "Semiparametric Estimation of Selection Models: Some Empirical Results," American Economic Review, 80, No. 2, pp. 324-28. Paarsch, H., 1984, "A Monte Carlo Comparison of Estimators for Censored Regression Models," Journal of Econometrics, 24, pp. 197-213. Park, B. U., and Marron, J. S., 1990, "Comparisons of Data- Driven Bandwidth Selections," Journal of the American Statistical Association, Vol. 85, pp.66-72. Powell, J., 1987, "Semiparametric Estimation of Bivariate Latent Variable Models," Working Paper No. 8704, SSRI, University of Wisconsin-Madison, July. Rivers, D. and Q. Vuong, 1988, "Limited Information Estimation and Exogeneity Tests for Simultaneous Probit Models," Journal of Econometrics, 39, pp. 347-66. Robinson,P.M., 1988, "Root-N-Consistent Semiparametric Regression," Econometrica, 56, pp. 931-54. Ruud,P., 1986, "Consistent Estimation of Limited Dependent Variable Models Despite Misspecification of Distribution," Journal of Econometrics, 32, pp. 157-187. White, H., 1982, "Maximum Likelihood Estimation of Misspecified odels," Econometrica Vol. 50, pp. 1-26.

(Click here for tables 1 through 9 or here for the appendix tables)