« Functional Form | Main | HW, due 3/1/05 »
February 22, 2005
Left-Out / Irrelevant Variables
We considered models with one and two regressors, where the issue is the possible inclusion of X3.

Here, the C's are sample covariances, the V's are sample covariances, and the r's are correlations.
If X3 is included, the formula for the estimate of b2 differs from the formula if X3 is not included. The estimate of b2 depends on the covariance between X2 and X3 and the covariance between X3 and Y.
Left-Out Variable: X3 is left-out when it should not be. The formula for b2 is wrong, producing bias. The only exception is when the sample covariance between X2 and X3 is zero.
Irrelevant Variable: X3 is put in when it is not needed. The formula for b2 does not produce bias because it is not wrong to estimate b3, which is zero. The formula for the variance of b2 shows that the estimate loses efficiency because the sample correlation between X2 and X3 increases the variance of b2 (unless that sample correlation is zero).
Multicollinearity: the degree of linear association among the regressors. In this case, the degree of collinearity is the covariance/correlation between X2 and X3. The left-out variable and irrelevant variable problems are more important when this collinearity is strong.
The example we discussed was a regression of income on education with the ability variable playing the role of X3. If ability is not available, then you can only consider the possible effects of the left-out variable bias as the X3 term is forced to be part of the error term.
Posted by bparke at February 22, 2005 08:34 PM