STA511: Final Examination
Question 1
A linear regression model contains a constant and two other regressors. The sample consists of $S (< \infty)$ observations, so after de-meaning the model to eliminate the intercept, the model in vector form is give by
[$y = x2 \beta2 + x3 \beta3 + \epsilon, \,\,\,\, \text{(\textbf{True model})}$]
where the regressors $x2, x3$ are $S \times 1$ vectors, and $\beta2, \beta3$ are the (scalar) unknown parameters. The variables $y, x2, x3$ are thus in mean-deviation form. The $S\times 1$ error vector $\epsilon$ satisfies the assumptions: $E(\epsilon | x2, x3) = 0$ and $E(\epsilon \epsilon' | x2, x3) = \sigma^2 IS$ with $\sigma^2 < \infty$; and the sample correlation coefficient between $x2$ and $x3$ equals 0.57. A researcher knows these true properties of the model but does not have the $x3$ series, which means the researcher is forced to work with the model:
$y = x2 \beta2 + u, \,\,\,\, \text{(\textbf{Run model})}$
Explain carefully what the properties of the $u$ error term are in the Run model. Hence, determine whether or not Ordinary Least Squares (OLS) applied to the Run model will provide unbiased estimates of the unknown $\beta2$. If biased, you should derive the explicit formula for the bias OLS estimator will exhibit. You should explicitly compare the bias and inconsistency properties of OLS implied by using only the single regressor $x2$ as opposed to using both regressors $x2$ and $x3$ implied by the full model. (6 marks)\
It is suggested to this researcher that the problem may be overcome through the use of Instrumental Variables estimation, which introduces an instrument $x4$ and defines the IV estimator: $\hat{\beta}2^{IV} = (x4^{'}x2)^{-1} x_4^{'} y$
What are the properties of this estimator compared to the one available in (1)? What are the requirements for $x_4$ in order for it to be a "good" instrument in the sense of making the IV estimator perform well? Are these conditions likely to be satisfied? (6 marks)\
Another suggestion is to obtain an imperfect measure of the unavailable $x3$ regressor, in the form of: $z3 \equiv x3 + v$ where the measurement error $v$ is independent of $x2, x3$ and $\epsilon$, with finite variance $\sigmav^2$. One then applies OLS of $y$ regressed on $x2$ and $z3$. What are the bias and inconsistency properties of the resulting estimator for $\beta_2$, compared to those obtained in (1) and (2)? Which estimator would you recommend in this setting? (6 marks)\
Suppose your team member claims that deep neural network can help removing all issues discussed above. Considering the fact that any form of neural network is to capture non-linear movements of the target variable, provide your rebuttal. (4 marks) Does your rebuttal change if $S \rightarrow \infty$, so-called "Big" data? (3 marks)
Question 2
Consider the linear regression model $y = X \beta + \epsilon$ where $y$ and $\epsilon$ are $T \times 1$ vectors, and $X$ a $T\times k$ matrix where $T< \infty$. The \textit{t}th observation of this model satisfies: $yt = xt^{'} \beta + \epsilon_t$ It is believed that the error term follows a weakly stationary MA(5) process defined by:
$\epsilont = \gamma1 v{t-1} +...+ \gamma5 v_{t-5}$
where the Gauss-Markov error $vt$ is distributed, conditionally on $X$, independently and identically $vt | X \sim N(0,\sigma^2)$. Consider the null hypothesis $H0: \gamma1 =...= \gamma5 =0$ against the alternative $H1: \gamma1 \neq 0$ and/or $\gamma2 \neq 0$... and/or $\gamma_5 \neq 0$.
1.Define two test procedures for testing $H0$ against $H1$, one based on the Wald principle and one on the Likelihood Ratio principle. Compare the two procedures in terms of computational difficulty as well as statistical properties. Can you suggest any difficulties for Lagrangian Multiplier test? (5 marks)
2.How would your answer to (1) change if you are now told $x{2t} = y{t-2}$? (3 marks)
- Given $x{2t} = y{t-2}$, by the further investigation on your data, you realized that the error follows below AR(1) process. $\epsilont = \delta1 \epsilon{t-1} + vt$
where $v_t | X \sim iid N(0, \sigma^2)$. Can you justify the structural changes in autocorrelation? If so, how does your reasoning in (1) and (2) help building your argument? (4 marks)
4.Given $x{2t} = y{t-2}$ and AR(1), how is your approach affected? (4 marks)
5.Would your answers above change if you believed $vt |X$ were i.i.d. Laplace (double-exponential) with p.d.f. given by: $f(vt) = \frac{1}{2 \sigma^2} \exp (-\frac{|v_t|}{\sigma})$
If so, how would they change? (3 marks) Discuss value of neural network model in this case. (3 marks) Is your logic affected, if $T \rightarrow \infty$? (3 marks)