STA511: Final Examination

Question 1

A linear regression model contains a constant and two other regressors. The sample consists of $S (< \infty)$ observations, so after de-meaning the model to eliminate the intercept, the model in vector form is give by

$y = x_2 \beta _2 + x _3 \beta _3 + \epsilon, \,\,\,\, \text{(True model)}$

where the regressors $x _2$, $x _3$ are $S \times 1$ vectors, and $\beta _2, \beta _3$ are the (scalar) unknown parameters. The variables $y, x _2, x _3$ are thus in mean-deviation form. The $S\times 1$ error vector $\epsilon$ satisfies the assumptions: $E(\epsilon | x _2, x _3) = 0$ and $E(\epsilon \epsilon' | x _2, x _3) = \sigma^2 I _S$ with $\sigma^2 < \infty$; and the sample correlation coefficient between $x _2$ and $x _3$ equals 0.57. A researcher knows these true properties of the model but does not have the $x _3$ series, which means the researcher is forced to work with the model:

$y = x_2 \beta _2 + u, \,\,\,\, \text{(Run model)}$

Explain carefully what the properties of the $u$ error term are in the Run model. Hence, determine whether or not Ordinary Least Squares (OLS) applied to the Run model will provide unbiased estimates of the unknown $\beta _2$. If biased, you should derive the explicit formula for the bias OLS estimator will exhibit. You should explicitly compare the bias and inconsistency properties of OLS implied by using only the single regressor $x _2$ as opposed to using both regressors $x _2$ and $x _3$ implied by the full model. (6 marks)
It is suggested to this researcher that the problem may be overcome through the use of Instrumental Variables estimation, which introduces an instrument $x_4$ and defines the IV estimator: $\hat{\beta} _2^{IV} = (x _4^{'}x _2)^{-1} x _4^{'} y$

What are the properties of this estimator compared to the one available in (1)? What are the requirements for $x_4$ in order for it to be a "good" instrument in the sense of making the IV estimator perform well? Are these conditions likely to be satisfied? (6 marks)

Another suggestion is to obtain an imperfect measure of the unavailable $x_3$ regressor, in the form of: $z _3 \equiv x _3 + v$ where the measurement error $v$ is independent of $x _2, x _3$ and $\epsilon$, with finite variance $\sigma _v^2$. One then applies OLS of $y$ regressed on $x _2$ and $z _3$. What are the bias and inconsistency properties of the resulting estimator for $\beta _2$, compared to those obtained in (1) and (2)? Which estimator would you recommend in this setting? (6 marks)
Suppose your team member claims that deep neural network can help removing all issues discussed above. Considering the fact that any form of neural network is to capture non-linear movements of the target variable, provide your rebuttal. (4 marks) Does your rebuttal change if $S \rightarrow \infty$, so-called "Big" data? (3 marks)

Question 2

Consider the linear regression model $y = X \beta + \epsilon$ where $y$ and $\epsilon$ are $T \times 1$ vectors, and $X$ a $T\times k$ matrix where $T< \infty$. The \textit{t}th observation of this model satisfies: $y _t = x _t^{'} \beta + \epsilon _t$ It is believed that the error term follows a weakly stationary MA(5) process defined by:

$\epsilon _t = \gamma _1 v _{t-1} +...+ \gamma _5 v _{t-5}$

where the Gauss-Markov error $v _t$ is distributed, conditionally on $X$, independently and identically $v _t | X \sim N(0,\sigma^2)$. Consider the null hypothesis $H _0: \gamma _1 =...= \gamma _5 =0$ against the alternative $H _1: \gamma _1 \neq 0$ and/or $\gamma _2 \neq 0$... and/or $\gamma _5 \neq 0$.

1.Define two test procedures for testing $H _0$ against $H _1$, one based on the Wald principle and one on the Likelihood Ratio principle. Compare the two procedures in terms of computational difficulty as well as statistical properties. Can you suggest any difficulties for Lagrangian Multiplier test? (5 marks)

2.How would your answer to (1) change if you are now told $x _{2t} = y _{t-2}$? (3 marks)

Given $x _{2t} = y _{t-2}$, by the further investigation on your data, you realized that the error follows below AR(1) process. $\epsilon _t = \delta _1 \epsilon _{t-1} + v _t$

where $v_t | X \sim iid N(0, \sigma^2)$. Can you justify the structural changes in autocorrelation? If so, how does your reasoning in (1) and (2) help building your argument? (4 marks)

4.Given $x _{2t} = y _{t-2}$ and AR(1), how is your approach affected? (4 marks)

5.Would your answers above change if you believed $v_t |X$ were i.i.d. Laplace (double-exponential) with p.d.f. given by: $f(v _t) = \frac{1}{2 \sigma^2} \exp (-\frac{|v _t|}{\sigma})$

If so, how would they change? (3 marks) Discuss value of neural network model in this case. (3 marks) Is your logic affected, if $T \rightarrow \infty$? (3 marks)