mattblackwell / gov2002-book Goto Github PK

View Code? Open in Web Editor NEW

78.0 78.0 19.0 11.67 MB

An Introduction to Statistical Inference and Regression

Home Page: https://mattblackwell.github.io/gov2002-book/

TeX 99.91% CSS 0.09%

gov2002-book's People

Contributors

Stargazers

Watchers

Forkers

kyusik-yang whentostart canhthiendang dkahle econometrics zekiakyol exclusivemekus rtravis89 rudidu maxdrohde dataunirio cgmoreh dadasmash l-d-s tiagovier meanregressive aubreympungose

gov2002-book's Issues

Mathematical inaccuracy in second last sentence of second paragraph in Chapter 5

This aforementioned second last sentence is:

If any variable in $\mathbf{X}_i$ is continuous, we must estimate an infinite number of possible values of $\mathbf{x}$.

This should instead be:

If any variable in $\mathbf{X}_i$ is continuous, we must estimate an infinite number of possible values of $\mu(\mathbf{x})$.

since the thing of interest (that we want to estimate) is $\mu(\mathbf{x})$ and not the values $\mathbf{x}$.

Mistake in first equation below Section 5.2.2 in Chapter 5

The first (left) display equation after Section 5.2.2 is:

$m(X_i,X_i^2)=\beta_0+\beta_1 X_i\beta_2 X_i^2$.

This should be replaced with:

$m(X_i,X_i^2)=\beta_0+\beta_1 X_i+\beta_2 X_i^2$.

(added a missing plus sign)

Typo in second paragraph first sentence of Section 6.6

The first sentence of the second paragraph of Section 6.6 starts with:

Let $\mathcal{C}(\mathbb{X})={ \mathbb{X}\mathbf{b}\colon \mathbb{b}\in \mathbb{R}^2} be the column space...

This should be:

Let $\mathcal{C}(\mathbb{X})={ \mathbb{X}\mathbf{b}\colon \mathbb{b}\in \mathbb{R}^{k+1}} be the column space...

(replaced $2$ with $k+1$ in the dimension of the Euclidean space since outside of examples, it is assumed that $\mathbb{X}$ is a $n\times(k+1)$ matrix)

Typo in paragraph before "Chi-squared critical values" box

In the paragraph before the "Chi-squared critical values" box, there is a sentence that starts with:

After recentering ad rescaling by the covariance matrix, ...

This should be

After recentering and rescaling by the covariance matrix, ...

(replaced "ad" with "and")

Typo in Theorem 7.3 statement

The middle sentence in Theorem 7.3 is:

and its conditional sampling variance issue

This should be:

and its conditional sampling variance is

(changed "issue" to "is")

Error in third point of "Residual regression approach" box

The third point (3.) in the "Residual regression approach" box is:

Use OLS to regression $\tilde{\mathbf{e}}_2$ on $\tilde{\mathbb{X}}_1$.

This should be:

Use OLS to regress $\tilde{\mathbf{e}}_2$ on $\tilde{\mathbb{X}}_1$.

(replaced "regression" with "regress")

Typo after the $F=\frac{W}{q}$ display equation in Chapter 7

There is a sentence in Chapter 7 which starts with:

which also typically uses the the homoskedastic variance estimator ...

This should be replaced with:

which also typically uses the homoskedastic variance estimator ...

(removed an extra "the")

Mistake/typo in second displayed equation of Example 3.3

The second equation of Example 3.3 is:

$\sum_{i=1}^n (X_i - \bar{X}n)^2 = \sum{i=1}^n X_i^2 + n\bar{X}_n$

but should be

$\sum_{i=1}^n (X_i - \bar{X}n)^2 = \sum{i=1}^n X_i^2 + n\bar{X}_n^2$

which is consistent with the later derivation.

Grammatical typo in third last paragraph before Section 5.4 of Chapter 5

The first sentence of the third last paragraph before Section 5.4 is:

Thus, we can write the CEF with two binary covariates as linear when the linear specification includes and multiplicative interaction between them $(x_1,x_2)$.

This should instead be:

Thus, we can write the CEF with two binary covariates as linear when the linear specification includes a multiplicative interaction between them $(x_1,x_2)$.

(replaced "and" with "a" after "includes")

Typo in statement of Theorem 3.3 https://mattblackwell.github.io/gov2002-book/03_asymptotics.html

Theorem 3.3 states:

Let $X_1,\dots,X_n$ be a an i.i.d. draws ...

which should probably be replaced by

Let $X_1,\dots,X_n$ be iid draws ...

(removing the extra "a" and keeping the iid shorthand consistent with the rest of the book)

Typo in last line of Example 4.3 in Chapter 4

The last line of Example 4.3 ends with:

... and if we want an asymptotically level of 0.05, we can reject when $|T|>1.96$.

This should instead be:

... and if we want an asymptotic level of 0.05, we can reject when $|T|>1.96$.

(replace "asymptotically" with "asymptotic")

Possible grammatical error right before Section 2.4.2 https://mattblackwell.github.io/gov2002-book/02_estimation.html#fn1

Before Section 2.4.2, there is the sentence:

Maximum likelihood estimators have very nice properties, especially in large samples. Unfortunately, it also requires the correct knowledge of the parametric model, which is often difficult to justify.

The second sentence should probably refer to MLEs in the plural as so:

Maximum likelihood estimators have very nice properties, especially in large samples. Unfortunately, they also require the correct knowledge of the parametric model, which is often difficult to justify.

Typo in third sentence of Chapter 5

The third sentence of Chapter 5 is:

For example, we may want to know how wait voting poll wait times vary as a function of some socioeconomic features of the precinct, like income and racial composition.

This should instead be:

For example, we may want to know how voting poll wait times vary as a function of some socioeconomic features of the precinct, like income and racial composition.

(removed an extra "wait" before "voting poll")

Slight grammatical error at https://mattblackwell.github.io/gov2002-book/01_intro.html

In point #2 on this page, the sentence:

So many methods are either use regression estimators like ordinary least squares or extend it in some way.

should be modified to

So many methods either use regression estimators like ordinary least squares or extend it in some way.

i.e. there's an extra "are" term.

Missing $\mathbb{X}$ in second display equation in Section 6.9.1

The first part of the second display equation in Section 6.9.1 is:

$\hat{\mathbf{Y}}=\mathbf{P}\mathbf{Y}$

This should be replaced with:

$\hat{\mathbf{Y}}=\mathbf{P}_{\mathbb{X}}\mathbf{Y}$

to keep the notation of the projection matrix consistent with what is used elsewhere in the chapter

Grammatical typo after Figure 6.6

The sentence after Figure 6.6 is:

One measure of influence is called DFBETA$_i$ measures how much $i$ changes the estimated coefficient vector ...

This should be:

One measure of influence, called DFBETA$_i$, measures how much $i$ changes the estimated coefficient vector ...

Capitalization typo in paragraph before Section 6.9.1

A sentence in the paragraph before Section 6.9.1 starts with:

Thus, by definition, This means that when an observation ...

This should be replaced by:

Thus, by definition, this means that when an observation ...

Grammatical error in Warning box before Section 4.9 of Chapter 4

The second sentence of the second paragraph of the Warning box before Section 4.9 is:

Of course, this doesn’t make sense from our definition because the p-values conditions on the null hypothesis—it cannot tell us anything about the probability of that null hypothesis.

This should instead be:

Of course, this doesn’t make sense from our definition because the p-value conditions on the null hypothesis—it cannot tell us anything about the probability of that null hypothesis.

(replace "p-values" with "p-value")

Since-then grammatical error in Section 7.1

One sentence after the Section 7.1 header reads:

Remember that since $\hat{\boldsymbol{\beta}}$ is a vector, then the variance of that estimator will actually be a variance-covariance matrix.

This should be:

Remember that since $\hat{\boldsymbol{\beta}}$ is a vector, the variance of that estimator will actually be a variance-covariance matrix.

(removed the "then")

Typo in sentence before Section 7.4.1

A sentence right before Section 7.4.1 starts with:

Recall the heteroskedastic-consistent variance estimator is ...

This should be:

Recall the heteroskedastic-consistent variance estimator ...

(removed the "is" because right after the display equation the sentence starts with "is")

Missing Period and Publisher

Near the end of the page:

Wooldridge, Jeffrey. Econometric Analysis of Cross Section and Panel Data

should be replaced by

Wooldridge, Jeffrey. Econometric Analysis of Cross Section and Panel Data. The MIT Press.

for consistency with earlier bullet points?

Typo in second last sentence of first paragraph in Section 6.4

The end of the second last sentence of the first paragraph in Section 6.4 is:

... linear independence means that if $\mathbb{X} \mathbf{b} =0$ if and only if $\mathbf{b}$ is a column vector of 0s.

This should be:

... linear independence means that $\mathbb{X} \mathbf{b} =0$ if and only if $\mathbf{b}$ is a column vector of 0s.

(removed the "if" before the "if and only if" statement)

Mistake in first display equation after Section 5.3 of Chapter 5

The first display equation after Section 5.3 of Chapter 5 is (using the LaTeX shorcuts you defined in the source code):

$$ \mu(\X) = \E[Y_{i} \mid \X_{i}] = \argmin_{g(\X_i) \in L_2}; \E\left[(Y_{i} - f(\X_{i}))^{2}\right], $$

This should instead be:

$$ \mu(\X) = \E[Y_{i} \mid \X_{i}] = \argmin_{g(\X_i) \in L_2}; \E\left[(Y_{i} - g(\X_{i}))^{2}\right], $$

(replaced "f" in the argmin expression with "g" to match the "g" the argmin is being taken over)

Typo in Definition 4.4 of Chapter 4

The statement of Definition 4.4 is:

The p-value of a test is the probability of observing a test statistic is at least as extreme as the observed test statistic in the direction of the alternative hypothesis.

This should instead be:

The p-value of a test is the probability of observing a test statistic at least as extreme as the observed test statistic in the direction of the alternative hypothesis.

(the "is" before "at least" should be removed)

Typo in last sentence in paragraph after Section 5.3 of Chapter 5

The last sentence in the paragraph after Section 5.3 of Chapter 5 starts with:

In particular, if we label $L_2$ be the set of all functions of the covariates $g()$ that have finite squared expectation,...

This should instead be:

In particular, if we label $L_2$ to be the set of all functions of the covariates $g()$ that have finite squared expectation,...

(added "to" before "be")

Typos after Definition 4.3 in Chapter 4

The paragraph after Definition 4.3 in Chapter 4 is:

A test with a significance level of $\alpha = 0.05$ will have a false positive/type I error rate no larger than 0.05. This level is widespread in the social sciences, though you also will $\alpha = 0.01$ or $\alpha = 0.1$. Frequentists justify this by saying this means that with $\alpha = 0.05$, there will only be 5% of studies that will produce false discoveries.

This should instead be (with differences in brackets []):

A test with a significance level of $\alpha = 0.05$ will have a false positive/type I error rate no larger than 0.05. This level is widespread in the social sciences, though you also will [see] $\alpha = 0.01$ or $\alpha = 0.1$. Frequentists justify this by saying this means that with $\alpha = 0.05$, there will only be [at most] 5% of studies that will produce false discoveries.

(a missing "see" word and an addition of "at most" to make the last statement more precise)

Typos in first two points of "Linear projection assumptions" box

The first two points (1. and 2.) of the "Linear projection assumptions" box is:

${(Y_i,\mathbf{X}i)}{i=1}^n are iid random vectors.
$\mathbb{E}[Y_{i^2}]<\infty$ (finite outcome variance)

These should be replaced by:

${(Y_i,\mathbf{X}i)}{i=1}^n are iid random vectors
$\mathbb{E}[Y_{i}^2]<\infty$ (finite outcome variance)

(removed the period in (1.) and fixed the subscript typo in (2.))

Typos

Here a few typos I've collected while consulting the textbook (the corrections are in bold):

Here: "Implicit in this analysis [...]"
Here: "Remember that the interpretation of confidence [...]"
Here: "@fig-ci-sim shows 100 iterations of these steps. Here we see that, as expected, the large majority of calculated CIs contain the true value. [...] The guarantee of the 95% confidence intervals [...]"
Here: "we view $\mb{Y}$ as an $n$-dimensional vector in $\mathbb{R}^n$."

Thank you @mattblackwell for writing this helpful resource and making it publicly available for free!

Possibly wrong choice of `size` in Example 2.8 https://mattblackwell.github.io/gov2002-book/02_estimation.html#fn1

In Example 2.8, the random variable $X_i$ initially introduced has support on ${0,1,2,3,4}$, which has size 5. However, later on this is modeled as a Binomial with support ${0,1,2,3}$ with the choice size = 4, which doesn't match the original rv. It would probably be more appropriate to model using size = 5 for all the subsequent simulations in the example.

Trailing sentence in parenthesis before Jacobian matrix in Section 3.9

In the paragraph preceding the large Jacobian matrix display equation, there is a sentence which ends like:

... and be continuously differentiable (we make the function bold since it ).

The remark in the parenthesis should probably be completed.

Pairwise Independence does not imply Mutual Independence error in https://mattblackwell.github.io/gov2002-book/02_estimation.html

In the first paragraph after Section 2.2, it is said that:

They are independent in that the random vectors $X_i$ and $X_j$ are independent for all $i\neq j$,

However, this is pairwise independence, which is strictly weaker than mutual independence (which is required in the iid assumption). I checked this here:

https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables#Definition_for_more_than_two_random_variables

Potential mathematical typo in first Warning box https://mattblackwell.github.io/gov2002-book/02_estimation.html#fn1

In the first warning box, there is the sentence:

Then the statistic is a random variable that has a distribution over the numbers from {2, , 12} that describes our uncertainty over what the sum will be before we roll the dice.

However, the {2, , 12} seems like an odd choice to format the set, and could possibly be replaced with nicer LaTeX formatting, i.e. {2,\dots,12} (formatted in LaTeX)?

Grammatical error in sentence preceding Jacobian matrix in Section 3.9

The sentence right before the Jacobian matrix is:

It will help us use more compact matrix notation if we introduce a Jacobian matrix of all partial derivatives

This should be replaced with

It will help us to use more compact matrix notation if we introduce a Jacobian matrix of all partial derivatives

(adding a "to")

Typo in second sentence of Chapter 5

The second sentence in Chapter 5 is:

In particular, these tools show how the conditional mean of $Y_i$ varies as a function $\mathbf{X}_i$.

This should be:

In particular, these tools show how the conditional mean of $Y_i$ varies as a function of $\mathbf{X}_i$.

(added an "of" near the end)

Colon inconsistency in Example 2.10 https://mattblackwell.github.io/gov2002-book/02_estimation.html#fn1

Example 2.10 starts as:

Example 2.10 (Sampling variance of the sample mean:)

but this should probably be replaced with

Example 2.10 (Sampling variance of the sample mean)

to be consistent with the other examples before this

Missing period for sentence preceding Section 2.6.3 https://mattblackwell.github.io/gov2002-book/02_estimation.html#fn1

The sentence right before Section 2.6.3 is:

Given the above derivation, the standard error of the sample mean under iid sampling is $\sigma/\sqrt{n}$

This is missing a period at the end of the sentence, i.e. should be replaced with:

Given the above derivation, the standard error of the sample mean under iid sampling is $\sigma/\sqrt{n}$.

Punctuation typo in Blue box after Section 5.2.3 of Chapter 5

The paragraph before the numbered statements in the Blue box after Section 5.2.3 is:

Without some assumptions on the joint distribution of the data, The following “regularity conditions” will ensure the existence of the BLP:

This should be replaced by:

Without some assumptions on the joint distribution of the data, the following “regularity conditions” will ensure the existence of the BLP:

(changed "T" to "t" in "the")

Typo in sentence before Figure 3.3

The sentence right before Figure 3.3 reads:

... establishes how the standard 95% confidence interval for the sample mean above asymptotically valid.

This should be:

... establishes how the standard 95% confidence interval for the sample mean above is asymptotically valid.

(adding an "is")

Mistake in first sentence after first display equation in Section 5.4 of Chapter 5

The first sentence after the first display equation in Section 5.4 is:

so that the change in the predicted outcome for increasing $X_{i1}$ by one unit isn’t

Following the logical structure of the argument, this should be:

so that the change in the predicted outcome for increasing $X_{i1}$ by one unit is

(replaced "isn't" with "is")

Typo in Chapter 6

The start of a paragraph in Chapter 6 is:

Why does this rank condition matter for the OLS estimator? A key property of full column rank matrices is that $\mathbb{X}$ if of full column rank if and only if $\mathbb{X}'\mathbb{X}$ is non-singular and a matrix is invertible if and only if it is non-singular.

This should be replaced by:

Why does this rank condition matter for the OLS estimator? A key property of full column rank matrices is that $\mathbb{X}$ is of full column rank if and only if $\mathbb{X}'\mathbb{X}$ is non-singular and a matrix is invertible if and only if it is non-singular.

(replaced wrongly used "if" with "is")

Typo in first sentence of Footnote #2 in Chapter 4

The first sentence of Footnote #2 in Chapter 4 reads:

Different people and different textbooks describe what to do when do not reject the null hypothesis in different ways.

This should instead be:

Different people and different textbooks describe what to do when we do not reject the null hypothesis in different ways.

Inconsistent notation for indicator random variable in the book

In the proof of Theorem 3.1, the notation $\mathbb{1}$ is used for the indicator random variable. But in the definition of the empirical cdf right before Section 2.5, the notation $\mathbb{I}$ is used for the indicator random variable. Not sure if this is an issue or typo, but just wanted to make a note of it in case it is of interest.

Bold typo in display equation right before Theorem 7.1

The display equation right before Theorem 7.1 is:

$$ \bhat \inprob \beta + \mb{Q}_{\X\X}^{-1}\E[\X_ie_i] = \beta, $$

This should instead be:

$$ \bhat \inprob \boldsymbol{\beta} + \mb{Q}_{\X\X}^{-1}\E[\X_ie_i] = \boldsymbol{\beta}, $$

since beta is a vector and not a scalar.

Mistake in first display equation after Section 5.2.3 of Chapter 5

The first display equation after Section 5.2.3 is:

$\mathbf{x}'\boldsymbol{\beta}=x_1\beta_1+x_2\beta_2+\cdots+X_k\beta_k$.

This should instead be:

$\mathbf{x}'\boldsymbol{\beta}=x_1\beta_1+x_2\beta_2+\cdots+x_k\beta_k$.

(the last x should be lowercase)

Typo after $n\to \infty$ in Definition 3.1 https://mattblackwell.github.io/gov2002-book/03_asymptotics.html

The word after the $n\to \infty$ expression is "of" when it should probably be "or".

Typo in second last paragraph before Section 6.6

The last sentence of the second last paragraph before Section 6.6 is:

Then coefficient on the West dummy will be

This should be:

Then, the coefficient on the West dummy will be

(added a "the")

Typo in sentence after $T$ display equation after Section 4.11 of Chapter 4

The sentence after the $T$ test statistic display equation after the Section 4.11 heading is:

As we discussed in the earlier, an $\alpha = 0.05$ test would reject this null when $|T|>1.96$, or when...

This should instead be:

As we discussed earlier, an $\alpha = 0.05$ test would reject this null when $|T|>1.96$, or when...

(removed the "in the" before "earlier")

Grammatical typo in footnote at https://mattblackwell.github.io/gov2002-book/02_estimation.html#fn1

The first sentence of the footnote reads:

This approach to inference is often called a model-based approach since we assuming a probability model in the cdf, $F$.

and should instead be:

This approach to inference is often called a model-based approach since we are assuming a probability model in the cdf, $F$.

(a missing "are" term)

Possible inaccuracy in citation of Theorem 3.5 in Example 3.3

In the last paragraph of Example 3.3, it is said that the bias shrinks as a function of the sample size, so as long as the sampling variance shrinks as a function of the sample size (which it does), the estimator is consistent by Theorem 3.5.

However, Theorem 3.5 only says that an estimator is consistent if it is unbiased and its sampling variance shrinks as a function of the sample size, and no mention was made about the slightly stronger version used in Example 3.3.

mattblackwell / gov2002-book Goto Github PK

gov2002-book's People

Contributors

Stargazers

Watchers

Forkers

gov2002-book's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs