GithubHelp home page GithubHelp logo

mattblackwell / gov2002-book Goto Github PK

View Code? Open in Web Editor NEW
78.0 78.0 19.0 11.67 MB

An Introduction to Statistical Inference and Regression

Home Page: https://mattblackwell.github.io/gov2002-book/

TeX 99.91% CSS 0.09%

gov2002-book's People

Contributors

mattblackwell avatar noahdasanaike avatar zekiakyol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gov2002-book's Issues

Mathematical inaccuracy in second last sentence of second paragraph in Chapter 5

This aforementioned second last sentence is:

If any variable in $\mathbf{X}_i$ is continuous, we must estimate an infinite number of possible values of $\mathbf{x}$.

This should instead be:

If any variable in $\mathbf{X}_i$ is continuous, we must estimate an infinite number of possible values of $\mu(\mathbf{x})$.

since the thing of interest (that we want to estimate) is $\mu(\mathbf{x})$ and not the values $\mathbf{x}$.

Typo in second paragraph first sentence of Section 6.6

The first sentence of the second paragraph of Section 6.6 starts with:

Let $\mathcal{C}(\mathbb{X})={ \mathbb{X}\mathbf{b}\colon \mathbb{b}\in \mathbb{R}^2} be the column space...

This should be:

Let $\mathcal{C}(\mathbb{X})={ \mathbb{X}\mathbf{b}\colon \mathbb{b}\in \mathbb{R}^{k+1}} be the column space...

(replaced $2$ with $k+1$ in the dimension of the Euclidean space since outside of examples, it is assumed that $\mathbb{X}$ is a $n\times(k+1)$ matrix)

Typo in paragraph before "Chi-squared critical values" box

In the paragraph before the "Chi-squared critical values" box, there is a sentence that starts with:

After recentering ad rescaling by the covariance matrix, ...

This should be

After recentering and rescaling by the covariance matrix, ...

(replaced "ad" with "and")

Typo in Theorem 7.3 statement

The middle sentence in Theorem 7.3 is:

and its conditional sampling variance issue

This should be:

and its conditional sampling variance is

(changed "issue" to "is")

Error in third point of "Residual regression approach" box

The third point (3.) in the "Residual regression approach" box is:

Use OLS to regression $\tilde{\mathbf{e}}_2$ on $\tilde{\mathbb{X}}_1$.

This should be:

Use OLS to regress $\tilde{\mathbf{e}}_2$ on $\tilde{\mathbb{X}}_1$.

(replaced "regression" with "regress")

Typo after the $F=\frac{W}{q}$ display equation in Chapter 7

There is a sentence in Chapter 7 which starts with:

which also typically uses the the homoskedastic variance estimator ...

This should be replaced with:

which also typically uses the homoskedastic variance estimator ...

(removed an extra "the")

Mistake/typo in second displayed equation of Example 3.3

The second equation of Example 3.3 is:

$\sum_{i=1}^n (X_i - \bar{X}n)^2 = \sum{i=1}^n X_i^2 + n\bar{X}_n$

but should be

$\sum_{i=1}^n (X_i - \bar{X}n)^2 = \sum{i=1}^n X_i^2 + n\bar{X}_n^2$

which is consistent with the later derivation.

Grammatical typo in third last paragraph before Section 5.4 of Chapter 5

The first sentence of the third last paragraph before Section 5.4 is:

Thus, we can write the CEF with two binary covariates as linear when the linear specification includes and multiplicative interaction between them $(x_1,x_2)$.

This should instead be:

Thus, we can write the CEF with two binary covariates as linear when the linear specification includes a multiplicative interaction between them $(x_1,x_2)$.

(replaced "and" with "a" after "includes")

Typo in last line of Example 4.3 in Chapter 4

The last line of Example 4.3 ends with:

... and if we want an asymptotically level of 0.05, we can reject when $|T|>1.96$.

This should instead be:

... and if we want an asymptotic level of 0.05, we can reject when $|T|>1.96$.

(replace "asymptotically" with "asymptotic")

Possible grammatical error right before Section 2.4.2 https://mattblackwell.github.io/gov2002-book/02_estimation.html#fn1

Before Section 2.4.2, there is the sentence:

Maximum likelihood estimators have very nice properties, especially in large samples. Unfortunately, it also requires the correct knowledge of the parametric model, which is often difficult to justify.

The second sentence should probably refer to MLEs in the plural as so:

Maximum likelihood estimators have very nice properties, especially in large samples. Unfortunately, they also require the correct knowledge of the parametric model, which is often difficult to justify.

Typo in third sentence of Chapter 5

The third sentence of Chapter 5 is:

For example, we may want to know how wait voting poll wait times vary as a function of some socioeconomic features of the precinct, like income and racial composition.

This should instead be:

For example, we may want to know how voting poll wait times vary as a function of some socioeconomic features of the precinct, like income and racial composition.

(removed an extra "wait" before "voting poll")

Missing $\mathbb{X}$ in second display equation in Section 6.9.1

The first part of the second display equation in Section 6.9.1 is:

$\hat{\mathbf{Y}}=\mathbf{P}\mathbf{Y}$

This should be replaced with:

$\hat{\mathbf{Y}}=\mathbf{P}_{\mathbb{X}}\mathbf{Y}$

to keep the notation of the projection matrix consistent with what is used elsewhere in the chapter

Grammatical typo after Figure 6.6

The sentence after Figure 6.6 is:

One measure of influence is called DFBETA$_i$ measures how much $i$ changes the estimated coefficient vector ...

This should be:

One measure of influence, called DFBETA$_i$, measures how much $i$ changes the estimated coefficient vector ...

Capitalization typo in paragraph before Section 6.9.1

A sentence in the paragraph before Section 6.9.1 starts with:

Thus, by definition, This means that when an observation ...

This should be replaced by:

Thus, by definition, this means that when an observation ...

Grammatical error in Warning box before Section 4.9 of Chapter 4

The second sentence of the second paragraph of the Warning box before Section 4.9 is:

Of course, this doesn’t make sense from our definition because the p-values conditions on the null hypothesis—it cannot tell us anything about the probability of that null hypothesis.

This should instead be:

Of course, this doesn’t make sense from our definition because the p-value conditions on the null hypothesis—it cannot tell us anything about the probability of that null hypothesis.

(replace "p-values" with "p-value")

Since-then grammatical error in Section 7.1

One sentence after the Section 7.1 header reads:

Remember that since $\hat{\boldsymbol{\beta}}$ is a vector, then the variance of that estimator will actually be a variance-covariance matrix.

This should be:

Remember that since $\hat{\boldsymbol{\beta}}$ is a vector, the variance of that estimator will actually be a variance-covariance matrix.

(removed the "then")

Typo in sentence before Section 7.4.1

A sentence right before Section 7.4.1 starts with:

Recall the heteroskedastic-consistent variance estimator is ...

This should be:

Recall the heteroskedastic-consistent variance estimator ...

(removed the "is" because right after the display equation the sentence starts with "is")

Typo in second last sentence of first paragraph in Section 6.4

The end of the second last sentence of the first paragraph in Section 6.4 is:

... linear independence means that if $\mathbb{X} \mathbf{b} =0$ if and only if $\mathbf{b}$ is a column vector of 0s.

This should be:

... linear independence means that $\mathbb{X} \mathbf{b} =0$ if and only if $\mathbf{b}$ is a column vector of 0s.

(removed the "if" before the "if and only if" statement)

Mistake in first display equation after Section 5.3 of Chapter 5

The first display equation after Section 5.3 of Chapter 5 is (using the LaTeX shorcuts you defined in the source code):

$$ \mu(\X) = \E[Y_{i} \mid \X_{i}] = \argmin_{g(\X_i) \in L_2}; \E\left[(Y_{i} - f(\X_{i}))^{2}\right], $$

This should instead be:

$$ \mu(\X) = \E[Y_{i} \mid \X_{i}] = \argmin_{g(\X_i) \in L_2}; \E\left[(Y_{i} - g(\X_{i}))^{2}\right], $$

(replaced "f" in the argmin expression with "g" to match the "g" the argmin is being taken over)

Typo in Definition 4.4 of Chapter 4

The statement of Definition 4.4 is:

The p-value of a test is the probability of observing a test statistic is at least as extreme as the observed test statistic in the direction of the alternative hypothesis.

This should instead be:

The p-value of a test is the probability of observing a test statistic at least as extreme as the observed test statistic in the direction of the alternative hypothesis.

(the "is" before "at least" should be removed)

Typo in last sentence in paragraph after Section 5.3 of Chapter 5

The last sentence in the paragraph after Section 5.3 of Chapter 5 starts with:

In particular, if we label $L_2$ be the set of all functions of the covariates $g()$ that have finite squared expectation,...

This should instead be:

In particular, if we label $L_2$ to be the set of all functions of the covariates $g()$ that have finite squared expectation,...

(added "to" before "be")

Typos after Definition 4.3 in Chapter 4

The paragraph after Definition 4.3 in Chapter 4 is:

A test with a significance level of $\alpha = 0.05$ will have a false positive/type I error rate no larger than 0.05. This level is widespread in the social sciences, though you also will $\alpha = 0.01$ or $\alpha = 0.1$. Frequentists justify this by saying this means that with $\alpha = 0.05$, there will only be 5% of studies that will produce false discoveries.

This should instead be (with differences in brackets []):

A test with a significance level of $\alpha = 0.05$ will have a false positive/type I error rate no larger than 0.05. This level is widespread in the social sciences, though you also will [see] $\alpha = 0.01$ or $\alpha = 0.1$. Frequentists justify this by saying this means that with $\alpha = 0.05$, there will only be [at most] 5% of studies that will produce false discoveries.

(a missing "see" word and an addition of "at most" to make the last statement more precise)

Typos in first two points of "Linear projection assumptions" box

The first two points (1. and 2.) of the "Linear projection assumptions" box is:

  1. ${(Y_i,\mathbf{X}i)}{i=1}^n are iid random vectors.
  2. $\mathbb{E}[Y_{i^2}]<\infty$ (finite outcome variance)

These should be replaced by:

  1. ${(Y_i,\mathbf{X}i)}{i=1}^n are iid random vectors
  2. $\mathbb{E}[Y_{i}^2]<\infty$ (finite outcome variance)

(removed the period in (1.) and fixed the subscript typo in (2.))

Typos

Here a few typos I've collected while consulting the textbook (the corrections are in bold):

  1. Here: "Implicit in this analysis [...]"
  2. Here: "Remember that the interpretation of confidence [...]"
  3. Here: "@fig-ci-sim shows 100 iterations of these steps. Here we see that, as expected, the large majority of calculated CIs contain the true value. [...] The guarantee of the 95% confidence intervals [...]"
  4. Here: "we view $\mb{Y}$ as an $n$-dimensional vector in $\mathbb{R}^n$."

Thank you @mattblackwell for writing this helpful resource and making it publicly available for free!

Pairwise Independence does not imply Mutual Independence error in https://mattblackwell.github.io/gov2002-book/02_estimation.html

In the first paragraph after Section 2.2, it is said that:

They are independent in that the random vectors $X_i$ and $X_j$ are independent for all $i\neq j$,

However, this is pairwise independence, which is strictly weaker than mutual independence (which is required in the iid assumption). I checked this here:

https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables#Definition_for_more_than_two_random_variables

Potential mathematical typo in first Warning box https://mattblackwell.github.io/gov2002-book/02_estimation.html#fn1

In the first warning box, there is the sentence:

Then the statistic is a random variable that has a distribution over the numbers from {2, , 12} that describes our uncertainty over what the sum will be before we roll the dice.

However, the {2, , 12} seems like an odd choice to format the set, and could possibly be replaced with nicer LaTeX formatting, i.e. {2,\dots,12} (formatted in LaTeX)?

Grammatical error in sentence preceding Jacobian matrix in Section 3.9

The sentence right before the Jacobian matrix is:

It will help us use more compact matrix notation if we introduce a Jacobian matrix of all partial derivatives

This should be replaced with

It will help us to use more compact matrix notation if we introduce a Jacobian matrix of all partial derivatives

(adding a "to")

Typo in second sentence of Chapter 5

The second sentence in Chapter 5 is:

In particular, these tools show how the conditional mean of $Y_i$ varies as a function $\mathbf{X}_i$.

This should be:

In particular, these tools show how the conditional mean of $Y_i$ varies as a function of $\mathbf{X}_i$.

(added an "of" near the end)

Punctuation typo in Blue box after Section 5.2.3 of Chapter 5

The paragraph before the numbered statements in the Blue box after Section 5.2.3 is:

Without some assumptions on the joint distribution of the data, The following “regularity conditions” will ensure the existence of the BLP:

This should be replaced by:

Without some assumptions on the joint distribution of the data, the following “regularity conditions” will ensure the existence of the BLP:

(changed "T" to "t" in "the")

Typo in sentence before Figure 3.3

The sentence right before Figure 3.3 reads:

... establishes how the standard 95% confidence interval for the sample mean above asymptotically valid.

This should be:

... establishes how the standard 95% confidence interval for the sample mean above is asymptotically valid.

(adding an "is")

Mistake in first sentence after first display equation in Section 5.4 of Chapter 5

The first sentence after the first display equation in Section 5.4 is:

so that the change in the predicted outcome for increasing $X_{i1}$ by one unit isn’t

Following the logical structure of the argument, this should be:

so that the change in the predicted outcome for increasing $X_{i1}$ by one unit is

(replaced "isn't" with "is")

Typo in Chapter 6

The start of a paragraph in Chapter 6 is:

Why does this rank condition matter for the OLS estimator? A key property of full column rank matrices is that $\mathbb{X}$ if of full column rank if and only if $\mathbb{X}'\mathbb{X}$ is non-singular and a matrix is invertible if and only if it is non-singular.

This should be replaced by:

Why does this rank condition matter for the OLS estimator? A key property of full column rank matrices is that $\mathbb{X}$ is of full column rank if and only if $\mathbb{X}'\mathbb{X}$ is non-singular and a matrix is invertible if and only if it is non-singular.

(replaced wrongly used "if" with "is")

Typo in first sentence of Footnote #2 in Chapter 4

The first sentence of Footnote #2 in Chapter 4 reads:

Different people and different textbooks describe what to do when do not reject the null hypothesis in different ways.

This should instead be:

Different people and different textbooks describe what to do when we do not reject the null hypothesis in different ways.

Inconsistent notation for indicator random variable in the book

In the proof of Theorem 3.1, the notation $\mathbb{1}$ is used for the indicator random variable. But in the definition of the empirical cdf right before Section 2.5, the notation $\mathbb{I}$ is used for the indicator random variable. Not sure if this is an issue or typo, but just wanted to make a note of it in case it is of interest.

Bold typo in display equation right before Theorem 7.1

The display equation right before Theorem 7.1 is:

$$ \bhat \inprob \beta + \mb{Q}_{\X\X}^{-1}\E[\X_ie_i] = \beta, $$

This should instead be:

$$ \bhat \inprob \boldsymbol{\beta} + \mb{Q}_{\X\X}^{-1}\E[\X_ie_i] = \boldsymbol{\beta}, $$

since beta is a vector and not a scalar.

Typo in second last paragraph before Section 6.6

The last sentence of the second last paragraph before Section 6.6 is:

Then coefficient on the West dummy will be

This should be:

Then, the coefficient on the West dummy will be

(added a "the")

Typo in sentence after $T$ display equation after Section 4.11 of Chapter 4

The sentence after the $T$ test statistic display equation after the Section 4.11 heading is:

As we discussed in the earlier, an $\alpha = 0.05$ test would reject this null when $|T|>1.96$, or when...

This should instead be:

As we discussed earlier, an $\alpha = 0.05$ test would reject this null when $|T|>1.96$, or when...

(removed the "in the" before "earlier")

Possible inaccuracy in citation of Theorem 3.5 in Example 3.3

In the last paragraph of Example 3.3, it is said that the bias shrinks as a function of the sample size, so as long as the sampling variance shrinks as a function of the sample size (which it does), the estimator is consistent by Theorem 3.5.

However, Theorem 3.5 only says that an estimator is consistent if it is unbiased and its sampling variance shrinks as a function of the sample size, and no mention was made about the slightly stronger version used in Example 3.3.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.