Ch.6 Field Flashcards

1
Q

Population and Samples

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Population and Samples

What are some points on samples you have to remember?

A
  • The form of our model is similar for all samples and for the population (The model we get for one sample is similar to all the other models for all the other samples, and similar to the model for the population)
  • Parameter estimates vary across samples -> Samples don’t have parameter values that match the true values exactly
  • Spread of scores around pouplation model is consistent -> lines representing limits are parallel to the model itself. In other words, at all values of the IV the spread of scores around the model is assumed to be similar (See image 1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Population and Samples

What is error?

A

The difference between the predicted value of the DV at a certain level of the IV vs the observed value of the DV at the same value of the IV
error = observed - model
- When referring to error in sample models, we use e
- When referring to error in population models, we use ε
!!! e and ε are the same concept (error = observed - model), they’re just used in different circumstances !!!

(See image 2, notive the difference in hats as well, because with a sample model we want to estimate the population parameters (we can’t get the population values from our data) we add hats. In the population model because our data give us the actual numbers in the population, we don’t estimate parameters, so there are no hats. Also note the difference in e & ε as mentioned above)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Population and Samples

What are some general notes on error?

A
  • most errors in prediction will be close to 0 (this is what the book said, seems like bullshit to me, nonetheless I think just remember it as a note if it’s asked in a multiple choice question)
  • As magnitude of errors increase, frequency decreases.
    ~ !!! The opposite isn’t necessarily the case. Remember it just in this direction only !!!
  • The distribution of the errors is also normally distributed with a mean of 0 and a variance of s^2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Errors vs Residuals

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Errors vs Residuals

What is a Residual

A

Residual = observed value - value predicted by the model = error, BUT SPECIFICALLY FOR A SAMPLE MODEL (simply, residual is error for a sample model)
- Since we use a sample model to estimate the population model, it’s likely that the residuals from the sample model are a good approximation of the population errors
- If we make a plot distribution of all the residuals, it’s normal with a mean of 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Errors vs Residuals

What do we use Residuals for?

A

We use Residuals to inder different stuff about the errors in the population model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Errors vs Residuals

What is the equation for Total error?

A

See image 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Errors vs Residuals

What is the ordinary least squares (OLS) regression?

A

It’s a method that uses the method of least squares to estimate the parameters (b-values) for which the total error is at it’s minimum (method to minimize total error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Errors vs Residuals

How do you estimate the variance of the model errors?

A

See image 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Confidence Intervals and Significance Testing

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CI & Significance Testing

General notes on Sampling Distribution

A
  • It’s the distribution of parameter estimates across samples
  • The width reflects variability in sampling error
    ~ Also called th standard deviation -> in a sampling distribution the sd is called standard error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CI & Significance Testing

What is the general equation for any test-statistic?

A

effect/error
This also equals to = size of parameter/sampling variation in the parameter
- Sampling variation in the parameter is equal to the difference between the means for each sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CI & Significance Testing

What is the Central Limit Theorem?

A

!!! If the model errors are normally distributed, the sampling distribution of b^ is also normal.
Therefore, we can estimate the se of b^ and construct the CI and the test-statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

CI & Significance Testing

What is true about the relationship between sample size and sampling distribution?

A

As sample size increases, the sampling distribution approximates a normal distribution with a mean equal to the population mean and a variance equal to σ^2/n
(specifically, when model errors are normally distributed, the sampling distribution for b^ is normal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CI & Significance Testing

Based on the above flashcard, what are the steps for conducting NHST and constructing a CI?

A
  1. When the sampling distribution for b^ is normal we can use s^2 to estimate the SE of b
  2. The sampling distribution of SE is called a X^2 distribution with n-p degrees of freedom
  3. Knowing the estimate of SE(b) allows us to construct a CI and a hypothesis
17
Q

CI & Significance Testing

What is the Gauss-Markov Theorem?

A

When certain conditions are met the OLS is the best way to estimate parameters. The condition that need to bet met are:
- Model errors are on average 0
- Homoscedasticity
- Independence of osbervations
Last two conditions are called spherical errors. See image 5

18
Q

Bias

A
19
Q

Bias

What us an unbiased estimator?

A

An estimator that yields an expected value that is the same as the one it is trying to estimate (in other words: on average the estimate in the sample will match the estimate in the population)

20
Q

Bias

What is a consistent estimator?

A

An estimator that produces estimates which tend to the population value as the sample size increases

21
Q

Bias

What is an efficient estimator?

A

An estimator that produces estimates that are in a way “the best” of the available methods of estimation
(best = lowest variance, and the estimates are distributed ,pre tightly around the population)

22
Q

Bias

What is the optimal estimate for any data set?

A

The mean
(If dataset has a mean, mean is pushed up and to the right)

23
Q

Bias

What are outliers and why are they problematic?

A

Data points that differ significantly from the rest of the data
- They bias parameter estimates
- They increase SSR a lot
If SSR is affected by outliers, the following happens:
1. SSR is biased
2. SD is biased
3. SE is biased
4. CI and test-statistic are biased

24
Q

Bias

What should we do with outliers?

A

Keep them, unless you know they’re not representative of the population

25
Q

Assumptions

A
26
Q

Assumptions

What are Assumptions?

A

A condition that ensures that what we’re attempting to do works as it should

27
Q

Assumptions

What is the most important assumption?

A

Linearity and additivity: The process we’re trying to describe can be described by a linear model
Even if all other assumptions are met (next flashcard) the model is invalid because our description of the process of the model is wrong

28
Q

Assumptions

What are the other general assumptions?

A
  • Expected value of errors is 0
  • Spherical errors
    ~ Homoscedasticity
    ~ Independence of errors
  • Assumption of normality
  • (No outliers) (not really considered by many an assumption, bu still could be thought of as one)
    (See image 6 summarizes some of the things we might want from a model, and the assumptions required for them).
29
Q

Assumptions

Notes on Homoscedasticity

A
  • Homoscedasticity applies to population errors, not your sample data. BUT, if sample residuals exhibit the characteristics of homogeneity, so will the population errors probably
  • If violated, SE, CI, and significance test associated with a parameter will be inaccurate
    ~ If we apply though the method least squares, we can get an unbiased estimate of a parameter. Still though, the CI and SE is inaccurate
    (See image 7 as well for another note)
30
Q

Assumptions

Notes on Independence of Errors

A

If violated, same consequences as if Homoscedasticity was violated

31
Q

Assumptions

Notes on Normality

A

(In general, the least damage if violated)
For the CI, SE and test statistic coming from a parameter to be accurate, the parameter estimate must come from a normal sampling distribution
- If sample residuals are normal -> Population error is normal -> Sampling distribution is normal
- In large samples, because sampling distribution of the parameter will be normal, this assumption can be ignored (it’ll be true either way)

32
Q

What is bootstrapping?

A

A robust method tests use in case normaltiy is violated.
Lack of normality prevents us from inferring the shape of
the sampling distribution unless we have big samples. Bootstrapping gets around this problem by estimating the properties of the sampling distribution empirically from the sample data.