Johnny, CH.2 - SPINE Flashcards

1
Q

Why do Scientists use statistical models?

A
  1. models represent real world processes to predict how these processes operate under certain conditions
    - We can have little confidence, not complete confidence in the predictions models make
    - Outcome (data) = model +error. This equation means that the data we observe can be predicted from the model we choose to fit plus some amount of error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the relationship between samples and populations when it comes to psychological research?

A

Scientists are usually interested in finding results that apply to an entire population. Because we can’t collect data from every being in a population, we use a sample and use these data to infer things about the population as well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is one of the most common statistical methods?

A

The Linear Model:
Y1 = b0 + b1X1 + e1
(This equation is expressed as we want to predict the value of an outcome variable Y from a predictor variable X.
- b0: intercept of a line (determines the vertical height of a line, represents the overall level of the outcome variable when predictor variables are 0)
- b1: slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SPINE of Statistics

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does SPINE stand for?

A

S: Standard Errors
P: Parameters
I: Interval Estimates (CI)
N: Null Hypothesis Significance Testing (NHST)
E: Estimation (of Parameters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Parameters

What are Parameters?

A

Parameters are a numerical or other measurable factor that define a system or define the conditions of how the system works
(Very general, don’t memorize)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Parameters

What do parameters represent?

A

Some fundamental truth about the variables in the model.
- !!! Parameters are not measured !!!
- We can predict values of an outcome variable based on a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parameters

What are some important things to note on parameters?

A
  • Always use the word “estimate”: When we calculate parameters based on sample data they are only estimates of what the true parameter value is in the population
  • the model variables have no error
  • See Picture 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Parameters

What is error in statistics?

A

A discrepancy between observed values and true values
(See Picture 2 for formula and explanation)
- deviance: outcome - model
- error: observed - predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Parameters

What is the total error (Or else, sum of errors)?

A

(See Picture 3 for explanation and examples)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Parameters

How do you estimate the mean error in the population (mean squared errors)?

A
  • total error / degrees of freedom
  • total error: sum of differences between observed and predicted scores, squared
  • degrees of freedom (df): N - 1
    (See Picture 4)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Parameters

What is the fit of a model and how do you estimate it?

A

The fit of a model is how representative of the real world a model is.
- We estimate it using the sum of squared errors and the mean squared error
- ~As Sum of Squared Errors decreases, the fit of a model increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Estimation of Parameters

What is the method of least squares?

A

A method used to minimize the sum of squared errors
(I think the method is rather unimportant to mention, there hasn’t been anything in slides or exercises as well, if you guys want me to put it then let me know)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Estimation of Parameters

What is the maximum likelihood estimation?

A

An estimation method whose goal is to find the values that maximize the likelihood.
- Likelihood refers to how well a sample provides support for particular values of a parameter in a model (In other words, when we are calculating likelihood we are trying to determine if we can trust the parameters in the model based on the sample data that we have observed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Standard Error

Why is the standard error important?

A

It is important becuase it shows us how representative our samples are of the population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Standard Error

What is Sample Variation?

A

Samples vary because they contain different members of the population

17
Q

Standard Error

What is A sampling distribution?

A

A frequency distribution of sample means
- Average of all sample means: Value of the population mean
- Use it to tell us how representative a sample is of the population
- SD is also known as standard error of the means (or standard error)

18
Q

Standard Error

What is the Central Limit Theorem?

A

As samples increase, the sampling distribution becomes normal with a mean equal to the proportion mean and a SD equal to = (See Picture 5)

19
Q

Interval Estimates (CI)

General Info

A
  • Confidence level (e.g. 95%), we can be 95% confident that the parameter in question is within this range
  • Boundaries of a CI (See Picture 6)
  • If we have small sample, we don’t use a normal but a t distribution (See Picture 7 for boundaries)
    (Picture 8 also contains an example of how to represent visually CI)
20
Q

Null Hypothesis Significance Testing (NHST)

What are the 2 philosophical frameworks leading in NHST?

A
  • Fisher Paradigm (Father of P-value). He created the concept of the Ho
  • Neyman-Pearsonm: included the Ho and created the concept of the Ha
    !!! IN GENERAL: If P-value is significantly low, reject Ho !!!
21
Q

NHST

What are the differences between Ho and Ha?

A

(This table was in the lecture but Johnny didn’t explain it well, I think just remember the general outline of it, no need to understand it really)
Ho:
- Skeptical point of view
- No effect
- No preference
- No correlation
- No difference
Ha:
- Refute skepticism
- Effect
- Preference
- Correlation
- Difference

22
Q

NHST

What are the 2 main aspects of Frequentist Probability?

A
  • Objective Probability: The likelihood of an event occurring based on empirical data and concrete measures
  • Relative frequency in the long run
23
Q

NHST

What is the standard error?

A

The Variability of the Sampling Distribution (in other words, if I repeat my experiment over and over again how much variability will there be in the outcome of that experiment?)
- As variability increases, so does SE
(See Picture 9 for formula)

24
Q

NHST

How do we use the SE?

A

To do parameter estimation. It is crucial in designing Confidence Intervals, since it determines the boundaries and how wide the CI will be.
- If we want to design a 95% CI, as our SE increases, so will the width of our CI, in order to be 95% sure that our CI entails the true parameter. So we can have two 95% Ci intervals with the first being wider than the other, if that first one has a bigger SE
(See picture 10)

25
Q

NHST

What is the general process of NHST?

A

See picture 11

26
Q

NHST

What is the significance level α?

A

A value that determines when to reject Ho
(determines how strict we must be to reject the Ho)

27
Q

NHST

What is the statistical power (1-β)?

A

The probability of rejecting the Ho when the HA IS TRUE

28
Q

NHST

What do 1-α and β represent respectively?

A
  • 1-α is the probability of correctly accepting Ho
  • β is the probability of not rejecting Ho when it’s false (error type II)
29
Q

NHST

How do you calculate the test statistic?

A

See picture 12

30
Q

NHST

What is true about the usual size of tests in NHST?

A

Usually we have small tests, large tests are more uncommon

31
Q

NHST

What are the different types of tests according to the directionality of the hypothesis?

A
  • If a statistical model tests a directional hypothesis -> one-tailed test (e.g. we want to see if watching a film increases or decreases the likelihood of buying the equivalent book. Ho is that there is no increase or decrease, Ha is either there is a decrease or increase. If we have a directional hypothesis, we only look at one of the two options, increase or decrease. So we look at Ho<Ha, or Ho>Ha, but not both
  • If a statistical model tests a non-directional hypothesis -> two-tailed test (in the same example as above, now we look at both Ho<Ha, and Ho>Ha)
32
Q

NHST

What are the two types of errors?

A
  • Type I error
  • Type II error
33
Q

NHST

Type I error

A

When we believe there’s an effect in the population, when in fact there isn’t one (False positive)
- By specifying a we specify how often we are ok with making a Type I erroR
- ~ Given that a usually equals 0.05, probability of making a Type I error is usually 0.05
(In other words, out of 100 data collection process times, 5 times we obtain a statistic making us think there’s an effect when there isn’t one)
- Depends on sample size (Because a larger sample size is a better approximation of the population, as sample size increases, error decreases, so it is easier to find the signal, thus there are lower probabilities of making a Type I error)

34
Q

NHST

Type II error

A

When we believe there isn’t an effect in the population, when there is one actually (False Negative)
- Is equal to b-level
- Usually 0.2
- Depends on sample size (Because a larger sample size is a better approximation of the population, as sample size increases, error decreases, so it is easier to find the signal, thus there are lower probabilities of making a Type II error)

35
Q

NHST

What is the relationship between Type I and II errors?

A

As the probability of making a Type I error increases, the probability of making a Type II error decreases, and vice versa
(in terms of α and β, if α decreases, then we decrease the probability of making a Type I error, but at the same time it is more difficult to find an effect, thus more likely that we will make a Type II error

36
Q

NHST

How is the chance of making a Type I error affected by many testings?

A

If Type I error probability = 0.05, then the probability of no Type I error = 0.95. If we have 3 tests, then the probability of no Type I error across these three tests is 0.95x0.95x0.95=0.857, so the probability of Type I error across 3 tests is 1-0.857 = 0.143
This shows that the probability for a Type I error increases over many tests. This also gives the following formula:
- experiment wise error rate = chance of making a Type I error in testing = 1 - (0.95)^h, where h is the number of tests

37
Q

NHST

How can we combat the buildup of errors?

A

Use the Bonferroni correction:
- Pcrit = α/k, where a is singificance calue and k is the number of tests.
This formula is used to ensure that α<0.05 always

38
Q

NHST

Other general info regarding α and β

A
  • If we know 3 out of the following: 1-β, α, size of effect and sample size, we can calculate the power of a test and the sample size necessary to achieve a given level of power
  • Sample size affects whether a difference between samples is deemed significant or not (e.g. in large samples even small differences can be significant
39
Q

NHST

What is the effect size of a test?

A

A quantitative measure of the strength of a phenomenon
- Larger absolute values always indicate a stronger effect
- Important in power analysis, meta-analyses etc.