Discovering Statistics Flashcards

(28 cards)

1
Q

What is validity?

A

The degree to which a theory/model reflects a true/accurate picture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is reliability?

A

The replicability of results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the characteristics of a normal distribution?

A
Symmetrical
Bell shaped curve
Standard Deviation determines steepness
Unimodal
Continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What percentage of values fit within +/- 1.96 standard deviations in a normal distribution?

A

95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the standard error?

A

The standard deviation (variability) of the sampling distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are point estimates?

A

Single numbers used to guess corresponding population parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are examples of point estimates?

A

Measures of central tendency such as mean median and mode
Measures of dispersion such as range and standard deviation
Relationships such as correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are interval estimates?

A

uncertainty quantified around point estimates (smaller intervals mean more confidence and less uncertainty)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are confidence intervals?

A

range of values that’s likely to include a population value with a certain degree of confidence. E.g 95% Confidence interval means that 95% of samples will include the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the t-distribution?

A

A way of approximating confidence intervals if the sampling distribution mean is not known. It is centred around 0, symmetrical and its shape changes based degrees of freedom (df=infinity, the distribution is normal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the three levels of hypothesis?

A

Conceptual
Operational
Statistical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the scientific method?

A
Observation
Theory
Hypothesis/predictions 
Test hypothesis 
Interpret data
Reach conclusions + generate more hypotheses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the linear model?

A

To obtain the value of an outcome from one or more predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equation for the general linear model?

A

Outcome = b0 (intercept) + b1(predictor) + e (error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is b0 (intercept)?

A

The value of the outcome when the predictor is 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is b1?

A

The change in the outcome for every unit change in the predictor (slope)

17
Q

What value is used to establish the significance of b1?

A

T value measures how many SD our estimate is from 0, we want it faraway from 0 as possible to reject null hypothesis

18
Q

How is model fit evaluated?

A

R2 and adjusted R2

Always lies between 0 and 1, near 0 means does not fit the variance, 1 means good fit

19
Q

What is the F stat?

A

The statistic that indicates whether there is a relationship between outcome and predictor. The further the f is from 1 means there is a relationship

20
Q

What are outliers?

A

A value in the data that does not follow the trend

21
Q

How can outliers be detected in the GLM?

A
Graphs
Standardised residuals (if difference between observed and predicted is more than 3 its is outlier)
Cooks distance (more than 1 is outlier)
22
Q

If outliers are present, what should be done?

A

A robust estimation model should be used instead of OLS model as they are more resistant to their influence

23
Q

What are the assumptions of the linear model?

A

Linearity and additivity
Normally distributed
Independent errors
Homoscedastic errors

24
Q

What are the differences between errors and residuals?

A

Errors refer to difference between observed and predicted values of the population - this cannot be observed
Residuals refer to difference between observed and predicted values of the sample

25
What are independent errors?
Errors in one prediction that are unrelated to errors in another
26
What are homoscedastic errors?
Variance of residuals should be consistent at different levels of the predictor variable.
27
What is heteroscedasticity?
The number of residuals are more on one side of the spread creating a funnel shape
28
What should be done if the assumption of normal distribution is not met?
Normality can be ignored as long as sample size is big enough die to central limit theorem (at least 30 samples), if small sample size is used - bootstrapping can be used.