Discovering Statistics Flashcards by Ismail Uddin

What is validity?

The degree to which a theory/model reflects a true/accurate picture.

How well did you know this?

Not at all

Perfectly

What is reliability?

The replicability of results

How well did you know this?

Not at all

Perfectly

What are the characteristics of a normal distribution?

Symmetrical
Bell shaped curve
Standard Deviation determines steepness
Unimodal
Continuous

How well did you know this?

Not at all

Perfectly

What percentage of values fit within +/- 1.96 standard deviations in a normal distribution?

95%

How well did you know this?

Not at all

Perfectly

What is the standard error?

The standard deviation (variability) of the sampling distribution

How well did you know this?

Not at all

Perfectly

What are point estimates?

Single numbers used to guess corresponding population parameters.

How well did you know this?

Not at all

Perfectly

What are examples of point estimates?

Measures of central tendency such as mean median and mode
Measures of dispersion such as range and standard deviation
Relationships such as correlations

How well did you know this?

Not at all

Perfectly

What are interval estimates?

uncertainty quantified around point estimates (smaller intervals mean more confidence and less uncertainty)

How well did you know this?

Not at all

Perfectly

What are confidence intervals?

range of values that’s likely to include a population value with a certain degree of confidence. E.g 95% Confidence interval means that 95% of samples will include the population mean

How well did you know this?

Not at all

Perfectly

What is the t-distribution?

A way of approximating confidence intervals if the sampling distribution mean is not known. It is centred around 0, symmetrical and its shape changes based degrees of freedom (df=infinity, the distribution is normal)

How well did you know this?

Not at all

Perfectly

What are the three levels of hypothesis?

Conceptual
Operational
Statistical

How well did you know this?

Not at all

Perfectly

What is the scientific method?

Observation
Theory
Hypothesis/predictions 
Test hypothesis 
Interpret data
Reach conclusions + generate more hypotheses

How well did you know this?

Not at all

Perfectly

What is the linear model?

To obtain the value of an outcome from one or more predictors

How well did you know this?

Not at all

Perfectly

What is the equation for the general linear model?

Outcome = b0 (intercept) + b1(predictor) + e (error)

How well did you know this?

Not at all

Perfectly

What is b0 (intercept)?

The value of the outcome when the predictor is 0

How well did you know this?

Not at all

Perfectly

What is b1?

Study These Flashcards

The change in the outcome for every unit change in the predictor (slope)

What value is used to establish the significance of b1?

Study These Flashcards

T value measures how many SD our estimate is from 0, we want it faraway from 0 as possible to reject null hypothesis

How is model fit evaluated?

Study These Flashcards

R2 and adjusted R2

Always lies between 0 and 1, near 0 means does not fit the variance, 1 means good fit

What is the F stat?

Study These Flashcards

The statistic that indicates whether there is a relationship between outcome and predictor. The further the f is from 1 means there is a relationship

What are outliers?

Study These Flashcards

A value in the data that does not follow the trend

How can outliers be detected in the GLM?

Study These Flashcards

Graphs
Standardised residuals (if difference between observed and predicted is more than 3 its is outlier)
Cooks distance (more than 1 is outlier)

If outliers are present, what should be done?

Study These Flashcards

A robust estimation model should be used instead of OLS model as they are more resistant to their influence

What are the assumptions of the linear model?

Study These Flashcards

Linearity and additivity
Normally distributed
Independent errors
Homoscedastic errors

What are the differences between errors and residuals?

Study These Flashcards

Errors refer to difference between observed and predicted values of the population - this cannot be observed
Residuals refer to difference between observed and predicted values of the sample

What are independent errors?

Errors in one prediction that are unrelated to errors in another

What are homoscedastic errors?

Variance of residuals should be consistent at different levels of the predictor variable.

What is heteroscedasticity?

The number of residuals are more on one side of the spread creating a funnel shape

What should be done if the assumption of normal distribution is not met?

Normality can be ignored as long as sample size is big enough die to central limit theorem (at least 30 samples), if small sample size is used - bootstrapping can be used.

Discovering Statistics Flashcards

(28 cards)