(1) Basic Statistical Concepts Flashcards

1
Q

Normalization

A

forces something into a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Standardization

A

dividing it by something to remove its effect

Ex: dividing something by area of pop size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

QQ/quantile plot

A

Visualization to see if data is normally distributed

negative = points curve beneath line
positive skew = points curve above
normal = points are on line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

r or coefficient of correlation

A

looks at whether 2 variables vary together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Range for correlation coefficient and what is positive/negative/0?

A

-1 to 1
positive = both variables go up
negative = one goes up, one goes down
0 = no association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Standard deviation

A

measures how far data values are from the mean
little variation in values means small standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Analysis of variance (ANOVA)

A

Parametric test to see if there are significant differences in 3+ categorical groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Covariance

A

Testing 2 variables to see if they vary together or not using a correlation coefficient (r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Kernel density (3 facts about it)

A
  1. removes statistical noise from data by smoothing it
  2. Uses Gaussian weighting (closer points = more weight)
  3. good for showing generalized densities of points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

p value (3 facts)

A
  1. doesn’t tell you size of difference, just that there is one
  2. says if result is significant
  3. whether or not to reject null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to use a p value in a sentence to explain random chance and null hypothesis (hint: %)

A
  1. ___% chance you saw these results by random chance
  2. ___% chance you are falsely rejecting the null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

histogram

A

x-axis = category
y-axis = frequency in that category

way to visualize frequency/distribution of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Z score meaning

A

Number of standard deviations away from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Z score formula

A

(score - mean) / standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Coefficient of determination (r-squared)

A

High = good fit
Low = poor fit
How much of the variance in y is described by variance in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sentence using coefficient of determination

A

Variable x explains 80% of the variation in variable y

17
Q

Kruskall-Wallis test

A

look at more than 2 populations for similarity

non-parametric version of ANOVA

18
Q

Central limit theorem

A

Distribution approaches normal as sample size increases

19
Q

Mann-Whitney U

A

compares 2 sample populations
non-parametric
scores are ranked from small to large and then ranks of scores are compared

20
Q

Sample mean

A

mean of a sample of the data

21
Q

non-parametric statistics (list tests)

A

does not follow a Gaussian distribution

Mann Whitney-U, Kruskall Wallis, Spearman’s Rho

22
Q

Normal (Gaussian) distribution (3 facts)

A
  1. follows a bell curve
  2. uses parametric stats
  3. defined using the mean/standard deviation
23
Q

Normal QQ plot

A

is like a qq/quantile plot but compares the data quantiles against the quantiles of a normal distribution

24
Q

Null hypothesis

A

no significant difference, effect, or relationship in the population

25
Q

Parametric statistics (also list tests)

A

follows a Gaussian distribution
2 sample t test, ANOVA, Pearson’s R/correlation

26
Q

Parsimony

A

Keep it simple and make it clear

27
Q

Pearson’s R

A

measure the strength/direction between 2 variables
Parametric

28
Q

Residual plot

A

plots the residuals from a regression model

If there is an obvious pattern to the residuals than the model might not work

29
Q

Residuals

A

distance between point and the best fit line

kind of like error

30
Q

Interpreting residuals (+ and -)

A

+ = overestimating rates of something

  • = underestimating rates of something
31
Q

Shapiro test

A

null hypothesis = samples come from a normal distribution

32
Q

Spearman’s Rho

A

compares differences between the ranks in 2 data sets

values range from -1 - +1 (same as r-squared value)

33
Q

Square of the error

A

quantifies difference between observed and expected values