Stats Flashcards

1
Q

What are bar charts good for?

A

Counts and proportions.

Eg. Left/right/ambidextrous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are box plots good for?

A

Multiple numerical samples in different groups.

Eg. Speed of spiders with one and two palps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are histograms good for?

A

Counts of numerical observations.

Eg. Frequency/forest density.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are scatter graphs good for?

A

Showing the relationship between 2 numerical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are maps good for?

A

Showing geographical relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are coxcomb/rose/polar area charts good for?

A

Showing cyclical changes in the frequency of categorical variables.
Eg. Causes of death for each month.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between accuracy and precision?

A
Accuracy = sample values are close to the actual value.
Precision = sample values are tightly grouped and highly repeatable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are summary statistics?

A
Describe averages (central tendency):
Mean.
Median.
Mode.
Describe data (proportions):
Counts.
Percentages.
Describe variation:
Variance.
SD.
^^distribution around mean.
SE.
CI.
^^accuracy of mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are confounding variables?

A

Unmeasured variables that change in tandem with one or more of the measured variables, giving the appearance of a causal relationship (spurious).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is standard deviation (S)?

A

A measure of spread around the mean.

Square root of variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is variance (S^2)?

A

N-1 (the df)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is df calculated?

A

Sample size - number of parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the standard error (SE)?

A

A measure of how precise the estimate of the mean is.

= S/sqroot n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the 95% confidence interval (CI) for the mean?

A

The range of values around the estimated mean which is likely to contain the true population mean.

Upper 95% CI = x + 1.96SE.
Lower 95% CI = x - 1.96SE.
X = sample mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When do conversion equations between variance SD SE CI not apply?

A

When data isn’t normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which tests assume normally distributed data?

A
T test.
F test.
ANOVA.
Pearson's correlation.
Linear regression.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are negative and positive skew?

A

Graph leans to right and leans to left.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which plots show normal distribution?

A

Histogram and quantile-quantile plots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does p<0.05 mean?

A

Reject null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the null hypothesis of the Shapiro will test?

A

That the data are normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the null hypothesis of the T test?

A

The means of the 2 groups are the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the null hypothesis and assumptions of a one sample T test?

A

H0: the mean of the sample is equal to the population mean.
Assumes:
Data are a random sample.
Data are normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the t statistic?

A

(Sample mean - population mean) / sample SE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

When are samples independent?

A

When the probability of one is unrelated to the probability of the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How do you eliminate experimenter bias?

A

Randomly assign experimental units to treatments.

Blind procedures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is pseudoreplication?

A

Looks like replication but units are not independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is blocking?

A

The grouping of experimental units that have similar properties. Within each block, treatments are randomly assigned.

28
Q

What is matching?

A

When every individual in the treatment group is paired with a control individual with the same/similar values for suspected confounding variables, e.g. twin studies.

29
Q

What are the null hypothesis and assumptions of Welch’s T test?

A
H0:
The means of the two groups are equal.
Assumes:
Random samples.
Normally distributed.
30
Q

What are type 1 errors?

A

False discovery rate - the proportion of tests where we reject the null hypothesis when it’s true (0.05).

31
Q

What is power?

A
The proportion of tests where we reject the null hypothesis when it's false (0.8).
Depends on:
Sample size.
Effect size.
Variation in data.
32
Q

What are type 2 errors?

A

The proportion of tests where we fail to reject the hull hypothesis when it’s false (0.2).

33
Q

What are the null hypothesis and assumptions of a paired T test?

A
H0:
The mean difference between pairs is equal to a specified value.
Assumes:
Random sample.
Differences are normally distributed.
34
Q

What are non-parametric tests?

A

Used when data isn’t normally distributed.

Fewer assumptions but less power.

35
Q

What is the sign test?

A

Non-parametric alternative to one sample and paired T tests.
Identical to binomial test (hits = positive differences, trials = sample size).
H0:
The median difference between pairs is 0 (probability of success is 0.5).
Assumes:
Random sample.

36
Q

What is the Mann-Whitney U test/Wilcoxon rank-sum test?

A

Non-parametric alternative to Welch’s T test.
H0:
The distributions of the 2 samples are the same.
Assumes:
Random sample.
Not a test of differences between medians or means.

37
Q

What is the Wilcoxon signed rank test?

A

H0:
The distributions of the 2 samples are the same.
Assumes:
Random sample.
The distribution is symmetrical around the median.
Not a test of differences between medians/means.

38
Q

What is ANOVA?

A
Compares the means of 3 or more groups.
H0: there is no difference between the means.
Assumes:
Random samples.
Normally distributed.
Equal variance.
39
Q

What is the Kruskal-Wallis rank sum test?

A

Non-parametric alternative to ANOVA.
H0: the distributions of the samples are the same.
Assumes:
Random samples.

Can test differences between medians/means if distributions are the same.

40
Q

What are planned comparisons?

A

Specific comparisons planned during study design, eg comparing treatments to a control group - Dunnett’s test.

41
Q

What are unplanned comparisons?

A

Comparisons not planned during study design because no prior reason to compare any particular groups, e.g. comparing all groups to each other - Tukey-Kramer.

42
Q

Why are multiple tests bad?

A

Inflate false discovery rate - for 1 test = 0.05, for 2 = 0.1…
So 1 ANOVA better than 3 T tests.

43
Q

What is a 2 way ANOVA?

A

Assesses the effects of 2 different categorical independent variables, e.g. does mean weight differ with species and sex?
H0: there is no difference between the means.

44
Q

What is an ANCOVA?

A

Assesses the effects of a categorical independent variable and a numerical independent variable, e.g. does time to fall asleep differ with experimental treatment and age?

45
Q

What are the assumptions of a 2 way ANOVA and ANCOVA?

A

Random samples.
Normal distribution of errors (in a 1-way ANOVA normal samples = normal errors, not the case for other ANOVAs).
Equal variance in all samples.

46
Q

How is goodness of fit measured?

A

R^2:
ANOVA (how much variation in the dependent variable is attributable to the independent variable)
and regressions (how much change in 1 variable is attributable to the other).
Chi-squared:
Categorical data.

47
Q

How do you check if the data meet the assumptions of the model?

A

Plot residuals (errors) against fitted values (means or predicted values).

48
Q

What are balanced and unbalanced designs?

A

Balanced = all treatments have the same number of samples.

Unbalanced: treatments have different numbers of samples.

49
Q

What is the binomial test?

A
H0: the frequency of successes matches the predicted frequency of successes (eg 0.5).
Requires:
Number of successes.
Number of trials.
Predicted probability of success.
50
Q

What is a Pearson’s chi-squared test?

A

Test for association between 2 categorical variables, e.g. sex and species.
H0: there is no association between the variables.
Assumes:
Random samples.
No expected values less than 1.
No more than 20% of expected values less than 5.
*Also chi squared goodness of fit test (are data distributed between categories as predicted?).

51
Q

When would you use Fisher’s exact test?

A

When you have a 2x2 square and don’t meet the assumptions of a chi-squared test.
H0: there is no association between 2 variables, each with 2 categories.
Assumes:
Random samples.

52
Q

What is the chi-squared goodness of fit test?

A

H0: individuals are distributed between categories in a specified proportion.
Need categories and expected proportions.
Assumes:
Same as Pearson’s chi-squared.

53
Q

What is a correlation coefficient?

A

Measures the correlation between 2 numerical variables.

54
Q

What is Pearson’s correlation coefficient?

A

H0: there is no correlation between the 2 variables.
Need 2 continuous variables (e.g. Tail length and body length).
Assumes:
Random samples.
Measurements have a bivariate normal distribution.

55
Q

What is bivariate normal distribution?

A
  1. Relationship between x and y is linear.
  2. The points in a scatterplot are circular or elliptical.
  3. Both x and y are normally distributed.
56
Q

What is Pearson’s product-moment correlation coefficient?

A
H0: there is no correlation between the 2 variables.
Need 2 continuous variables.
Assumes:
Random samples.
Bivariate normal distribution.
57
Q

How is R^2 calculated in a regression analysis?

A

The square of Pearson’s correlation coefficient.

58
Q

What is Spearman’s rank correlation?

A

H0: there is no correlation between the ranks of 2 variables.
Need 2 continuous variables.
Assumes:
Random samples.

59
Q

What is the difference between correlation coefficients and linear models?

A

Correlation coefficients:
1. Can’t make predictions.
2. Observational only.
Linear models:
1. Fits a line.
2. X and y are not interchangeable.
3. It is assumed that x determines y in some way.
4. If we know x, we can make a prediction for y.
5. X doesn’t need to be normally distributed.
6. Experimental and observational.

60
Q

What is a linear regression?

A

H0: the slope of the relationship between the 2 variables is 0.
Need 2 continuous variables.
Assumes:
Random samples.
There is a linear relationship between the variables.
Homoscedasticity (variance in errors constant across x values).
Y values are normally distributed for each x value.

61
Q

What is regression to the mean?

A

When 2 measurements are taken and 1 is extreme (far from the mean) the other on average will be closer to the mean. This movement towards the mean can look like response to a treatment - regression fallacy.

62
Q

What is a random sample?

A

Where each member of the population has an independent and equal chance of being selected.

63
Q

What if samples aren’t independent?

A
  1. Use an average of related samples.
  2. Use a mixed effects model - can only be used for ANOVAs/linear models, splits variables into fixed effects (effects you’re interested in) and random effects (effects the dependent but you don’t care how).
64
Q

What is the poisson distribution?

A

Common for discrete variables - counting things in time/space.
Clumped dispersion = mean>variance.
Random dispersion = mean=variance.
Regular dispersion = mean

65
Q

What is the ANOVA F statistic?

A

MSgroups/MSerror.