Final: Ch 11-20 Flashcards

1
Q

Numerical Variables from a Single Sample

When is Ȳ normally distributed?

A

whenever:
- Y is normally distributed, OR
- n is large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Numerical Variables from a Single Sample

If Ȳ is normally distributed, what can we convert its distribution to?

A

standard normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Numerical Variables from a Single Sample

What does a standard normal distribution do?

A

gives a probability distribution of the difference between a sample mean and the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Numerical Variables from a Single Sample

What is used to calculate the confidence interval of the mean?

A

t-distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a one-sample t-test do?

A

compares the mean of a random sample from a normal population, with the population mean proposed in a null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the hypotheses for a one-sample t-test?

A

H0: mean of the population is µ0
HA: mean of the population is not µ0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the degrees of freedom for a one-sample t-test?

A

df = n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the assumptions of a one-sample t-test? (2)

A
  • variable is normally distributed

- sample is a random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Tests that compare means have what type of variables?

A

one categorical and one numerical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Paired vs. 2-sample t-tests

A

paired comparisons: allow us to account for a lot of extraneous variation

  • ie. before and after treatment
  • ie. upstream and downstream of power plant
  • ie. identical twins – one with treatment, one without treatment
  • ie. how to get earwigs in each ear out – compare tweezers to hot oil

2-sample comparisons: sometimes easier to collect data for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are paired comparisons?

A

data from the two groups are paired

  • each member of pair shares much in common with the other, except for the tested categorical variable
  • there is one-to-one correspondence between the individuals in the two groups
  • in each pair, there is one member that has one treatment/group and another who has another treatment/group
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do we used to compare two groups in paired comparisons?

A

use mean of the difference between the two members of each pair

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a paired t-test?

A

one sample t-test on the differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a paired t-test do?

A

compares mean of the differences to a value given in null hypothesis

for each pair, calculate the difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the number of data points in a paired t-test?

A

number of pairs – NOT number of individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the degrees of freedom for a paired t-test?

A

df = number of pairs - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the assumptions of a paired t-test?

A
  • pairs are chosen at random

- differences (NOT individuals) have normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does a 2-sample t-test do?

A

compares means of numerical variable between two populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the degrees of freedom for a 2-sample t-test?

A
df1 = n1 - 1
df2 = n2 - 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the assumptions of a 2-sample t-test? (3)

A
  • both samples are random samples
  • both populations have normal distributions
  • variance of both populations is equal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does Welch’s t-test do?

A

compares means of two groups without requiring the assumption of equal variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is different about the degrees of freedom for Welch’s t-test compared to other tests?

A

degrees of freedom is not necessarily an integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Wrong Way to Make Comparison of Two Groups

A

“Group 1 is significantly different from a constant, but Group 2 is not. Therefore Group 1 and Group 2 are different from each other.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does Levene’s test do?

A

compares variances of two (or more) groups

use R to calculate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does the F test do?

A

most commonly used test to compare variances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why do we usually use Levene’s test instead of F test?

A

F test is very sensitive to its assumption that both distributions are normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the 2 tests that compare variances?

A
  • Levene’s test

- F test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What 2 tests can conduct two-sample comparisons?

A

2-sample t-test or Welch’s t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What 2 tests can conduct two-sample comparisons?

A

2-sample t-test or Welch’s t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does 2-sample t-test and Welch’s t-test both assume?

A

normal distributed variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What assumption differs between 2-sample t-test and Welch’s t-test?

A
  • 2- sample t-test assumes equal variance

- Welch’s t-test does NOT assume equal variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What can you compare the means of two groups using? (2)

A
  • mean of paired differences

- mean difference between two groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the assumptions of all t-tests? (2)

A
  • random sample(s)
  • populations are normally distributed

(for 2-sample t-test only): populations have equal variances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are methods to detect deviations from normality? (4)

A
  • previous data / theory
  • histograms
  • quantile plots
  • Shapiro-Wilk test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What does normal data look like in a quantile plot?

A

points form an approximately straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is the Shapiro-Wilk Test used for?

A

to test statistically whether a set of data comes from a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What do you do when assumptions are not true? (5)

A
  • if sample sizes are large, sometimes parametric tests work OK anyway
  • transformations
  • non-parametric tests
  • permutation tests
  • bootstrapping
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Why do parametric tests on large samples work relatively well even for non-normal data?

A

means of large samples are normally distributed

rule of thumb: if n > ~50, then normal approximations may work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What parametric test is ideal when assumptions are not true?

A

Welch’s t-test

if sample sizes are equal and large, then even a 10x difference in variance is approximately OK – but Welch’s is still better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are data transformations?

A

changes each data point by some simple mathematical formula

then carry out the test on transformed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

When is log transformation useful? (3)

A
  • variable is likely to be the result of multiplication or division of various components
  • frequency distribution of data is skewed right
  • variance seems to increase as mean gets larger (in comparisons across groups)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What are some other types of transformations? (3)

A
  • arcsine transformation
  • square-root transformation
  • reciprocal transformation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What are characteristics of valid transformations? (3)

A
  • require same transformation be applied to each individual
  • have one-to-one correspondence to original values
  • have monotonic relationship with original values (ie. larger values stay larger)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What should you consider when choosing transformations? (3)

A
  • must transform each individual in the same way
  • transformed values must still carry biological meaning
  • you CANNOT keep trying transformations until P < 0.05
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What do non-parametric (“distribution-free”) methods assume?

A

assume less about underlying distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What do parametric methods assume?

A

assume a distribution or a parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What are some non-parametric tests? (3)

A
  • sign test
  • RANKS
  • Mann-Whitney U test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What does the sign test do?

A

compares data from one sample to a constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

How is a sign test conducted?

A
  • for each data point, record whether individual is above (+) or below (–) hypothesized constant
  • use binomial test to compare result to ½
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Does sign test have high or low power?

A

has very low power – therefore it is likely to NOT reject false null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What does it mean for a test to have high power?

A

more power → more information → higher ability to reject false null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is RANKS?

A

used by most non-parametric methods

rank each data point in all samples from lowest to highest – ie. lowest data point gets rank 1, next lowest gets rank 2, …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What does the Mann-Whitney U test do?

A

compares central tendencies of two groups using ranks (equivalent to Wilcoxon rank sum test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

How is a Mann-Whitney U Test conducted?

A
  1. rank all individuals from both groups together in order (for example, smallest to largest)
  2. sum the ranks for all individuals in each group → R1 and R2
  3. calculate U1: number of times an individual from population 1 has lower rank than an individual from population 2, out of all pairwise comparisons
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What are the assumptions of the Mann-Whitney U Test? (2)

A
  • both samples are random samples

- both populations have the same shape of distribution – only necessary when using Mann-Whitney to compare means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What is a permutation test used for?

A

for hypothesis testing on measures of association – can be done for any test of association between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

How is a permutation test conducted?

A
  1. variable 1 from an individual is paired with variable 2 data from a randomly chosen individual – this is done for all individuals
  2. estimate is made on randomized data
  3. whole process is repeated numerous times – distribution of randomized estimates is null distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What does it mean if permutation tests are done without replacement?

A

all data points are used exactly once in each permuted data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What are the goals of experiments? (2)

A
  • eliminate bias

- reduce sampling error (increase precision and power)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What are some design features that reduce bias? (3)

A
  • controls
  • random assignment to treatments
  • blinding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What is a control?

A

group which is identical to the experimental treatment in all respects aside from the treatment itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

What is random assignment?

A

individuals are randomly assigned to treatments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

How does random assignment reduce bias?

A

averages out effects of confounding variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What is blinding?

A

preventing knowledge of experimenter (or patient) of which treatment is given to whom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

How do the results of unblinded studies compare to blinded studies?

A

unblinded studies usually find much larger effects (sometimes 3x higher) – shows the bias that results from lack of blinding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

How can you reduce sampling error?

A

increase signal to noise ratio

if ‘noise’ is smaller, it is easier to detect a given ‘signal’ – can be achieved with smaller s or larger n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What are some design features that reduce the effects of sampling error? (4)

A
  • replication
  • balance
  • blocking
  • extreme treatments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

What is replication?

A

carry out study on multiple independent objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

What is balance?

A

nearly equal sample sizes in each treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

What is blocking?

A

grouping of experimental unit – within each group, different experimental treatments are applied to different units

71
Q

How do extreme treatments reduce effects of sampling error?

A

stronger treatments can increase the signal-to-noise ratio

72
Q

How does balance reduce effects of sampling error?

A

increases precision

for a given total sample size (n1 + n2), standard error is smallest when n1 = n2

73
Q

How does blocking reduce effects of sampling error?

A

allows extraneous variation to be accounted for – it is therefore easier to see the signal through the remaining noise

74
Q

Blocking

A

75
Q

What does ANOVA (analysis of variance) do?

A

compares means of more than two groups

asks whether any of two or more means is different from any other – is the variance among groups greater than 0?

76
Q

How does ANOVA compare to a t-test?

A

like t-test, but can compare more than two groups

77
Q

How does ANOVA compare to a t-test?

A

like t-test, but can compare more than two groups

78
Q

What are they hypotheses for ANOVA?

A

H0: all populations have equal means (variance among groups = 0)
HA: at least one population mean is different

79
Q

What is ANOVA with 2 groups mathematically equivalent to?

A

two-tailed 2-sample t-test

80
Q

In ANOVA, under the null hypothesis, why should the sample mean of each group vary?

A

because of sampling error

81
Q

In ANOVA, what is the standard error?

A

standard deviation of sample means (when true mean is constant)

82
Q

In ANOVA, if null hypothesis is not true, what should variance among groups be?

A

variance among groups should be equal to variance due to sampling error plus real variance among population means

if at least one of the groups has a different population mean, we expect that variance between sample means can be captured by standard error

83
Q

ANOVA

What is k?

A

number of groups

84
Q

ANOVA

What is MSgroup?

A

mean squares group

85
Q

ANOVA

What is MSerror?

A

mean squares error

86
Q

What is the test statistic for ANOVA?

A

F

87
Q

ANOVA

What should F be if null hypothesis is true?

A

1

88
Q

ANOVA

What is F if null hypothesis is false?

A

F > 1

(but must take into account sampling error – F calculated from data will often be greater than one even when null is true, therefore we must compare F to null distribution)

89
Q

What is an ANOVA table?

A

convenient way to keep track of important calculations

scientific papers often report ANOVA results with ANOVA tables

90
Q

What are the assumptions of ANOVA? (3)

A
  • random samples
  • normal distributions for each population
  • equal variances for all populations
91
Q

What is the Kruskal-Wallis Test?

A

non-parametric test similar to a single factor ANOVA

uses ranks of the data points

92
Q

What is a factor?

A

categorical explanatory variable

93
Q

What is multiple-factor ANOVA?

A

ANOVAs can be generalized to look at more than one categorical variable at a time

  • can ask whether each categorical variable affects a numerical variable
  • can ask whether categorical variables interact in affecting the numerical variable
94
Q

Multiple-factor ANOVA Graphs

A

95
Q

ANOVA

What are fixed effects?

A

treatments are chosen by experimenter – not a random subset of all possible treatments

  • things we care about
  • ie. specific drug treatments, specific diets, season
96
Q

ANOVA

What are random effects?

A

treatments are a random sample from all possible treatments

  • things that can affect response variable, but we don’t care too much about
  • ie. family, location
97
Q

ANOVA

What is the difference in statistics for fixed or random effects for single-factor ANOVA?

A

no difference

98
Q

What is 2-factor ANOVA?

A

test multiple hypotheses

ie. no difference based on North and South alone

99
Q

Multiple Comparisons

What is the equation for probability of Type I error in N tests?

A

1 - (1-𝛼)^N

ie. for 20 tests, probability of at least one Type I error is ~65%

type 1 error rate for each test = 𝛼
Pr[not making type I error | null is true] = 1-𝛼
Pr[not making type I error on 2 tests | null is true] = (1-𝛼)(1-𝛼) = (1-𝛼)^N
Pr[at least one type I error] = 1- (1-𝛼)^N

100
Q

Multiple Comparisons

What happens to the probability of type I error every time you do a test?

A

probability increases

  • do too many tests → probability gets too high
  • do more tests → will find something that is statistically significant due to chance
101
Q

What is the Bonferroni Correction for multiple comparisons?

A

uses smaller 𝛼 value

𝛼’ = 𝛼 / (number of tests)

102
Q

What does the Tukey Kramer test do?

A

compares all group means to all other group means to find which groups are different from which others

103
Q

When are Tukey-Kramer tests done?

A

after finding evidence for differences/variation among means with single-factor ANOVA

104
Q

What are the hypotheses for Tukey-Kramer test?

A

H0: 𝜇1 = 𝜇2
H0: 𝜇1 = 𝜇3
H0: 𝜇2 = 𝜇3

etc.

105
Q

What is the probability of making at least one Type I error in Tukey-Kramer test?

A

probability of making at least one Type 1 error throughout the course of testing all pairs of means is no greater than significance level (𝛼)

106
Q

Tukey-Kramer Graph

A

107
Q

Why do we use Tukey-Kramer instead of a series of two-sample t-tests? (3)

A
  • multiple comparisons would cause t-tests to reject too many true null hypotheses
  • Tukey-Kramer adjusts for the number of tests
  • Tukey-Kramer also uses information about variance within groups from all the data, so it has more power than t-test with Bonferroni correction
108
Q

What is the parameter for correlation?

A

⍴ (rho)

value is between -1 and 1

109
Q

What is the estimate for correlation?

A

correlation coefficient (r): describes relationship between two numerical variables

110
Q

What is the coefficient of determination (r^2)?

A

describes proportion of variation in one variable that can be predicted from the other variable

111
Q

What is covariance in relation to variance?

A

variance is subset of covariance

112
Q

What are the assumptions of correlation tests? (3)

A
  • random sample
  • X is normally distributed with equal variance for all values of Y
  • Y is normally distributed with equal variance for all values of X
113
Q

Correlation

What does it mean if ⍴ = 0?

A
  • r is normally distributed with mean = 0
  • every time sampling distribution is normal, use t when using estimated standard error
  • if ⍴ ≠ 0, there is asymmetry
114
Q

What is Spearman’s Rank correlation?

A

alternative to Pearson’s correlation that does not make so many assumptions

115
Q

Correlation

What is attenuation?

A

estimated correlation will be lower if X or Y are estimated with error

116
Q

What does correlation depend on?

A

range

117
Q

Are species independent data points?

A

NO

118
Q

What is a similarity between correlation and regression?

A

both compare two numerical variables

119
Q

What is a difference between correlation and regression?

A

each ask different questions:

  • correlation – symmetrical
  • regression – asymmetrical
120
Q

What does regression do?

A

predicts Y from X (one variable from another)

121
Q

What does linear regression assume? (3)

A
  • random sample
  • Y is normally distributed with equal variance for all values of X, assuming variance for all values of X is the same
  • relationship between X and Y can be described by a line
122
Q

Parameters of Linear Regression – graphs

A

123
Q

What is the equation for the estimated regression line?

A

Y = a + bX

124
Q

What is the least squares regression line?

A

best line that minimizes sum of squares for the residual

125
Q

What is a residual?

A

residual = observed Y - predicted Y

for every X value, Ŷ (predicted value of Y, by regression line) is value of Y right on the line

126
Q

Regression

What does the coefficient of determination (r^2) do?

A

predicts amount of variance in Y explained by regression line

127
Q

Regression

What do you need to be cautious about?

A

unwise to extrapolate beyond range of the data

128
Q

What are the hypotheses for regression?

A

H0: 𝛽 = 0
HA: 𝛽 ≠ 0

129
Q

Regression

What is the degrees of freedom for residual?

A

df = n -2

130
Q

What are confidence bands?

A

confidence intervals for predictions of mean Y

131
Q

What are prediction intervals?

A

confidence intervals for predictions of individual Y

132
Q

How can non-linear relationships be ‘fixed’ (turned linear)? (3)

A
  • transformations
  • quadratic regression
  • splines
133
Q

What do residual plots do?

A

help assess assumptions

134
Q

What should the residual plot look like?

A
  • mean population is right on the line, and there’s variance around it
  • residual should roughly be the same size across all values of X (should be centred around 0, with equal positives and negatives)
  • residual should be spread out across the line, and about the same distance from the line on average for every X
135
Q

Polynomial Regression

Why should you NOT fit a polynomial with too many terms? (3)

A

(sample size should be at least 7x the number of terms)

  • very unlikely that new X would fall on the line
  • tradeoff between fit and prediction error – would fit better with your particular data set, but would have larger prediction error
136
Q

What does logistic regression do?

A

tests for relationship between numerical variable (as the explanatory variable) and binary variable (as the response variable)

ie. does the dose of a toxin affect probability of survival?
ie. does the length of a peacock’s tail affect its probability of getting a mate?

137
Q

What is publication bias?

A

papers are more likely to be published if P < 0.05 – causes bias in science reported in literature

138
Q

What are computer-intensive methods for hypothesis testing?

A
  • simulation

- randomization

139
Q

What are computer-intensive methods for confidence intervals?

A

bootstrap

140
Q

What is simulation?

A

simulates sampling process on computer many times – generates null distribution from estimates done on simulated data

computer assumes null hypothesis is true

141
Q

What is the equation for likelihood?

A

L(hypothesis A | data) = P[data | hypothesis A]

142
Q

What does likelihood NOT care about?

A

other data sets – ONLY cares about the specific data set we have

143
Q

What does likelihood capture?

A

captures level of surprise

prefer models that make data less surprising, and have higher likelihood

144
Q

Does likelihood consider more than one possible hypothesis?

A

yes

145
Q

What is the law of likelihood of a particular data set?

A

supports one hypothesis better than another if likelihood of that hypothesis is higher than likelihood of the other hypothesis

therefore we try to find the hypothesis with maximum likelihood (least surprising data) – all estimates we have learned so far are also maximum likelihood estimates

146
Q

What are the 2 ways to find the maximum likelihood?

A
  • calculus

- computer calculations

147
Q

How to Find Maximum Likelihood

Calculus

A

ie. maximum value of L(p=x) is found when x = ⅜

note that this is the same value we would have gotten by methods we already learned

148
Q

How to Find Maximum Likelihood

Computer Calculations

A
  1. input likelihood formula to computer
  2. plot value of L for each value of x
  3. find largest L
149
Q

What does hypothesis testing by likelihood do?

A

compares likelihood of maximum likelihood estimate to null hypothesis

use log-likelihood ratio

150
Q

What is the test statistic for hypothesis testing by likelihood?

A

ꭓ^2 = 2 (log likelihood ratio)

151
Q

What is the degree of freedom for hypothesis testing by likelihood?

A

df = number of variables fixed to make null hypothesis

152
Q

When producing a 95% confidence interval for the difference between the means of two groups, under what circumstances can a violation of the assumption of equal standard deviations be ignored?

A

two-sample t-tests and confidence intervals are robust to violations of equal standard
deviations as long as:

  • sample sizes of the two groups are roughly equal
  • standard deviations are within three times of one another.
153
Q

What is the justification for including extreme doses well outside the range of exposures encountered by people at risk in a dose-response study on animals of the effects of a hazardous substance? What are the problems with this approach?

A
  • extreme doses increase power, and so enhance the probability of detecting an effect
  • however, effects of a large dose might be very different from effects of a smaller, more realistic dose
  • if an effect is detected, then studies of the effects of more realistic doses would be the next step
154
Q

What does randomization do?

A

removes effects of confounding variables

155
Q

What does blinding do?

A

avoids unconscious bias

156
Q

What happens if a study has a poor control?

A

increases possibility of confounding by unmeasured variables

157
Q

What are planned vs. unplanned comparisons?

A

unplanned comparisons – intended to search for differences among all pairs of means

planned comparisons – must be few and identified as crucial in advance of gathering and analyzing the data

158
Q

The largest pairwise difference between means, that between the “medium” and “isolated” treatments, is statistically significant. How is this possible, given that neither of these two means is significantly different from the means of the other two groups?

A

failure to reject a null hypothesis that the difference between a given pair of means is zero does not imply that the means are equal, because power is not necessarily high, especially when the differences are small

if the means of the “medium” and “isolated” treatments differ from one another, then one or both of them must differ from the means from the other two groups, but we don’t know which

159
Q

What quantity would you use to describe the fraction of the variation in expression levels explained by group differences?

A

R^2

160
Q

Earwig density on an island and the proportion of males with forceps are estimates, so the measurements of both variables include sampling error. In light of this fact, would the true correlation between the two variables tend to be larger, smaller, or the same as the measured correlation?

A

sampling error in the estimates of earwig density and the proportion of males with forceps means that true density and proportion on an island are measured with error

measurement error will tend to decrease the estimated correlation

therefore, the actual correlation is expected to be higher on average than the estimated correlation.

161
Q

How do you analyze assumptions of linear regression in scatter plot?

A
  • residuals are symmetric and don’t show any obvious non-normality
  • variance of the residuals does not appear to change greatly for different values of X
162
Q

What is a least squares regression line?

A

minimizes the sum of squared differences between the predicted Y-values on the regression line for each X and the observed Y-values

163
Q

What are residuals?

A

differences between predicted Y-values on the estimated regression line, and the observed Y-values

164
Q

What does the MSresidual measure?

A

variance of the residuals

165
Q

Linear Regression

What does R^2 measure?

A

fraction of the variation in Y that is explained by X

166
Q

The data set depicted in the graph includes one conspicuous outlier on the far right. If you were advising the forensic scientists who gathered these data, how would you suggest they handle the outlier?

A
  • first, check the data to ensure this individual was not entered incorrectly
  • perform the analysis with and without the outlier included in the data set to determine whether it has an influence on the outcome
  • if it has a big influence, then it is probably wise to leave it out and limit predictions to the range of X- values between 0 and about 200 (and urge them to obtain more data at the higher X-value)
167
Q

What do confidence bands measure?

A

give the confidence interval for the predicted Y for a given X

168
Q

Which bands would provide the most relevant measure of uncertainty?

A

prediction interval, because it measures uncertainty when predicting Y of a single individual

169
Q

What is ANCOVA?

A

(analysis of covariance)

compares many slopes

170
Q

What are the hypotheses of ANCOVA?

A

H0: 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4 = 𝛽5… (multiple null hypotheses)

HA: at least one of the slopes is different from another

171
Q

What is bootstrapping?

A

method for estimation (and confidence intervals)

  • often used for hypothesis testing too
  • often used in evolutionary trees
172
Q

What is the method for bootstrapping?

A
  • for each group, randomly pick with replacement an equal number of data points, from data of that group
  • with this bootstrap dataset, calculate bootstrap replicate estimate
173
Q

Why are paired samples analyzed differently than separate samples?

A

two individuals in a pair share many things in common with each other but
differ from members of other pairs

whatever variation these shared differences causes in the response variable is factored out in the difference between them

by looking at the differences, we potentially avoid much of the error variance in the data

separate samples do not share these properties