Final: Ch 11-20 Flashcards
Numerical Variables from a Single Sample
When is Ȳ normally distributed?
whenever:
- Y is normally distributed, OR
- n is large
Numerical Variables from a Single Sample
If Ȳ is normally distributed, what can we convert its distribution to?
standard normal distribution
Numerical Variables from a Single Sample
What does a standard normal distribution do?
gives a probability distribution of the difference between a sample mean and the population mean
Numerical Variables from a Single Sample
What is used to calculate the confidence interval of the mean?
t-distribution
What does a one-sample t-test do?
compares the mean of a random sample from a normal population, with the population mean proposed in a null hypothesis
What are the hypotheses for a one-sample t-test?
H0: mean of the population is µ0
HA: mean of the population is not µ0
What is the degrees of freedom for a one-sample t-test?
df = n-1
What are the assumptions of a one-sample t-test? (2)
- variable is normally distributed
- sample is a random sample
Tests that compare means have what type of variables?
one categorical and one numerical variable
Paired vs. 2-sample t-tests
paired comparisons: allow us to account for a lot of extraneous variation
- ie. before and after treatment
- ie. upstream and downstream of power plant
- ie. identical twins – one with treatment, one without treatment
- ie. how to get earwigs in each ear out – compare tweezers to hot oil
2-sample comparisons: sometimes easier to collect data for
What are paired comparisons?
data from the two groups are paired
- each member of pair shares much in common with the other, except for the tested categorical variable
- there is one-to-one correspondence between the individuals in the two groups
- in each pair, there is one member that has one treatment/group and another who has another treatment/group
What do we used to compare two groups in paired comparisons?
use mean of the difference between the two members of each pair
What is a paired t-test?
one sample t-test on the differences
What does a paired t-test do?
compares mean of the differences to a value given in null hypothesis
for each pair, calculate the difference
What is the number of data points in a paired t-test?
number of pairs – NOT number of individuals
What is the degrees of freedom for a paired t-test?
df = number of pairs - 1
What are the assumptions of a paired t-test?
- pairs are chosen at random
- differences (NOT individuals) have normal distribution
What does a 2-sample t-test do?
compares means of numerical variable between two populations
What is the degrees of freedom for a 2-sample t-test?
df1 = n1 - 1 df2 = n2 - 1
What are the assumptions of a 2-sample t-test? (3)
- both samples are random samples
- both populations have normal distributions
- variance of both populations is equal
What does Welch’s t-test do?
compares means of two groups without requiring the assumption of equal variance
What is different about the degrees of freedom for Welch’s t-test compared to other tests?
degrees of freedom is not necessarily an integer
Wrong Way to Make Comparison of Two Groups
–
“Group 1 is significantly different from a constant, but Group 2 is not. Therefore Group 1 and Group 2 are different from each other.”
What does Levene’s test do?
compares variances of two (or more) groups
use R to calculate
What does the F test do?
most commonly used test to compare variances
Why do we usually use Levene’s test instead of F test?
F test is very sensitive to its assumption that both distributions are normal
What are the 2 tests that compare variances?
- Levene’s test
- F test
What 2 tests can conduct two-sample comparisons?
2-sample t-test or Welch’s t-test
What 2 tests can conduct two-sample comparisons?
2-sample t-test or Welch’s t-test
What does 2-sample t-test and Welch’s t-test both assume?
normal distributed variables
What assumption differs between 2-sample t-test and Welch’s t-test?
- 2- sample t-test assumes equal variance
- Welch’s t-test does NOT assume equal variance
What can you compare the means of two groups using? (2)
- mean of paired differences
- mean difference between two groups
What are the assumptions of all t-tests? (2)
- random sample(s)
- populations are normally distributed
(for 2-sample t-test only): populations have equal variances
What are methods to detect deviations from normality? (4)
- previous data / theory
- histograms
- quantile plots
- Shapiro-Wilk test
What does normal data look like in a quantile plot?
points form an approximately straight line
What is the Shapiro-Wilk Test used for?
to test statistically whether a set of data comes from a normal distribution
What do you do when assumptions are not true? (5)
- if sample sizes are large, sometimes parametric tests work OK anyway
- transformations
- non-parametric tests
- permutation tests
- bootstrapping
Why do parametric tests on large samples work relatively well even for non-normal data?
means of large samples are normally distributed
rule of thumb: if n > ~50, then normal approximations may work
What parametric test is ideal when assumptions are not true?
Welch’s t-test
if sample sizes are equal and large, then even a 10x difference in variance is approximately OK – but Welch’s is still better
What are data transformations?
changes each data point by some simple mathematical formula
then carry out the test on transformed data
When is log transformation useful? (3)
- variable is likely to be the result of multiplication or division of various components
- frequency distribution of data is skewed right
- variance seems to increase as mean gets larger (in comparisons across groups)
What are some other types of transformations? (3)
- arcsine transformation
- square-root transformation
- reciprocal transformation
What are characteristics of valid transformations? (3)
- require same transformation be applied to each individual
- have one-to-one correspondence to original values
- have monotonic relationship with original values (ie. larger values stay larger)
What should you consider when choosing transformations? (3)
- must transform each individual in the same way
- transformed values must still carry biological meaning
- you CANNOT keep trying transformations until P < 0.05
What do non-parametric (“distribution-free”) methods assume?
assume less about underlying distributions
What do parametric methods assume?
assume a distribution or a parameter
What are some non-parametric tests? (3)
- sign test
- RANKS
- Mann-Whitney U test
What does the sign test do?
compares data from one sample to a constant
How is a sign test conducted?
- for each data point, record whether individual is above (+) or below (–) hypothesized constant
- use binomial test to compare result to ½
Does sign test have high or low power?
has very low power – therefore it is likely to NOT reject false null hypothesis
What does it mean for a test to have high power?
more power → more information → higher ability to reject false null hypothesis
What is RANKS?
used by most non-parametric methods
rank each data point in all samples from lowest to highest – ie. lowest data point gets rank 1, next lowest gets rank 2, …
What does the Mann-Whitney U test do?
compares central tendencies of two groups using ranks (equivalent to Wilcoxon rank sum test)
How is a Mann-Whitney U Test conducted?
- rank all individuals from both groups together in order (for example, smallest to largest)
- sum the ranks for all individuals in each group → R1 and R2
- calculate U1: number of times an individual from population 1 has lower rank than an individual from population 2, out of all pairwise comparisons
What are the assumptions of the Mann-Whitney U Test? (2)
- both samples are random samples
- both populations have the same shape of distribution – only necessary when using Mann-Whitney to compare means
What is a permutation test used for?
for hypothesis testing on measures of association – can be done for any test of association between two variables
How is a permutation test conducted?
- variable 1 from an individual is paired with variable 2 data from a randomly chosen individual – this is done for all individuals
- estimate is made on randomized data
- whole process is repeated numerous times – distribution of randomized estimates is null distribution
What does it mean if permutation tests are done without replacement?
all data points are used exactly once in each permuted data set
What are the goals of experiments? (2)
- eliminate bias
- reduce sampling error (increase precision and power)
What are some design features that reduce bias? (3)
- controls
- random assignment to treatments
- blinding
What is a control?
group which is identical to the experimental treatment in all respects aside from the treatment itself
What is random assignment?
individuals are randomly assigned to treatments
How does random assignment reduce bias?
averages out effects of confounding variables
What is blinding?
preventing knowledge of experimenter (or patient) of which treatment is given to whom
How do the results of unblinded studies compare to blinded studies?
unblinded studies usually find much larger effects (sometimes 3x higher) – shows the bias that results from lack of blinding
How can you reduce sampling error?
increase signal to noise ratio
if ‘noise’ is smaller, it is easier to detect a given ‘signal’ – can be achieved with smaller s or larger n
What are some design features that reduce the effects of sampling error? (4)
- replication
- balance
- blocking
- extreme treatments
What is replication?
carry out study on multiple independent objects
What is balance?
nearly equal sample sizes in each treatment
What is blocking?
grouping of experimental unit – within each group, different experimental treatments are applied to different units
How do extreme treatments reduce effects of sampling error?
stronger treatments can increase the signal-to-noise ratio
How does balance reduce effects of sampling error?
increases precision
for a given total sample size (n1 + n2), standard error is smallest when n1 = n2
How does blocking reduce effects of sampling error?
allows extraneous variation to be accounted for – it is therefore easier to see the signal through the remaining noise
Blocking
–
What does ANOVA (analysis of variance) do?
compares means of more than two groups
asks whether any of two or more means is different from any other – is the variance among groups greater than 0?
How does ANOVA compare to a t-test?
like t-test, but can compare more than two groups
How does ANOVA compare to a t-test?
like t-test, but can compare more than two groups
What are they hypotheses for ANOVA?
H0: all populations have equal means (variance among groups = 0)
HA: at least one population mean is different
What is ANOVA with 2 groups mathematically equivalent to?
two-tailed 2-sample t-test
In ANOVA, under the null hypothesis, why should the sample mean of each group vary?
because of sampling error
In ANOVA, what is the standard error?
standard deviation of sample means (when true mean is constant)
In ANOVA, if null hypothesis is not true, what should variance among groups be?
variance among groups should be equal to variance due to sampling error plus real variance among population means
if at least one of the groups has a different population mean, we expect that variance between sample means can be captured by standard error
ANOVA
What is k?
number of groups
ANOVA
What is MSgroup?
mean squares group
ANOVA
What is MSerror?
mean squares error
What is the test statistic for ANOVA?
F
ANOVA
What should F be if null hypothesis is true?
1
ANOVA
What is F if null hypothesis is false?
F > 1
(but must take into account sampling error – F calculated from data will often be greater than one even when null is true, therefore we must compare F to null distribution)
What is an ANOVA table?
convenient way to keep track of important calculations
scientific papers often report ANOVA results with ANOVA tables
What are the assumptions of ANOVA? (3)
- random samples
- normal distributions for each population
- equal variances for all populations
What is the Kruskal-Wallis Test?
non-parametric test similar to a single factor ANOVA
uses ranks of the data points
What is a factor?
categorical explanatory variable
What is multiple-factor ANOVA?
ANOVAs can be generalized to look at more than one categorical variable at a time
- can ask whether each categorical variable affects a numerical variable
- can ask whether categorical variables interact in affecting the numerical variable
Multiple-factor ANOVA Graphs
–
ANOVA
What are fixed effects?
treatments are chosen by experimenter – not a random subset of all possible treatments
- things we care about
- ie. specific drug treatments, specific diets, season
ANOVA
What are random effects?
treatments are a random sample from all possible treatments
- things that can affect response variable, but we don’t care too much about
- ie. family, location
ANOVA
What is the difference in statistics for fixed or random effects for single-factor ANOVA?
no difference
What is 2-factor ANOVA?
test multiple hypotheses
ie. no difference based on North and South alone
Multiple Comparisons
What is the equation for probability of Type I error in N tests?
1 - (1-𝛼)^N
ie. for 20 tests, probability of at least one Type I error is ~65%
type 1 error rate for each test = 𝛼
Pr[not making type I error | null is true] = 1-𝛼
Pr[not making type I error on 2 tests | null is true] = (1-𝛼)(1-𝛼) = (1-𝛼)^N
Pr[at least one type I error] = 1- (1-𝛼)^N
Multiple Comparisons
What happens to the probability of type I error every time you do a test?
probability increases
- do too many tests → probability gets too high
- do more tests → will find something that is statistically significant due to chance
What is the Bonferroni Correction for multiple comparisons?
uses smaller 𝛼 value
𝛼’ = 𝛼 / (number of tests)
What does the Tukey Kramer test do?
compares all group means to all other group means to find which groups are different from which others
When are Tukey-Kramer tests done?
after finding evidence for differences/variation among means with single-factor ANOVA
What are the hypotheses for Tukey-Kramer test?
H0: 𝜇1 = 𝜇2
H0: 𝜇1 = 𝜇3
H0: 𝜇2 = 𝜇3
etc.
What is the probability of making at least one Type I error in Tukey-Kramer test?
probability of making at least one Type 1 error throughout the course of testing all pairs of means is no greater than significance level (𝛼)
Tukey-Kramer Graph
–
Why do we use Tukey-Kramer instead of a series of two-sample t-tests? (3)
- multiple comparisons would cause t-tests to reject too many true null hypotheses
- Tukey-Kramer adjusts for the number of tests
- Tukey-Kramer also uses information about variance within groups from all the data, so it has more power than t-test with Bonferroni correction
What is the parameter for correlation?
⍴ (rho)
value is between -1 and 1
What is the estimate for correlation?
correlation coefficient (r): describes relationship between two numerical variables
What is the coefficient of determination (r^2)?
describes proportion of variation in one variable that can be predicted from the other variable
What is covariance in relation to variance?
variance is subset of covariance
What are the assumptions of correlation tests? (3)
- random sample
- X is normally distributed with equal variance for all values of Y
- Y is normally distributed with equal variance for all values of X
Correlation
What does it mean if ⍴ = 0?
- r is normally distributed with mean = 0
- every time sampling distribution is normal, use t when using estimated standard error
- if ⍴ ≠ 0, there is asymmetry
What is Spearman’s Rank correlation?
alternative to Pearson’s correlation that does not make so many assumptions
Correlation
What is attenuation?
estimated correlation will be lower if X or Y are estimated with error
What does correlation depend on?
range
Are species independent data points?
NO
What is a similarity between correlation and regression?
both compare two numerical variables
What is a difference between correlation and regression?
each ask different questions:
- correlation – symmetrical
- regression – asymmetrical
What does regression do?
predicts Y from X (one variable from another)
What does linear regression assume? (3)
- random sample
- Y is normally distributed with equal variance for all values of X, assuming variance for all values of X is the same
- relationship between X and Y can be described by a line
Parameters of Linear Regression – graphs
–
What is the equation for the estimated regression line?
Y = a + bX
What is the least squares regression line?
best line that minimizes sum of squares for the residual
What is a residual?
residual = observed Y - predicted Y
for every X value, Ŷ (predicted value of Y, by regression line) is value of Y right on the line
Regression
What does the coefficient of determination (r^2) do?
predicts amount of variance in Y explained by regression line
Regression
What do you need to be cautious about?
unwise to extrapolate beyond range of the data
What are the hypotheses for regression?
H0: 𝛽 = 0
HA: 𝛽 ≠ 0
Regression
What is the degrees of freedom for residual?
df = n -2
What are confidence bands?
confidence intervals for predictions of mean Y
What are prediction intervals?
confidence intervals for predictions of individual Y
How can non-linear relationships be ‘fixed’ (turned linear)? (3)
- transformations
- quadratic regression
- splines
What do residual plots do?
help assess assumptions
What should the residual plot look like?
- mean population is right on the line, and there’s variance around it
- residual should roughly be the same size across all values of X (should be centred around 0, with equal positives and negatives)
- residual should be spread out across the line, and about the same distance from the line on average for every X
Polynomial Regression
Why should you NOT fit a polynomial with too many terms? (3)
(sample size should be at least 7x the number of terms)
- very unlikely that new X would fall on the line
- tradeoff between fit and prediction error – would fit better with your particular data set, but would have larger prediction error
What does logistic regression do?
tests for relationship between numerical variable (as the explanatory variable) and binary variable (as the response variable)
ie. does the dose of a toxin affect probability of survival?
ie. does the length of a peacock’s tail affect its probability of getting a mate?
What is publication bias?
papers are more likely to be published if P < 0.05 – causes bias in science reported in literature
What are computer-intensive methods for hypothesis testing?
- simulation
- randomization
What are computer-intensive methods for confidence intervals?
bootstrap
What is simulation?
simulates sampling process on computer many times – generates null distribution from estimates done on simulated data
computer assumes null hypothesis is true
What is the equation for likelihood?
L(hypothesis A | data) = P[data | hypothesis A]
What does likelihood NOT care about?
other data sets – ONLY cares about the specific data set we have
What does likelihood capture?
captures level of surprise
prefer models that make data less surprising, and have higher likelihood
Does likelihood consider more than one possible hypothesis?
yes
What is the law of likelihood of a particular data set?
supports one hypothesis better than another if likelihood of that hypothesis is higher than likelihood of the other hypothesis
therefore we try to find the hypothesis with maximum likelihood (least surprising data) – all estimates we have learned so far are also maximum likelihood estimates
What are the 2 ways to find the maximum likelihood?
- calculus
- computer calculations
How to Find Maximum Likelihood
Calculus
ie. maximum value of L(p=x) is found when x = ⅜
note that this is the same value we would have gotten by methods we already learned
How to Find Maximum Likelihood
Computer Calculations
- input likelihood formula to computer
- plot value of L for each value of x
- find largest L
What does hypothesis testing by likelihood do?
compares likelihood of maximum likelihood estimate to null hypothesis
use log-likelihood ratio
What is the test statistic for hypothesis testing by likelihood?
ꭓ^2 = 2 (log likelihood ratio)
What is the degree of freedom for hypothesis testing by likelihood?
df = number of variables fixed to make null hypothesis
When producing a 95% confidence interval for the difference between the means of two groups, under what circumstances can a violation of the assumption of equal standard deviations be ignored?
two-sample t-tests and confidence intervals are robust to violations of equal standard
deviations as long as:
- sample sizes of the two groups are roughly equal
- standard deviations are within three times of one another.
What is the justification for including extreme doses well outside the range of exposures encountered by people at risk in a dose-response study on animals of the effects of a hazardous substance? What are the problems with this approach?
- extreme doses increase power, and so enhance the probability of detecting an effect
- however, effects of a large dose might be very different from effects of a smaller, more realistic dose
- if an effect is detected, then studies of the effects of more realistic doses would be the next step
What does randomization do?
removes effects of confounding variables
What does blinding do?
avoids unconscious bias
What happens if a study has a poor control?
increases possibility of confounding by unmeasured variables
What are planned vs. unplanned comparisons?
unplanned comparisons – intended to search for differences among all pairs of means
planned comparisons – must be few and identified as crucial in advance of gathering and analyzing the data
The largest pairwise difference between means, that between the “medium” and “isolated” treatments, is statistically significant. How is this possible, given that neither of these two means is significantly different from the means of the other two groups?
failure to reject a null hypothesis that the difference between a given pair of means is zero does not imply that the means are equal, because power is not necessarily high, especially when the differences are small
if the means of the “medium” and “isolated” treatments differ from one another, then one or both of them must differ from the means from the other two groups, but we don’t know which
What quantity would you use to describe the fraction of the variation in expression levels explained by group differences?
R^2
Earwig density on an island and the proportion of males with forceps are estimates, so the measurements of both variables include sampling error. In light of this fact, would the true correlation between the two variables tend to be larger, smaller, or the same as the measured correlation?
sampling error in the estimates of earwig density and the proportion of males with forceps means that true density and proportion on an island are measured with error
measurement error will tend to decrease the estimated correlation
therefore, the actual correlation is expected to be higher on average than the estimated correlation.
How do you analyze assumptions of linear regression in scatter plot?
- residuals are symmetric and don’t show any obvious non-normality
- variance of the residuals does not appear to change greatly for different values of X
What is a least squares regression line?
minimizes the sum of squared differences between the predicted Y-values on the regression line for each X and the observed Y-values
What are residuals?
differences between predicted Y-values on the estimated regression line, and the observed Y-values
What does the MSresidual measure?
variance of the residuals
Linear Regression
What does R^2 measure?
fraction of the variation in Y that is explained by X
The data set depicted in the graph includes one conspicuous outlier on the far right. If you were advising the forensic scientists who gathered these data, how would you suggest they handle the outlier?
- first, check the data to ensure this individual was not entered incorrectly
- perform the analysis with and without the outlier included in the data set to determine whether it has an influence on the outcome
- if it has a big influence, then it is probably wise to leave it out and limit predictions to the range of X- values between 0 and about 200 (and urge them to obtain more data at the higher X-value)
What do confidence bands measure?
give the confidence interval for the predicted Y for a given X
Which bands would provide the most relevant measure of uncertainty?
prediction interval, because it measures uncertainty when predicting Y of a single individual
What is ANCOVA?
(analysis of covariance)
compares many slopes
What are the hypotheses of ANCOVA?
H0: 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4 = 𝛽5… (multiple null hypotheses)
HA: at least one of the slopes is different from another
What is bootstrapping?
method for estimation (and confidence intervals)
- often used for hypothesis testing too
- often used in evolutionary trees
What is the method for bootstrapping?
- for each group, randomly pick with replacement an equal number of data points, from data of that group
- with this bootstrap dataset, calculate bootstrap replicate estimate
Why are paired samples analyzed differently than separate samples?
two individuals in a pair share many things in common with each other but
differ from members of other pairs
whatever variation these shared differences causes in the response variable is factored out in the difference between them
by looking at the differences, we potentially avoid much of the error variance in the data
separate samples do not share these properties