Stats Flashcards
What are the two kinds of statistics in respect to their use?
1) Descriptive statistics: Measures of central tendency and variability
2) Inferential statistics: Parameter estimation, defining uncertainty, determining reasons for variation.
Bias
Any systematic deviation between sample estimates and a true value
Inference
Drawing a conclusion from a premise.
Premise
A premise is a statement we assume is true (e.g. data and observations).
The two kinds of variability in a study
1) Variability related to the variables we’re investigating.
2) Variability that is not interesting in the context of what we are investigating (noise variability).
What is the purpose of inferential statistics?
1) To discriminate between interesting variation and noise variation.
2) To determine the probability of observing such variability if a scientific mechanism was not operating.
What is an informal way think of “statistically significant” as?
Statistically significant = unlikely to hve ocurred by chance.
How does statistical analysis fit into the scientific method?
Statistical analysis allows for an objctive assessment of evidence in support or against a hypothesis.
What is a scientific hypothesis?
A scientific hypothesis is a proposed cause and effect relationship between a process and an observation.
Observation = what Hypothesis = how
What is a statiscial hypothesis?
Simply a statment about whether there is or not a pattern of interest in the data.
What are the two types of statistical hypotheses?
- H0* (null hypothesis) = No effect on predictor variable
- HA* (alternative hypothesis) = Effect on predictor variable
What are the two kinds of variables in an experiment?
1) Predictor variable (aka independent variable)
2) Response variable (aka dependent variable)
What is µ0 “mew not” in a one-sample study?
µ0 is the true population mean.
What is α?
α is a set proability criterion we use to reject a null hypothesis. It’s a set chance for incorreclty rejecting a null hypothesis.
In testing a hypothesis, what is a sample used for?
In testing a hypothesis, we use a sample to estimate characteristics of an underlying population.
The statement “We calculate the proability H0 is true, given the data” is wrong.
1) Why is this?
2) What is the correct statment?
1) Population paremeters are fixed, so either H0 is true or not.
2) The correct statment would be “We calculate the probability of oberving the data we gathered given a H0”.
How are the N0 and NA formualted in a one-sample test?
- H0*: µ = µ0
- HA*: µ ≠ µ0
OR
- H0*: µ - µ0 = 0
- HA*: µ - µ0 ≠ 0
To test a hypothesis we use a test statistic. Broadly, how is a test statistic calculated?
Test statistic = effect (i.e. µ - µ0) / error
How is a test statistic used for testing a hypothesis?
1) Either comparing the test statistic to a critical value
or
2) calculating a p-value associated with that test statistic
How is a p-value interpreted?
The p-value can e though of as the probability of observing the data if the H0 was true.
In the example of a z statistic, what is z?
z is the number of stander deviations by which the observed mean differs from the population mean.
What does the central limit theorem state?
The CLT states that the distribution of means from a non-normal populaion will not be normal but will approximate normalityas n increases.
How is population variance calculated?
How is sample variance calculated?
What is standard error and how is it calculated?
The standar error (aka SE, SEM) is the standard deviation of a statistic (in this case mean) and is calculated as:
Noting that we don’t know σ,
How is SE estimated?
Up to 19:00
We can estimate SE as:
This is because the best estimate for population variance σ is sample variance s.
What is the relationship between sample size n and variance in the distribution of sample means?
The variance in the distribution of means will decrease as n increases.
How si the t statistic calculated
1) What is the distribution shape difference between z-distribution and t-distribution?
2) What effect does this have on a critical value?
1) In the t-distribution, there is more area at the distribution tails. Also the t-dsitribution is “pushed at the top”.
2) a t-critical value is more extreme than a z-critical value (see bars in the figure)
Note: Remember than for a normal distribution, the percentage of values in an area can be known with the number of standard deviations form the mean.
★ one sample t-test example
We want to know whether drug A significantly changes the body temperature of healthy human adults 2 hours after taking the drug. Note that the normal body temperature is 37 °C. We take our measurements from a sample and find a mean temperature of 38.5 °C and a variance of 3.4. The sampe size is 30.
Note: On the final exam we’ll have to calculate variance, which will not be given to us.
s = sqrt(3.4/30) = 0.3366502
t = (38.5 - 37)/0.3366502 = 4.456
Then we look up the t critical value in a table using:
two tailed, 29 df, and an α of 5%,
and get a value of 2.045.
Because 4.456 > 2.045 we reject the null hypothesis and cunclude that drug A significantly changes body temperature.
When do we use a t statistic instead of a z statistic?
We can’t use z if we are estimating σ from s.
What is the value of v (derees of freedom) for an hypothesis about mean?
v = n - 1
How does the location of the critical value of a one-taled test differ from the critical calue of a two-tailed test?
For a one-tailed test, we put the entire rejection region into one tail of the t-distribution, instead of splitting it between the two tails.
In the following t-distribution graph, you wold reject the null hypothesis if the t-value was less than the critical value (shown in red).
What is t thought of as?
Like z, t is the number of standard deviations from the mean.
What are the typed of errors in hypothesis testing?
1) Type I error (α) = rejecting true H0
2) Type II eror (β) = failing to reject a false H0
Note that: When µ ≠ µ0, the critical value defines the boundary between power and type II error.
In a t-distribution, why do we need to know the degrees of freedom?
The degrees of freedom are needed because the distribution shape changes for different degrees of freedom.
Note that t-tables only tell us whether the p-value is greater or less than a specified α.
If instead of using tables, you want to know the p-value, how do you calculate it?
In R, you can use:
1 - pt(4.456,29)
In an example similar to the drug A and temperature example, when would you use a one-sampled test?
You would use a one-tail test if you’re only interested in whether body temeprature is either increasing or decreasing as a result fo the drug.
In an the drug A and temperature example, how would you write the one-tailed statistical hypotheses in the following cases?
1) We want to know whether the drug increases body temperature
2) We want to know whethr the drug decreases body temperature
1) We want to know whether the drug increases body temperature
- H0*: µ - µ0 ≤ 0
- HA*: µ - µ0 > 0
2) We want to know whethr the drug decreases body temperature
- H0*: µ - µ0 ≥ 0
- HA*: µ - µ0
Note that the = sign always is part of H0 and not *HA
H0*
: µ - µ0 ≤ 0
HA: µ - µ0 > 0
What is the formula for a two-sample t-test looking for any diffeence between the two samples?
What is the formula for a two-sample t-test with a given µ0 different than 0 (i.e. looking for a specific difference between two sample means)?
In a one-sample t-test we use s to estimate σ.
In a two-sample t-test we do something similar. We assume that s1 and s2 are similar, but not the same, so we use s2p as a pooled variance estimator (see formula).
How is s2p calculated?
where SS is sum of squares.
How is the formula for two-sample one-tailed t-test different from the two-tailed formula?
In the formula for one-tailed t-tests, the values of the means are not absolute.
Using the visual representation of a t-distribution, explain why we always need to accept some level of error.
We need to accept some level of error because the t-distribution asymptotes at the x axis and there is no value of t that corresponds to a proability of 0%.
What is statistical power?
Statistical power (1 - β) is the probability of correclty rejecting a false H0
What is the relationship between power and the difference between µ and µ0?
The greater the difference between µ and µ0, the greater the power we have to deect the difference.
What does the probability of a type II error depend on?
The probability of a type II error depends on:
1) what HA is
2) how large an effect we hope to detect
3) sample size
4) how good the experimental design was
When we set an α of 0.05, we often have a β of around 0.2 and a power of around 0.8.
★ Welsch’s test example
We want to test for a difference in protein concentration between two pea populations. We determine variances are heterogenous and thus use a Welsch’s test:
Results:
meanfert = 24 g protein
SSfert = 261 g2
nfert = 30
meanunfert = 21.8 g protein
SSunfert = 320 g2
nunfert = 29
- *s2f** = SSf / (nf - 1) = 261/29 = 9
- *s2u** = SSf / (n<span>u</span> - 1) = 320/28 = 11.43
t’ = (x̄1 - x̄2)/sqrt(s21/n1 + s22/n2)
= (24 - 21.8)/sqrt(9/30 + 11.43/29) = 2.6406
Wilsch has a different distribution, so we need to use a special formula to calculate the degrees of freedom:
v’ = (s2x̄1 + s2x̄2)2/(s2x̄1)2/(n1 - 1) + (s2x̄2)2/(n2 - 1)
but first:
- *s2x̄f** = s2f/nf = 9/30 = 0.3
- *s2x̄u** = s2u/n<span>u</span> = 11.43/29 = 0.3941
- *v’** = (s2x̄1 + s2x̄2)2/[(s2x̄1)2/(n1 - 1) + (s2x̄2)2/(n2 - 1)] =
(0. 3 + 0.3941)2/(0.3)2/29 + (0.3941)2/28 = 55.6939
Now that we know v’ we check the t-table and find t0.05(1),55.6939 = 1.672677 ⇒ N0 rejected
What increases statistical power?
These elements increase statistical power:
1) greater difference between µ and µ0
2) larger α
3) larger n
4) smaller σ2
5) one-tailed tests
For a one-tailed Mann-Whitney / Wilcoxon test, you have to decide which is the tail of interest. How does this work?
What are the assumptions of one-sample t-tests?
1) Data are a random sample
2) Each data point is independent from each other
3) Data come from a normally-distributed population
Note: One-sample-t tests are robust against non-normality as long as data are symmetrical.
How are the statistical hypotheses written for testing the proability of getting different means from two populations?
H0: µ1 = µ2
HA: µ1 ≠ µ2
OR
H0: µ1 - µ2 = 0
HA: µ1 - µ2 ≠ 0
What are the assumptions for a two-sample t-test?
1) data are random and independent
2) Both samples come from normally-distributed populations
3) Both populations have equal variances
★ two-sample two-tailed t-test example
We want to test for a difference in protein concentration between two pea populations:
Results:
meanfert = 24 g protein
SSfert = 261 g2
nfert = 30
mean<sub>unfert</sub> = 21.8 g protein SS<sub>unfert</sub> = 320 g<sup>2</sup> n<sub>unfert</sub> = 29
- H0*: µ1 - µ2 = 0
- HA*: µ1 - µ2 ≠ 0
s2p = (SSf + SSu)/dff +dfu = (261 + 320)/(29 + 28) = 10.193 g2
sx̄f-x̄u = sqrt(s2P/nf + s2p/nu) =
sqrt(10.193/30+ 10.193/29) = 0.8314 g
t = (x̄f - x̄u)/s<em>x̄f - x̄u</em> = (24 - 21.8)/0.8314 = 2.645
v = 57, t-critical = 2.0.
absolute value > critical value, so we reject the null hypothesis.
For the following one-tailed test hypotheses, based on the relationship between the observed and critical t-values, when do you reject the null hypothesis?
1) HA: µ1 - µ2 < 0
2) HA: µ1 - µ2 > 0
1) HA: µ1 - µ2 < 0
* H0* is rejected if t ≤ tα(1),v’
2) HA: µ1 - µ2 > 0
* H0* is rejected if t ≥ tα(1),v’
Note that for a two-tailed test,
For a two-tailed test,
HA: µ1 - µ2 ≠ 0
we reject H0 if | t | ≥ t ≥ tα(2),v’
★ two-sample one-tailed t-test example
We want to test the hypothesis that bean protein concentration increases by at least 2 g/100 g beans when bean plants are fertilized. We do the study and get the following results:
meanfert = 24 g protein
SSfert = 261 g2
nfert = 30
dffert = 29
meanunfert = 21.8 g protein
SSunfert = 320 g2
nunfert = 29
dfunfert = 28
- H0*: µ<span>f</span> - µ<span>u</span>**HA: µ<span>f</span> - µ<span>u</span> ≥ 2
- s2p* = (SSf + SSu)/dff +dfu = (261 + 320)/(29 + 28) = 10.193 g2
sx̄f-x̄u = sqrt(s2P/nf + s2p/nu) =
sqrt(10.193/30 + 10.193/29) = 0.8314 g
t = (x̄f - x̄u)/sx̄f-x̄u = (24 - 21.8 - 2)/0.8314 = 0.240558
v = 57, t-critical = 1.67.
Because out t-value is less than the t-critical, we cannot reject out null hypothesis that there is a difference of at least 2 g between both treatments.
What assumption violations is the t-test most sensitive to?
T-test is quite robust to considerably non-normality, but violation of random/independence and homogeneity of variances is serious.
- For the figure below, in which two-sample t-test would there be higher power?
a) A
b) B
a) A
* *b) B**
- Use the figure below to answer the next 3 questions. Which area under the curve(s) represents the probability of correctly not rejecting the null hypothesis?
A) A
B) B
C) C
D) D
E) A + D
F) C + B
A) A
B) B
C) C
D) D
E) A + D
F) C + B
- In the figure above, which area under the curve(s) represents the probability of incorrectly not rejecting the null hypothesis?
A) A
B) B
C) C
D) D
E) A + D
A) A
B) B
C) C
D) D
E) A + D
- In the figure above, if this hypothesis test were performed at a significance level of 0.01, what probability would A represent?
A) 0.05
B) 0.975
C) 0.01
D) 0.0005
E) 0.005
A) 0.05
B) 0.975
C) 0.01
D) 0.0005
E) 0.005
Which two factors increase rubustness against heterogenous variances in a t-test?
T-tests are a little bit more robust against variance heterogeneity if:
1) sample sizes are similar
2) sample sizes are above 30
2) the test is two-tailed
How are assumptions of a two-sample t-test tested?
1) data are random and independent: cannot be checked. Done from experimental design.
2) Both samples come from normally-distributed populations: Visual inspection and Shapiro-wilk test
3) Both populations have equal variances:
Visual inspection and Fligner - Killeen test
One example of violation of the independence assumption is when samples are paired (repeated measues). How could you get around the assumption of independence with paired data?
Paired data can be combined into a new sample by calulating their differences and this will now make data points independent.
For two-sample analysis, how do you analyse the data in the following scenarios?
1) Both samples normal and equal variances
2) Both samples normal but unequal variances
3) Both samples non-normal but equal variances
4) Both sampes non-normal and unequal variances
1) Both samples normal and equal variances:
two-sample t-test with pooled variance
2) Both samples normal but unequal variances:
Welsch’s two-sample t-test (no pooled variance)
3) Both samples non-normal but equal variances:
Mann-Whitney or Wilcoxon rank test
4) Both sampes non-normal and unequal variances:
Transformation and re-assessment
Note that for Welsch’s test, we use a t’ statistics instead of a t statistic.
Same as v’, which is a different calculation of df.
What is the main characteristics of the Mann-Whitney / Wilcoxon test?
It’s a non-parametric test. Because of this:
1) It does not require estimation of population paameters
2) Hypotheses are not statements about population parameters
However,
3) it assumes that the data are random
How are data treated in a Mann-Whitney / Wilcox test?
What is a drawback of this test?
Data are ranked wither from high to low or from low to high.
Convertion of data into ranks causes a loss of information and therefore power.
For the following samples of germination times, fill in the “Rank A” and “Rank B” columns with the ranks that we would assign to these data in order to do a two-sample Mann-Whitney/Wilcoxon test.
Step 1: assign ranks to all numbers. If a number is repeated, they still get ranks n+1 where n is the previous rank.
Step 2: average the ranks in the repeated numbers.
What are the two statistics calculated in a Mann-Whitney / Wilcoxon test?
How are they calculated?
u = n1n2 +[n1(n1 + 1)]/2 - R1
u’ = n1n2 - u
★Mann-Whitney / Wilcoxon test example
Height of males: 193, 188, 185, 183, 180, 175, 170
Height of females: 178, 173, 168, 156, 163
Ranks of male heighs: 1, 2, 3, 4, 5, 7, 9.
Ranks of female heighs: 6, 8 ,10, 11, 12
n<sub>m</sub> = 7 n<sub>f</sub> = 5 R<sub>m</sub> = 31 R<sub>f</sub> = 47
R is the sum of the ranks from each sample
- H0* = Male and female students are the dame height
- HA* = Male and female students are not the same height
Not that no hypothesis is made on any population parameters.
u = n1n2 + n1(n1 + 1)/2 - R1
= (7)(5) + (7)(8)/2 - 31
= 35 + 28 - 31
= 32
u’ = n1n2 - u = (7)(5) - 32 = 3
Then you compare either u or u’, whichever is larger to the u critical (uα(2),n1,n2). If greater, reject H0.
This calculaton is not done by hand for in the exam.
★Mann-Whitney / Wilcoxon test in R
How do you do this in R?
1) make a string will all the data:
height
2) make a string corresponding to sex for each data point
sex
3) test:
wilcox.test(height~sex)
- What is a t-value?
a) A variance
b) A number of standard errors from the mean for a t-distribution with a given number of degrees of freedom
c) A statistic that, without any other information, tells you whether your alternative hypothesis is true
d) A non-parametric test statistic
a) A variance
b) A number of standard errors from the mean for a t-distribution with a given number of degrees of freedom
c) A statistic that, without any other information, tells you whether your alternative hypothesis is true
d) A non-parametric test statistic
- On a standard normal distribution, 95% of the observations are contained within how many σ of μ? Choose the best approximation.
a) 1
b) 1.645
c) 2
d) 2.5
e) 3
a) 1
b) 1.645
c) 2
d) 2.5
e) 3
- In which of the following situations should we select a Welch’s two-sample t-test as the most appropriate and powerful option for conducting a hypothesis test?
a) Both samples are non-normally distributed, sample variances are equal, and sample distributions are similar
b) One sample is non-normally distributed and variances are unequal
c) One sample is non-normally distributed and variances are not equal
d) Both samples are normally distributed, and variances are equal
e) Both samples are normally distributed and variances are unequal
a) Both samples are non-normally distributed, sample variances are equal, and sample distributions are similar
b) One sample is non-normally distributed and variances are unequal
c) One sample is non-normally distributed and variances are not equal
d) Both samples are normally distributed, and variances are equal
e) Both samples are normally distributed and variances are unequal
- Which statement about the following study description is correct?
A herbicide-resistant strain of wheat and a non-herbicide resistant strain of wheat are grown, with 30 plants of each in a greenhouse before they are sprayed with a new herbicide that is going on the market. The researcher wants to test whether the herbicide-resistant strain (which was genetically engineered for resistance to different herbicides than the one being tested in this study) shows better growth and seed set than the control, following the spray.
a) The dependent variables are growth and seed set.
b) In a graph of the seed set results, seed set should be plotted on the x-axis.
c) A one-sample test is appropriate for this situation.
d) A paired-sample test is appropriate for this situation.
a) The dependent variables are growth and seed set.
b) In a graph of the seed set results, seed set should be plotted on the x-axis.
c) A one-sample test is appropriate for this situation.
d) A paired-sample test is appropriate for this situation
- Which of the following statements is correct?
a) A statistical hypothesis is a statement about a cause-and-effect relationship between 2 or more variables.
b) A scientific hypothesis is a statement about a cause-and-effect relationship between 2 or more variables.
c) A statistical hypothesis must be proved to accept or reject a scientific hypothesis
d) “Descriptive statistics” refers to testing how much variation in an observed variable is due to a predictor variable, versus how much is due to chance alone.
a) A statistical hypothesis is a statement about a cause-and-effect relationship between 2 or more variables.
b) A scientific hypothesis is a statement about a cause-and-effect relationship between 2 or more variables.
c) A statistical hypothesis must be proved to accept or reject a scientific hypothesis
d) “Descriptive statistics” refers to testing how much variation in an observed variable is due to a predictor variable, versus how much is due to chance alone.
- If we are interested in testing a hypothesis about a difference in two means, as the uncertainty (error) of our estimates of the means increases, our chance of detecting a real difference:
a) Decreases
b) Increases
c) Is not affected
- *a) Decreases**
b) Increases
c) Is not affected
- Conceptually, why is the standard error of the mean always smaller than the standard deviation of a sample, when both are derived from the same sample data?
a) Standard deviation is a measure of sample variability, whereas standard error of the mean is an estimate of the standard deviation of the distribution of sample means from which that sample is assumed to have come, and distributions of sample means are always narrower than the sample distribution from which they are estimated.
b) Standard deviation is not always smaller than the estimate of standard error derived from the same sample. It is bigger when sample size is large (>30).
c) Because the standard deviation represents the 95% confidence interval, whereas standard error represents one standard deviation of the distribution of sample means.
d) Standard deviation is the width of the distribution of sampling means, whereas standard error is a measure of sample variability, and the distribution of sample means is always more variable than a single sample.
a) Standard deviation is a measure of sample variability, whereas standard error of the mean is an estimate of the standard deviation of the distribution of sample means from which that sample is assumed to have come, and distributions of sample means are always narrower than the sample distribution from which they are estimated.
b) Standard deviation is not always smaller than the estimate of standard error derived from the same sample. It is bigger when sample size is large (>30).
c) Because the standard deviation represents the 95% confidence interval, whereas standard error represents one standard deviation of the distribution of sample means.
d) Standard deviation is the width of the distribution of sampling means, whereas standard error is a measure of sample variability, and the distribution of sample means is always more variable than a single sample.
- What is the Central Limit Theorem?
States the following:
The distribution of means taken from a population which is or not normal will approximate normality as sample size increases.
- What is a p-value?
The probability of collecting certain data if H0 was true.
- For a two-tailed t-test, with α of 0.05, what does the lower critical value tell us?
It tells us the t-value below which there is a 2.5% or lower chance of having gotten a sample t-value that small if the null hypothesis was true.