Multiple Testing: Flashcards
What is a two-sample t-test?
A t-test only comparing two samples
What is a one sample t-test?
When we compare a group to a specific mean or group
Name the function that would be used to compare the treatment and the control in a drug trial, using a two sample t-test:
t. test(output~treatment, data= )
- Where output is the treatment
- Where data is the data frame
Name the function used to compare the output to the mean in a one sample t-test:
t. test(mu= , data)
- Where mu is the population mean
- Where data is the data frame name
How do you perform a one or two-tailed t-test in R?
By adding the argument “alternative=greater/less” into line of code
Name the function used to compare the treatment and the control in a one-tailed two sample t-test:
t. test(output~treatment, alternative = “less”, data)
- Where data is the data frame
What is a paired sample t-test also known as?
A dependent sample t-test
When is a paired sample t-test used?
When samples are closely related to each other
What is an example of samples being closely related?
Measuring the same sample or patient twice before and after a certain treatment
What kind of t-test would you use when comparing cells treated with a drug versus with the control?
t-test
What kind of t-test would you use when comparing students’ grades in a BMB module of the different academic years?
t-test
What t-test would you use when comparing students’ grades before and after tutoring?
Paired t-test
What t-test would you use when comparing heights of males in Denmark versus in Tailand?
t-test
What kind of t-test would you use when measuring patients’ blood pressure before and after taking a new drug?
Paired t-test
What t-test would you use to compare the times runners take to finish a marathon before and after nutritional changes?
Paired t-test
How do you carry out a paired t-test in R?
By adding the argument “paired = True/T” to the t.test() function
List the function that would be used to compare the treatment and the control in a two tailed, two sample paired t-test:
t. test(output~treatment, paired = T, data= )
- Where data is the dataframe
What is the student t-test?
A hypothesis test
Why is it important to check the assumptions of a hypothesis test?
- They are based on certain assumptions about the data
- If data doesn’t fit with assumptions, the probability calculations underlying the test are likely to be incorrect
- This increases the chances of a false negative or false positive result
- This can have bad consequences
List the assumptions of the t-test:
- Dependent variable must be continuous and the independent variable must be bivariate
- Population (not sample) must be normally distributed
- The data of the two populations from which sample is taken must have equal variance
What does it mean that the dependent variable must be continuous?
The dependent variable is the outcome, which needs to be continuous (able to take on any value)
What does it mean that the independent variable must be bivariate?
- The t-test can only compare two groups
- So there can only be two levels for the dependent variable
- Underlying data could have more than 2 levels but only two at a time is analysed with the t-test
What does it indicate if a sample is normal distributed?
That the population is also normally distributed
What does a normal distribution look like?
- Most values are clustered around the mean
- The tails on either side are fairly symmetrical
What is the advantage of a normal quantile-quantile plot (Q-Q plot) over a histogram?
It gives a clearer indication of a normal distribution
What is the disadvantage of a normal quantile-quantile plot (Q-Q plot) over a histogram?
It is slightly more complicated than a histogram
How is a quantile-quantile plot made and how does it give a clearer indication of a normal distribution?
- It compares the quantiles of the data (sample) with the theoretical quantiles from a normal distribution
- A straight line indicates a normal distribution
What are quantiles also known as?
Percentiles
How can you check that the variance of two populations are the same in R?
By using the function to produce the summary statistics for both populations and compare the variance
What does the describeBy function do?
Summarise the data and compare the standard deviations
Where can the describeBy function be found?
In the psych package
What does it mean that R uses the Welch’s t-test by default?
It doesn’t assume that the variance is equal
How do specify that the variance is both populations are true in R when completing a student t-test? What effect does this have?
- Using the argument var.equal = TRUE
- This increases the power of the test a little bit
- In most situations, there is little advantage to doing this
What might cause an experiment to have several groups or samples?
- Performing a positive and negative control
- Carried out different independent variables/carried out same variable at different concentrations
How would you use the t-test to carry out comparisons in an experiment where there are many groups/samples?
Many t-tests would have to be conducted
List the reasons why taking several t-tests is not efficient:
- It takes a lot of effort and time
- The family wise error rate (FWER)
?????What is best to check before performing several t-tests?
That there is a difference (in what?)
What is the Family wise error rate (FWER)?
The probability of getting a false positive if the null hypothesis is true across a group of tests
What is the “family” in the family wise error rate?
Usually a group of tests on the same data set or the number of tests
What does it mean that the significance level is 5%? How does falsely rejecting the null hypothesis link to this?
- We strongly reject the null hypothesis five times out of a hundred based only on variation in samples
- Every 20th comparison, we are likely to reach the 5% significance level and falsely reject the null hypothesis
Multiple testing increases the probability of what type of error?
Type 1 error (false positive)
If the probability of getting a type 1 error increases during multiple testing, what does this mean about alpha?
The type 1 error rate is no longer equal to alpha but increases with the number of tests
What is the probability of getting a false positive in a single test?
Alpha
What is the probability of not getting a false positive in one test?
1-alpha
What is the probability of not getting a false positive in m number of tests?
(1-alpha)^m
What is the probability of getting at least one false positive in m number of tests?
1- (1-alpha)^m
What pattern do you see when you plot the probability of getting at least 1 false positive to the number of trials?
The probability increases with the number of tests performed
What test can be used to compare several means/samples together?
F-test
What is the F-test based on?
The comparison of the variance within the samples with the variance between the samples
What is comparison between the variance within and between samples called?
The analysis of variance (ANOVA)
What question does the F-test ask?
Are all the values that have been measured in our samples are from the same population, or is at least one group from a different population?
Three samples are taken, each with a different mean measured. Each sample mean lies within one standard deviation of a particular population’s mean. What does this indicate?
That all free samples were of the same population
Three samples are taken, each with a different mean measured. Only two of the sample mean lies within one standard deviation of a particular population’s mean. What does this indicate?
The third sample is very unlikely to be measured from the same population as the other two samples
How is variance calculated?
- Squaring standard deviation
- Taking sum of squared difference between each observation and the sample mean then dividing it by degree of freedom
- Mean of the squares minus square of the mean
What is the degree of freedom?
The number of observations minus 1
What does the variance of a sample show you?
The dispersion of the sample (how spread out it is)
Why can both variance or standard deviation be used in ANOVA?
Both variance and standard deviation show the dispersion of data and are very closely linked
How do you work out the overall mean of several samples?
- Calculate the mean of all samples
- And up the values of the mean of every sample
- Divide by the total number of samples
What is SSw?
The sum of squared differences within the groups
How do you work out SSw?
- Work out the difference between observations made and the mean for each sample
- Square each individual value
- Add up each individual value
What is SSB?
The squared differences between the mean of each group and overall mean
What is the weighting factor?
The number of observations
How does the number of observations act as the weighting factor?
It accounts for samples with a different number of replicates
Why is the degrees of freedom considered to be n-1?
- If we know the sample mean, we can work out the missing value if only one data point from the sample is missing
- This means we don’t need need to know all the data points-it is enough to know one less than the total number of observations
- This means that the last value is set and not free to be any value
- This concept is called the degrees of freedom
-
What does N denote?
The total number of observations (sum of all n of each group)
What does n denote?
The number of observations in each group
What does K (or m) denote?
The number of groups or samples
When we know the sample mean and overall mean, what happens to the degrees of freedom?
The degrees of freedom change for the variance between groups and within groups
The degrees of freedom between the groups is equal to what?
The degrees of freedom when calculating the standard deviation for a sample with n observations
Why is the degree of freedom between groups equal to K-1?
Because the overall mean was calculated from other sample means
Why is the degrees of freedom within a group N-K?
- All observations and sample means from each group is used
- Therefore degrees of freedom is defined by the total number of observations from all samples (N) minus the number of sample means (equal to number of groups-K)
What is the F-value of statisitics (generated from the F-test)?
The ratio between the variance between and within a group
What is the equation for the F-value?
Variance within groups
What is the output of ANOVA?
The F-statitsic
Why are the degrees of freedom so important for the output pf ANOVA?
- We need to find the p-value for the specific ratio of the variance between groups to the variance within groups to analyse the output of ANOVA
- The F-distribution is strongly dependent on the degrees of freedom of the two variances (between and within groups)
How does R make finding the degrees of freedom easier?
It automatically calculates the degrees of freedom and uses these values to obtain the p-value
What is the p-value for the specific ratio of the variance between groups to variance within groups?
The probability of getting the calculated F ratio or a value more extreme
What should you do when you perform an ANOVA and report it’s outcome?
- Include the most important output of the ANOVA test
- Also common to include degrees of freedom for within and between groups, the F-value/F-statistic and the p-value