Multiple Testing: Flashcards

1
Q

What is a two-sample t-test?

A

A t-test only comparing two samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a one sample t-test?

A

When we compare a group to a specific mean or group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name the function that would be used to compare the treatment and the control in a drug trial, using a two sample t-test:

A

t. test(output~treatment, data= )
- Where output is the treatment
- Where data is the data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name the function used to compare the output to the mean in a one sample t-test:

A

t. test(mu= , data)
- Where mu is the population mean

  • Where data is the data frame name
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you perform a one or two-tailed t-test in R?

A

By adding the argument “alternative=greater/less” into line of code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name the function used to compare the treatment and the control in a one-tailed two sample t-test:

A

t. test(output~treatment, alternative = “less”, data)

- Where data is the data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a paired sample t-test also known as?

A

A dependent sample t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is a paired sample t-test used?

A

When samples are closely related to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an example of samples being closely related?

A

Measuring the same sample or patient twice before and after a certain treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What kind of t-test would you use when comparing cells treated with a drug versus with the control?

A

t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What kind of t-test would you use when comparing students’ grades in a BMB module of the different academic years?

A

t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What t-test would you use when comparing students’ grades before and after tutoring?

A

Paired t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What t-test would you use when comparing heights of males in Denmark versus in Tailand?

A

t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What kind of t-test would you use when measuring patients’ blood pressure before and after taking a new drug?

A

Paired t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What t-test would you use to compare the times runners take to finish a marathon before and after nutritional changes?

A

Paired t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you carry out a paired t-test in R?

A

By adding the argument “paired = True/T” to the t.test() function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

List the function that would be used to compare the treatment and the control in a two tailed, two sample paired t-test:

A

t. test(output~treatment, paired = T, data= )

- Where data is the dataframe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the student t-test?

A

A hypothesis test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why is it important to check the assumptions of a hypothesis test?

A
  • They are based on certain assumptions about the data
  • If data doesn’t fit with assumptions, the probability calculations underlying the test are likely to be incorrect
  • This increases the chances of a false negative or false positive result
  • This can have bad consequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

List the assumptions of the t-test:

A
  • Dependent variable must be continuous and the independent variable must be bivariate
  • Population (not sample) must be normally distributed
  • The data of the two populations from which sample is taken must have equal variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does it mean that the dependent variable must be continuous?

A

The dependent variable is the outcome, which needs to be continuous (able to take on any value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does it mean that the independent variable must be bivariate?

A
  • The t-test can only compare two groups
  • So there can only be two levels for the dependent variable
  • Underlying data could have more than 2 levels but only two at a time is analysed with the t-test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does it indicate if a sample is normal distributed?

A

That the population is also normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does a normal distribution look like?

A
  • Most values are clustered around the mean

- The tails on either side are fairly symmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the advantage of a normal quantile-quantile plot (Q-Q plot) over a histogram?

A

It gives a clearer indication of a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the disadvantage of a normal quantile-quantile plot (Q-Q plot) over a histogram?

A

It is slightly more complicated than a histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How is a quantile-quantile plot made and how does it give a clearer indication of a normal distribution?

A
  • It compares the quantiles of the data (sample) with the theoretical quantiles from a normal distribution
  • A straight line indicates a normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are quantiles also known as?

A

Percentiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How can you check that the variance of two populations are the same in R?

A

By using the function to produce the summary statistics for both populations and compare the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does the describeBy function do?

A

Summarise the data and compare the standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Where can the describeBy function be found?

A

In the psych package

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What does it mean that R uses the Welch’s t-test by default?

A

It doesn’t assume that the variance is equal

33
Q

How do specify that the variance is both populations are true in R when completing a student t-test? What effect does this have?

A
  • Using the argument var.equal = TRUE
  • This increases the power of the test a little bit
  • In most situations, there is little advantage to doing this
34
Q

What might cause an experiment to have several groups or samples?

A
  • Performing a positive and negative control

- Carried out different independent variables/carried out same variable at different concentrations

35
Q

How would you use the t-test to carry out comparisons in an experiment where there are many groups/samples?

A

Many t-tests would have to be conducted

36
Q

List the reasons why taking several t-tests is not efficient:

A
  • It takes a lot of effort and time

- The family wise error rate (FWER)

37
Q

?????What is best to check before performing several t-tests?

A

That there is a difference (in what?)

38
Q

What is the Family wise error rate (FWER)?

A

The probability of getting a false positive if the null hypothesis is true across a group of tests

39
Q

What is the “family” in the family wise error rate?

A

Usually a group of tests on the same data set or the number of tests

40
Q

What does it mean that the significance level is 5%? How does falsely rejecting the null hypothesis link to this?

A
  • We strongly reject the null hypothesis five times out of a hundred based only on variation in samples
  • Every 20th comparison, we are likely to reach the 5% significance level and falsely reject the null hypothesis
41
Q

Multiple testing increases the probability of what type of error?

A

Type 1 error (false positive)

42
Q

If the probability of getting a type 1 error increases during multiple testing, what does this mean about alpha?

A

The type 1 error rate is no longer equal to alpha but increases with the number of tests

43
Q

What is the probability of getting a false positive in a single test?

A

Alpha

44
Q

What is the probability of not getting a false positive in one test?

A

1-alpha

45
Q

What is the probability of not getting a false positive in m number of tests?

A

(1-alpha)^m

46
Q

What is the probability of getting at least one false positive in m number of tests?

A

1- (1-alpha)^m

47
Q

What pattern do you see when you plot the probability of getting at least 1 false positive to the number of trials?

A

The probability increases with the number of tests performed

48
Q

What test can be used to compare several means/samples together?

A

F-test

49
Q

What is the F-test based on?

A

The comparison of the variance within the samples with the variance between the samples

50
Q

What is comparison between the variance within and between samples called?

A

The analysis of variance (ANOVA)

51
Q

What question does the F-test ask?

A

Are all the values that have been measured in our samples are from the same population, or is at least one group from a different population?

52
Q

Three samples are taken, each with a different mean measured. Each sample mean lies within one standard deviation of a particular population’s mean. What does this indicate?

A

That all free samples were of the same population

53
Q

Three samples are taken, each with a different mean measured. Only two of the sample mean lies within one standard deviation of a particular population’s mean. What does this indicate?

A

The third sample is very unlikely to be measured from the same population as the other two samples

54
Q

How is variance calculated?

A
  • Squaring standard deviation
  • Taking sum of squared difference between each observation and the sample mean then dividing it by degree of freedom
  • Mean of the squares minus square of the mean
55
Q

What is the degree of freedom?

A

The number of observations minus 1

56
Q

What does the variance of a sample show you?

A

The dispersion of the sample (how spread out it is)

57
Q

Why can both variance or standard deviation be used in ANOVA?

A

Both variance and standard deviation show the dispersion of data and are very closely linked

58
Q

How do you work out the overall mean of several samples?

A
  • Calculate the mean of all samples
  • And up the values of the mean of every sample
  • Divide by the total number of samples
59
Q

What is SSw?

A

The sum of squared differences within the groups

60
Q

How do you work out SSw?

A
  • Work out the difference between observations made and the mean for each sample
  • Square each individual value
  • Add up each individual value
61
Q

What is SSB?

A

The squared differences between the mean of each group and overall mean

62
Q

What is the weighting factor?

A

The number of observations

63
Q

How does the number of observations act as the weighting factor?

A

It accounts for samples with a different number of replicates

64
Q

Why is the degrees of freedom considered to be n-1?

A
  • If we know the sample mean, we can work out the missing value if only one data point from the sample is missing
  • This means we don’t need need to know all the data points-it is enough to know one less than the total number of observations
  • This means that the last value is set and not free to be any value
  • This concept is called the degrees of freedom

-

65
Q

What does N denote?

A

The total number of observations (sum of all n of each group)

66
Q

What does n denote?

A

The number of observations in each group

67
Q

What does K (or m) denote?

A

The number of groups or samples

68
Q

When we know the sample mean and overall mean, what happens to the degrees of freedom?

A

The degrees of freedom change for the variance between groups and within groups

69
Q

The degrees of freedom between the groups is equal to what?

A

The degrees of freedom when calculating the standard deviation for a sample with n observations

70
Q

Why is the degree of freedom between groups equal to K-1?

A

Because the overall mean was calculated from other sample means

71
Q

Why is the degrees of freedom within a group N-K?

A
  • All observations and sample means from each group is used
  • Therefore degrees of freedom is defined by the total number of observations from all samples (N) minus the number of sample means (equal to number of groups-K)
72
Q

What is the F-value of statisitics (generated from the F-test)?

A

The ratio between the variance between and within a group

73
Q

What is the equation for the F-value?

A

Variance within groups

74
Q

What is the output of ANOVA?

A

The F-statitsic

75
Q

Why are the degrees of freedom so important for the output pf ANOVA?

A
  • We need to find the p-value for the specific ratio of the variance between groups to the variance within groups to analyse the output of ANOVA
  • The F-distribution is strongly dependent on the degrees of freedom of the two variances (between and within groups)
76
Q

How does R make finding the degrees of freedom easier?

A

It automatically calculates the degrees of freedom and uses these values to obtain the p-value

77
Q

What is the p-value for the specific ratio of the variance between groups to variance within groups?

A

The probability of getting the calculated F ratio or a value more extreme

78
Q

What should you do when you perform an ANOVA and report it’s outcome?

A
  • Include the most important output of the ANOVA test

- Also common to include degrees of freedom for within and between groups, the F-value/F-statistic and the p-value