Ritchie Lectures 3, 4, 5 Flashcards

1
Q

Statistical significance implies that a result is biologically significant.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When testing significance…

a. Always expect the alternative hypothesis to be true
b. Only use Z statistics for published data
c. You are measuring differences between your data and what is expected under the null hypothesis
d. Ensure that the sample size does not exceed N>20 because the SEM will be too large

A

c. You are measuring differences between your data and what is expected under the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In terms of experimental hypotheses…

a. The null hypothesis only applies to data that forms a Poisson distribution
b. The alternative hypothesis states that observed differences reflect chance variation
c. Both the null and alternative hypotheses must be defined to test significance
d. The null hypothesis states that observed differences are real

A

c. Both the null and alternative hypotheses must be defined to test significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What would be considered a large sample size?

a. N ≥ 3
b. N ≥ 20
c. N ≥ 10
d. N ≥ 5

A

b. N ≥ 20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The study design can be readily checked by running a test of significance.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A student’s T test requires small degrees of freedom.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The P value does not depend on sample size

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The central limit theorem only applies to random samples

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When testing significance, random sampling is assumed

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Mean and Standard Deviation of the following data set: 5, 4, 8, 3.

a. Mean= 5, SD= 2
b. Mean= 20, SD= 3.46
c. Mean= 4.5, SD= 2
d. Mean= 5, SD= 1.87

A

d. Mean= 5, SD= 1.87

Mean = (5+4+8+3) / 4 = 5
SD = √[((5-5)2 + (4-5)2 + (8-5)2 + (3-5)2) / 4]
      = √[(0 + 1 + 9 + 4) / 4]
      = √[14 / 4]
      = √[3.5]
      = 1.87
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is Standard Error calculated?

a. √SD
b. √SD – 2 X √N
c. SD / √N
d. √(SD + N)

A

c. SD / √N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Standard Error of the following data set: 5, 4, 8, 3.

a. 0.94
b. 5
c. 0.05
d. 2

A

a. 0.94

Mean = 5
SD = 1.87
N = 4 (4 samples)
SEM = SD / √N
         = 1.87/√4
         = 0.94
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In practice, most sampling is WITH replacement.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is INCORRECT about the central limit theorem?

a. The sample from a population must follow a binomial distribution for the probability histogram to follow a normal curve
b. N must be relatively small
c. The probability histogram must be put into standard units
d. The area of a probability histogram = “chances”

A

b. N must be relatively small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In regards to confidence intervals:

a. Intervals are random and a result of sampling
b. Intervals always contain the unknown population mean
c. Intervals are independent of sampling
d. Intervals are only relevant in epidemiological research where N = 1

A

a. Intervals are random and a result of sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a test statistic measure?

a. The Standard Error of a small population sample
b. The likelihood that the null hypothesis and alternative hypothesis are both wrong
c. The difference between the data and what is expected of the null hypothesis
d. The likelihood that the alternative hypothesis does not correlate with z

A

c. The difference between the data and what is expected of the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the test statistic (z) given the following information: Mean = 4, Result = 5.2, SE = 0.4

a. 1.0
b. 3.0
c. 4.0
d. 5.2

A

b. 3.0

Test statistic = (observed quantity – expected value) / SE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When calculating the P value, the basis is that the null hypothesis is right.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the correct way to calculate the SE of a difference of means from two independent samples? Answer using the following data set as an example: Sample 1 SE, 0.59. Sample 2 SE, 1.02.

a. SE (difference or sum) = [(1.02) + (0.59)]^2 = 2.59
b. SE (difference or sum) = √(1.02)^2 + √ (0.59)^2 ] = 1.61
c. SE (difference or sum) = √[ (1.02)^2 + (0.59)^2 ] = 1.18
d. SE (difference or sum) = [(1.02) - (0.59)]^2 = 0.18

A

c. SE (difference or sum) = √[ (1.02)^2 + (0.59)^2 ] = 1.18

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A student’s T test…

a. Is the opposite of a z statistic (an expected value over an SE)
b. Only shows clear differences when the degrees of freedom is large
c. Cannot be used on data that follows a normal curve
d. Recognises that the SD has to be estimated from the sample

A

d. Recognises that the SD has to be estimated from the sample

21
Q

What is FALSE about significant results?

a. When presenting results, the researcher does not need to specify which statistical test was used if they use a T test
b. P values of a T test depend on sample size
c. A box model is needed to make sense out of a test of significance
d. Tests of significance assume random sampling

A

a. When presenting results, the researcher does not need to specify which statistical test was used if they use a T test

22
Q

A p value is the probability under a specified statistical model that a statistical summary of the data would be equal to or more extreme than its observed value

A

True

23
Q

What is the 95% confidence interval for a population mean based on a random sample?

a. (2 x SE) + √N and (2 x SE) - √N
b. M – 2SE and M + 2SE
c. M +/- 2SD
d. N +/- (M – 2SE)

A

b. M – 2SE and M + 2SE

24
Q

In terms of proportions…

a. The most variability is seen at the ends of a distribution (99%, 1%)
b. The data is continuous and therefore forms a normal curve
c. Data can be transformed using y = arcsin(√p)
d. There is little variability in the samples that fall in the range of 30-60%

A

c. Data can be transformed using y = arcsin(√p)

25
Q

The null hypothesis states that there is no difference between the control and experimental group.

A

True

26
Q

Biological data often produces a normal distribution.

A

False

27
Q

Indicate which of the following statements about P values are true and which are false:
1. P values does not indicate how compatible the data are with a specified statistical model

  1. P values do not measure the probability that the studied hypothesis is true or the probability that the data were produced by random chance alone
  2. Scientific conclusion and business or policy decisions should always be based on whether a P value passes a specific threshold
  3. Proper inference requires full reporting and transparency
  4. A p value or statistical significance measures the size of an effect and the importance of a result
  5. By itself, a p-value is a good measure of evidence regarding a model or hypothesis
A
  1. P values does not indicate how compatible the data are with a specified statistical model
    False, it does
  2. P values do not measure the probability that the studied hypothesis is true or the probability that the data were produced by random chance alone
    True
  3. Scientific conclusion and business or policy decisions should always be based on whether a P value passes a specific threshold
    False
  4. Proper inference requires full reporting and transparency
    True
  5. A p value or statistical significance measures the size of an effect and the importance of a result
    False
  6. By itself, a p-value is a good measure of evidence regarding a model or hypothesis
    False
28
Q

What would be considered a null hypothesis?

a. There is a larger variation of heights in Group A than Group B
b. Group A subjects are taller than Group B
c. There is no difference in height between Group A and Group B
d. Group B subjects are taller than those in Group A when p < 0.05

A

c. There is no difference in height between Group A and Group B

29
Q

When should I use a two-sided test?

a. When the alternative hypothesis is that mean Group A > mean Group B
b. When the alternative hypothesis is that mean Group A = mean Group B
c. When the alternative hypothesis = the null hypothesis
d. When the alternative hypothesis is that mean Group A ≠ mean Group B

A

d. When the alternative hypothesis is that mean Group A ≠ mean Group B

30
Q

When would be the best situation to use a pooled t test?

a. When the SDs of all groups are variable and are quite different
b. When the degrees of freedom is high
c. After transforming the measurement scale so that SDs are relatively equal
d. When the z statistic does not demonstrate an expected value over a SE

A

c. After transforming the measurement scale so that SDs are relatively equal

31
Q

What is NOT an explanation for why biological data rarely forms a normal curve?

a. Most measurements are only positive
b. Percentages are bound by 0 and 100
c. Many scientists are biased and inaccurate at counting
d. Counting involves whole numbers that are positive integers

A

c. Many scientists are biased and inaccurate at counting

32
Q

Which would have a normal distribution?

a. A count of the number of measles cases in Melbourne
b. Proportion of unimelb students with Irish ancestry
c. The height of students in a grade 3 class
d. Percentage of cell death after drug treatment
e. None of the above

A

e. None of the above

Height = continuous and positive
Percentage and proportion bound by 0-1

33
Q

When is it most suitable to treat continuous positive measurements as approximately normal?

a. When the SD is much smaller than the mean
b. When there are many outliers
c. When y=arcsin(√p) > 1
d. When data is skewed to the left

A

a. When the SD is much smaller than the mean

34
Q

Positive measurements almost always have a right skew distribution when the values vary over several orders of magnitude (long tail to right).

A

True

35
Q

What is INCORRECT about using log scales to transform continuous positive measurements?

a. Multiplicative changes are converted into additive changes
b. Additive changes are generally more relevant than relative (%) changes
c. Continuous positive measurements often more symmetrical on the log scale then original scale
d. If transforming multiple groups, the SD is often stabilised across the groups

A

b. Additive changes are generally more relevant than relative (%) changes

36
Q

• With continuous positive measurements in multiple groups, if the means vary over a wide range then the SD will increase with the mean (proportionate relationship).

A

True

37
Q

Why might I want to transform my data to a log scale?

a. I am working with discrete data with negative and positive values
b. I want to decrease the normality pattern in my data
c. I want to use a pooled t test on multiple groups with different SDs
d. It’s fun to do and I love stats4lyf

A

c. I want to use a pooled t test on multiple groups with different SDs

38
Q

In terms of transforming proportions…

a. Normality is not impacted
b. The output is unsuitable for a pooled t test
c. Variability shifts from the 50% region to the 25 and 75% regions
d. The transformation aids in stabilising variance

A

d. The transformation aids in stabilising variance

39
Q

What is correct about log transformations of continuous positive measurements?

a. Transformation destabilises SDs across multiple groups
b. Variability needs to be fairly consistent when looking at multiple groups
c. The output cannot be used for a pooled t test
d. It can only be completed on groups with identical SDs

A

b. Variability needs to be fairly consistent when looking at multiple groups

40
Q

What is NOT a principle that you can ensure is in place in statistical design?

a. Local control
b. Randomisation
c. External validity
d. Replication

A

c. External validity

41
Q

External validity…

a. Cannot be improved using replication
b. Helps to show if findings are relevant to the greater population
c. Does not need to be considered when N > 3
d. Will not be improved by local controls

A

b. Helps to show if findings are relevant to the greater population

42
Q

What is NOT a benefit of using replication in your study design?

a. It will increase precision
b. It allows for estimation
c. It does not require random sampling
d. It aids the external validity of your conclusions

A

c. It does not require random sampling

43
Q

Without replication, statistical inference cannot be performed.

A

True

44
Q

Technical replicates have more relevance in statistics than biological replicates.

A

False

45
Q

Replication is closely related to random sampling.

A

True

46
Q

What does NOT have any influence on replication?

a. The experimental unit used in the experiment
b. The question being investigated in the experiment
c. Experimental error
d. The value of SD/√N

A

d. The value of SD/√N

47
Q

Consider the following experiment: A scientist is investigating protein levels in blood and completes mass spec on 1 control sample and 2 experimental samples. Because the process is time consuming, he always runs the control samples on day 1, experimental group A on day 2 and experimental group B on day 3. Which statement about his experimental design is correct?

a. The study could be improved by implementing randomisation principles and randomly assigning the day of mass spec to each sample each time the experiment is repeated
b. The study is a good example of abiding to replication principles because there are two experimental groups
c. Because of the high quality data obtained by mass spec, the findings of the experiment are likely to have a high degree of external validity
d. The experiment is well balanced and would be able to be statistically analysed because the three groups are examined over a period of three days

A

a. The study could be improved by implementing randomisation principles and randomly assigning the day of mass spec to each sample each time the experiment is repeated

48
Q

What is NOT a key influence of external validity?

a. Random sampling of experimental units from an appropriate population
b. Experimental conditions such as subjects, reagents, equipment and facilities
c. Conducting many experiments in a range of different conditions
d. Carefully assigning subjects to experimental or control groups based on specific criteria

A

d. Carefully assigning subjects to experimental or control groups based on specific criteria

This must be random

49
Q

What is the purpose of using a power analysis in experimental design?

a. To be able to repeat the experiment in different conditions to gauge which leads to the greatest P value
b. To determine how much replication (N value) is required in an experiment
c. To use statistics to reject an alternative hypothesis and accept the null hypothesis
d. So normalise the SDs across multiple sample groups

A

b. To determine how much replication (N value) is required in an experiment