Exam 2 Flashcards

1
Q

What is hypothesis testing?

A

compares data to what we would expect to see if a specific null hypothesis were true. If the data are too unusual, compared to what we expect to see if the null hypothesis were true, then the null hypothesis is rejected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a null hypothesis?

A
  • a specific statement about a population parameter made for the purposes of argument
  • a statement that would be interesting to reject
  • “default” hypothesis that has an interest of zero
  • No effect, no preference, no correlation, no difference
  • H0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an alternative hypothesis?

A
  • The alternative hypothesis includes all other feasible values for the population parameter besides the value stated in the null hypothesis
  • Includes possibilities that are biologically interesting
  • Eg there is an effect, preference, correlation, difference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the language for hypothesis testing?

A
  • The H0 is what is being tested
  • If the data are consistent with H0 then you fail to reject it
  • If the data are inconsistent with H0 then you reject it
  • Rule out the null hypothesis
  • You do not “prove” the HA, you can only “reject” or “fail to reject” the H0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the test statistic and give an example?

A
  • The test statistic is a number calculated from the data that is used to evaluate how compatible the data are with the result expected under the null hypothesis
  • In a study 18 toads were samples and 14 were observed to be right handed
  • In this case the test statistic is 14 (or Pr = 14/18 = 0.7778)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the null distribution?

A

the sampling distribution of outcomes for a test statistic under the assumption that the null hypothesis is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a p-value?

A
  • the probability of obtaining the data or data showing as great or greater difference from the null hypothesis) given that the null hypothesis were true
  • Sum probabilities of getting values as extreme as 14
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a normal p-value in biology?

A

In many areas of biology a P-value < 0.05 is small enough to reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is significance level? How is it used?

A
  • alpha
  • a probability used as a criterion for rejecting the null hypothesis
  • If the P-value is less than or equal to alpha then the null hypothesis is rejected
  • If the p-value is greater than alpha then the null hypothesis is not rejected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a type 1 error?

A

Type 1 error is rejecting a true null hypothesis. The significance level alpha sets the probability of committing a type 1 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a type 2 error?

A

failing to reject a false null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the power of a hypothesis test?

A

The power of a test is the probability that a random sample will lead to rejection of a false null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the typical significance level (alpha level)?

A

Typically the alpha level is 0.05 (five percent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you interpret a non-significant result from a hypothesis test?

A
  • 94 % chance that we get our observed data given that the null hypothesis is true
  • Data are compatible or consistent with the null hypothesis
  • “Fail to reject the null hypothesis”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a 95% confidence interval?

A

95% confidence interval puts bounds on the most plausible population parameter based on your random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What two tests almost always give the same answer?

A
  • ## Almost always, the 95% confidence interval and a hypothesis test give the same answer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you use a 95% confidence interval to support or reject a null hypothesis?

A
  • If the 95% confidence interval includes the null (the test statistic) you say your data are consistent with the null hypothesis
  • If the 95% confidence interval doesn’t include the null then you say your data are inconsistent with the null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which is better: hypothesis testing or confidence interval?

A
  • Confidence interval has added benefit of giving actual magnitude
  • P-value give qualitative magnitude (smaller p-value means greater ability to reject the null)
  • Generally, a hypothesis test is used more often, but both are good approaches
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a proportion?

A
  • Proportion of observations in a given category
  • P = (num in category / n)
  • Ranges from zero to one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the binomial distribution?

A

The binomial distribution provides the probability distribution for the number of successes in a fixed number of independent trials when the probability of the success is the same in each trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you calculate the probability of (X) sucesses in a binomial distribution?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the sampling distribution of a proportion?

A
  • P is the “real” proportion of the population (parameter)
  • P(hat) is the estimated proportion from a sample (estimate/statistic)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the binomial test?

A

The binomial test uses data to test whether a population proportion (p) matches a null expectation (p0) for the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the hypotheses in a binomial test?

A
  • H0= the relative frequency of successes in the population is p0
  • HA= the relative frequency of successes in the population is not p0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How do you perform a binomial test?

A
  • Use binomial distribution formula to calculate probability of getting >= 10 successes in 25 samples with p = 0.061
  • Sum these probabilities and multiply by 2 for two sided test
  • P value is the probability of the observed data or more extreme data given that the null hypothesis is true
  • If P value is below alpha level – 0.05, then reject the null hypothesis
  • P = 2 x Pr[number of successes >= 10]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the standard error of a proportion?

A
  • Recall that 𝑝̂ is the sample estimate and p is the (true) population proportion
  • The standard error of a proportion tells you the precision (uncertainty) of the estimate (like how standard deviation tells you the precision of a mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How do you calculate the confidence interval of a proportion?

A
  • Textbook recommends Agresti-Coull
  • First calculate the intermediate value and then check the range
  • If the interval does not include the null proportion of 0.5 , data are inconsistent with the null
  • Can be confident that the population proportion of females is much higher than 0.5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is a goodness of fit test?

A

method of comparing an observed frequency distribution with the frequency distribution expected under a probability model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are two examples of goodness of fit tests?

A

Chi squared and binomial test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Why is the binomial test limited?

A

the data must fit into two mutually exclusive outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the X^2 Goodness-of-fit test?

A
  • This test compare frequency data to a probability model state by the null
  • More general than the binomial test because it can handle more than two categories
  • Calculations also are easier
32
Q

What does the chi squared statistic measure?

A
  • The x^2 test statistic measures the discrepancy between observed frequencies from the data and the expected frequencies from the null hypothesis
  • the larger the discrepancy the larger the chi squared statistic
33
Q

What is the chi squared goodness of fit test equation?

A
34
Q

How do you know which chi squared statistic value to use for a problem?

A
  • there is a sampling distribution curve for chi-squared values
  • the number of degrees of freedom of a x^2 statistic specifies which distribution to use as the null hypothesis
35
Q

How do you determine if your chi squared value is significant?

A

1:get exact p=value from computer software
2: get the critical value for the test using a table

36
Q

What is a critical value?

A
  • A critical value is the value of a test statistic that marks the boundary of a specific area in the tail or tails of the sampling distribution under H0
  • Chi squared test has a table of values where the p value can be obtained from
37
Q

What are the requirements for a chi-squared test?

A
  • None of the categories should have an expected frequency less than one
  • No more than 20% of the categories should have an expected frequency less than five
38
Q

Which test should be performed when there are two categories?

A
  • If you’re doing it by hand the chi squared test is much easier math
  • If you’re using a computer the binomial test is recommended because the P-value will be more exact
39
Q

What is a contingency analysis?

A
  • Contingency analysis estimates and tests for an association between two or more categorical variables
  • Determines the extent to which one variable is contingent on the other
  • In a contingency analysis the null hypothesis is that there is no contingency between x and y
40
Q

What is a 2x2 contingency table?

A
  • Two categorical variables, each with two groups
  • In a contingency analysis the null hypothesis is that there is no contingency between x and y
  • Relative proportion is the same
41
Q

What are three different contingency tests?

A
  • Relative risk
  • Odds ratio
  • X^2 contingency test
42
Q

What is relative risk?

A

Relative risk is the probability of an undesired in the treatment group divided by the probability of the same outcome in a control group

43
Q

What is the equation for relative risk?

A
44
Q

What is relative risk if the variables are not contingent?

A

1

45
Q

What is the convention for relative risk in medical studies?

A
  • Calculate the odds of the outcome “diseased” or “died”
46
Q

What two values can be calculated to describe the uncertainty of a Relative Risk or Odds Ratio? What are they each measuring?

A
47
Q

What are possible conclusions based on relative risk or an odds ratio?

A
  • Potential outcomes
    1: risk in two groups are equal
    <1: risk in group 2 (denominator, placebo/control group) more likely
    >1: risk in group 1 (numerator, treatment) more likely
  • Range includes 1 (no different in risk of getting cancer between aspirin and placebo groups)
  • Data are consistent with a small beneficial effect a small deleterious effect or no effect at all
48
Q

What is the odds of success?

A
  • The odds of success are the probability of success divided by the probability of failure
  • Success is typically the bad outcome - the odds of the bad outcome happening
49
Q

What is the odds ratio?

A
  • The odds ratio is the odds of success in one group divided by the odds of success in a second group
  • The convention in medical studies is to make the control/placebo group #2 (the denominator)
50
Q

What is the odds ratio calculation shortcut?

A

OR = ad / bc

51
Q

How do you set up a contingency table in a medical study?

A
52
Q

Correct definition of a p-value

A

probability of a result at least as extreme as the result obtained if the null is true

53
Q

What is the null distribution in a hypothesis test?

A

In a hypothesis test, the null distribution is a distribution of test statistic and under the assumption that the null hypothesis is true .

54
Q

It would be useful to provide a confidence interval for the proportion of participants improving because it allows you to _____.

A

put bounds on the most plausible values for the true proportion of participants improving

55
Q

What can you conclude from hypothesis testing?

A

In hypothesis testing you can only reject (P-value ≤ alpha) or fail to reject (P-value > alpha) a specific null hypothesis.

56
Q

How do you find the standard deviation of a sampling distribution of a proportion? The sampling distribution of p-hat has a mean of 0.40 and a standard deviation of _____.

A
57
Q

What is the conclusion (words) from an odds ratio confidence interval?

A

Data are consistent with a small beneficial effect, a small deleterious effect or no effect at all

58
Q

When is it appropriate to use odds ratio vs relative risk?

A

Relative risk – more intuitive (ratio of two proportions)
Odds Ratio – can be applied to data from case-control studies

59
Q

What is a case control study?

A
  • a type of observational study in which a sample of individuals with a focal condition (cases) is compared to a sample of subjects lacking the condition (controls)
  • Sample sizes set by the experimenter, thus the proportion of individuals with/without condition is not proportional to the population parameters
60
Q

What is the chi squared contingency test?

A
  • The chi squared contingency test is the most commonly used test of association between two categorical variables
  • Special case of the chi squared goodness-of fit test where null model is independence of variables
61
Q

What are the assumptions for the chi squared contingency test?

A
  • Same assumptions as chi squared goodness of fit test
  • No cells have expected frequency less than one
  • No more than 20% of cells have expected frequency less than five
62
Q

What are two ways to calculate the expected frequencies in a chi-squared contingency test?

A
  • To get the expected frequencies use the probability rule: Pr[uninfected and eaten] = Pr [uninfected #/total sample#] x Pr[eaten #/total sample#] = A
  • A x total sample # = expected frequency
  • Expected frequencies shortcut calculation: Expected[ri, cj] = (row i total)(column j total) / (grand total)
63
Q

How do you find the dof in a chi-squared contingency test?

A

df = (r - 1)(c - 1)

64
Q

What is the equation to calculate vaccine efficacy? What is this also called?

A
  • Vaccine efficacy measures the proportionate reduction in cases among vaccinated persons
  • Also called vaccine effectiveness or relative risk reduction
  • 1 – RR = calculation
65
Q

What is the normal distribution?

A

a continuous probability distribution describing a bell-shaped curve. It is a good approximation to the frequency distributions of many biological variables

66
Q

What are the two parameters for the normal distribution?

A

mean and standard deviation

67
Q

What are the properties of a normal distribution?

A
  • Continuous distribution, so probability is measured as area under the curve
  • It is symmetrical around its mean (bell-shaped)
  • Single mode
  • Probability density is highest exactly at the mean
68
Q

Describe the symmetry of normal distributions.

A

For a variable with a normal distribution, about two thirds of samples are within one standard deviation and about 95% of samples are within two standard deviations

69
Q

What is the standard normal distribution?

A
  • The standard normal distribution is a normal distribution with a mean of zero and a standard deviation of one
  • Indicated by symbol Z
  • Probabilities for any value of Z can be obtained by a computer function or a statistical table
70
Q

What is z?

A

A standard normal deviate or Z tells us how many standard deviations a particular value is from the mean
Z = (Y - u)/o

71
Q

What is sampling distribution?

A

the probability of all values for an estimate that we might obtain when we sample a population

72
Q

If a variable has a normal distribution in the population, what can we assume about the distribution of sample means?

A

it is also normal

73
Q

What is the standard deviation of a sample distribution?

A

the standard error

74
Q

How do you calculate the probability of a sample mean? Birth weight dataset: Suppose a random sample of 80 babies produces a mean of 3370. What is the probability of getting a mean of 3370 or greater?

A
75
Q

What is the central limit theorem?

A
  • According to the central limit theorem, the sum or mean of a large number of measurements randomly samples from a non-normal distribution is approximately normal
  • Central limit theorem allows the use of statistical tests even if the distribution of the samples population parameter is not normal
76
Q

True or False: Reducing the value of the significance level from 0.05 to 0.01 lowers the probability of committing a Type I error.

A

True

77
Q

there are two categorical variables in the dataset, and both have >2 potential outcomes. What test would be appropriate to run?

A

chi-square contingency test