Week 5 - Sampling and random error & Week 6 - Statistical significance Flashcards

1
Q

What is a sample?

A

A sample is a selected subset of a source population
Ideally should be representative of source population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a source population?

A

The source population is the group of all
individuals in which we are interested to assess some parameter(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is sampling?

A

The process of selecting a number of
individuals from all individuals found in a source population
Many different sampling methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a sampling frame?

A

a list (or database) containing all
individuals in a population and is used for sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are sampling units?

A

The sampling units are the individuals to be potentially
selected.
Sampling units most of the time are individual
people, but we could also have larger sampling units (i.e.
families, streets, hospitals, schools, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Who can be part of the source population?

A

The source population can be the general population (i.e.
the total population of a country or city), but can also be a
specific sub-population (i.e. all smokers of a country, all
patients with heart disease, all children with cancer, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe what the sample should represent for each type of research.

A
  1. In descriptive research (i.e. when we want to investigate
    prevalence/incidence of a condition in a population), it is
    particularly important that the sample accurately
    represents the specific source population
  2. In analytic research (i.e. when we investigate association
    between exposure and outcome), we can be more general
    regarding the source population, depending on the
    research question of interest
  3. In situations where we investigate a biological effect on
    some disease (i.e. effect of smoking on risk of cancer), we
    can be more general in identifying the source population
    (i.e. not necessarily restricted to specific country/region)
    4.In situations where we investigate social/cultural effects
    (i.e. effect of social class on risk of heart disease), we have
    to more careful and restrict the source population to the
    specific country/region from where the sample was derived
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an estimate?

A

In order to determine the proportion of a characteristic in a
population, we usually measure
that in a sample
Therefore what we measure is an ESTIMATE. This estimate
carries an inherent error (sampling error)
The sample estimate attempts to quantify the
corresponding population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is statistical inference?

A

*When the sample estimate is used to draw conclusions
(inferences) about the population from which the sample
was taken, this is called STATISTICAL INFERENCE
*Statistical inference, as the name suggests, involves the use
of statistics to determine the degree of uncertainty in the
estimate of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a parameter?

A
  • A parameter is a measurement of a quantity (or association)
    in a population, which we are interested about, e.g:
  • mean age
  • prevalence of obesity
  • mean difference in blood pressure between men and women
  • Odds Ratio for association between smoking and cancer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Population parameter and sample estimate for any given variable is?

A

Sample estimate mean = 3.75
Population parameter = 3.72

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is sampling variation?

A

The difference (variation) between different sample
estimates derived from the same source population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is sampling error?

A

The difference in magnitude between the sample estimates
and the actual population parameter caused by measuring a
quantity (or association) in a sample rather than in the
source population
Also called “random error”, because it depends on chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What happens when you decrease sample size?

A
  1. Sampling variation = increase
  2. sampling error = increase
    NB! All principles covered thus far apply for all measures of association (incidence, risk ratio, rate ratio, mean diff, correlation coefficient, regression coefficient) and all are termed ‘estimates’ calculated from ‘sample’.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is sampling distribution?

A

All the samples calculated plotted on a histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sampling distribution for very small sample size

A
17
Q

Sampling distribution for larger sample size

A
18
Q

Sampling distribution for very large sample size

A
19
Q

What is standard error?

A

The standard error describes the uncertainty of how well
the sample estimate represents the population parameter
It essentially estimates the standard deviation of the
sampling distribution, i.e. the average error that can occur
whenever we take a sample of a certain size n

20
Q

What is the standard error formula?

A

Standard error can be estimated from a single (!) sample

21
Q

What is the 95% Confidence Interval?

A
  • Confidence intervals indicate a range (interval) within which
    we are confident (with some degree of uncertainty) that the
    true population parameter lies
  • the 95% Confidence Interval (95% CI) for a sample estimate is
    calculated as:
  • Lower confidence interval
    sample estimate – 1.96*standard error
  • Upper confidence interval
    sample estimate + 1.96*standard error
  • Interpretation: We are 95% confident that
    the population parameter is contained within the interval
    sample estimate +/- 1.96 SE
22
Q

What are the two things we assess we assess associations?

A

The presence of
an association and the magnitude of this association

23
Q

What are the two things we assess we assess associations?

A

The presence of
an association and the magnitude of this association

24
Q

What are the 2 possibilities for any given association?

A
  1. The association does not exist in the population
    (i.e. the two variables are not linked)
  2. The association exists in the population (the two
    variables are linked)
25
Q

What are the two types of associations called?

A
  • The Null hypothesis (H0
    ) always states that there is no
    association between the two variables in the population
  • The Alternative hypothesis (HA
    ) always states that there
    is an association between the two variables in the
    population
26
Q

Explain the formal process of hypothesis testing.

A
  1. Define statistical null ( H0
    ) and alternative hypotheses (HA
    )
  2. Start by assuming NO association exists in population➔ i.e. start
    with H0
  3. Define what is sufficient evidence against H0
    : the significance
    level
  4. Collect some sample data from population (evidence)
  5. Does sample estimate provide sufficient evidence against H0
    (i.e. no association)?
    * Or alternatively could sample estimate be explained by random
    error alone, i.e. consistent with expected sampling variation if no
    association exists in the population
  6. Calculate value of test statistic (using sample)
  7. Using test statistic derive probability that quantifies our belief
    against H0
    : p-value
  8. Interpret p-value: often in the context of the significance level
27
Q

What is the p-value?

A
  • What is the probability of obtaining an association as strong (or
    stronger) as the one observed in our sample, if in fact there is no
    association present in the source population (i.e. H0
    is true)
    -The lower the p-value , the lesser the chance we could have
    obtained an association this strong (or stronger) in our sample if no
    true association existed (in the population)
    -Thus , the lower the p-value, the more we think about rejecting H0
    (no association exists in population) in favour of HA
    (association exists
    in population)
  • Generally, it is true that the stronger the association ( and the
    larger the value of the test statistic), the lower the p-value
28
Q

What’s the relationship between p-value and association?

A

Inversely proportional.

29
Q

What is the significance level?

A

It is the binary cut-off to say what is sufficient evidence and how low the p-value should be.
* Often a significance level of 5% is chosen and therefore a
p-value of <0.05 is used to infer statistical significance
* An estimate with a p-value of <0.05 is deemed statistically
significant

30
Q

Rejecting or not the Null Hypothesis based on p-value

A
  • The p-value is used as evidence for rejecting or not rejecting
    the Null hypothesis H0
    in favour of the alternative HA
  • If the p-value is <0.05 (or whatever the chosen significance
    level was), we reject H0
    (no association in the population)
  • If the p-value is ≥0.05 (or whatever the chosen significance
    level was), we cannot reject H0
    (no association in the
    population)
31
Q

IMPORTANT TO REMEMBER

A
  • In hypothesis testing …
  • We either have, or do not have, enough evidence to “reject H0
  • Can only either “reject H0 “ or “fail to reject H0
  • We cannot confirm whether HA or H0 are true
32
Q

What can we expect to happen to the p-value if we assume association?

A

Generally, if we assume the presence of an association in
the source population:
o large sample sizes will give smaller p-values
o estimates of large magnitude will also give
smaller p-values

33
Q

What 2 factors affect the p-value?

A

(Just like with 95% CI)
1. Sample size
2. Magnitude of association

34
Q

What two ways can be used to reject or not the Null hypotheses?

A
  1. p-value
  2. 95% Confidence Interval
35
Q

How can 95% CI be used to reject or not the Null Hypothesis?

A

*Mean difference: If the 95% CIs include 0 then H0
cannot be rejected. This is because 0 (meaning no
difference between the two means) is a likely value
in the source population.
* Regression coefficient and correlation coefficient: If
the 95% CIs include 0 then H0 cannot be rejected.
This is because 0 (meaning no correlation between
the two variables and a slope of 0) is a likely value
in the source population
* Odds Ratio/Risk Ratio/Rate Ratio: If the 95% CIs
include 1 then H0
cannot be rejected. This is
because 1 (meaning equal risks, rates or odds
between the two groups) is a likely value in the
source population

36
Q

Statistical Significance Summarised

A
  • In cases where the p-value of an estimate is ≥0.05 or
    when the 95% Confidence Intervals include 0 (mean
    difference, regression coefficient, correlation
    coefficient) or 1 (Odds Ratio/Risk Ratio/Rate Ratio),
    then the estimate is considered not statistically
    significant, thus the study finding is not conclusive.
  • In cases where the p-value of an estimate is <0.05 or
    when the 95% Confidence Intervals do not include 0
    (mean difference, regression coefficient, correlation
    coefficient) or 1 (Odds Ratio/Risk Ratio/Rate Ratio),
    then the estimate is considered statistically significant,
    thus the study finding is conclusive
37
Q

LOOK AT EXAMPLES

A

IN SLIDES