Week 5 - Sampling and random error & Week 6 - Statistical significance Flashcards

Question 1

Q

What is a sample?

Answer

A

A sample is a selected subset of a source population
Ideally should be representative of source population

Question 2

Q

What is a source population?

Answer

A

The source population is the group of all
individuals in which we are interested to assess some parameter(s)

Question 3

Q

What is sampling?

Answer

A

The process of selecting a number of
individuals from all individuals found in a source population
Many different sampling methods.

Question 4

Q

What is a sampling frame?

Answer

A

a list (or database) containing all
individuals in a population and is used for sampling

Question 5

Q

What are sampling units?

Answer

A

The sampling units are the individuals to be potentially
selected.
Sampling units most of the time are individual
people, but we could also have larger sampling units (i.e.
families, streets, hospitals, schools, etc.)

Question 6

Q

Who can be part of the source population?

Answer

A

The source population can be the general population (i.e.
the total population of a country or city), but can also be a
specific sub-population (i.e. all smokers of a country, all
patients with heart disease, all children with cancer, etc.)

Question 7

Q

Describe what the sample should represent for each type of research.

Answer

A

In descriptive research (i.e. when we want to investigate
prevalence/incidence of a condition in a population), it is
particularly important that the sample accurately
represents the specific source population
In analytic research (i.e. when we investigate association
between exposure and outcome), we can be more general
regarding the source population, depending on the
research question of interest
In situations where we investigate a biological effect on
some disease (i.e. effect of smoking on risk of cancer), we
can be more general in identifying the source population
(i.e. not necessarily restricted to specific country/region)
4.In situations where we investigate social/cultural effects
(i.e. effect of social class on risk of heart disease), we have
to more careful and restrict the source population to the
specific country/region from where the sample was derived

Question 8

Q

What is an estimate?

Answer

A

In order to determine the proportion of a characteristic in a
population, we usually measure
that in a sample
Therefore what we measure is an ESTIMATE. This estimate
carries an inherent error (sampling error)
The sample estimate attempts to quantify the
corresponding population parameter

Question 9

Q

What is statistical inference?

Answer

A

*When the sample estimate is used to draw conclusions
(inferences) about the population from which the sample
was taken, this is called STATISTICAL INFERENCE
*Statistical inference, as the name suggests, involves the use
of statistics to determine the degree of uncertainty in the
estimate of interest

Question 10

Q

What is a parameter?

Answer

A

A parameter is a measurement of a quantity (or association)
in a population, which we are interested about, e.g:
mean age
prevalence of obesity
mean difference in blood pressure between men and women
Odds Ratio for association between smoking and cancer

Question 11

Q

Population parameter and sample estimate for any given variable is?

Answer

A

Sample estimate mean = 3.75
Population parameter = 3.72

Question 12

Q

What is sampling variation?

Answer

A

The difference (variation) between different sample
estimates derived from the same source population

Question 13

Q

What is sampling error?

Answer

A

The difference in magnitude between the sample estimates
and the actual population parameter caused by measuring a
quantity (or association) in a sample rather than in the
source population
Also called “random error”, because it depends on chance

Question 14

Q

What happens when you decrease sample size?

Answer

A

Sampling variation = increase
sampling error = increase
NB! All principles covered thus far apply for all measures of association (incidence, risk ratio, rate ratio, mean diff, correlation coefficient, regression coefficient) and all are termed ‘estimates’ calculated from ‘sample’.

Question 15

Q

What is sampling distribution?

Answer

A

All the samples calculated plotted on a histogram

Question 16

Q

Sampling distribution for very small sample size

Question 17

Q

Sampling distribution for larger sample size

Question 18

Q

Sampling distribution for very large sample size

Question 19

Q

What is standard error?

Answer

A

The standard error describes the uncertainty of how well
the sample estimate represents the population parameter
It essentially estimates the standard deviation of the
sampling distribution, i.e. the average error that can occur
whenever we take a sample of a certain size n

Question 20

Q

What is the standard error formula?

Answer

A

Standard error can be estimated from a single (!) sample

Question 21

Q

What is the 95% Confidence Interval?

Answer

A

Confidence intervals indicate a range (interval) within which
we are confident (with some degree of uncertainty) that the
true population parameter lies
the 95% Confidence Interval (95% CI) for a sample estimate is
calculated as:
Lower confidence interval
sample estimate – 1.96*standard error
Upper confidence interval
sample estimate + 1.96*standard error
Interpretation: We are 95% confident that
the population parameter is contained within the interval
sample estimate +/- 1.96 SE

Question 22

Q

What are the two things we assess we assess associations?

Answer

A

The presence of
an association and the magnitude of this association

Question 23

Q

What are the two things we assess we assess associations?

Answer

A

The presence of
an association and the magnitude of this association

Question 24

Q

What are the 2 possibilities for any given association?

Answer

A

The association does not exist in the population
(i.e. the two variables are not linked)
The association exists in the population (the two
variables are linked)

Question 25

Q

What are the two types of associations called?

Answer

A

The Null hypothesis (H0
) always states that there is no
association between the two variables in the population
The Alternative hypothesis (HA
) always states that there
is an association between the two variables in the
population

Question 26

Q

Explain the formal process of hypothesis testing.

Answer

A

Define statistical null ( H0
) and alternative hypotheses (HA
)
Start by assuming NO association exists in population➔ i.e. start
with H0
Define what is sufficient evidence against H0
: the significance
level
Collect some sample data from population (evidence)
Does sample estimate provide sufficient evidence against H0
(i.e. no association)?
* Or alternatively could sample estimate be explained by random
error alone, i.e. consistent with expected sampling variation if no
association exists in the population
Calculate value of test statistic (using sample)
Using test statistic derive probability that quantifies our belief
against H0
: p-value
Interpret p-value: often in the context of the significance level

Question 27

Q

What is the p-value?

Answer

A

What is the probability of obtaining an association as strong (or
stronger) as the one observed in our sample, if in fact there is no
association present in the source population (i.e. H0
is true)
-The lower the p-value , the lesser the chance we could have
obtained an association this strong (or stronger) in our sample if no
true association existed (in the population)
-Thus , the lower the p-value, the more we think about rejecting H0
(no association exists in population) in favour of HA
(association exists
in population)
Generally, it is true that the stronger the association ( and the
larger the value of the test statistic), the lower the p-value

Question 28

Q

What’s the relationship between p-value and association?

Answer

A

Inversely proportional.

Question 29

Q

What is the significance level?

Answer

A

It is the binary cut-off to say what is sufficient evidence and how low the p-value should be.
* Often a significance level of 5% is chosen and therefore a
p-value of <0.05 is used to infer statistical significance
* An estimate with a p-value of <0.05 is deemed statistically
significant

Question 30

Q

Rejecting or not the Null Hypothesis based on p-value

Answer

A

The p-value is used as evidence for rejecting or not rejecting
the Null hypothesis H0
in favour of the alternative HA
If the p-value is <0.05 (or whatever the chosen significance
level was), we reject H0
(no association in the population)
If the p-value is ≥0.05 (or whatever the chosen significance
level was), we cannot reject H0
(no association in the
population)

Question 31

Q

IMPORTANT TO REMEMBER

Answer

A

In hypothesis testing …
We either have, or do not have, enough evidence to “reject H0
”
Can only either “reject H0 “ or “fail to reject H0
”
We cannot confirm whether HA or H0 are true

Question 32

Q

What can we expect to happen to the p-value if we assume association?

Answer

A

Generally, if we assume the presence of an association in
the source population:
o large sample sizes will give smaller p-values
o estimates of large magnitude will also give
smaller p-values

Question 33

Q

What 2 factors affect the p-value?

Answer

A

(Just like with 95% CI)
1. Sample size
2. Magnitude of association

Question 34

Q

What two ways can be used to reject or not the Null hypotheses?

Answer

A

p-value
95% Confidence Interval

Question 35

Q

How can 95% CI be used to reject or not the Null Hypothesis?

Answer

A

*Mean difference: If the 95% CIs include 0 then H0
cannot be rejected. This is because 0 (meaning no
difference between the two means) is a likely value
in the source population.
* Regression coefficient and correlation coefficient: If
the 95% CIs include 0 then H0 cannot be rejected.
This is because 0 (meaning no correlation between
the two variables and a slope of 0) is a likely value
in the source population
* Odds Ratio/Risk Ratio/Rate Ratio: If the 95% CIs
include 1 then H0
cannot be rejected. This is
because 1 (meaning equal risks, rates or odds
between the two groups) is a likely value in the
source population

Question 36

Q

Statistical Significance Summarised

Answer

A

In cases where the p-value of an estimate is ≥0.05 or
when the 95% Confidence Intervals include 0 (mean
difference, regression coefficient, correlation
coefficient) or 1 (Odds Ratio/Risk Ratio/Rate Ratio),
then the estimate is considered not statistically
significant, thus the study finding is not conclusive.
In cases where the p-value of an estimate is <0.05 or
when the 95% Confidence Intervals do not include 0
(mean difference, regression coefficient, correlation
coefficient) or 1 (Odds Ratio/Risk Ratio/Rate Ratio),
then the estimate is considered statistically significant,
thus the study finding is conclusive

Question 37

Q

LOOK AT EXAMPLES

Answer

A

IN SLIDES