Sampling Error and Bias Flashcards

1
Q

Why is the sampling distribution important?

A

We never draw lots of samples. We estimate the population parameter from a single or small number of samples. Our point estimate is drawn from a theoretical sampling distribution. Variation associated with this distribution is influenced by sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is sampling distribution?

A

A sampling distribution is a probability distribution of a statistic obtained through a large number of samples drawn from a specific population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the central limit theorem?

A

Tells us the sampling distribution will approximate to a normal distribution with sufficient sample size, representative sample, random sampling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a confidence interval?

A

defines a range in which we estimate the true value will fall, accept some error (level of confidence 95%)
2xME

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a 95% confidence level mean?

A

We accept a 5% likelihood that our confidence interval will not contain the true value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is margin of error?

A

Confidence interval is constructed by ME either side of our point estimate (mean). SE x 1.96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Standard Error

A

Measure of how much our estimate differs from the true population value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How would you get a precise estimate, with a narrow confidence interval?

A

Increase sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When do we use t-scores?

A

When dealing with small samples (<40). Instead of z scores and normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do we have to do when calculating confidence interval for RR and OR?

A

We must log transform estimate and then antilog it as they do not follow a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define sampling frame.

A

Actual list of survey population from which the sample is drawn, after which inclusion and exclusion criteria have been determined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

define sampling fraction.

A

Ratio between sample size and population size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is systematic error?

A

Sample not representative of population due to inaccuracy in sampling design or procedures of measurement. Form of bias. Predictable and once identified can be avoided. Will likely not form normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is random error?

A

Not predictable. Caused by natural fluctuations in sampling or measurement process. When plotting random errors as a histogram they should always form a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the process of simple random sampling?

A

Identify survey population, create sampling frame, list eligible units, number them, determine sample size needed, randomly draw units (random number generator).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the advantages of simple random sampling?

A

simple, sampling error easily measured, every unit in frame has equal probability of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are limitations of simple random sampling?

A

create list of all units, get list of units from records (what if they don’t represent the population e.g. telephone directory excludes people without telephone), logistical challenge (time and cost), important minority groups may be missed by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe systematic sampling.

A

identify survey population, sampling frame, arrange units in a sequence (alphabetically), determine sample size, divide sampling population by sample size, choose random starting point, draw units at reg. intervals.

19
Q

Advantages of systematic sampling.

A

simple, easy to implement, sampling error easily determined, ensures representivity.

20
Q

Limitations of systematic sampling.

A

Needs a complete list that is representative of target population, patterns in ordering sequence increases probability of some units being selected.

21
Q

Describe cluster sampling.

A
  1. list potential clusters e.g. all schools in a state 2. list of units in each cluster 3. calculate systematic sampling interval (cumulative population/number desired clusters) e.g. say it is 738 4. choose random start number between 1 and 738 5. select remaining clusters
22
Q

Advantages of cluster sampling.

A

complete list of units not needed, less travel, within clusters all units have equal probability of being selected

23
Q

Limitations of cluster sampling.

A

positive covariance within a cluster (bias), increased sampling (standard) error

24
Q

Describe stratified sampling.

A

Stratify the sampling frame into homogenous sub-populations (strata), sample drawn randomly from each strata.

25
Q

Advantages of stratified sampling.

A

Info on subgroups, increased precision so can have a smaller sample, economical, can have several strata

26
Q

Limitations of stratified sampling.

A

more effort in administration to classify every unit to a category, a participant may classify into several sub-groups, harder to measure sampling error, ss at strata level may be low (high random error and loss of precision)

27
Q

What is the sampling fraction?

A

Use in stratified sampling to ensure probability proportional to size. SS/population x100

28
Q

What is multistage sampling?

A

Use a combo of methods. e.g. 1)identify primary sampling unit (clusters) 2. select sampling units from a cluster

29
Q

What is optimal sampling method is precision and reduction of sampling error were the priorities?

A

stratified random sampling with replacement

30
Q

What is optimal sampling method if logistics is a consideration or if study is on an intervention targeted at community level?

A

cluster

31
Q

What is the objective of an estimation study?

A

Estimate a population parameter (mean or prevalence) from a sample.

32
Q

What is the objective of a comparative study?

A

Compare groups to assess whether there s any statistically or clinically significant difference between them (expressed as a hypothesis test).

33
Q

What do sample size calculations presume?

A

simple random sampling, random error

34
Q

What do we need to know for sample size calculations in surveys?

A

confidence level (z stat- 1.96), sd (if estimating a mean) or proportion/prevalence (if estimating a single proportion), precision (0.05)

35
Q

What do we need to know to calculate sample size of a comparative study?

A

threshold for a sig. result (0.05), power (0.8/0.9), base level for one of the groups (estimate from previous studies), minimum effect size (min RR or OR)

36
Q

What is power?

A

The probability of making a correct decision to reject null hypothesis.

37
Q

How would you boost the power of a comparative study

A

increase significance threshold, increase effect size, reduce variation, use a one tailed test

38
Q

Describe the properties of the normal distribution?

A

symmetrical shape, deviations away from centre equally in + or - direction, mean and median directly in centre.
for cont. normal distributions the probabilities of all poss. outcomes are represented by the area under the curve.

39
Q

How does a standard Normal distribution differ from normal distribution?

A

Standard normal is referenced to a standardised scale where mean=0 and variance=1.

40
Q

What is a z-score?

A

found on a standardised normal distribution. calculated by x-mean/sd. They tell us how many sd’s an observation is above or below the mean. This allows comparisons of distributions expressed in different units.

41
Q

The 95% reference range lies within how many sd and what z score?

A

+ and - 2sd and -1.96 +1.96 z-score.

42
Q

When do you use a students t test?

A

When sample sizes are small or the sd of the population is not known.

43
Q

How is the t-distribution used and what does its shape look like?

A

Used in t-test to construct CI for diff between 2 population means, and in linear regression analysis. It is bell shaped but with a stronger peak and longer tails.

44
Q

How does a t-score differ from a z-score?

A

(x-mean) / (sd/root n)

uses sd of sample, not population