Week 10: Survey Analysis Flashcards

1
Q

What is the difference between a census and a sample?

A

Census: Information gathered on every member of the population
Sample: Information gathered on a population subset to represent the whole and make inferences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why use a sample instead of a census?

A
  • cost-effective
  • time-saving
  • reduces workload
  • useful when the population is large and difficult to survey entirely
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a sampling frame?

A

A list of all possible sampling units from which the sample is drawn. Ideally, the frame should match the target population/universe
Example sampling frames: Registered voters; UK residents
Example universes: Voters; UK population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the characteristics of an ideal sampling frame?

A
  • adequate (covers all population units)
  • accurate (correct listing of all sampling units and organised logically with numerical identifiers)
  • complete (no omissions, duplications or extraneous units)
  • up-to-date
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the main types of sampling methods?

A

Probability sampling: Each unit has a known chance of selection
Non-probability sampling: Not all units have a known or equal chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between probability and non-probability sampling?

A

Probability sampling: Reduces selection bias and allows for error measurement
Non-probability sampling: Easier and cheaper but introduces selection bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are common non-probability sampling methods?

A
  • convenience/haphazard sampling (units selected arbitrarily and cannot estimate representativeness)
  • purposive (judgement) sampling (units are selected subjectively to obtain a sample that appears to represent the population)
  • volunteer sampling (respondents are volunteers who are screened to get a set of characteristics for the purposes of the survey - carries large selection biases; e.g., individuals with a particular disease)
  • quota sampling
  • expert sampling
  • snowball sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List types of probability sampling methods

A
  • simple random sampling (SRS)
  • systematic sampling
  • stratified sampling
  • cluster sampling
  • multi-stage sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain stratified sampling

A

The population is divided into strata (homogeneous, mutually exclusive groups), and independent samples are drawn from each stratum. This ensures better representation of subgroups, allowing meaningful subgroup inferences
- assumes that groups are more homogenous within-group than across the population. This is more efficient; you need smaller samples from each stratum to get precise estimates for that stratum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is cluster sampling and how does it differ from stratified sampling?

A

Cluster sampling: Population is divided into clusters, and entire clusters are sampled
Stratified sampling: Samples are drawn from each stratum, not the whole cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is multi-stage sampling?

A

A combination of cluster and stratified sampling, where sampling occurs within selected clusters in multiple stages. This sampling occurs in two stages:
- clusters here are referred to primary sampling units (PSU) and units within clusters as secondary sampling units (SSU)
- sample size needed to obtain given level of precision would still be bigger than for an SRS (less efficient method)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What factors determine sample size?

A
  • acceptable significance level
  • study power
  • expected effect size
  • population variability (standard deviation)
    also study structure (descriptive/comparative); study design; resources and finances; non-response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are common errors associated with sampling?

A
  • coverage/sampling frame error (sample used does not properly represent underlying population under study)
  • sampling error (unavoidable - degree to which a statistic differs from its “true” value given that the survey was conducted among only one of many possible samples)
  • non-response error (selected sampling units are not interviewed)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is it important to consider sampling frame carefully?

A

A non-representative sampling frame can lead to sampling frame error (biased sample) which can cause inaccurate predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the advantages of non-probability sampling?

A
  • quick and convenient
  • cost-effective
  • minimal respondent burden
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the disadvantages of non-probability sampling?

A
  • selection bias (assumptions to make inferences about the population are too strong)
  • non-coverage bias
  • difficult to assess the quality of estimates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the response rate and how is it calculated?

A

Response rate = (no. of respondents) / (total eligible sample)
- denominator includes non-responders: refusals, language problems, illness, not available
- ineligible people should be removed from sampling frame (and not included in response rate)
- can affect study’s representativeness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why is it important to account for survey design in data analysis?

A

Ignoring stratification or clustering may lead to underestimation of uncertainty and misleading results

19
Q

What are the initial stages of designing a survey?

A
  • define the population of interest
  • choose sample type
  • determine sample size
20
Q

What is a sampling unit?

A

The population divided into a finite number of distinct and identifiable units
- Individuals
- Households
- Institutions (e.g. schools)
- Clusters (e.g. areas of a country; city blocks, etc.)
They influence research design, data collection, and data analysis

21
Q

What are some issues with sampling frames?

A
  • availability
  • coverage (under-coverage; over-coverage; extraneous)
    e.g., who is missing in household surveys?
22
Q

Difference between parameter and statistic:

A

Parameter: The actual value of the population characteristic (known only in error-free censuses)
Statistic: Estimate of the parameter obtained from the sample (inference)

23
Q

Why do researchers generally prefer probability sampling methods?

A

They are considered to be more accurate and rigorous (they minimise the risk that the sample does not represent the population)
However, there may be circumstances where it is not feasible, practical or theoretically sensible to do random sampling

24
Q

Are non-probability samples useless?

A

No, but it is best to:
- compare your sample characteristics to those based on representative samples and/or censuses
- think carefully about biases the data collected may have and how this can affect results

25
Q

What is probability sampling?

A

Any sampling method utilising some form of random selection
Must set up some process that assures that every element in the population has a known non-zero probability of selection
- no guarantee that the sample will be representative, but “sampling error” can be measured

26
Q

What is SRS?

A

Each member of the population has an equal chance of selection (=n/N)

27
Q

What is systematic random sampling?

A

Sample principle as SRS; often used to select large samples from long lists:
- calculate sampling interval (k=N/n)
- select random number g (1 ≤ g ≤ k)
- select sampling unit starting from gth unit and take every kth unit

28
Q

Example: We know in the frame there are 700 Whites, 200 Asians, and 100 Blacks. We can stratify the population and randomly select from each stratum. Specify how proportional and non-proportional stratified sampling differ in this case

A

Proportional stratified sampling: Each subgroup is proportional to the population size
Disproportional stratified sampling:
- N1 = 700 Whites, sample n1 = 50
- N2 = 200 Asians, sample n2 = 25
- N3 = 100 Blacks, sample n3 = 25

29
Q

Describe in detail cluster (random) sampling:

A

Often used for convenience to reduce costs when sample units are too spread out geographically
- population divided into groups
- examples include factories, schools, and geographic areas (postcode, electoral)
- clusters are randomly selected
- all units within selected clusters are included in sample
- better to survey many small clusters (instead of a few large ones)

30
Q

Suppose you wish to find out which sports Grade-11 students are participating in across Canada. It would be too costly and lengthy to survey every Canadian in Grade 11, or even a couple of students from every Grade 11 class in Canada. Describe how you would carry out cluster sampling, two-stage sampling, and stratified cluster sampling designs:

A

Cluster sampling: Randomly select 100 schools from over Canada and survey all Grade 11 students in all 100 clusters
Two-stage sampling design: Get a list of all grade 11 students from these selected schools; and select a random sample of grade 11 students from each school (schools: PSU; students: SSU)
Stratified cluster sampling: If you stratified schools in small/medium/large and selected a sample of schools from each stratum

31
Q

How big should a sample be?

A

If too small, we may fail to detect important effects or be too imprecise to draw meaningful conclusions. If too large, it may be a waste of time and resources

32
Q

What are some misconceptions around sample size?

A
  • should be set fraction of population i.e., 1 or 5%
  • some set number (i.e., 1500 for national survey)
33
Q

Describe the difference between type I and type II error

A

Type I error (alpha) occurs if we conclude that the data are not consistent with H0 (although H0 is true). Type II error (beta) occurs if we conclude that the data are consistent with H0, even though H0 is false

34
Q

What is effect size?

A

Difference between the value of the variable in control group and that in test/drug group. The larger the effect size, the smaller the sample size

35
Q

When considering effect size, what is meant by absolute/relative difference? Explain in context of group A having a weight loss of 20kg, and group B having a weight loss of 10kg

A

Absolute effect size = 10kg
Relative effect size = 50%

36
Q

How does sample size depend on anticipated variation in measures under study?

A

Smaller sample if the population is more homogeneous (i.e., it has a smaller variance or sd)

37
Q

What is the calculation of sample size based on and when should it be calculated?

A

The sample size calculation is based on conventions around Type I and II errors and assumptions around effect size and standard deviation. It is also influenced by practical issues, e.g., administrative issues, costs, availability of patients/subjects
The sample size has to be calculated before initiating a study and should be not be changer during the study course

38
Q

What if we use secondary data?

A

You cannot influence the sample size
You can determine the study power. Post-hoc power is a function of the p-value and contains no additional helpful information. May be used as ‘follow-up’ analysis, particularly if a finding is non-significant (reporting 95% CIs is a better option)

39
Q

What implications do sampling methods have for t-tests and chi-square tests?

A

Formulas used in analyses in these tests are appropriate for data collected from an SRS.
If samples are stratified/clustered, different units within a PSU are more similar to each other than in an SRS and should not be treated as statistically significant. Correct inference requires considering these design-based features when analysing data.

40
Q

What can affect the magnitude of the SEs, the value of test statistics and the width of confidence intervals?

A

Stratification and clustering. The conclusions drawn from analysis not accounting for complex survey design features may be misleading. Ignoring complex features leads to under-estimation of uncertainty in survey data (p values too small; CIs too narrow)

41
Q

What is a primary sampling unit (PSU?)

A

First unit sampled in the design e.g., first regions in a country may be sampled, then schools within regions may be sampled. The region would be the PSU

42
Q

What is a stratum?

A

Stratification is a method of breaking up the sampling frame into different groups (often by gender, ethnicity, or SES). Once these groups have been defined, one samples from each group as if it were independent of all other groups

43
Q

What are weights?

A

Most common survey weight is the probability weight, used to denote the inverse of the probability of being included in the sample due to sampling design. The probability weight may be corrected for errors in the sampling frame, unit non-response, under/over representation. “Corrected” weights are also known as sampling weight

44
Q

How do we consider sampling design in practice?

A

Strata, PSU, and weights are variables in the dataset that enable users to account for the complex design

<svyset> command used to specify the survey design (PSUs and strata) as well as weights. This informs subsequent use of <svy> commands.
<svy: mean bmi> accounts for complex design
<mean: bmi> assumes SRS
option <subpop> to avoid producing underestimated SEs
</subpop></svy></svyset>