L4 Statistical techniques and sampling designs Flashcards

1
Q

Descriptive statistics

A

Methods of summarizing the data in an informative way

  • central tendency: mean, median, mode
  • dispersion: range, stdev, variance, interquartile range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Inferential statistics

A
Methods to draw conclusions (or to make inferences, test hypotheses)
• Mean difference test
• Chi-square test
• Analysis of variance (ANOVA)
• Regression analysis
• Logit analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Four types of scales

A
  • Nominal (qualitative)
  • Ordinal (qualitative)
  • Interval (quantitative)
  • ratio (quantitative)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Nominal scale

A

allows classifying data into groups/categories

e.g. gender

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ordinal scale

A

rank orders in a meaningful way

e.g. education level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Interval scale

A

Meaningful differences between values, but no natural zero point –> zero means something (0 degrees)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ratio scale

A

Meaningful differences and ratios between values due to a natural zero point –> zero is actually nothing (0 dollar is no money)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Choosing between inferential statistics:

IV=nominal/ordinal DV=nominal/ordinal

A

Chi-square test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Choosing between inferential statistics:

IV=nominal/ordinal DV=interval/ratio

A

T-test, Anova

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Choosing between inferential statistics:

IV=interval/ratio DV=nominal/ordinal

A

logit analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Choosing between inferential statistics:

IV=interval/ratio DV=interval/ratio

A

regression analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When to perform T-Test vs Anova

A

T-Test –> compare two means (two levels of IV)

Anova –> compare more than two levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Rating scales

A
  • Likert scale: strongly agree/disagree
  • Semantic differential: Cold warm

TREATED AS INTERVAL/RATIO so that you can use regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a population?

A

Entire group of people, firms, events, or things of interest for which you would like to make inferences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a sample?

A

A subset of the population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a subject?

A

Single member

17
Q

What is low representativeness?

A

= properties of the population are over- or underrepresented in the sample
= high sampling error

18
Q

The sampling process

A
  1. define population
  2. determine sampling frame
  3. determine sampling design
  4. determine sample size
19
Q
  1. define population
A

e.g. students TISEM, dutch organ donors

20
Q
  1. determine sampling frame
A

“Physical” representation of the target population

- where you can reach out to e.g. Donorregister

21
Q

coverage error

A

sampling frame ≠ population
• Under-coverage: true population members are excluded
• Miss-coverage: non-population members are included

22
Q

solutions to coverage error

A
  • If small, recognize but ignore

* If large, redefine the population in terms of the sampling frame

23
Q
  1. determine sampling design
A

probability vs non-probability sampling

24
Q

Probability sampling

A

Each element of the population has a known chance
of being selected as a subject

–>Results generalizable to population
BUT more time and resource intensive

25
Q

Nonprobability sampling

A

The elements of the population do not have a known chance of being selected as a subject

–> less time and resource intensive
BUT results not generalizable to population

26
Q

Probability sampling techniques

A
  • Simple random sampling (SRS)
  • Systematic sampling
  • Stratified sampling
  • Cluster sampling
27
Q

Simple random sampling (SRS)

A

Each population element has an equal chance of being chosen
e.g. out of a hat

–> Highest generalizability
BUT costly?

28
Q

Systematic sampling

A

Select random starting point and then pick every nth element

–> simplicity
BUT low generalizability if there happens to be a systematic difference between every nth observation

29
Q

Stratified sampling

A

Divide the population in meaningful (homogenous) groups, then apply SRS within each group
e.g. level of income

–> All groups are adequately sampled, allowing for group comparisons
BUT more time consuming and Requires homogenous subgroups

30
Q

Cluster sampling

A

Divide the population in heterogeneous groups, randomly select a number of groups and select each member within these groups
e.g. geographic clusters (areas)

–> Geographic clusters
BUT Subsets of naturally occurring clusters are typically more homogeneous than heterogeneous

31
Q

Nonprobability sampling

A
  • Convenience sampling
  • Quota sampling
  • Judgment sampling
  • Snowball sampling
32
Q

Convenience sampling

A

Select subjects who are conveniently available
e.g. random on the street

–> Convenient (inexpensive and fast)
BUT lower generalizability

33
Q

Quota sampling

A

Fix quota for each subgroup (percentage in population)

–> When minority participation is critical
BUT lower generalizability

34
Q

Judgment sampling

A

Select subjects based on their knowledge/professional judgment
e.g. experts

–> Convenient (inexpensive and fast) when a limited # of people has the info you need
BUT Lower generalizability

35
Q

Snowball sampling

A

“Do you know people who…”
e.g. people with rare disease

–> For rare characteristics (“experts”)
BUT first participants strongly influence the sample

36
Q

Rules of thumb for sample size

A

• Sample size ≥ 75, < 500
• Multivariate research: ≥ 10 x parameters to be
estimated
• Subsamples (e.g., male/female): ≥ 30 per subsample