data collection Flashcards

1
Q

what is data collection

A

Data collection is the systematic process of gathering and measuring information from various sources to answer research questions, test hypotheses, and make inferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a sampling unit

A

Sampling unit: an individual object, animal, or person, on which measurements can be made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is a target population

A

The target population is the entire group of sampling units we want to study or make inferences about.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a census

A

A census measures every individual in the target population. It is often an official survey conducted by governments to gather demographic data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 advantages of a census

A

Provides complete and accurate data.

No sampling error since everyone is included.

Useful for policy-making and resource allocation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the disadvantages of a census?

A

Expensive and time-consuming.

Difficult to access the entire population.

Data may become outdated by the time analysis is complete.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a sampling protocol or design

A

The procedure or strategy used to select sampling units from the target population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a sample

A

A subset of individuals or sampling units selected from a target population for analysis to estimate parameters or test hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a variable

A

A variable is a characteristic of each sampling unit that is measured (e.g., age, blood group, voting preference), usually denoted by lowercase Roman letters (e.g.𝑥, 𝑦).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a parameter

A

A parameter is a numerical summary of a variable for a population, usually represented by Greek letters (e.g. 𝜇 for the true mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a statistic/estimate

A

A statistic (or estimate) is a numerical summary of a variable for a sample, often used to estimate a population parameter (e.g. 𝑥ˉ estimates 𝜇)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

4 data collection methods

A

Censuses – measuring the entire target population.

Polls and surveys – collecting responses from a sample.

Randomized designed experiments – manipulating variables under controlled conditions.

Observational studies – collecting data without intervention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a survey

A

A survey is the process of collecting data from a sample in order to obtain information about the whole population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is an opinion poll

A

An opinion poll assesses public opinion by questioning a random or representative sample. Often used for election forecasting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why use a survey instead of a census? (3)

A
  • Cheaper
  • Faster
  • More practical (accessing the entire population may be difficult or impossible)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is sampling error

A

This variation between samples is called sampling error and it is unavoidable without taking a census.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is random sampling important? (4)

A

Gives each member of the population an equal chance of selection.

Reduces bias.

Allows calculation of sampling error.

Larger samples improve representativeness.
- more representative sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are accuracy, precision, and bias in statistics?

A

Accurate – Sample statistic is similar to the population parameter.

Precise – Statistic is consistent across multiple samples. A lack of precision may arise from sampling error e.g. where sample sizes are very small.

Biased – implies that the sample statistic tends to differ from the population parameter in a consistent way (there is a systematic error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is the goal in sampling

A

To select a sample that reflects the variation in the whole population without sampling the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why is careful sampling important?

A

Poor data collection can lead to flawed conclusions.

A well-chosen sample allows for accurate and robust decisions.

Uncertainty is inherent in sampling, so methods must minimize errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

3 different sample strategies

A

Simple random sampling

Systematic random sampling

Stratified random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is simple random sampling

A

A method where each individual in the population has an equal chance of being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the formula for the probability of selection in simple random sampling?

A

The chance, or probability, of being selected in a sample of size 𝑛 from a population of size 𝑁 is:

​chance of selection=𝑛/𝑁

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the probability of a student being selected from a sample of 20 students from a class of 130

A

20/130 = 0.1538 = 15.38%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Given a University of St Andrews population of 13,484, what is the probability of being selected in a sample of 650?

A

650/13484 = 0/0482=4.82%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what is systematic random sampling

A

A method where a sample is selected at regular intervals after a random start.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the advantages of systematic sampling?

A

Easier to implement (only one random number needed).

Ensures even distribution of the sample across the population.

Works well for time-based selection (e.g., traffic monitoring).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do you calculate the fixed periodic interval (𝑘) in systematic sampling?

A

K=N/𝑛

N = Population size
𝑛 = Sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the fixed periodic interval 𝑘 for a sample of 650 from 13,484 individuals?

If the random start is 𝑞=9, what are the first three individuals selected using systematic sampling?

A

𝑘 = 13,484 / 650 =20.75≈21
So, every 21st individual is selected after a random start.

q,q+k,q+2k
9, 9+21, 9+2(21)
The first three selected individuals are 9, 30, and 51.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what is the process of systematic random sampling

A

Suppose there are 𝑁=1,000 sampling units in the target population and we want to take a sample of size 𝑛=20. A systematic sample can be selected as follows:

  1. Calculate 𝑘, the fixed periodic interval. This is the interval between successive samples. 𝑘=𝑁/𝑛 e.g. 𝑘=1000/20=50
  2. Randomly pick a starting number from 1 to 𝑘, inclusive, call it 𝑞. For this example, we want a number from 1 to 50, say 3 was chosen at random, 𝑞=3.
  3. Sample the 𝑞th individual, then the (𝑞+𝑘)th, then the (𝑞+2𝑘)th and so on. Therefore, the sample is 3, 53, 103, 153, …, 903, 953
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a potential issue with systematic sampling?

A

If the population has periodic variation and the fixed interval k matches the pattern, the sample could be biased.

32
Q

what is stratified random sampling

A

A method where the population is divided into strata (categories), and random samples are taken from each stratum.

33
Q

3 advantages of stratified random sampling

A

Ensures representation of all groups.

Provides better precision than simple random sampling.

Can be more convenient to organise

34
Q

How do you calculate the proportion of a stratum (𝑝𝑖)?

A

𝑝𝑖=𝑁𝑖/𝑁

where 𝑁𝑖 is the total number of units in stratum 𝑖.

35
Q

How do you calculate the sample size for a stratum (𝑛𝑖)?

A

𝑛𝑖=𝑛×𝑝𝑖

n = Total sample size
𝑝𝑖 = Proportion of stratum 𝑖

36
Q

How many staff members should be sampled from a total sample of 650, if there are 3,250 staff in a 13,484-person university?

A

Calculate proportion of staff: 3250/12484 = 0.241

Calculate sample size for staff: 650 x 0.241 = 157

157 staff members should be sampled.

37
Q

why are sampling errors unavoidable

A

there is a difference between the sample statistic used to estimate the population and the true parameter for the population.

random variation between samples.

we want random variation to be the only source of any difference between the sample and the truth

38
Q

6 non sampling errors

A
  • selection bias
  • non response bias
  • self selection bias
  • question effects
  • survey format
  • behavioural considerations
39
Q

what is selection bias

A

A non-representative sample due to flawed selection methods.

40
Q

what is non response bias

A

When selected participants do not respond, non-respondents tend to behave differently

41
Q

what is self selection bias

A

When people choose to participate, and those who do may differ from those who don’t.

Self-selection bias can be a problem in more ‘serious’ studies, because much behavioural research on people can only use volunteers, for ethical reasons.

42
Q

what is question bias

A

The way questions are worded affects responses.

43
Q

4 ways to prevent question bias

A

Fixed wording helps reduce bias.

Questions should not lead or prompt respondents.

Logical order prevents confusion.

The respondent may get tired, or bored, if there are too many questions

44
Q

How can survey format affect responses?

A

The way a survey is conducted (postal, online, telephone, in-person) can influence answers.

45
Q

What are some behavioural factors that influence survey responses?

A

Social stigma (e.g., “Have you ever been arrested?”)

Social prestige (e.g., “How much do you earn?”)

Social acceptability (e.g., “How much alcohol do you drink per week?”)

Misunderstanding or misremembering questions

46
Q

How can we improve survey accuracy?

A

Ask participants to keep a daily diary for accurate reporting.

Use neutral wording to avoid social desirability bias.

Pre-test the questionnaire for clarity and consistency.

47
Q

what is an experiment

A

A study where conditions are controlled to test the effect of a treatment.

48
Q

What are some examples of designed experiments?

A

Medical study: Testing if a drug improves survival rates.

Physics study: Measuring how heat affects electrical resistance.

Agricultural study: Checking if fertilizer increases crop yield.

49
Q

What two types of variation exist in experiments?

A

Variation due to treatment (effect of drug, heat, fertilizer, etc.)

Natural variation (differences between individuals, external factors)

50
Q

what is a randomised designed experiment

A

A randomised experiment is where the researcher randomly allocates treatments to sampling units to ensure groups are similar, and any differences in response can be attributed to the treatment.

51
Q

What is the purpose of random allocation in an experiment?

A

Random allocation ensures that treatment groups are similar on average, avoiding bias and ensuring that differences in response can be attributed to the treatment.

52
Q

What is the role of randomisation in an experiment?

A

Randomisation ensures treatment groups are similar, eliminating selection bias and improving the credibility of results.

53
Q

What is replication in a randomised designed experiment?

A

Replication involves repeating the experiment multiple times to:

Assess natural variation.

Increase precision (more replicates = more precise but higher cost).

54
Q

What is blocking in experiments?

A

Blocking involves partitioning sampling units into different strata or groups (e.g., male/female) before random allocation to treatments. This reduces natural variability, improving precision.

55
Q

what is a a placebo

A

A placebo is a substance with no active effect (e.g., sugar pills) given to a control group to avoid psychological effects influencing the results.

56
Q

what is double blinding in clinical trials

A

The doctor doesn’t know which treatment is being given.

The patient doesn’t know which treatment they are receiving.

57
Q

two methods used in clinical trails

A

placebo
double blinding

58
Q

What is the significance of randomised experiments in establishing causation?

A

If a randomised experiment shows a significant effect, it provides strong evidence for causation, i.e., the treatment caused the observed effect.

59
Q

What was the design of Jonas Salk’s polio vaccine trial in 1954?

A

The trial used a stratified randomised design, where children were randomly allocated to the treatment or control group.

Treatment group: Received the vaccine.
Control group: Received a saline solution.
Double-blind: Neither the participants nor the doctors knew which treatment they received.
Results: The vaccine was shown to be effective in reducing polio cases.

60
Q

What are the 5 key components of a randomised designed experiment?

A

Randomisation: Assign subjects randomly to treatment groups.

Replication: Repeat the treatment on multiple subjects to assess natural variation.

Control group: A group that does not receive the active treatment.

Blocking/Stratification: Group subjects by similar characteristics before randomisation (e.g., age).

Causality: A significant result can suggest causality between treatment and effect. experimental units may first be divided into groups (e.g. on the basis of age)

61
Q

what is an observational study, and how does it differ from a randomised experiment?

A

The researcher does not control conditions. They simply observe and measure variables.

whereas a randomised experiment: The researcher manipulates the variables and controls conditions to establish causality.

62
Q

what is a confounding variable

A

A confounding variable is an unmeasured factor that affects both the potential cause and the effect, making it difficult to determine the true relationship between the variables.

63
Q

challenge in observational studies

A

Confounding variables can make it difficult to argue for causality.

64
Q

why are observational studies used as evidence for an effect

A

It may be unethical to carry out a randomised experiment

It may be difficult to carry out a randomised experiment

65
Q

What are the 3 types of observational studies?

A
  • cohort studies
  • case control studies
  • cross sectional study
66
Q

what is a cohort study

A

A cohort is any group of people who are linked in some way. Researchers tracks the group over time and compares outcomes based on exposure to a certain variable.

67
Q

what is a case control study

A

Compares individuals with a health issue (cases) to those without it (controls) to study exposure.

68
Q

what is a cross sectional study

A

Measures variables at a specific point in time to understand prevalence.

69
Q

what is a Retrospective Cohort Study

A

Looks at historical data to assess past exposure and outcomes.

70
Q

what can cohort studies be

A

prospective or retrospective

71
Q

what are prospective cohort studies

A

none of the subjects have the outcome of interest (e.g. disease) when the study commences; the subjects are followed over a period of time to determine whether the disease develops.

72
Q

what do case control studies tend to be

A

Case-control studies tend to be retrospective and examine previous exposure in relation to the outcome.

73
Q

what is almost impossible to do with an observational study

A

infer causation because it is difficult to exclude the possible effects of confounding variables.

74
Q

advantages of observational studies (5)

A

cheaper

less time-consuming

effects can be investigated that would be unethical to manipulate

they reflect real-world scenarios

they are suited for long-term studies or assessing trends over time.

75
Q

advantages of experiments (5)

A

easier to establish causality. Changes can be more confidently attributed to the treatment.

in a controlled environment the influence of confounding variables is reduced.

often easier to perform again, increasing the ability to compare and confirm results.

minimise bias (with randomisation) to ensure that treatments groups are comparable.

allow the investigation of variables that might not occur naturally.