data collection Flashcards

Question 1

Q

what is data collection

Answer

A

Data collection is the systematic process of gathering and measuring information from various sources to answer research questions, test hypotheses, and make inferences.

Question 2

Q

what is a sampling unit

Answer

A

Sampling unit: an individual object, animal, or person, on which measurements can be made.

Question 3

Q

what is a target population

Answer

A

The target population is the entire group of sampling units we want to study or make inferences about.

Question 4

Q

what is a census

Answer

A

A census measures every individual in the target population. It is often an official survey conducted by governments to gather demographic data.

Question 5

Q

3 advantages of a census

Answer

A

Provides complete and accurate data.

No sampling error since everyone is included.

Useful for policy-making and resource allocation.

Question 6

Q

What are the disadvantages of a census?

Answer

A

Expensive and time-consuming.

Difficult to access the entire population.

Data may become outdated by the time analysis is complete.

Question 7

Q

what is a sampling protocol or design

Answer

A

The procedure or strategy used to select sampling units from the target population.

Question 8

Q

what is a sample

Answer

A

A subset of individuals or sampling units selected from a target population for analysis to estimate parameters or test hypotheses.

Question 9

Q

what is a variable

Answer

A

A variable is a characteristic of each sampling unit that is measured (e.g., age, blood group, voting preference), usually denoted by lowercase Roman letters (e.g.𝑥, 𝑦).

Question 10

Q

what is a parameter

Answer

A

A parameter is a numerical summary of a variable for a population, usually represented by Greek letters (e.g. 𝜇 for the true mean)

Question 11

Q

what is a statistic/estimate

Answer

A

A statistic (or estimate) is a numerical summary of a variable for a sample, often used to estimate a population parameter (e.g. 𝑥ˉ estimates 𝜇)

Question 12

Q

4 data collection methods

Answer

A

Censuses – measuring the entire target population.

Polls and surveys – collecting responses from a sample.

Randomized designed experiments – manipulating variables under controlled conditions.

Observational studies – collecting data without intervention.

Question 13

Q

what is a survey

Answer

A

A survey is the process of collecting data from a sample in order to obtain information about the whole population.

Question 14

Q

what is an opinion poll

Answer

A

An opinion poll assesses public opinion by questioning a random or representative sample. Often used for election forecasting.

Question 15

Q

Why use a survey instead of a census? (3)

Answer

A

Cheaper
Faster
More practical (accessing the entire population may be difficult or impossible)

Question 16

Q

what is sampling error

Answer

A

This variation between samples is called sampling error and it is unavoidable without taking a census.

Question 17

Q

Why is random sampling important? (4)

Answer

A

Gives each member of the population an equal chance of selection.

Reduces bias.

Allows calculation of sampling error.

Larger samples improve representativeness.
- more representative sample

Question 18

Q

What are accuracy, precision, and bias in statistics?

Answer

A

Accurate – Sample statistic is similar to the population parameter.

Precise – Statistic is consistent across multiple samples. A lack of precision may arise from sampling error e.g. where sample sizes are very small.

Biased – implies that the sample statistic tends to differ from the population parameter in a consistent way (there is a systematic error)

Question 19

Q

what is the goal in sampling

Answer

A

To select a sample that reflects the variation in the whole population without sampling the entire population.

Question 20

Q

Why is careful sampling important?

Answer

A

Poor data collection can lead to flawed conclusions.

A well-chosen sample allows for accurate and robust decisions.

Uncertainty is inherent in sampling, so methods must minimize errors.

Question 21

Q

3 different sample strategies

Answer

A

Simple random sampling

Systematic random sampling

Stratified random sampling

Question 22

Q

what is simple random sampling

Answer

A

A method where each individual in the population has an equal chance of being selected.

Question 23

Q

What is the formula for the probability of selection in simple random sampling?

Answer

A

The chance, or probability, of being selected in a sample of size 𝑛 from a population of size 𝑁 is:

chance of selection=𝑛/𝑁

Question 24

Q

What is the probability of a student being selected from a sample of 20 students from a class of 130

Answer

A

20/130 = 0.1538 = 15.38%

Question 25

Q

Given a University of St Andrews population of 13,484, what is the probability of being selected in a sample of 650?

Answer

A

650/13484 = 0/0482=4.82%

Question 26

Q

what is systematic random sampling

Answer

A

A method where a sample is selected at regular intervals after a random start.

Question 27

Q

What are the advantages of systematic sampling?

Answer

A

Easier to implement (only one random number needed).

Ensures even distribution of the sample across the population.

Works well for time-based selection (e.g., traffic monitoring).

Question 28

Q

How do you calculate the fixed periodic interval (𝑘) in systematic sampling?

Answer

A

K=N/𝑛

N = Population size
𝑛 = Sample size

Question 29

Q

What is the fixed periodic interval 𝑘 for a sample of 650 from 13,484 individuals?

If the random start is 𝑞=9, what are the first three individuals selected using systematic sampling?

Answer

A

𝑘 = 13,484 / 650 =20.75≈21
So, every 21st individual is selected after a random start.

q,q+k,q+2k
9, 9+21, 9+2(21)
The first three selected individuals are 9, 30, and 51.

Question 30

Q

what is the process of systematic random sampling

Answer

A

Suppose there are 𝑁=1,000 sampling units in the target population and we want to take a sample of size 𝑛=20. A systematic sample can be selected as follows:

Calculate 𝑘, the fixed periodic interval. This is the interval between successive samples. 𝑘=𝑁/𝑛 e.g. 𝑘=1000/20=50
Randomly pick a starting number from 1 to 𝑘, inclusive, call it 𝑞. For this example, we want a number from 1 to 50, say 3 was chosen at random, 𝑞=3.
Sample the 𝑞th individual, then the (𝑞+𝑘)th, then the (𝑞+2𝑘)th and so on. Therefore, the sample is 3, 53, 103, 153, …, 903, 953

Question 31

Q

What is a potential issue with systematic sampling?

Answer

A

If the population has periodic variation and the fixed interval k matches the pattern, the sample could be biased.

Question 32

Q

what is stratified random sampling

Answer

A

A method where the population is divided into strata (categories), and random samples are taken from each stratum.

Question 33

Q

3 advantages of stratified random sampling

Answer

A

Ensures representation of all groups.

Provides better precision than simple random sampling.

Can be more convenient to organise

Question 34

Q

How do you calculate the proportion of a stratum (𝑝𝑖)?

Answer

A

𝑝𝑖=𝑁𝑖/𝑁

where 𝑁𝑖 is the total number of units in stratum 𝑖.

Question 35

Q

How do you calculate the sample size for a stratum (𝑛𝑖)?

Answer

A

𝑛𝑖=𝑛×𝑝𝑖

n = Total sample size
𝑝𝑖 = Proportion of stratum 𝑖

Question 36

Q

How many staff members should be sampled from a total sample of 650, if there are 3,250 staff in a 13,484-person university?

Answer

A

Calculate proportion of staff: 3250/12484 = 0.241

Calculate sample size for staff: 650 x 0.241 = 157

157 staff members should be sampled.

Question 37

Q

why are sampling errors unavoidable

Answer

A

there is a difference between the sample statistic used to estimate the population and the true parameter for the population.

random variation between samples.

we want random variation to be the only source of any difference between the sample and the truth

Question 38

Q

6 non sampling errors

Answer

A

selection bias
non response bias
self selection bias
question effects
survey format
behavioural considerations

Question 39

Q

what is selection bias

Answer

A

A non-representative sample due to flawed selection methods.

Question 40

Q

what is non response bias

Answer

A

When selected participants do not respond, non-respondents tend to behave differently

Question 41

Q

what is self selection bias

Answer

A

When people choose to participate, and those who do may differ from those who don’t.

Self-selection bias can be a problem in more ‘serious’ studies, because much behavioural research on people can only use volunteers, for ethical reasons.

Question 42

Q

what is question bias

Answer

A

The way questions are worded affects responses.

Question 43

Q

4 ways to prevent question bias

Answer

A

Fixed wording helps reduce bias.

Questions should not lead or prompt respondents.

Logical order prevents confusion.

The respondent may get tired, or bored, if there are too many questions

Question 44

Q

How can survey format affect responses?

Answer

A

The way a survey is conducted (postal, online, telephone, in-person) can influence answers.

Question 45

Q

What are some behavioural factors that influence survey responses?

Answer

A

Social stigma (e.g., “Have you ever been arrested?”)

Social prestige (e.g., “How much do you earn?”)

Social acceptability (e.g., “How much alcohol do you drink per week?”)

Misunderstanding or misremembering questions

Question 46

Q

How can we improve survey accuracy?

Answer

A

Ask participants to keep a daily diary for accurate reporting.

Use neutral wording to avoid social desirability bias.

Pre-test the questionnaire for clarity and consistency.

Question 47

Q

what is an experiment

Answer

A

A study where conditions are controlled to test the effect of a treatment.

Question 48

Q

What are some examples of designed experiments?

Answer

A

Medical study: Testing if a drug improves survival rates.

Physics study: Measuring how heat affects electrical resistance.

Agricultural study: Checking if fertilizer increases crop yield.

Question 49

Q

What two types of variation exist in experiments?

Answer

A

Variation due to treatment (effect of drug, heat, fertilizer, etc.)

Natural variation (differences between individuals, external factors)

Question 50

Q

what is a randomised designed experiment

Answer

A

A randomised experiment is where the researcher randomly allocates treatments to sampling units to ensure groups are similar, and any differences in response can be attributed to the treatment.

Question 51

Q

What is the purpose of random allocation in an experiment?

Answer

A

Random allocation ensures that treatment groups are similar on average, avoiding bias and ensuring that differences in response can be attributed to the treatment.

Question 52

Q

What is the role of randomisation in an experiment?

Answer

A

Randomisation ensures treatment groups are similar, eliminating selection bias and improving the credibility of results.

Question 53

Q

What is replication in a randomised designed experiment?

Answer

A

Replication involves repeating the experiment multiple times to:

Assess natural variation.

Increase precision (more replicates = more precise but higher cost).

Question 54

Q

What is blocking in experiments?

Answer

A

Blocking involves partitioning sampling units into different strata or groups (e.g., male/female) before random allocation to treatments. This reduces natural variability, improving precision.

Question 55

Q

what is a a placebo

Answer

A

A placebo is a substance with no active effect (e.g., sugar pills) given to a control group to avoid psychological effects influencing the results.

Question 56

Q

what is double blinding in clinical trials

Answer

A

The doctor doesn’t know which treatment is being given.

The patient doesn’t know which treatment they are receiving.

Question 57

Q

two methods used in clinical trails

Answer

A

placebo
double blinding

Question 58

Q

What is the significance of randomised experiments in establishing causation?

Answer

A

If a randomised experiment shows a significant effect, it provides strong evidence for causation, i.e., the treatment caused the observed effect.

Question 59

Q

What was the design of Jonas Salk’s polio vaccine trial in 1954?

Answer

A

The trial used a stratified randomised design, where children were randomly allocated to the treatment or control group.

Treatment group: Received the vaccine.
Control group: Received a saline solution.
Double-blind: Neither the participants nor the doctors knew which treatment they received.
Results: The vaccine was shown to be effective in reducing polio cases.

Question 60

Q

What are the 5 key components of a randomised designed experiment?

Answer

A

Randomisation: Assign subjects randomly to treatment groups.

Replication: Repeat the treatment on multiple subjects to assess natural variation.

Control group: A group that does not receive the active treatment.

Blocking/Stratification: Group subjects by similar characteristics before randomisation (e.g., age).

Causality: A significant result can suggest causality between treatment and effect. experimental units may first be divided into groups (e.g. on the basis of age)

Question 61

Q

what is an observational study, and how does it differ from a randomised experiment?

Answer

A

The researcher does not control conditions. They simply observe and measure variables.

whereas a randomised experiment: The researcher manipulates the variables and controls conditions to establish causality.

Question 62

Q

what is a confounding variable

Answer

A

A confounding variable is an unmeasured factor that affects both the potential cause and the effect, making it difficult to determine the true relationship between the variables.

Question 63

Q

challenge in observational studies

Answer

A

Confounding variables can make it difficult to argue for causality.

Question 64

Q

why are observational studies used as evidence for an effect

Answer

A

It may be unethical to carry out a randomised experiment

It may be difficult to carry out a randomised experiment

Answer 65

A

cohort studies
case control studies
cross sectional study

Answer 66

A

A cohort is any group of people who are linked in some way. Researchers tracks the group over time and compares outcomes based on exposure to a certain variable.

Answer 67

A

Compares individuals with a health issue (cases) to those without it (controls) to study exposure.

Answer 68

A

Measures variables at a specific point in time to understand prevalence.

Answer 69

A

Looks at historical data to assess past exposure and outcomes.

Answer 70

A

prospective or retrospective

Answer 71

A

none of the subjects have the outcome of interest (e.g. disease) when the study commences; the subjects are followed over a period of time to determine whether the disease develops.

Answer 72

A

Case-control studies tend to be retrospective and examine previous exposure in relation to the outcome.

Answer 73

A

infer causation because it is difficult to exclude the possible effects of confounding variables.

Answer 74

A

cheaper

less time-consuming

effects can be investigated that would be unethical to manipulate

they reflect real-world scenarios

they are suited for long-term studies or assessing trends over time.

Answer 75

A

easier to establish causality. Changes can be more confidently attributed to the treatment.

in a controlled environment the influence of confounding variables is reduced.

often easier to perform again, increasing the ability to compare and confirm results.

minimise bias (with randomisation) to ensure that treatments groups are comparable.

allow the investigation of variables that might not occur naturally.