gea1000 chp1 Flashcards
PPDAC full form
Problem, Plan, Data, Analysis, Conclusion
Population
The entire group of individuals or objects that we wish to know about
Research question
Seeks to investigate some characteristic of a population
Population of interest
A group in which we have interest in drawing conclusions in a study
Population parameter
Numerical fact about a population
Census
An attempt to reach out to the entire population of interest
Drawbacks of a census
- High cost of conducting
- Takes a long time to complete - some studies are time sensitive
- One may not be able to achieve 100% response rate
Sample
Proportion of the population selected in the study
Sampling frame
List from which the sample was obtained
Conditions for generalisability
- The sampling frame must be equal to or greater than the population of interest
- There should be no bias when we obtain the sample (selection bias, non-response bias)
Selection bias
- Associated with the researcher’s biased selection of units into the sample
- This can be caused by imperfect sampling frame, which excludes units from being selected
- Can also be caused by non-probability sampling
Non-response bias
- Associated with participants’ non-disclosure or non-participation in the research study
- This results in the exclusion of info from this group
- E.g. inconvenience or unwillingness to disclose sensitive info
can occur regardless of whether the sampling method is probabilistic or non-probabilistic
Probability sampling
A sampling scheme such that the selection process is done via a known randomised mechanism
Every unit in the sampling frame has a known non-zero probability of being selected but the probability of being selected doesn’t have to be the same for all units
Simple random sampling
- Units are randomly selected from the sampling frame
- Every unit of the sampling frame has equal chance to be selected
- Sampling without replacement
Systematic sampling
A method of selecting units from a list by applying a selection interval k and a random starting point from the first interval
e.g. in an interval of k, the rth number is chosen where
1 <= r <= k
r, r+k, r+2k .. r + (n-1)k
Pitfalls of systematic sampling
Potentially under-representing the population
Stratified sampling
- Sampling frame is divided into groups called strata
- Each stratum is similar in that they share similar characteristics but the size of each stratum is not necessarily the same
- We apply SRS to each stratum to generate the overall sample
Pitfalls of stratified sampling
Require sampling frame and criteria for classification of the population into stratum
Cluster sampling
- Sampling frame is divided into clusters
- A fixed number of clusters are then selected using SRS
- All the units from the selected clusters are then included in the overall sample
Advantages of cluster sampling
Less time-consuming and less costly
Clusters as usually naturally defined so it is easy to classify a unit under a cluster
Disadvantage of cluster sampling
- Depending on which clusters are selected, we may see high variability in the overall sample if there are dissimilar clusters with distinct characteristics
- If the number of clusters sampled is small, there is also a risk that the clusters selected will not be representative of the population
Pitfalls of simple random sampling
Time-consuming; accessibility of information and sampling frame
Non-probability sampling
A non-probability sampling method is when the selection of units is not done by randomisation
There is no element of chance in determining which units are selected - down to human discretion
Convenience sampling
Non-probability sampling method where a researcher chooses subjects based on the most easy availability
* introduces selection bias
* introduces non-response bias