Statistics Theory L5 = Statistical Sampling Flashcards
Goals of statistical sampling? (3)
- Gather information via observational study (could also be used in experiments by sampling of experimental units).
- Collect representative data, which allows us to make inferences about the intended statistical population (target population).
- Make reliable inferences (i.e., to avoid bias & get adequate precision).
Again, we refer to the diagram that illustrates the sample vs the population?
The Population-Sample-Direction-of-Inference diagram.
Egs of parameters of interest we might want to estimate? (5)
- Animal density in a nature reserve.
- Average height of students at Wits.
- Average circumference of trees in a plantation.
- The slope between two variables X and Y.
- A measure of uncertainty (SE and 95% CI).
For most of the work we do in the environmental sciences a census is generally not possible, so what do we need to get reliable inferences, avoid bias, etc? (2)
- Probabilistic sample.
- Sampling frame.
Probabilistic sample?
= selection of a sample of units based on some random mechanism.
Probabilistic sample attribute?
Haphazard, opportunistic, judgement sampling can be highly biased.
Goal of a Probabilistic sample?
To avoid bias selection of the units (as it leads to a biased estimate of parameters).
Eg of a Probabilistic sample?
Wits’ students economic status.
- Solution to it being haphazard and stuff is to get a list from the university registrar - this list is the sampling frame.
Sampling frame?
= a list of all sample units in a statistical population.
Sampling frame attributes? (2)
- In spatial sampling, one could randomly choose x and y coordinates.
- Every sampling unit has some chance of being selected.
Types of sampling designs? (5)
- Simple random sampling.
- Stratified random sampling.
- Systematic sampling.
- Cluster sampling.
- Double sampling.
Simple random sampling attributes? (5)
- We select n units from a population of N.
- Each unit has the same probability of being selected.
- Selection of each unit is independent.
- Sampling without replacement (SWOR) produces more precise estimates.
- Good to use when the attribute of interest is homogeneous.
Details of Simple random sampling? (5)
- N is assumed to be finite.
- Possible to locate & identify each sampling unit & measure variables of interest (measurement error must be much smaller than the sampling error).
- Sampling frame consists of distinct, non-overlapping sample units (has to do with the fact that each sampling unit is independent).
- Sampling units (eg, plots) can be different sizes, but they add variability & complexity to analysis.
- If possible, sample without replacement, as it produces more precise estimates.
N?
= total number of units in a population.
Thing to note about random sampling?
Can sometimes produce a clumped or patchy distribution of sampling units.
Why use Simple random sampling?
Stratified random sampling attributes? (5)
- We designate homogeneous strata from the sampling frame.
- Then we spread the sampling effort between the strata.
- We can treat the strata as domains of study (eg, to compare between them).
- Sample & generate estimates by stratum, but then combine estimates with an overall measure of uncertainty/precision.
- Good option if variability within the strata < the variability between the strata (provides more precise estimate).-
Stratified random sampling design uses a number of ways to allocate sample units among strata, what are they? (3)
- Proportional to size.
- Proportional to variability.
- Based on economic or logistical considerations.
Goal of Stratified random sampling design?
To improve precision through optimal allocation of sampling effort.
Why use Stratified random sampling?
Systematic sampling?
= we select sample units at regular intervals after a random start.
Systematic sampling attributes? (4)
- Each transect/plot is a sampling unit.
- Done to reduce bias.
- The mathematics is more complicated, but usually precision is better.
- A potential problem is if the arrangement of sample units coincide with an unknown cyclic pattern.
Egs of Systematic sampling? (2)
- A plot or transect every 50m.
- We sample every kth person in a list.
Why use Systematic sampling design?
To have a well-spread out sample.
Cluster sampling attributes? (4)
- No sampling frame available for individuals but there is for groups of individuals (eg, vegetation patches rather than individual plants).
- Each sample unit is a collection/cluster of individual elements.
- Good if variation within patches > variation between patches (provides more precision estimates).
- Consists of various stages.
Stages of Cluster sampling? (3)
- 1-stage sampling.
- 2-stage sampling.
- Multi-stage cluster sampling.
1-stage sampling?
= we randomly select clusters & then measure all elements within them.
2-stage sampling?
= involves the random selection of clusters & random sample of elements within the clusters.
Why use Cluster sampling?
I think we use it (1) if the variation within the patches is more than the variation between the patches & (2) if there is no sampling frame for individuals but groups.
Double sampling?
= where we work with two measurements that are correlated, one that is easy to measure & a second that is harder to measure but is of more interest to the scientist.
Double sampling attributes? (4)
- One set of sample unit consists of easy0to-conduct measurements & we can do many of them.
- A second smaller set consists of sample units where we do the easy-to-conduct measurement & we do the more difficult measurement that we’re actually interested in.
- We estimate a relationship between the 2 measurements using something like regression.
- We use the regression & the measurements from the 1st set of sample units to estimate the 2nd quantity of interest.
Eg of Double sampling?
We’re interested in grass biomass & how it varies across the veld, but it’s difficult to measure accurately. Grass height is easier to measure & it’s correlated with biomass.
So, explain the Double sampling eg? (4)
- Step 1: Get the 1st set of samples, which is grass height (n = lots).
- Step 2: Get the 2nd set of samples, which are grass height & clipped dry biomass (n = few).
- Step 3: Estimate the relationship in the 2nd set of samples (regression).
- Step 4: Calculate biomass for samples in 1st set of samples.
Why use Doubling sampling?
I think we can use it when our attribute of interest is difficult to measure but is correlated to a variable that is easy to measure.
2 Elements to consider when planning field logistics?
- Plot size.
- Plot shape.
What does the plot size & plot shape depend on? (2)
- The objectives of the study.
- The nature of the data/study system.
Thing to note when deciding which plot size or plot shape to use?
It might require experimenting with different plot sizes & plot shapes to find an optimal size or shape.
Criteria for deciding on plot size & shape? (3)
- Statistical procedure that gives the best precision given the cost & area of sampling.
- Ecologically, consider the best efficiency to achieve the objectives of the study.
- Logistical greatest ease to implement in the field.
4 Factors that influence plot shape?
- Detection of individuals.
- Distribution of individuals.
- Edge effects (i.e., knowing where the subject is with respect to the plot boundary).
- Data collection methods.
Types of plot shapes? (3)
- Long, narrow plots.
- Square or circular plots.
- Narrow, rectangular plots.
Long, narrow plots attributes? (4)
- Easy to lay out.
- Have a lot of perimeter with respect to plot area.
- More edge effects.
- Problems with identifying whether animals are inside or outside plots.
Square or circular plots attribute?
Have fewer problems with edge effects.
Narrow, rectangular plots attributes? (3)
- Increased detection of study subjects.
- Increase chance of intersecting clumps/clusters.
- Best for clusters (better precision in vegetation studies).
Thing to note when you have a fixed budget?
There is a trade-off between the number of plots & plot size.
For instance, what to do if you have a homogeneous population?
Use few large plots, as they will capture variability in the population.
What to do if you have a high degree of heterogeneity?
A large number of smaller plots might be necessary.
- If plots are too small, there will be too many zeros, leading to poor precision (high SE).