Collecting and interpreting data Flashcards
2 definitions for outliers
UQ or LQ + or - 1.5 IQR
or mean +- 2 standard deviations
how do you display outliers on a box plot
add x symbols where they are outside of the end bar line
Categorical data displays
Bar chart, pie chart, dot plot
Discrete ungrouped data displays
Stem and leaf, vertical line chart
Ranked data displays
Stem and leaf, box plot
Discrete grouped
Bar chart
Bivariate data displays
scatter graph, line of best fit
Continuous data displays
Frequency chart, histogram, cumulative frequency curve
Bivariate/ multivariate data
data with two or more data points associated e.g age, weight, height of students
Sample vs population
Population is the whole set of items of interest, sample is a (hopefully representative) subset of a parent population
Sampling frame
a list or other representation of all items able to be sampled
what is a census and when can it not be used
100% data from entire population
cannot be used when sampling requires destruction of some sort - e.g looking for 2 yolked eggs
Simple random sampling
Every unit has an equal chance of being picked e.g via random number generator
+ No bias, easy, equal chances
- Need sample frame, not useful with large populations
Systematic sampling
Pick units at regular intervals throughout a population
+ simple and quick, good for large populations
- needs a sampling frame, can have bias if sample frame not randomly ordered
Stratified sampling
Split into strata then pick a proportional amount randomly from each group
+ Most representative
- Same drawbacks as simple random and only works when clear strata divisible
Quota sampling
Divide into strata then pick just enough from each
+ No sampling frame required, quick and easy, representative (to an extent)
- Bias may present, hard with more groups, non-responses not recorded, requires clear strata
Opportunity sampling
Find people on the spot when a sample is readily available
+ Easy, cheap
- unrepresentative, bias, dependent on specific methods of samplers
Cluster sampling
split into strata then sample from a portion of said strata
+ quick and easy, somewhat representative
- risks being unrepresentative, possible bias, clear strata required
self-selected/ volunteer sampling
those who take part actively choose to e.g online surveys and adverts
+ reduces chance of non-responses, less active seeking
- bias, unrepresentative, some people may partially answer causing bias
Standard deviation formula
root((sum of x^2 - n(bar x^2))/n - 1)