Statistics - Chapter 1 - Data Collection Flashcards
Population
Whole set of items that are of interest. Information can be obtained from a population
Raw data
Unprocessed information
Data vs information
Data: collection of raw unorganised facts
Information: collection of processed, organised facts placed into context
Census
- Observes/measures entire population
- Pros: should give completely accurate result
- Cons: time consuming, expensive, cannot be used when testing process destroys item, hard to process large quantity of data
“Testing process will destroy…, so a census would destroy all the…”
Sample
Selection of observations taken from a subset of the population to use to find information about population as a whole
Pros: less time consuming n expensive than a census, less people have to respond/less data to process than a census
Cons: data may not be as accurate, sample may not be large enough to give information about small sub-groups of the population
How the size of the sample can affect the validity of any conclusions drawn
- size depends on required accuracy + available resources
- larger the sample, more accurate it is + more accurate predictions, but greater resources needed
-if population is very varied, need large sample than if population were uniform
– as natural variation in pop: different samples -> different conclusions
“they could take a larger sample, for example… this would give a better estimate of the overall proportion of…”
“full coverage”
Sampling units
Individual units of of a population e.g an university student/house. Often individually named or numbered to form a list (sampling frame - list of units a sample can be drawn from) e.g list of university students/total number of houses in the locality//phone book/a map/electoral roll
Random sampling
-every mem
-equal chance
-of selection
-sample representative of pop
-removes bias
simple random sampling, systematic sampling, stratified sampling
Simple random sampling
- every sample of same size has an equal chance of being selected
- no bias, ez cheap implement for small, each s unit known equal selecton chance,
- frame needed, large= not suitable (time expense disruptive)
- frame
- each member in frame allocated a unique number from 1 to pop size
- selection of these numbers chosen at random for n sample size
- by generating w random number generator/calculator/computer/random number table or by lottery sampling (members are written on tickets and placed in a hat, required number of tickets drawn out).
- go back to pop, select mems corresponding to the generated nums
random number table
- assign unique digit identifies e,g 3-digit
so 000, 001…
-work along rows of random number tables generating 3-digit numbers
Systematic sampling
- required elements chosen at regular intervals from an ordered list.
- simple, quick to use, for large
- frame needed, introduce bias if frame not random e.g MFMF, patterns in sample data might occur when taking every _ person
’- allocate a number from 1 to pop size
- use a random number generator to select the first person from 1 to interval calculated
- “Select every (interval calculated)th person thereafter.”
e.g first person chosen random at 2, remaining would be 7,12,17 etc for interval 5th
Stratified sampling
- pop divided into mutually exclusive strata e.g F and M, random sample taken from each.
- sample accurately reflects pop structure, proportional representation of groups within pop guarantee
- clearly classify pop into distinct strata, each stratum selection = same CONS of simple
stratified sample for that strata = (stratum size/pop size) x req overall sample size
e.g working out layout
cricket : 121/370 x 30 = 9.8 ≈ 10
Quota sampling
- interviewer selects a sample that reflects the characteristics of the whole population. pop / into groups according to given chars. size of each group determines proportion of sample that should have that chars. meet, assess their group and allocates them into the appropriate quota. continue until quotas filled.
- allows small sample to be still representative of pop, no frame required, quick ez, allows for ez comparision between diff grps in pop
- non random so bias. pop must be divided into group (costly, inaccurate ++ increasing scope -> +groups -> +time +expense), non-responses not recorded
Maddison has a list of 210 pupils, and wants to find out which musical instrument they prefer listening to amongst the flute, the clarinet, the guitar and the saxophone. To take a sample of size 30, Maddison surveys the first 15 girls and the first 15 boys to arrive at the school.
non-responses elaboration: means that the people who refuse to participate or cannot be reached which can affect the representativeness of the sample + not included in the sample, potentially introducing bias.
Opportunity/convenience sampling
- taking sample from people available at the time the study is carried out and who fit the crit
- ez to carry out, cheap
- unlikely to provide representative sample, highly independent on individual researcher (time, place)
“sample is likely to be biased towards … who …”
“improvements by interviewing ppl at diff locations + times, + increase sample size”
types of data
quantitative: associated with numerical observations
qualitative: associated with non-numerical observations
continuous variable: can take any value in a given range e.g height or time
discrete variable - can take only specific values in a given range e.g number of people cant be 2.65