U1 - collecting data Flashcards
quantitative data definition
numerical data (numerical observations or measurements)
qualitative data definition
non-numerical data (non-numerical observations)
continuous data definition with examples
how can this data be represented?
can take any value on a continuous numerical scale. grouped data with inequalities.
e. g.
- height
- weight
- temperature
- length
can be represented by:
- histograms
- cumulative frequency curves
- line graphs
discrete data definition with examples
how can this data be represented?
can only take particular values. grouped data with no inequalities
e. g.
- the number of students in a class
- shoe size
- the number of languages an individual speaks
can be represented by:
- bar charts
- CF step polygons
categorical data definition with examples
how can this data be represented?
can be sorted into non-overlapping categories.
e. g.
- race
- sex
- age group
can be represented by:
- frequency tables (normal freq tables, relative freq tables, cf tables)
- pie charts
- bar charts
ordinal data definition with examples
how can this data be represented?
like categorical data but can be written in order and given a rating scale
e. g.
- spicy scale (plain, mild, medium, hot, extra hot)
- income level (low income, medium income, high income)
- satisfaction level (extremely dislike, dislike, neutral, like, extremely like)
can be represented by:
- bar charts
- pie charts
- tables
bivariate data definition with examples
how can this data be represented?
involves a pair of related data, helps you study correlation between two variables
e. g.
- how temperature affects the state of an ice cream (two variables are temperature and ice cream)
can be represented by:
- scatterplots
multivariate data definition with examples
how can this data be represented?
involves sets of 3 or more related data values. involves multiple dependent variables that result in one outcome
e. g.
- predicting the weather (multiple factors like pollution, humidity, precipitation, etc)
can be represented by:
- radar charts
population definition
everything or everybody that could possibly be involved in an investigation
census definition
a survey of a whole population
sample definition
a smaller number of items from the population
biased sample definition
not representative of everyone in the population
sampling frame definition
a list of people/items that are to be sampled
advantages of primary data (3)
- accurate
- collection method is known (because its your own)
- you can find answers to specific questions
disadvantages of primary data (2)
- time consuming
- usually expensive
advantages of secondary data (4)
- cheap
- easy
- quick
- data from some organisations can be more reliable than data collected yourself
disadvantages of secondary data (5)
- method of collection is unknown
- data may be out of date
- may contain mistakes
- may come from an unreliable source
- may be difficult to find answers to specific questions
advantages of a census (3)
- unbiased
- accurate
- takes the entire population into account
disadvantages of a census (4)
- time consuming
- expensive
- lots of data to manage
- difficult to ensure the whole population is used - if some are missed, the survey may be biased
advantages of a sample (3)
- cheaper
- quicker
- less data to consider
- easier to get hold of all the required information
disadvantages of a sample (2)
- may be biased
- not representative of the entire population - each possible sample will give different results, so the one selected might not accurately reflect the population
impact of a sample size on reliability and replication
the bigger the sample size, the better the estimate of the population parameters
what is the peterson capture-recapture method
a way of estimating the size of a population, usually dealing with wildlife.
peterson capture-recapture: population size formula
population size = (number in 1st sample x number in 2nd sample) / number in 2nd sample that are marked
A fish farmer wants to estimate the size of his fish stocks. He nets 142 fish and marks them with a special ink. The fish are released back into the fish farm. A month later he nets 127 fish and finds that 6 of them are marked.
a) Estimate the size of the fish population at the fish farm.
b) What assumptions are made in obtaining this estimation?
a) Working out:
(142x127) / 6 = 3005.6666667
population size = 3000 to 2.s.f
b) Four possible answers:
- That the population does not change between capture and recapture.
- That the sampling method is identical.
- That the capture and marking does not have an effect on the population.
- That the percentage of fish marked on the recapture is accurate. This is unlikely to be true as it is random. There could have as easily been any number from 1 to 10 marked fish.
To estimate the size of the population of Caribou in a national forest in Canada, 100 Caribou are trapped at different locations through the forest (capture), and tags fitted to their ears. A week later another 100 Caribou are trapped (recapture). It is found that 4 of these have tags on their ears. Estimate the population of Caribou in the forest.
Working out:
(100x100) / 4 = 2500
population size = 2500
why is it important to make sure the sample is as similar to the population as possible?
so that it is representative. otherwise, it may be biased, and conclusions about the population based on your sample may not be correct
how to avoid sampling bias (3)
- select from the correct population and make sure no member of the population is excluded
- select your sample at random - if members are linked in some way, it can cause bias
- make sure all your sample members respond