SECTION 11- DATA PRESENTATION AND INTERPRETATION Flashcards
what is meant by the term population
- it’s all the individuals/ set of individuals you’re interested in for a particular investigation
- can be difficult to investigate an entire population, so you could choose a sample
what is simple random sampling
- items in sample are chosen by random e.g. using a random number generator
- every possible sample has the same probability of being selected
advantages and disadvantages of simple random sampling
Advantages:
- everyone has a chance of being selected
Disadvantages:
- won’t always be possible, because you might not have a list of every member of the population
what is cluster sampling
- if the population is divided into subgroups which are reasonably representative of the entire population
- so cluster sampling means taking the sample from a few of these subgroups e.g. if you want to take a of Yr11 students, you might take a sample from 2 or 3 different schools
what is opportunity sampling
- when individuals are chosen to be part of a sample as opportunity rises e.g. interviewing passer-bys on the street
what is stratified sampling
- when the parent population is divided into subgroups (or strata) like by age or gender
- stratified sampling ensures that all strata are sampled – subgroups aren’t expected to be a representation of the population
- if numbers sampled from each strata are proportional to size of the strata – this is proportional stratified sampling
what is quota sampling
- similar to a stratified sample
- but the number of data items in each stratum are specified e.g. certain no. of males and females may be required
- method used by interviewers and selection of sample members is up to interviewer
what is a self-selected sample
- individuals in the sample have chosen to be in the sample e.g. the respondents to a survey posted publicly on the Internet
what is systematic sampling
- choosing individuals to form a sample e.g. the parent population was all the Yr11 students in a school – you might obtain an alphabetical list and select every other student on the list
examples of samples prone to bias
Opportunity sampling:
- if you survey passer-bys in the middle of the street on working days – sample may include a disproportionate no. of retired people
Self-selected sample:
- when posting a survey on a website, the visitors might not be a representative of the population as a whole or they might already hold a strong opinion about the subject of the survey
equation for combing means
mean = total / number so to find total it’s mean x n
- to combine means it’s the total(1) + total(2)/ n(1) + n(2)
when is interpolation used
- to calculate the median and quartiles of grouped data
how to work out the median in grouped data when there are gaps
- assume numbers have been rounded to nearest whole number, so change them so there’s no gap e.g 21- 25 becomes 20.5- 25.5
- then calculate the cumulative freq. and find the median
calculating the median (in interpolation)
lower class boundary + (how many in/ group total) x class
advantages and disadvantage of quota sampling
ADVANTAGES:
- cost effective and is easy to conduct
DISADVANTAGES:
- doesn’t allow random selection of participants, so won’t be accurate
- can be biased as individuals chosen is up to interviewer (may choose people more co-operative)
how to calculate the variance (σ²)
mean of the squares - square of the mean
how to calculate the standard deviation (σ)
- it’s the square root of the variance
what does the area in a histogram represent
frequency
what is bivariate data
- data that has pairs of values for two variable
- it can be represented on scatter diagrams
how to plot on a scatter diagram
- independent variable (something researcher can control) is plotted on the x-axis
- dependent variable (something measured by researcher) is plotted on the y-axis
difference between negative and positive correlation
- negative correlation is when one variable decreases when the other increases
- positive correlation is when one variable increases with the increase of the other variable (both variables are increasing)
what is a casual relationship (between variables)
- when a change in one variable causes a change in the other
what is standard deviation
finding out how far away the numbers are from the mean
what is uniform distribution
- when all the outcomes are equally likely
what are measures of central tendency (and examples)
- it helps you find the middle/ average of a data set e.g. mean, mode and median
how to calculate the mean in a cumulative freq. table
sum of the products of the data values and their frequencies/ the sum of frequencies
why is the mean good and bad for quantitative data
- it uses all values in the data, so it gives a true measure of data, but it’s affected by extreme values
what are percentiles
- they split the data into 100 parts e.g. 10th percentile is 1/10 of the way through the data
what is interpolation
- estimating unknown values that fall between known values
examples of measures of spread
- range
- interquartile range
- interpercentile range
- variance
finding outliers general formula
If it’s …
Greater than: Q₃ + k(Q₃ - Q₁)
Less than: Q₁ -k(Q₃ - Q₁)
(K WILL BE GIVEN IN THE EXAM – exam may have different ways of identifying outliers, so will be told what method to use)
what is data cleaning
- process of removing anomalies from the data set, can do it by drawing box plots
what is an outliers
- a value that falls either 1.5x interquartile range above the UQ or 1.5x interquartile range below LQ