Statistics Flashcards
Bivariate data
Data relating to pairs of variables
Variables that are statistically related
Correlated
How do you identify correlation
Scatter graph
What goes on x axis
Explanatory or independent variable
Population
The set of things you are interested in
E.g. all people in the uk
Census
Observes or measures every member of the population
Parameter
A number that describes the entire population
E.g. the mean or standard deviation
Sample
Subset of a population
Used to find out information about the whole population
Statistic
A value calculated from a sample
E.g. the mean or standard deviation of the sample that can be used to estimate the mean of the population or standard deviation of the population
Sampling unit
An individual unit from the population
E.g. The particular person living in the uk
Sampling frame
A list of all the sampling units in the population
E.g. The electoral register for the uk
Advantage of using a sample over a census
Quicker, fewer people have to respond and less data to process
Less expensive
Advantage of using a census over a sample
Should be a completely accurate result
A sample may not be large enough to give information about small sub groups of the population
Sample disadvantage
Data may not be accurate
Sample might not be large enough to give information about small sub groups
Census disadvantage
Takes a long time and expensive
Hard to process the large quantities of data
Cannot be used if the testing destroys them
Advantages of sampling
Quick and not as expensive
Fewer people have to respond
Less data to process
Advantages of a census
Should give a completely accurate result
If you want to know the mean number of sweets in a packet of sweets, why is it not possible to use a census
Destroying all the sweets
Can’t use a census if all the sampling units are being destroyed
3 methods of random sampling
Simple random sampling
Systematic sampling
Stratified sampling
Simple random sampling method
Number all the items in the population
Use a random number generator to select sample of desired size
If a number is replicated generate another number for item to be sampled
Systematic sampling method
Number all items in the population
Let n=population size/sample size
Use random number generator from 1 to n to select the first item
Choose every nth item
Stratified sampling method
The population divided into groups
Decide how many to sample from each group using…
(Number in group/Number in population)×sample size
Use simple random sampling to select the items from each group
So it is proportional and representative
2 methods of non random sampling
Opportunity sampling/convenience sampling
Quota sampling
Opportunity sampling method
Sample consists of any items available to be sampled
Used to sample the required number from each group and once requirement is filled any further items are ignored
E.g. who walks into the frozen aisle of a supermarket
Quota sampling method
The population divided into groups
Decide how many to sample from each group using…
(Number in group/Number in population)×sample size
Sample the first “X” for each group and ignore any further items
Advantages of simple random sampling
Free of bias
Easy and cheap to implement for small populations and small samples
Each sampling unit has a known and equal chance of selection
Disadvantages of simple random sampling
Not suitable for a small population size or sample size
Large samples are expensive and time consuming and disruptive
Need a sampling frame
Advantages of systematic sampling
Simple and quick to use
Suitable for a large sample/population
Disadvantages of systematic sampling
Sampling frame needed
Can introduce bias if the sampling frame isn’t random
Advantages of stratified sampling
Sample accurately reflects the population
Guarantees proportional representation of groups within a population
Disadvantages of stratified sampling
Population must be clearly classified into discrete groups
Selection within each group suffers same as simple random sampling e.g. need a sampling frame
Advantages of opportunity sampling
Easy to carry out
Inexpensive
Disadvantages of opportunity sampling
Unlikely to provide representative sample
Highly dependant on individual researcher
Not random so may introduce bias
Advantages of quota sampling
Allows a small sample to still be representative of the population
No sampling frame needed
Quick and easy and inexpensive
Allows for easy comparison between different groups in a population
Disadvantages of quota sampling
Not random so may introduce bias
Population must be divided into groups which can be costly or inaccurate
Non-responses are not recorded as such
Increasing scope of study increases number of groups which adds time and expense
Mean
A numerical measure
Given by the Σx/n
What’s a median
A numerical measure
Given by n+1/2 for non grouped data
And n/2 for groped
Mode
Most common value
Range
Difference between the highest and lowest data value
Lower Quartile
Q1
Point that is a quarter of the way along an ordered data set
Given by n+1/4 for non-grouped data
And n/4 for grouped data
Upper Quartile
Q3
Point that is three quarters of the way along an ordered data set
Given by 3(n+1)/4 for non-grouped data
And 3n/4 for grouped data
IQR
Interquartile range
The difference between the lower and upper quartile
Q3-Q1
Variance
A measure of spread of data
σ^2=Σ(x-x̄^2)/n
Where x̄ is the mean
Standard deviation
A measure of spread of data
σ=sqrt of variance
Can you use your calculator to get the median for linear interpolation
No
Not accurate
How do you use a calculator to get the mean, median, standard deviation, variance and quartiles
Shift Menu/setup 3. Statistics Frequency on Menu/setup 6. Statistics 1. 1-Variable Input values AC (sets table) OPTN 2 (1-Variable calc)
Discrete data
Can only take certain values and can have gaps
shoe size, money, number of sweets
Median for grouped
n/2
Median for non-grouped data
n+1/2
Continuous data
Can take any value in a certain range
height, time, length
Linear interpolation assumption
Assuming that the data values are evenly distributed within each class
How do you work out standard deviation
Root of variance
Or
Page 3/4 of formula book
Where
What is coding
A way of simplifying statistical calculations
Each data value is coded to make a new set of data values that are easier to work with
Coding formula for mean and standard deviation
Mean: ȳ=(x̄-a)/b
Standard deviation: σy=σx/b
What is an outlier
An extreme piece of data which differs significantly from other observed data values
Expected formula will be given in exam
What does it mean to clean data
Remove outliers
But keep the outliers in unless told otherwise
Mark with an x if you are able to identify them
Advantage of mode
Useful for non numerical data
Not usually affected by outlier or emissions
Always an observed data value
Disadvantage of mode
Does not use all data values
May not be representative if low frequency
May not be representative if in a small population
Advantage of median
Not usually affected by outliers or errors
Disadvantage of median
Not always a data value
Does not use all data values
Advantage of mean
When data is large a few extreme values have little effect
Uses all data values
Disadvantage of mean
May not always be a data value
Affected by outliers and errors if in a small population
Advantage of range as a measure of spread
Reflects the full data set
Disadvantage of range
Distorted by outliers
Advantage of using the IQR as a measure of spread
Not distorted by outliers
Disadvantages of using the IQR as a measure of spread
Does not reflect the full data set
Advantage of using the standard deviation as a measure of spread
When data is large a few outliers have negligible impact
Disadvantage of using the standard deviation as a measure of spread
When a data set is small a few outliers have a large impact
What is a box plot
Can be drawn to represent important features of data
AKA FIVE FIGURE SUMMARY since it displays the lowest and highest values, the quartiles and the median
Can display any outliers with an x or *
When can cumulative frequency be used
For grouped data
Can be an alternative way to estimate the median, quartiles or percentiles
Do you include outliers in range
Yes
Unless told otherwise