Statistics Flashcards
Three methods of Random Sampling
1- simple random sampling
2- systematic sampling
3- stratified sampling
State what a simple random sample is
A simple random sample of size ‘n’ is one where every individual sample has an equal chance of being selected.
- e.g. group of people are allocated a number and a selection of these numbers are chosen at random
Two methods of choosing the unique numbers when simple random sampling
- generating random numbers using a calculator, computer or random number table
- lottery sampling names on IDENTICAL tickets drawn from a ‘hat’
State what systematic sampling is
The required elements are chosen at regular intervals from an ordered list.
- e.g. if you needed a sample size of 20, and you had population of 100, you would take every 5th person in that population (100 / 20 = 5) ….. NOTE: the first person to be chosen should be chosen at RANDOM.
- e.g. if 2nd person, then the next sampled people would be 7, 12, 17, etc…..
State what stratified sampling is
the population is divided into mutually exclusive strata (males and females, age range categories etc) and a random sample is taken from each.
What should you remember about each strata sampled in stratified sampling
The proportion of each strata sampled must be the same.
-e.g. if there are 150 in a population (100 males and 50 females) and 75 were required to be sampled, then there should be 50 males and 25 females in the sample
State the formula used to calculate the number of people we should sample from each stratum
number sampled in a stratum = (number in stratum / number in population ) x overall required sample size
Advantages of simple random sampling (3)
- free of bias
- easy and cheap to implement for small populations and small samples
- each sampling unit has a known and equal chance of selection
Disadvantages of simple random sampling (2)
- not suitable when the population size or sample size is too large
- a sampling frame is needed
Advantages of systematic sampling (2)
- simple and quick to use
- suitable for large samples and large populations
Disadvantages of systematic sampling (2)
- a sampling frame is needed
- it can introduce bias if the sampling frame is not random
Advantages of stratified sampling (2)
- sample accurately reflects the population structure
- guarantees proportional representation of certain groups within a population
Disadvantages of stratified sampling (2)
- population must be clearly classified into distinct strata (strata meaning - groups/categories)
- selection within each stratum suffers from the same disadvantages as simple random sampling (not suitable when population/sample is too large + sampling frame needed)
Two types of non-random sampling
- quota sampling
- opportunity sampling
State what quota sampling is
When an interviewer or researcher selects a sample that reflects the characteristics of the whole population.
How quota sampling works
Population divided into groups according to a given characteristic.
The size of each group determines the proportion of the sample that should have that characteristic.
State what opportunity sampling is
It consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you’re looking for.
-e.g. first 20 people you meet outside a supermarket on a Monday morning who are carrying shopping bags
Advantages of quota sampling (4)
- allows a small sample to still be representative of the population
- no sampling frame needed
- quick, easy, inexpensive
- allows for easy comparison between different groups within a population
Disadvantages of quota sampling (4)
- non-random sampling can introduce bias
- population must be divided into groups, which can be costly or inaccurate
- increasing scope of study increases number of groups, which adds time and expense
- non-responses are not recorded as such
Advantages of opportunity sampling (2)
- easy to carry out
- inexpensive
Disadvantages of opportunity sampling (2)
- unlikely to provide a representative sample
- highly dependent on individual researcher
What are data/variables with numerical observations called?
QUANTITATIVE data/variables
-e.g. shoe size are in numbers
What are data/variables with non-numerical observations called?
QUALITATIVE data/variables
-e.g. hair colour, you can’t give a number to each colour
Give an example of a continuous variable
(any from)
- height
- weight
- time
Give an example of a discrete variable
(any from)
- number of people
- number given when a dice is rolled
Continuous variables can …
… take ANY value in a given range
Discrete variables can …
… take ONLY SPECIFIC values in a given range
Mode is …
the value that occurs most often.
Median is …
the middle value when the values are all put in order.
Equation for median is…
(n + 1)/2 = x
- ‘x’ being the ‘x’th value when the data set is put in order
Mean is …
the “average of the data”
Mean can be calculated using the formula:
x̄ = Σx / n
Where:
- ‘x̄’ is called ‘x bar’. This represents the mean
- Σx is the sum of all of the data values
- n is the number of data values
Mean in a, frequency table, can be calculated using the formula:
x̄ = Σ(xf) / Σf
Where:
- ‘x̄’ is called ‘x bar’. This represents the mean
- Σ(xf) is the sum of the products of the data values (‘x’) and their frequencies (‘f’)
- —— e.g. (x * f) + (x * f) + (x * f) + ….. = Σ(xf)
- Σf is the sum of the frequencies
Is the median of a set of data effected by extreme values?
No, as the extreme values are not taken into account when calculating the median from a set of data
Is the mean of a set of data effected by extreme values?
Yes, as it takes into account each value from the whole data set when calculating the mean of a set of data.
Is mode useful if in a set of data, each value only occurs once?
No, you need at least one value which occurs more times otherwise there is no value that stands out.
What is it called when a set of data has two modes?
Bimodal
What value of x would you use, when given a frequency table with class intervals (e.g. 30 - 31, 32 - 33, …etc.)?
You would take the midpoint of the class interval (for this e.g. 30.5, 32.5, …etc.).
When the mean is calculated from a frequency table, is it always going to be completely accurate?
No, it will be an estimate.
- As you’re using the midpoint of the class intervals. The true values could be any where/any one which is within that given range.
—— E.g. if interval is 30-31 in mm, midpoint is 30.5mm which you use to calculate the mean. However, potentially all the values could be 30.1mm but you cannot tell this from a frequency table. Therefore it is an estimate.
Formula used to calculate the LOWER quartile
L.Q = n/4
It will be the (n/4)th value when the data is put in increasing order.
Formula used to calculate the UPPER quartile
U.Q = 3n/4
It will be the (3/4 of n)th value when the data is put in increasing order.
What is a percentile?
It is when the set of data is divided up into 100 parts.
E.g. the 10th percentile lies one-tenth of the way through the data set.
Interpolation is when you …
… assume that the data values are evenly distributed within each class. (Go to page 26 of Stats+Mechanics Y1 book for clear example of how to interpolate)
How to calculate range from a set of data?
LARGEST - smallest = range
How to calculate interquartile range from a set of data?
UPPER quartile - lower quartile = IQR
What is the interpercentile range?
(First given percentile) - (second given percentile) = interpercentile range
Upper quartile is represented as …
Q_3
Lower quartile is represented as …
Q_1
Median is represented as …
Q_2
Variance is …
… is the average (squared) distance from the mean.
Why is the variance squared?
To eliminate all negative values of deviation (if it is below the mean)
Standard deviation is …
… is a measure of the amount of variation of a set of values.
Basically, it is how widespread the data is.
Formula for Variance
σ² = (Σx^2 / n) - (Σx / n)^2
Formula for Standard Deviation
σ = √ (σ²)
… which is just square rooting variance so…
σ = √ [ (Σx^2 / n) - (Σx / n)^2 ]
Relationship between standard deviation and variance
σ² = σ
… so to find standard deviation when you’ve got a value for variance, all you need to do is SQUARE ROOT it!
Where:
- σ² is the variance and;
- σ is the standard deviation
Formula for Variance (in a frequency table)
σ² = (Σf(x^2) / Σf) - (Σfx / Σf)^2
Formula for Standard Deviation (in a frequency table)
σ = √ (σ²)
… which is just square rooting the formula for variance so…
σ = √[ (Σf(x^2) / Σf) - (Σfx / Σf)^2 ]