Year 1 Stats Flashcards
Discrete vs continuous data
Discrete is countable - shoe size (binomial distribution)
Continuous is measurable - shoe length (normal distribution / histograms)
Target population
All the members of the population that would ideally take part in your study
Sample
A subset of a target population
Sampling frame
A list or database of the target population
Census
Measures or observes every member of the population
Advantages and Disadvantages of census
- Completely Accurate - collects data from everyone
- Expensive/time consuming
2 Cannot be used in testing which destroys the item
3 hard to process large quantities of data
Steps to simple random
- Have a sampling frame and have a number on every member of sample
- Use random number generator to pick members
Advantages of using a sample (2)
- Less time consuming/cheaper than census
2. Less data to process
Disadvantages of using a sample (2)
- Inaccurate
2. May not give any information about small sub groups of the population
Advantages of simple random sampling (2)
- Minimises bias
2. representative of whole population
Disadvantages of simple random sampling (2)
- Need sampling frame
2. Time consuming/ expensive
What is simple random sampling
When every possible sample has the same probability of being selected.
What is stratified sampling
When the population divided into mutually exclusive strata proportional to population and a simple random sample is taken from each strata
Advantages of systematic (2)
- Quick and easy to use
2. Assures that the population will be evenly sampled
Disadvantages of systematic (2)
Need sampling frame
There may be missing values in the population
What is systematic sampling
When you chose a starting point at random then systemically select groups at a certain number apart
Advantages of stratified (2)
- Minimises selection bias by making sure no strata are over/under represented
- Frequencies for each group in the sample proportional to each group in the population
Disadvantages of stratified (2)
Need sampling frame
Strata must be clearly defined
What is quota sampling
When the population is split into groups or strata, then you select members from the group. Is non random and biased
Advantages of quota (2)
- Don’t need sampling frame
2. Frequencies for each group in the sample can be proportional to each group in the population
What is opportunity sampling
Taking a sample from the population who are available at the time the study is carried out. Is non random and biased
Advantage of opportunity sampling
Easy to select sample
Formula for stratified
Target population/ whole population * sample size
Measuring outliers
LQ - 1.5(IQR)
UQ + 1.5(IQR)
In a box plot diagram what does it mean if group A median is larger than group B median
On average group A gets higher results
In a box plot diagram what does it mean if group A IQR is larger than group B IQR
Group A is less consistent in the results as data more spread out
Frequency density
Frequency / Class width
In Area, F = kA
Independent vs dependant
Independent does not rely on the other variable whilst dependant does. Independent on x axis
What an upwards very straight line says about correlation
It’s a strong positive correlation. When one variable increases so does the other.
What is correlation
Describes a linear relationship between two variables
What is bivariate data
Data which has pairs of values for two variables
PMCC
Measures how correlated a data set is
What does ‘b’ tell you in formula
y =a+bx
The change in y for each unit change in x
Why extrapolation unreliable
Doesn’t take into account limits to data
mean calculation
Sum x / n or Sum fx / n
How to find the median point and quartiles from 8 values of discrete data
8+1=9
9/2 = 4.5 so halfway between the 4th and 5th value
To find lower quartile find the median of the lowest half (4 values) of the data
If grouped continuous data, how would you find mean
Find midpoint of each class width and plug into calc with frequency then press 1-Var
If grouped continuous data, how would you find median
Frequency / 2 then use interpolation
Rule of thumb for choosing which set to use for linear interpolation
Always go set up unless right on the boundary, then use set down
What is standard deviation
A way of measuring how varied the data is from the mean
What does it mean if group A standard deviation from the mean higher than group B
data points are on average further apart and so less consistent
Meaning of Sxx and Sx
Sxx is the sum of the squares
Sxx = sum of (x - x(bar))^2.
Sx tells us the standard deviation of the sample.
Sx = Square root Sxx / n-1
Standard deviation from summary statistics
Square root Sum of x^2 / n - x (bar)^2
Boundaries for outliers using standard deviation
X(bar) - 2sd
X(bar) + 2sd
What to do if you have a constant k in a discrete random variable distribution
P(X=x) = 3k(4-x)(x^2+1).
x = 0,1,2
Substitute 0,1,2 into function for x
All equations add to 1
Work out k from that
What does a uniform distribution mean
All variables have same probability
What is a probability distribution
Describes the probability of any outcome in a sample space
Random variable
A variable whose value is determined by a random experiment
How to do binomial distribution on calculator for multiple values
Go bpd and plug in numtrial and probability values. Then press List 1 rather than variables to find individual values.
What is hypothesis testing
Building evidence for a case against the nil hypothesis
What does reducing the significance level on a hypothesis test mean
less evidence is needed to pass hypothesis test
What is the significance level
The probability of incorrectly rejecting the nul hypothesis
What does it mean if PMCC gets closer to 1 or -1
It’s getting closer to perfect positive correlation and perfect negative correlation
The conditions under which it is appropriate to assume a random variable has a binomial distribution
There are n independent trials