1 Statistics Basics Flashcards
Simple random sampling
Randomly selecting from everybody in the sample.
Stratified sampling
Creating different groups/strata and picking from each proportionally (to the overall group). Usually a large strata.
Systematic sampling
Chooses by selecting every nth term. The attribute being studied should be randomly distributed.
Convenience sampling
Based on ease of selection. E.g: people physically closer to you are more likely to be picked than someone in the back row who you can’t even really see.
Cluster random sampling
Divides population into different coherent areas then randomly select areas to assess.
Snowball sampling
Finding people who are suitable for the study and then asking them to refer others they know who would also be suitable for the study.
What is probability sampling
epresentative of the population as every individual has the same probability of being selected
For symmetric data we use…
mean and SD
For asymmetric data we use…
median and IQR
When a z scores used
when the values in question do not fall on specific reference ranges of the 68 rule.
Steps of a basic z score
- calculate the z scores.
- Search it in the table to find the corresponding area above these values.
- Use the overlap of area to find only the desired area.
What is a t distribution
Like normal distribution but takes into consideration degrees of freedom.
flatter/longer than a normal distribution peak.
- inc degrees of freedom
- inc sample size
the T distribution becomes more like the normal distribution.
What is degrees of freedom
(the number of data values that can change)
What is the central limit theorem
As n, the population, of a sample increases, the sample data is less likely to be skewed (more people = more likely outliers etc.).
The more samples we include on the mean distribution graph, the more it will look normally distributed, even if the initial data is skewed.
What is standard error
the standard deviation of the sampling distribution
Why is hypothesis testing used
analyse if the results in a sample are due to chance and if they are similar to the total population the sample came from.
WE ONLY TALK ABOUT THE
NULL NOT THE ALTERNATE.
What is a type 1 error
reject the null hypothesis even though it is true.
represents observations of the null hypothesis due to chance.
We use a p value to test this error.
What is a type 2 error
keep the null hypothesis even though it is false.
What does a p value represent
It represents the times an observance was due to chance
OR
the probability of a type 1 error occurring.
p>0.05 =
a lot of chance involved,
likely a type 1 error will occur,
insufficient evidence to reject H0.
p<0.05 =
not much chance involved,
type 1 error unlikely,
statistically significant to reject H0,
What does a 95% confidence interval represent
represents the interval of values that we are 95% confident will contain the sample statistic representing the whole population (not just the sample used).
If confidence interval contains the sample statistic value represent the null hypothesis then we cannot reject the null hypothesis.
How do u find a t multiplier
from the table