stats Flashcards
null hypothesis
based on normal conditions eg 0.5 probability of heads when tossing a fair coin
alternative hypothesis
when the probability is not “normal”
significance level
you set a lower tail and upper tail with anything above or below these numbers allowing you to reject the null hypothesis
combination
doesn’t care about the order of things for example 3 cards in order A,b,c is the same as 3 cards in order b,c,a
permeantation
does care about order eg number for a safe so different orders of the same numbers are different possible answers
If you multiply all your data points by a number what happens to mean and sd
mean and sd are both multiplied by that number
if you add a number to all data points what happens to mean and sd
add number to mean, nothing happens to sd
sum of squared deviation (sxx)
sum of (x-mean)^2
sd
- root(sxx/n-1)
2. root(sum of the x^2-n x mean^2/n-1)
lower boundary outliers
- LQ-1.5 x IQR
2. mean - 2 x sd
upper boundary outliers
- UQ+1.5 x IQR
2. mean + 2 x sd
how do you know if 2 events are mutually exclusive
p(A u B)= p(A)+P(B)
p(A n B)=0
p(B|A)
p(B|A)=P(BnA)/P(A)
P(B|A’)
P(B|A’)=P(BnA’)/p(A’)
parent population
set of all possible data points from which you will draw your sample
census
When population is small enough data can be collected from every member of the population
Sampling fraction
Sample size/population size
Sampling error
The difference between an estimate derived through stats and its true value
Simple random sampling
assign everyone in the population are number and generate random numbers (shift, decimal point, x by total in pop.). Take information from those with corresponding numbers. You should take no less than 30 samples
pros and cons of simple random sampling
- equal chance of getting chosen so it will provide an accurate picture of the population and a spread
- however it’s time consuming and access to the entire population is unlikely
Stratified sampling
The population is divided into different groups which will have different information (for example tomatoes sizes with a population divided into tomato varieties). Each group should be represented in your sample asa percentage of the sample it takes up equal to the percentage of the population the group takes up
Pros and cons of stratified sampling
- Results are likely to accurately reflect the population studied and take into account a wide spread
- you can’t always divide the population into groups and sometimes members will not fit into any group or will fit into multiple
Cluster sampling
The population is in groups but there is no reason to suspect the information between groups will be hugely different so use one or more groups as the sample
Pros and cons of cluster sampling
- very easy to conduct
- clusters are likely to have been picked by human bias so limited in how representative they are
Systematic sampling
Take a sample at regular intervals for example every 10th person
Pros and cons of systematic sampling
- it’s very simple and when the population is large you’re likely to get a widespread
- makes data manipulation easier and so bias/skewed results due to targeted outcome
Quota sampling
You don’t care about groups in the population so you decide in advance how many from each group you will use
Pros and cons of quota sampling
- easy and can accommodate population proportions to improve accuracy
- unsure if you can fill quota and non-random so gives researchers a lot of say
Opportunity sampling
You use anyone you can get for example drop a net and use the cod you catch
Pros and cons of opportunity sampling
- quick and easy
- easily manipulated by only talking certain people/cold calling in one area
Self-selecting sample
Use whoever allows you to sample them
pros and cons of self-selecting sample
- can allow you to get to get a spread
- tends to be a certain type of person so can make the results exaggerated and unrepresentative