quiz2 Flashcards
there are three types of data: quantitative, ordinal, and nominal. describe them
quantitative: numeric values with magnitude (think numbers)
ordinal: values or categories that can be ordered (think grades)
nominal: vales or categories that cant be ordered(think colours)
what is the point of inferential statistics
to use well chosen samples to come to a probably correct conclusion about the population
what is a probability distribution
the description of probabilities for all possible outcomes
what is covariance. when is it a + covariance and when is it a - covariance
it describes how two variables are related.
+: large y with a large x
-: large y with a small x
in inferential statistics we need to form a hypothesis: we need a null hypothesis and an alternate hypothesis. What’s the difference. What do we need to remember about hypothesises
Alternate hypothesis is usually what we are hoping to conclude, null hypothesis is the opposite.
these two hypothesis have to cover all possibilities
we assume the null is true and look for the data to force us to conclude that it isn’t. If it isn’t true then we have a proof by contradiction and we can assume the alternate is true.
we can never conclude the null is true. We can only falsify it
what does the T-test do and what is it’s null hypothesis
if we have two samples which are both normal and
with equal-variance, the T-test will tell us if the distributions have different means
MUST BE normally-distributed and equal-variance
null hypothesis: means are equal
what does the p-value tell us
all inferential tests end up with a probability. which is the probability of seeing our data if the null hypothesis is true. Alternatively you can think of it as, if the p-value is small we can reject the null hypothesis and accept the alternative hypothesis.
if smaller than 0.05 we reject the null hypothesis. If greater than 0.05 we do not reject it.
what do you do if you don’t know if a distribution is normal or not
use stats.normaltest
where null hypothesis is that: data is normal
stats.normaltest(data).pvalue
if the pvalue returned is >0.05 we can conclude it is normal since we cannot falsify the null hypothesis
what do you do if you don’t know if two distributions have equal variance
use the levene’s test
which has a null hypothesis: two samples do have equal variance
stats.levene(data1, data2).pvalue
if p-value >0.05 we can assume they do have equal variance because we cannot falsify the null hypothesis
we can transform data if it isn’t normal to make it normal enough.
Assuming all data are greater than 0, what are the 4 ways you can transform data and when would they be useful
e^x (if data left-skewed, longer on left)
x^2 (if data left-skewed, longer on left)
root(x) (if data right-skewed, longer on right)
log(x) (if data right-skewed, longer on right)
what is the issue with doing t-tests on 2+ datasets
what should we do to prevent the issue
if you do multiple t-tests, it increases the likely hood that there is an incorrect rejection of the null hypothesis
instead you should use the Bonferroni correction, where you choose a threshold of 0.05/(num of t-tests conducted)
What is ANOVA and its purpose
to test if the means of any groups differ, it is like a t-test but for +2 groups
Musts be
- observations must be independent and identically distributed
- normally distributed
- equal variance
Null hypothesis: groups have the same mean
What does ANOVA not tell you. What do you need to do to find out.
ANOVA tells you that there is a difference in the means (if there is) but we don’t know which groups have the different mean.
Use Post Hoc Analysis, only if we have a ANOVA value of less than 0.05.
ie. the groups do not have the same mean
How do we use the Post Hoc Analysis: Turkey’s HSD and what does it return
use panda.melt to get the data in a format that you want (unpivoted data), and then you can use the post hoc Turkey test
it returns a list of pairs and tells us if they have different means. Reject column tells us if true, they are different
give an example of where you would want to use a one sided tail test rather than two.
what is a way you can conduct this test
if you only want to determine if there is a difference between groups in a specific direction. (ir. will studying get me a better grade)
conduct ur test and look at the p value, we change the signifigance level to 0.10
a two sided test where p < 0.10 is the same as one sided test is the same as a one sided test where p < 0.05