Hypothesis Testing in R Flashcards
independent samples t-test code
p value definition
Assuming that there the null hypothesis is true (i.e.; that there is no difference between the groups), what is the probability that we would have gotten a test statistic as far away from 0 as the one we actually got?
It’s a bullshit detector aimed at the null hypothsis. If the p value gets too small, the bullshit detector goes off
Does the p-value tell us the probability that the null hypothesis is true?
No!!! The p-value does not tell you the probability that the null hypothesis is true. In other words, if you calculate a p-value of .04, this does not mean that the probability that the null hypothesis is true is 4%. Rather, it means that if the null hypothesis was true, the probability of obtaining the result you got is 4%. Now, this does indeed set off our bullshit detector, but again, it does not mean that the probability that the null hypothesis is true is 4%.
htest
R stores hypothesis tests in special object classes called htest. htest objects contain all the major results from a hypothesis test, from the test statistic (e.g.; a t-statistic for a t-test, or a correlation coefficient for a correlation test), to the p-value, to a confidence interval.
different h tests necessitate data to be loaded into the function in different formats (vectors/dfs or tables)
names()
returns all of the elements in the h.test object
one sample t-test
you can pull data from a df or from separate vectors, it doesn’t have to come from a table() function
t tests compared to each other in bar chart form
you can pull data from a df or from separate vectors, it doesn’t have to come from a table() function
Using subset to select levels of an IV
use the %in% argument to specify which levels of an IV you want to test
cor.test()
two ways to run a correlation test
To run a correlation test between two variables x and y, use the cor.test() function. You can do this in one of two ways, if x and y are columns in a dataframe, use the formula notation (formula = ~ x + y). If x and y are separate vectors (not in a dataframe), use the vector notation (x, y):
you can pull data from a df or from separate vectors, it doesn’t have to come from a table() function
example correlation test
using subset() in the cor.test() function
Just like the t.test() function, we can use the subset argument in the cor.test() function to conduct a test on a subset of the entire dataframe. For example, to run the same correlation test between a pirate’s age and the number of parrot’s she’s owned, but only for female pirates, I can add the subset = sex == “female” argument:
chisq.test()
used to determine whether there is a significant association between two categorical variables
you must create a table of data to feed a chisq.test function
this example has one nomial variable, and we are testing to see if the likelihood is equal that a pirate would attend either school.
2 sample chisq.test()
If you want to see if the frequency of one nominal variable depends on a second nominal variable, you’d conduct a 2-sample chi-square test.
apa-style conclusions using the apa() function
you can have R take raw h.test results and extract only the relevant data in APA style for you using this function