Lecture 5 - Statistical Tests I: Chi-Squared Flashcards
what is the mean and standard deviation of standard normal distribution (z-distribution)?
has a mean of 0 and a standard deviation of 1
how can we compare all normal distributions to the standard normal distribution?
• converting our y into a number of standard deviations from the mean
• finding the probability with which this value lies in a range
what type of experiments can test causation?
experiments that manipulate the explanatory variable can test causation
what should be considered when trying to select the correct statistical test?
what type of response variable? - continuous, discrete/count, proportion, binary?
what type of explanatory variable(s)? - continuous, discrete/count, proportion, binary, categorical?
interpreted in differences or trends/relationships?
paired or independent sample?
normal distribution = parametric tests:
- powerful & easy to interpret
- use means
- require data (or residuals) to be normally distributed
- often require similar variance in groups
- can be used to answer complicated questions
non-normal distribution = non-parametric tests:
- less powerful, more conservative
- use medians (data usually ranked before test)
- usually no assumptions about distribution of data
- robust
- often restrictive, cannot answer more complex questions
how can you check for normality?
graphically using histograms or quartile plots or via formal testing e.g. Shapiro-Will test
how can you tell if your values are of a normal distribution in R-Studio?
through using the shapiro-wilk command:
shapiro.test(variable name)
how do you calculate the expected frequencies with a data set?
you expand your results table and you calculate how many of each characteristic is present (e.g: amount of ppl with brown hair, blue eyes etc)
then you calculate how many individuals are in your study altogether
then to calculate the chance of having two specific characteristics you do: (amount of ppl with characteristic one / total amount of people in study) x (amount of people with characteristic 2 / amount of people in study) x amount of people in study
chi-squared tests for:
an association between categorical variables (y count, x categorical)
in what instance would you choose to create a Chi-Squared contingency table?
where response variable (y) is count (observations) and explanatory variable (x) is categorical
how do you conduct a chi-squared test of association?
we need to compare observed frequencies to expected frequencies to see if two values are independent - need to work with probabilities
X^2 = ∑ (observed - expected)^2 / expected
what do we do once we have our value for chi-squared?
we must compare our statistical value from the chi-squared test with a critical value - if our value is larger than the critical value, then it is unlikely that the null hypothesis is true (i.e. p < 0.05)
in a pearson test, what can you infer from your R value and P value?
R values give you an idea about how your correlation looks in a scale of [-1 → 1], where -1 is a very negative correlation and +1 is a very positive correlation
the P-value then gives you information regarding wether the relationship between the two variables is of a significant difference - if the value is smaller than 0.05 it is significant as there’s a less than 5% chance that the difference is due to chance
how can you check for normally distributed data using the R-Software?
through using the shapiro-wilk test command where: shapiro.test(name of your variable) will give you a W and a P value, if your p-value is smaller than 0.05 we can accept the alternative hypothesis