QDA Flashcards
3 ways to numerically summarise a categorical variable
- Frequencies or counts
- Relative frequencies
- Relative cummulative frequencies
How can categorical variables be summarised visually?
Bar and pie charts
What are bar charts for and what are the y and x axis
Representing frequencies of each of the different categories, the y axis is the frequencies and the x axis are the categories
What are pie charts for?
Representing the frequencies of each of the different categories as a slice of pie
When describing the contents of a numerical variable we can look at different aspects of its distribution such as:
Measures of location such as the mean
Measures of spread and variability
Extreme values
When is a t.test used?
When variables are independent and the errors are normally distributed. Use the mean to calculate
What is Wilcoxon rank sum test?
How does it work
Non-parametric alternative to a t.test. (Used when we cannot assume a normal distribution)
Puts all measurements into one column and assigns a value to each value
What does a scatter plot do?
What to look for and how to interpret
Display two numerical variables of interest along the x axis (independent) and y axis (dependent)
Whether it has a positive relation, linear, quadratic or exponential, strong relation, clear relation or outliers
Two main types of analysis
Descriptive - Describing data using numerics or graphical
Inferential - Using sample data to make a conclusion on larger populations
What are the main data types?
Categorical - Attributes observes for sampling unit. Binary categories
Numerical - Numerical value on a discrete, ordinal or continuous
What is a confidence interval?
The likely range the mean/proportion would fall in if the exercise was repeated
P value rule
P value <= a = Reject the null (significant)
P value > a = Fail to reject null (not significant)
(P value should be less than 0.05 for any difference to be significant)
What does it mean to test a null hypothesis?
It is what you’re trying to disprove. It is the given facts
The mean has a specific value against an alternative hypothesis.
H0: u = u0
H1: u =/ u0
What are the type 1 and 2 error probabilities
a = p(type 1 error) = p(reject H0 | H0 is true)
B = p(type 2 error = p(fail to reject H0 | H0 is false)
How to test for normality
Quantile - Quantile (Q-Q plot)