lecture 1 & 2 Flashcards
estimates
claims about the population
why is statistics important
- Statistical models also help us draw inferences from huge datasets
- Uncertainty → inferences always come with some level of uncertainty - statistics allows us to measure this uncertainty
- Gives tools to process info in a principled manner in order to draw inferences (claims) about the world
spurious correlations
occurs when two variables are statistically related but not directly causally related. These two variables falsely appear to be related, normally due to an unseen, third factor.
e.g. eating ice cream increases the chances of being involved in a shark attack
third factor –> being at the beach
causes of spurious correlations
- coincidence
- confounding variables
- small sample size
- overfitting
overfitting
occurs when a statistical model fits exactly against its training data. When this happens, the algorithm unfortunately cannot perform accurately against unseen data, defeating its purpose.
When the model memorizes the noise and fits too closely to the training set, the model becomes “overfitted,” and it is unable to generalize well to new data.
variables
a characteristic of a concept that takes on different values from one case to another or, for a given case, from one time to another
types of variables
- nominal
- ordinal
- quantitative
nominal variables
Different categories, no natural ordering (one category is not “more” than another)
E.g. Religion - Continent - Colours - Party - Etc.
blue is different from red.
blue is more than red. → nonsensical
ordinal variables
Different categories, with a meaningful ordering.
The distance between the two categories is not meaningful.
E.g. very dissatisfied < dissatisfied < neither dissatisfied or satisfied < satisfied < very satisfied
quantitative variables
Different categories, with a meaningful ordering AND the distance between two categories, have a meaning.
e.g. Number of votes, Temperature (Degrees), GDP (€)