L18 - Regression 2 Flashcards
what are the two types of models and how do they differ?
deterministic models have no randomness within them
probabilistic models have randomness
both describe the relationship between variables.
describe what a deterministic model is and provide an example.
They hypothesise the exact relationship between two variables. they are suitable when prediction error is negligible.
- an example is a linear graph of distance and time to produce speed.
Describe a Probabilistic model and provide an example.
probabilistic models hypothesise two components
- deterministic
- Random error
Example: Income varies among based on education level.
within the line of best fit regression model what does B1 represent?
B1 estimates the amount of change in Y given by a change of 1 in X. (AKA the slope )
within the line of best fit regression model what does B0 represent?
B0 represents the value of Y given that the value of X is zero (AKA the Y-intercept)
what are the four cautions of regression?
- Spurious relationships
- Extrapolation
- Generalisation
- Causation
what is a Spurious correlation in Regressions? and why is it a problem?
This occurs when the mathematical relationship between the two variables are not actually directly linked and have no relationship.
what is Extrapolation in terms of a regression model? and why is it a problem?
Extrapolation occurs when inference about the line of best fit is made outside where the data-points lie. this can be a problem because the relationship between the variables could change with higher and lower values of X.
what is Generalisation in a regression model? why can it be a problem?
generalisation occurs when a conclusion is drawn from a small dataset and applied to a larger population. This can be a problem because the sample may not accurately represent the population.
what is an Outlier? and why can it be a problem within a regression analysis ?
outliers are data points that are significantly different from the other data points within a sample. They can be a problem because they can skew the line of best fit and reduce correlation r^2 values
what is an Influential point within a dataset?
it is a point that significantly affects the line of best fit.
how does the removal of an influential point within a dataset affect the line of best fit?
it changes the slope of the line (B1)
Are all influential points outliers?
no