1. Introduction to Linear Model Flashcards
What is a model?
Formal representation of a system
What are models represented as in statistics?
Models are represented as functions
e.g. height = months of age x 50 cm
Why are models represented as functions?
Function allows a model/belief about how something works in the world
Allows precise specification about what is important (argument of belief) and how it occurs (operations)
Precise specification = Prediction = Prediction is tested against real world data
If a model is a true representation then real world data would closely match
What is the difference between a deterministic model and a statistical model?
Deterministic model = For exact relationship
Statistical model = Case-by-case variability (shows difference in individual data points)
What is a linear model?
Estimating a model for a relationship
Linear model tries to explain variation in an outcome (Y axis/Dependent variable) using one or more predictor (X axis, Independent variable)
What is the basic linear model equation?
yi = β0 + β1𝑥i + ei
yi = Outcome variable
𝑥i = predictor variable
β0 = intercept
β1 = slope
ei = residual
Subscript i = Each PPT has their own value
What is the residual?
Measure of how well the model fits each data point
Distance between model line (on y axis) and data point
Residual = Positive above line and negative below line
What are the two types of outliers we can get?
Marginal - outliers along one axis (x or y)
Jointly - Outliers that don’t fit with the rest of the data
What is the principle of least squares?
Process of obtaining a line of best fit from data based on sum of squares of errors, minimum value of estimation . It predicts the behaviour of the dependent variable
What does the principle of least squares do to our data?
Minimises residuals for each data point
Doing it across all data = Predicted values are as close to actual measured values of outcome
What is the method of least squares?
Fit a line
Calculate residuals
Square residuals
Sum up squares
How do we interpret the intercept of a simple linear model?
Expected value of y when x is 0
How do we interpret the slope of a simple linear model?
Number of units that y increases for a unit increase of x
What does e ~ N(0, sigma) mean?
Distributed in a normal distribution with a mean of 0
Sigma means standard deviation (estimated using model residuals)
Residuals should be the same at any point along the x axis
What does a large sigma suggest?
Data is more spread out/further away from the line