Lesson 4.1: Intro to Regression Flashcards
density curve in Excel
Week 3 Homework
- In Column ‘A’, generate series of numbers from -3 to 3 with an increment of 0.1 (z-values)
- In Column ‘B’, compute ‘Individual p-value’ by using NORM.S.DIST(z-value,FALSE) function.
- ‘TRUE’ = cumulative p-value, ‘FALSE’ = individual p-value. - Select column ‘A’ and ‘B’. Create scatter plot with column ‘A’ on x-axis, and column ‘B’ on y-axis.
density curve in R
Week 3 Homework
- x <- seq(-3, 3, 0.01)
- y <- dnorm(x)
- curve <- plot(x, y, type = ‘l’)
linear regression
- equation that fits into given observations
- y = mx + b OR y = β1x + c
2 Variable Regression
- How a response variable “y” changes as the predictor (explanatory) variable “x” changes
Multiple Regression
- How a response variable “y” changes as the predictor (explanatory) variables “x1”, “x2”, … “xn” change
single-variable polynomial regression
y may curve - not linear when polynomial
- 𝑦=𝑐+𝑎1𝑥
- 𝑦=𝑐+𝑎1𝑥+𝑎2𝑥2
- 𝑦=𝑐+𝑎1𝑥+𝑎2𝑥2+𝑎3𝑥3 𝑦=𝑐+𝑎1𝑥+𝑎2𝑥2+𝑎3𝑥3+𝑎4𝑥4
- 𝑦=𝑐+𝑎1𝑥+𝑎2𝑥2+𝑎3𝑥3+𝑎4𝑥4+⋯𝑎𝑛𝑥𝑛
Ordinary Least Squares
(OLS)
The least squares regression line of y and x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible
OLS
residual
- distance from the perfect line to the given point
- compute and square that distance
- doesn’t work well with high-dimensional data (many independent variables)
machine-learning methods
2
- supervised - response variable
- unsupervised - no response variable
supervised machine learning
Model development, Model verification, Model deployment
- split data into training and test data (eg. 80/20)
- Model Development: take training data, use modeling technique (eg. regression) to build model (equation)
- Model verification: test the model on test data with known response value
- Subject to training and test/generalization errors
supervised machine learning
generalization error
- Single split model assessment methodology
- The model is tested on hold out sample
- Only the hold out sample accuracy is reported