Lesson 4.1: Intro to Regression Flashcards

1
Q

density curve in Excel

Week 3 Homework

A
  1. In Column ‘A’, generate series of numbers from -3 to 3 with an increment of 0.1 (z-values)
  2. In Column ‘B’, compute ‘Individual p-value’ by using NORM.S.DIST(z-value,FALSE) function.
    - ‘TRUE’ = cumulative p-value, ‘FALSE’ = individual p-value.
  3. Select column ‘A’ and ‘B’. Create scatter plot with column ‘A’ on x-axis, and column ‘B’ on y-axis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

density curve in R

Week 3 Homework

A
  1. x <- seq(-3, 3, 0.01)
  2. y <- dnorm(x)
  3. curve <- plot(x, y, type = ‘l’)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

linear regression

A
  • equation that fits into given observations
  • y = mx + b OR y = β1x + c

2 Variable Regression
- How a response variable “y” changes as the predictor (explanatory) variable “x” changes

Multiple Regression
- How a response variable “y” changes as the predictor (explanatory) variables “x1”, “x2”, … “xn” change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

single-variable polynomial regression

A

y may curve - not linear when polynomial

  • 𝑦=𝑐+𝑎1𝑥
  • 𝑦=𝑐+𝑎1𝑥+𝑎2𝑥2
  • 𝑦=𝑐+𝑎1𝑥+𝑎2𝑥2+𝑎3𝑥3 𝑦=𝑐+𝑎1𝑥+𝑎2𝑥2+𝑎3𝑥3+𝑎4𝑥4
  • 𝑦=𝑐+𝑎1𝑥+𝑎2𝑥2+𝑎3𝑥3+𝑎4𝑥4+⋯𝑎𝑛𝑥𝑛
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ordinary Least Squares
(OLS)

A

The least squares regression line of y and x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

OLS

residual

A
  • distance from the perfect line to the given point
  • compute and square that distance
  • doesn’t work well with high-dimensional data (many independent variables)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

machine-learning methods

2

A
  1. supervised - response variable
  2. unsupervised - no response variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

supervised machine learning

A

Model development, Model verification, Model deployment

  1. split data into training and test data (eg. 80/20)
  2. Model Development: take training data, use modeling technique (eg. regression) to build model (equation)
  3. Model verification: test the model on test data with known response value
  • Subject to training and test/generalization errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

supervised machine learning

generalization error

A
  • Single split model assessment methodology
  • The model is tested on hold out sample
  • Only the hold out sample accuracy is reported
How well did you know this?
1
Not at all
2
3
4
5
Perfectly