Prediction Flashcards

1
Q

Two problem categories of prediction

A
  • Regression -> Linear regression
  • Classification -> Logistic regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Linear Regression?

A

We have seen a number of cases in which a scatter plot displays a correlation between variables.
We use the linear regression model to formalize this correlation.
Its a method used to model the relationship between a dependent variable and one or more independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pearson’s correlation coefficient

A

Denoted by r, it is a measure of correlation between two variables (columns)
Specifically, the r-value measure the strength and direction of the correlation

r has the following properties
-1 <= r <= 1
The further from 0, the stronger the correlation.
The slope has the same sign as the r-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Method of least squares

A

A fundamental concept of linear regression is the method of least squares.
This method finds the mathematically “best” line for the data.

Minimize:
sum_i=1_n((y_i - y_hat_i)^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Root Mean Squared Error

A

We define the best line to be the line which has the lowest Root Mean Squared Error
To measure this:
- For each observed y value, subtract the y-value predicted by the line
- Square this difference, aka the error term.
- Sum all squared differences
- Divide by the sample size.
- Take the square root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to find the slope and intercept of the regression line

A

for i in range(len(X)):
num += (X[i] - X_mean)(Y[i] - Y_mean)
den += (X[i] - X_mean)**2
beta_1 = num / den
beta_0 = Y_mean - beta_1
X_mean

beta_1 = slope
beta_0 = intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The logistic model

A

To model proportions, we use the logistic model, also know as the sigmoid function
The most basic logistic model has the form:
f(x) = 1 / (1 + e^-(x))

This model has an s-shape, and bounded between 0 and 1 on the y-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Fitting the logistic model

A

General logistic model has the form:
p(x) = 1/(1 + e^-(B_0 + B_1*x))
finding these values is not as easy as it is for the linear model.
Requires multivariate calculus.

Using scikit we can get the values ->
intercept = model.intercept_
coefficient = model.coef_

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly