Prediction Flashcards

Question 1

Q

Two problem categories of prediction

Answer

A

Regression -> Linear regression
Classification -> Logistic regression

Question 2

Q

What is Linear Regression?

Answer

A

We have seen a number of cases in which a scatter plot displays a correlation between variables.
We use the linear regression model to formalize this correlation.
Its a method used to model the relationship between a dependent variable and one or more independent variables.

Question 3

Q

Pearson’s correlation coefficient

Answer

A

Denoted by r, it is a measure of correlation between two variables (columns)
Specifically, the r-value measure the strength and direction of the correlation

r has the following properties
-1 <= r <= 1
The further from 0, the stronger the correlation.
The slope has the same sign as the r-value

Question 4

Q

Method of least squares

Answer

A

A fundamental concept of linear regression is the method of least squares.
This method finds the mathematically “best” line for the data.

Minimize:
sum_i=1_n((y_i - y_hat_i)^2)

Question 5

Q

Root Mean Squared Error

Answer

A

We define the best line to be the line which has the lowest Root Mean Squared Error
To measure this:
- For each observed y value, subtract the y-value predicted by the line
- Square this difference, aka the error term.
- Sum all squared differences
- Divide by the sample size.
- Take the square root

Question 6

Q

How to find the slope and intercept of the regression line

Answer

A

for i in range(len(X)):
num += (X[i] - X_mean)(Y[i] - Y_mean)
den += (X[i] - X_mean)**2
beta_1 = num / den
beta_0 = Y_mean - beta_1X_mean

beta_1 = slope
beta_0 = intercept

Question 7

Q

The logistic model

Answer

A

To model proportions, we use the logistic model, also know as the sigmoid function
The most basic logistic model has the form:
f(x) = 1 / (1 + e^-(x))

This model has an s-shape, and bounded between 0 and 1 on the y-axis.

Question 8

Q

Fitting the logistic model

Answer

A

General logistic model has the form:
p(x) = 1/(1 + e^-(B_0 + B_1*x))
finding these values is not as easy as it is for the linear model.
Requires multivariate calculus.

Using scikit we can get the values ->
intercept = model.intercept_
coefficient = model.coef_

Question 9

Q

Prediction Flashcards

(9 cards)