SKlearn Flashcards
What is independent variable called in SKlearn
Feature
What is dependent variable called?
Output, Target
Find the R-squared in sklearn
reg.score(x_matrix,y)
Notice x has been reshaped to a 2d vector
Find the coefficients in sklearn
reg.coef_
Result is an array containing all coefficients
Find the intercept in sklearn
reg.intercept_
–> Leads to a float
Making predictions in sklearn
reg.predict(input)
Leads to array not a float, because predict method can take more than 1 value
What is the ML word for observation?
Sample
Each row in the dataset is a sample
How to calculate Adjusted R-squared in Python?
Set cell to markdown
Put the formula in Python –> Look up formula
r2 = reg.score(x,y)
n = x.shape[0]
p = x.shape[1]
Notice that x does not need to be reshaped because it already contains 2 variables. Then fill in these variables into formula.
Remember: Adjusted R-squared steps on the R-squared and adjusts for the nr of variables included in the model
What is the advantage of feature selection?
Simplifies models
Improves speed and prevents a series of unwanted issues arising from having too many features
What can you do with the F-statistic?
Test whether model has merit
Null Hypothesis is that all betas are equal to 0 –> H0: ß1 = ß2 = ß3 = 0
If all Beta’s are 0 than the model is useless
What is an F-Statistic?
Similar to a T statistic from a T-test
T-test will tell you if a single variable is statistically significant
F-test will tell you if a group of variables are jointly significant
Based on hypothesis that all betas are equal to 0 –> H0: ß1 = ß2 = ß3 = 0
How to interpret the P-value in the results table?
A low P-value (< 0.05) means that the coefficient is likely not to equal zero.
A high P-value (> 0.05) means that we cannot conclude that the explanatory variable affects the dependent variable (here: if Average_Pulse affects Calorie_Burnage).
A high P-value is also called an insignificant P-value.
How is the P-value denoted in the results table?
P>|t|
How to interpret the F-statistic? And the P-value change?
Compare F-statistic without or without variable –> Lower F-statistic means closer to a non-significant model
Prob(F-statistic) can still be significant but notice the change –> If it’s higher then drop the variable
What will this return?
from sklearn.feature_selection import f_regression
f_regression(x,y)
2 Arrays
1 with the F-statistics
1 with the according p-values –> Prob(F-statistic)