Regression Flashcards
What is Linear Regression?
Linear Regression is a statistical method used to model the relationship between one or more independent variables (Predictors) and an dependent variable(outcome)
What is the assumption in Linear Regression?
The core assumption behind linear regression is that there exist a linear relation between the independent variables and the dependent variable. (proportional change)
What is the mathematical equation behind Linear Regression?
y=mx+b
y = b0 + b1x1 + b2x2 + …. bnxn
What is the value that we try to find in Linear Regression?
We try to find the coefficients that best fit the data.
How do we find the line in Linear Regression?
Using least squares.
Minimize Sum of Residual Squares
What is Least Squares?
Smallest sum of residual squares
What are the methods to compare models in Linear Regression?
R-square, P Value
What does Logistic Regression predict?
Logistic Regression(Classification) gives the probability of whether a data point belongs to a class or not(Obese or Not Obese) based on one or more predictors(Size, Size & Genotype etc.).
i.e. It classifies a data point.
How does logistic regression find the best fit?
1) Converts the Y axis to log(odds), allowing for straight lines, which can be used to find the best fit.
How do we find the best fit line in Logistic Regression?
Using Maximum Likelihood
Why do we use maximum likelihood instead of Least Squares in Logistic Regression?
When y axis is transformed into log(y), then the class values are pushed to negative and positive infinity. Hence residuals will be infinite.
Describe the steps in finding best fitting squiggle in Logistic Regression?
1) Transform the Y axis to Log(Y)
2) Assign a candidate straight line b0+ b1x1 in the log(Y) graph.
3) Project the datapoints at either infinites to the candidate line.
4)Take the Log(Y) values for each datapoint for the candidate line.
5) Transform log(Y) back into probabilities for each datapoint
6)Take the likelihood(Y value) for each datapoint belonging to each class.
7) Product(Likelihoods of belonging to one class) * Product (Likelhood of belonging to other class)
OR
Sum(Log(All Likelihoods))
8) This gives the likelihood for the candidate line
9) Change the line to find a lines that maximizes the likelihood
What is the equation for Logistic Regression?