Book - Chapter 6 Analytical Theory Regression Flashcards
Linear regression is a useful tool for answering what question
What is a persons expected income
Logistic regression is a popular method for answering what question
What is the probability that an applicant will default on the loan
In a linear regression what is the output
Continuous variable
In linear regression what is the input
Continuous or discrete variables
What is a key assumption of linear regression
That the relationship between an input variable and an output variable is linear
What is a linear regression model
A probabilistic one that accounts for the randomness that can affect any particular outcome
Where would you use linear regression
Real estate demand forecasting and medical for example proposed radiation treatment and reducing tumour sizes
What is the model outcome of linear regression
A set of estimated coefficient to indicates the relative impact of each input variable
In the linear regression what is a common technique to estimate the para metres
Ordinary least squares (0LS)
What is the goal of OLS
Find the line the best approximates relationship between the outcome variable and the input variable
What is a categorical variable
For example female or male
In regression what is the proper way to implement a categorical variable that can take on M different values
M -1 binary
What is the confidence percentage for linear regression
95%
Linear regression what a confidence intervals used for
To draw inferences on the populations expected outcome, and prediction intervals are used to draw inferences on the next possible outcome
What is a major assumption in linear regression modelling
That the relationship between the input variables and the output variable is linear
How would you evaluate a relationship between the input variable and the output variable
To plot the output variable against each input variable
What are common transformations in the linear regression
Taking square roots or the logarithm of the variables
Create a new input variables such as the age squared and added to the linear regression model to fit a quadratic relationship between an input variable and the output
What is N fold cross validation
Common practice to randomly split the entire dataset into training set and a testing set
What occurs in N fold cross validation
The entire dataset is randomly split into N data sets of approximately equal size
A model is trained against N -1 of these dataset and tested against the remaining dataset. A measure of the model area is obtained.
This process is repeated a total of eight times across the various combinations of any data sets taken N -1 at a time
The observed n model errors or averaged over the n folds
What are outliers
They can result from bad data collection, data processing errors, or an actual rare occurrence
What is the impact of logistic regression
Continuous or discrete variables
What is the output of logistic regression
Coefficients that indicate the impact of each driver
What are the use cases for logistic regression
Medical in the way you measure the likelihood of A patient response to treatment
Finance to determine the probability then after we default on the loan
Marketing to determine if the customer will switch carriers
Engineering the probability of a mechanical part experience a malfunction
Logistical progression as the value of wine increases what happens the probability
The probability of the outcome occurring increases