Book - Chapter 6 Analytical Theory Regression Flashcards
Linear regression is a useful tool for answering what question
What is a persons expected income
Logistic regression is a popular method for answering what question
What is the probability that an applicant will default on the loan
In a linear regression what is the output
Continuous variable
In linear regression what is the input
Continuous or discrete variables
What is a key assumption of linear regression
That the relationship between an input variable and an output variable is linear
What is a linear regression model
A probabilistic one that accounts for the randomness that can affect any particular outcome
Where would you use linear regression
Real estate demand forecasting and medical for example proposed radiation treatment and reducing tumour sizes
What is the model outcome of linear regression
A set of estimated coefficient to indicates the relative impact of each input variable
In the linear regression what is a common technique to estimate the para metres
Ordinary least squares (0LS)
What is the goal of OLS
Find the line the best approximates relationship between the outcome variable and the input variable
What is a categorical variable
For example female or male
In regression what is the proper way to implement a categorical variable that can take on M different values
M -1 binary
What is the confidence percentage for linear regression
95%
Linear regression what a confidence intervals used for
To draw inferences on the populations expected outcome, and prediction intervals are used to draw inferences on the next possible outcome
What is a major assumption in linear regression modelling
That the relationship between the input variables and the output variable is linear
How would you evaluate a relationship between the input variable and the output variable
To plot the output variable against each input variable
What are common transformations in the linear regression
Taking square roots or the logarithm of the variables
Create a new input variables such as the age squared and added to the linear regression model to fit a quadratic relationship between an input variable and the output
What is N fold cross validation
Common practice to randomly split the entire dataset into training set and a testing set
What occurs in N fold cross validation
The entire dataset is randomly split into N data sets of approximately equal size
A model is trained against N -1 of these dataset and tested against the remaining dataset. A measure of the model area is obtained.
This process is repeated a total of eight times across the various combinations of any data sets taken N -1 at a time
The observed n model errors or averaged over the n folds
What are outliers
They can result from bad data collection, data processing errors, or an actual rare occurrence
What is the impact of logistic regression
Continuous or discrete variables
What is the output of logistic regression
Coefficients that indicate the impact of each driver
What are the use cases for logistic regression
Medical in the way you measure the likelihood of A patient response to treatment
Finance to determine the probability then after we default on the loan
Marketing to determine if the customer will switch carriers
Engineering the probability of a mechanical part experience a malfunction
Logistical progression as the value of wine increases what happens the probability
The probability of the outcome occurring increases
What is MLE
Maximum likelihood estimation and its use to estimate the model parameters
In logistical aggression what is null deviance
Is the value where the likelihood function is based only on the intercept term
What is the residual deviance in logistic regression
The value where the likelihood function is based on the parameters in the specified logistic model
What is pseudo-r squared
A measure of how well the fitted model explains the data as compared to the default model of no predictor variables and only and intercept term
If the pseudo R squared value is near one what does that indicate
A good fit over the simple null model
How is logistic regression used as a classifier
To assign class labels to a person, item, or transaction based on the predicted probability provided by the model
What is the default probability threshold in logistical regression
0.5
How do you work out the false positive rate
Number of falls positives divided against number of negatives
How do you work out the true positive rate
But of true positives divided by number of positives
What is the receiver operating characteristic (ROC) curve
It is the plot of the true positive rate against the full positive rate
When is the RAC curve useful
For evaluating other classifiers