Data Mining - Lecture Regression Flashcards
What are the steps in CRISPS-DM?
- Understand business problem
- Understand data
- Prepare data
- Model Building
- Testing and Evaluating
- Deployment
Student Number?
2064381
What is the difference between single and multiple linear regression?
Single only has one predictor (independent) variable.
Multiple has multiple predictor (independent) variables.
What is the Ordinary Least Squares (OLS) method?
A method for estimating the unknown parameters in a linear regression.
-> More so, you are using it to determine the best plot line for your regression model based on the errors.
How do you calculate the OLS?
You calculate the error for every Yi that you have.
Yi is the actual observation for X.
The error for Yi is calculated by: Yi - Yhat
Yhat is the sample value, i.e. the model’s estimation for X.
OLS = SUM (Yi - Yhat)^2. You pick the model that has the smallest OLS.
Remember to compute and square each error béfore you add them up.
Which two uses are there for a regression model?
- Predictive
Detect the outcome value for new records - Explanatory/Descriptive
Explaining the average effect of inputs on an outcome
What is overfitting?
The goal about a model is to make good predictions about any additional data over which you run your algorithm.
If you have a function that represents your sample too perfectly, it does not take the ‘general’ relation between variables into account, just the ones from the sample. Therefore, it will not be able to predict future values well. This is overfitting.
-> Can be seen if the function in a graph is too close to the actual data points.
What is underfitting of a model?
The model performs poorly on the training data. This is because the model is unable to capture the relationship between the input examples (often called X) and the target values (often called Y).