Regression Flashcards
Is it possible to create models in Machine learning?
Machine learning is mainly about creating mathematical relations between features of a dataset. These mathematical relations are called models.
How was linear regression done prior to machine learning?
Long long ago, it was done by mathematicians with very few datasets. In the availability of computers, we are now enabled with high computational powers and hence we can prepare linear regression models with any amount of data in today’s world.
What was fearless about ML?
It’s a reference that ML practitioners usually don’t spend too much time evaluating the statistical validity of a method; they prefer to just create a model and evaluate its performance, rather than thinking about the statistical validity of the process.
If we don’t understand how a model’s intricacies work, how can we be 100% sure that it’s a successful model?
Interpretation of how the model is working mathematically is not always possible. While preparing the algorithm, its mathematical steps are derived. While applying the model it is not always possible and needed to know the steps. It is about the performance of the model on unseen data that decides the success of the model.
Do more data points bring new insights?
Yes, with larger data the insights will be supported with more data and hence they will be more trustworthy. The same will be the quality of model training. With more data, models train better.
Why is the model interpretation important?
Model interpretation is important because it gives an understanding of how the model reached the final results/predictions. It gives some clarity and indication about if some changes have to be done to make the model perform better.
What are the features of the machine learning model?
Features are the attributes or variables that are used in a machine learning model. Generally, there are independent and dependent features. Independent features are the input features of the model while the dependent feature is the output variable.
Is risk similar to variance?
For a machine learning model, the higher the error more will be chance of its failure over the unseen data and it will be a measure of the risk of the model. So risk can be interpreted as the error associated with the model and hence minimizing risk is equivalent to minimizing error or variance.
What are overfitting and underfitting?
Overfitting is a case when a machine learning model follows the noises of the data too much. It becomes too complex to make good predictions on the unseen data. Underfitting is the opposite case where the model is simple enough to capture the patterns of the data and hence it is also unable to generalize over the unseen data.
What does interpolation mean?
Interpolation is a statistical method by which related known values are used to estimate an unknown value. It is useful in treating missing values in machine learning.
Why do we square the residual?
It is for mathematical purposes. If we don’t square them then the positive values and negative values cancel out each other and we will still get the error as zero or close to zero which is wrong.
What is the limit of the error made by a regression model?
There is no existing defined limit on errors by a linear regression model. The only condition is that the sum squared of error terms should be minimum.
Can we use either normalization or standardization?
We use either standardization or normalization as it brings all the variables to a uniform scale. If we don’t do that there is a chance a less important variable will be given more priority.
Is it acceptable if the slope of a variable is very small in the model?
The slope of any variable tells about how the dependent variable would change for a unit increase in the independent variable while keeping other variables constant. The larger the value of slope higher the change and Vice versa. A small slope will indicate that a certain feature is not influencing the target variable and that can be removed.
Is that always good to add more variables that could potentially have an influence on the outcome?
Yes, it is good to add more variables that are influencing the outcome. It will make the predictions more generic and reliable over unseen data.
Is there a way to validate the independence of error terms?
Yes, to do that hypothesis testing is done.
What do we mean by “linear predictor”?
A linear predictor means a linear relationship between the output and the input variables. The term linear is associated with the coefficients/parameters of the model. Every equation that is not following the linearity in parameters is a non-linear relation.
Don’t we divide the TSS by ‘n’ so we have something somehow ‘independent’ from the number of samples we take?
Dividing the total sum of squares with n will not make any sense because the residual sum of squares will also be divided by n. As we are considering the ratio between them we do not need to divide TSS with n.
Can we interpret R² as a measure to understand how linear our dataset is?
No, R² is used to explain how much our model can capture the variance in the data. The higher the value better is the performance of the model.
How to choose between Normalization and Standardization?
Normalization, typically means that the range of values is “normalized to be from 0 to 1” while Standardization typically means that the range of values is standardized to measure how many standard deviations the value is from its mean.
What is an acceptable R-squared in the real world?
Higher the value of R-square better is the fit of the model over the given data. This is because a higher r-squared value implies a lower residual sum of squares.
Do we call it R squared because the square is mathematically useful somewhere else?
R is the correlation between the predicted values and the observed values of Y. R square is the square of this coefficient and indicates the percentage of variation explained by your regression line out of the total variation.
Can we discriminate the features before the training? like finding the correlation between features against labels?
Yes, we can find the correlation between the variables and if any 2 variables are highly correlated then we drop one of them as it is redundant to our model and adds no new information to the model we develop. Along with that by doing exploratory data analysis we can remove variables based on many criteria.
Do we need to test for multicollinearity in case of more variables?
Yes, we do need to check for multicollinearity before we develop a model. We drop the redundant variables. If we have a VIF value for a variable greater than 5 we generally drop those variables.
Is Linear Regression a good base model to compare other methods?
Yes. It is always good to develop a base model as Linear regression for continuous variables and then we can move to more complicated algorithms such as Boosting, neural nets, etc.
How to estimate the parameters in the model?
To estimate the parameters in the model, we minimize the sum square of errors in which the parameters are treated as variables. By applying optimization algorithms we can estimate them.
Do we iterate to reduce the distance from the true value?
Yes, if a model is not performing well in the first go, it means it is not close enough to the true line. In such a case we do iterate by making changes in the variables of the model, in the data, or hyper-parameters of the model.
How reliable is the prediction if the user provides a clean dataset to run the training?
Having clean data is good for the machine learning model, but it is not the only factor that decides the reliability of the model. Along with clean data, the features selected in the model, its performance over unseen data are things that make the model reliable.
How do we know if there is bias in our estimation and how to measure it?
Bias can be interpreted as the training error in the model. If the training error is very high, then the model is biased and simple enough to perform reliably on the unseen data.
Is there a book or website that you recommend on today’s concepts?
You can refer to ISLR (An Introduction to Statistical Learning) by Gareth James and co-authors.
Are confidence interval and confidence band the same?
They are two different representations of the same thing. A confidence band is the lines on a probability plot or fitted line plot that depict the upper and lower confidence bounds for all points on a fitted line within the range of data. On a fitted line plot, the confidence interval for the mean response of a specified predictor value is the points on the confidence bands directly above and below the predictor value.
Can you describe a little bit about the difference between MLE and OLS? Especially under what kind of context we might favor one over the other?
The ordinary least squares, or OLS, can also be called the linear least squares. This is a method for approximately determining the unknown parameters located in a linear regression model. Maximum likelihood estimation, or MLE, is a method used in estimating the parameters of a statistical model and for fitting a statistical model to data.
Does R-square error increase with the addition of new features?
Yes, the r-square value increases with the addition of new features.
Can we use logarithmic regression in machine learning?
Yes, if the data is fitting better over a logarithmic equation then it can be fit on that. Machine learning bothers about what relation is fitting the best to the given data.
Is standardization the same as scaling?
To standardize data is about how many standard deviations the data point is away from the mean of the data. Standardization is just a technique to perform feature scaling.
How do we know that linear regression is the wrong choice for a problem?
If the trained linear regression model is not doing good over the unseen data after all the possible modifications then it is not a good choice for that specific problems and one should prefer to choose from other algorithms suitable for regression.
PCA gives us coefficients to multiply with the variables (X) to find the Y, is it some sort of regression?
Even though PCA does that, it is not a supervised learning method and hence it does not do any sort of regression. The computations of PCA are different from that of linear regression.
Can you use PCA to reduce dimension for your dataset for regression?
The main purpose of PCA is to reduce the dimension (number of features) from the given dataset. It is applied mainly when the number of features is very high. It is not used for regression.
Is ML always a black box?
It’s not always a black box. It’s generally a scale of interpretability, some ML algorithms like Linear Regression and Decision Trees are highly interpretable, others like Neural Networks are not very interpretable.
How is ML different from a model?
Machine learning algorithms are procedures that are implemented in code and are run on data. Machine learning models are output by algorithms and are comprised of model data and a prediction algorithm.
What is the difference between linear and non-linear regression?
Nonlinear regression is a form of regression analysis where we model data in a nonlinear combination of model parameters. But in linear regression, we have a linear combination of parameters.
What does the common notion that “Theory doesn’t always explain the success” mean?
There can be instances where the model can give very good performance on some data but might not be suitable for the data as per the theory of the algorithms. For example, deep neural networks work very nicely on large datasets but it is not very obvious ‘why’ it worked on the basis of mathematical theory.