General machine learning Flashcards

Question

What is min-max scaling (often called normalization)?

Answer 1

It is a form of feature engineering. The values are shifted and rescaled so that they end up ranging from 0 to 1. We do this by subtracting the min value and dividing by the max minus the min value. Sklearn provides a transformer called MinMaxScaler

Answer 2

It is a form of feature engineering. In statistics, standardization is the process of putting different variables on the same scale. Basically we transform the data to follow a N(0,1) distribution. Typically, to standardize variables, you calculate the mean and standard deviation for a variable. Then, for each observed value of the variable, you subtract the mean and divide by the standard deviation.

Answer 3

1) Select a more powerful model 2) feed the training algorithm with better features 3) Reduce the constraints on the model.

Answer 4

Ensemble learning is a machine learning paradigm where multiple models (often called “weak learners”) are trained to solve the same problem and combined to get better results.

Answer 5

The bias error is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).

Answer 6

The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).

Answer 7

In statistics and machine learning, the bias–variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa.

Answer 8

If our model does much better on the training set than on the test set, then we’re likely overfitting.

Answer 9

1) Cross-validation 2) train with more data 3) remove features 4) early stopping: when a model is iterative. There is a point of diminishing return (mainly in deep learning) 5) Regularization 6) Ensemble learning

Answer 10

So that you can come back easily to any model you want. Make sure you save both the hyper-parameters and the trained parameters, as well as the cross-validation scores and perhaps the actually predictions as well.

Answer 11

A model parameter is a configuration variable that is internal to the model and whose value can be estimated from data. For example, the mean and variance. -They are required by the model when making predictions. -They values define the skill of the model on your problem. -They are estimated or learned from data.

Answer 12

A model hyperparameter is a configuration that is external to the model and whose value cannot be estimated from data. They are often used in processes to help estimate model parameters. If you have to specify a model parameter manually then it is probably a model hyperparameter. Some examples include: -The learning rate for training a neural network. -The C and sigma hyperparameters for support vector machines. -The k in k-nearest neighbors.

Answer 13

It is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process.

Answer 14

1) Grid search 2) Random search 3) Bayesian optimization 4) Gradient based optimization 5) evolution optimization 6) population based optimization

Answer 15

To give a concrete example, if you're using a support vector machine, you could use different values for gamma and C. Grid-search would basically train a SVM for each pair of (gamma, C) values, then evaluate it using cross-validation, and select the one that did best.

Answer 16

1) a dataset, 2) a cost function, 3) an optimization procedure, 4) and a model.

Answer 17

1) False positive is when your prediction state it a positive but the true value is negative. 2) False negative is when you prediction state it a negative but the true value is positive.

Answer 18

First let define some notation. TP = true positive, FP = false positive, FN = false negative. Precision = TP/(TP+FP) Recall = TP/(TP+FN)

Answer 19

It a NxN matrix showing the amount of TP,TF,FN and Fp for each class.

Answer 20

Accuracy = (TP+TN)/(TP+TN+FN+FP)

Answer 21

Tradeoff means increasing one parameter would lead to decreasing of other. In this case increasing Precision lead to a decrease of Recall, and vice versa.

Answer 22

It a tool that plot the sensitivity (recall) versus the specificity. Sensitivity: measures the proportion of actual positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition). Specificity:measures the proportion of actual negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).

Answer 23

In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data.

Answer 24

* *Bias:** This part of the generalization error is due to wrong assumptions, such as assuming that the data is linear when it is actually quadratic. A high bias model is most likely to underfit the training data. * *Variance:** This part is due to the model's excessive sensitivity to small variations in the training data and is likely to overfit. * *Irreducible error:** this part is due to the noisiness of the data itself. The only way to reduce this part of the error is to clean up the data.

Answer 25

**Increase complexity:** Typically the variance will increase and the bias will decrease **Decrease complexity:** Increase the bias and reduces its variance. This is the essence of the Bias/Variance trade-off.

Answer 26

By noise we mean the data points that don’t really represent the true properties of your data, but random chance.

Answer 27

Overfitting happens because your model is trying too hard to capture the noise in your training dataset.

Answer 28

**Regularization** This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting. **example:** the image shows ridge regression, where the RSS is modified by adding the shrinkage quantity. Now, the coefficients are estimated by minimizing this function. Here, λ is the tuning parameter that decides how much we want to penalize the flexibility of our model.

Answer 29

1) Logistic regression: If the resulting classifier has a high degree of fit, is accurate, sensitive, and specific we can conclude the two variables share a relationship and are indeed correlated. Note, that linear regression assume there is a linear relationship. Testing if this assumption is correct is not straightforward. 2) biserial correlation: a) Similar to the Pearson coefficient, the point biserial correlation can range from -1 to +1. b) The point biserial calculation assumes that the continuous variable is normally distributed and homoscedastic. 3) Kruskal wallis H test(Or parametric forms such as t-test or ANOVA): A simple approach could be to group the continuous variable using the categorical variable, measure the variance in each group and comparing it to the overall variance of the continuous variable. If the variance after grouping falls down significantly, it means that the categorical variable can explain most of the variance of the continuous variable and so the two variables likely have a strong association.

Answer 30

In statistics, Cramer V is a measure of association between two nominal variables. It is based on Pearson's chi-squared statistic. 1) it is the intercorrelation of two discrete variables and may be used with variables having two or more levels. 2) It is a symmetrical measure, it does not matter which variable we place in the col and row. 3) It can also be applied to goodness of fit chi-squared models when there is a 1xK table. 4) Its varies from 0 (no association) to 1 (complete association) 5) it can be a heavily biased estimator of its population.

General machine learning Flashcards

(55 cards)