general machine learning Flashcards

1
Q

Why do you want to lock a test set right from the beginning.

A

If you or your algorithm look at the test data its increase the likelihood that your model will be bias. The bias we are trying to avoid is the data snooping bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the data snooping bias?

A

The data snooping bias is a statistical bias that appears when exhaustively searching for combinations of variables, the probability that a result arose by pure chance grow with the number of combinations tested.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the sampling bias?

A

The sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower sampling probability than others. The results is a biased sample or a non-random sample of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the confirmation bias?

A

Confirmation bias, is the tendency to process information by looking for, or interpreting, information that is consistent with one’s existing beliefs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the exclusion bias?

A

Happens as a result of excluding some features from our dataset usually under the umbrella of cleaning our data. We think these features are irrelevant. For example in the titanic survival prediction problem, one might disregard the passenger id of the travelers as they might think it is completely irrelevant. Little did they know that titanic passengers were assigned rooms according to their passenger id. The smaller the id number the closer to the lifeboats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the observer bias?

A

The tendency to see what we expect to see, or what we want to see. When a researcher studies a certain group, they usually come to an experiment with prior knowledge and subjective feeling about the group being studied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is prejudice bias?

A

Happen as a result of cultural influences or stereotypes on data. Example: a computer vision program that detects people at work using google image. It will be fed thousand of man coding and women cooking. Your model might conclude that only man code and only women cook.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is measurement bias?

A

Systematic value value distortion happens when there’s an issue with the device used to observe or measure. This kind of bias tends to skew the data in a particular direction. Example: shooting images data with a camera that increases the brightness. This messed up measurement tool failed to replicate the environment on which the model will operate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the eight main steps of a machine learning project?

A

1) Frame the problem and look at the big picture 2) get the data 3) explore the data to get insight 4) prepare the data 5) explore many different models and shortlist the best ones. 6) Fine-tune your models and combine them into a great solution 7) Present your solution 8) Launch, monitor, and maintain your system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When framing the problem, which question should you ask yourself?

A

i1) What is the objective in business terms 2) how will the solution be used 3) how should performance be measured and is it aligned with the business objective 4) what would be the minimum performance needed to reach the business objective. 5) list and verify the validity of your assumption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

in the step, get the data, what do you need to verify (5)?

A

1) list the data you need and how much you need 2) find and document where you can find the data. 3) check legal obligation 4) Ensure sensitive information is deleted or protected. 5) sample a test set and put it aside, and never look at it(no data scooping)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do we mean by exploring the data (5 points)?

A

1) study each attribute and its characteristics 2) verify % of missing values 3) Identify the target attribute 4) Visualize the data 5) study the correlations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do we mean by preparing the data (7 points)?

A

1) note make sure to work on copies of the data (keep the original dataset intact) 2) Write function for all data transformation 3) fix or remove outlier 4) fill missing data 5) feature selection: Drop the attributes that provide no usefull information. 6) feature scaling 7) change the type of data. For example from continuous to discret.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do we mean by shortlist promising models?

A

1) train many quick-and-dirty models and compare their performance. 2) For each model, use N-fold cross-validation. 3) Analyze the most significant variable for each algorithm 4) Analyze the type of errors the models make 5) Perform a quick round of feature selection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we mean by fine tuning the system?

A

1) you will want to use as much data as possible for this step. 2) Fine tune the hyperparameters using cross-validation 3) Try ensemble methods. Combining your best models will often produce better performance than running them individually. 4) Once you are confident about your final model, measure its performance on the test set to estimate the generalization error. 5) Note: do not tweak your model after measuring the generalization error, you would just overfit the test set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

when presenting your solution do not forget to: (6)

A

1) document what you have done 2) create a nice presentation, make sure you highlight the big picture first 3) explain why your solution achieves the business objective 4) present interesting points you noticed along the way 5) list your system limitation 6) ensure your key finding are communicated by easy to remember statement. For example: the median income is the number one predictor of housing price.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a model validation techniques?

A

It is a techniques for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in setting where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the goal of cross-validation?

A

The goal of cross-validation is to test the model’s ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the 3 strategy to deal with missing values?

A

1) Get rid of the corresponding values 2) get ride of the whole attribute 3) set the values to some value (zero, the mean, the median, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

1 pro and con of mean imputing.

A

Pro: The other attributes are still used in our model con: The standard deviation is artificially lowered. Your model think it has more data than it really does for the given attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What do we mean by one-hot encoding, and when is it used?

A

We have an array of categorical variable and it not clear if there is any order to the set, we create one binary attribute per category. Only 1 attribute will be equal to 1 (hot) and all other will be 0 (cold).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a sparse matrix and why do we use it ?

A

A sparse matrix is a matrix that contain a majority of zero. Substantial memory requirement reductions can be realized by storing only the non-zero entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the difference between univariate and multivariate imputing ?

A

One type of imputation algorithm is univariate, which imputes values in the i-th feature dimension using only non-missing values in that feature dimension (e.g. impute.SimpleImputer). By contrast, multivariate imputation algorithms use the entire set of available feature dimensions to estimate the missing values (e.g. impute.IterativeImputer).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is feature scaling, why do we scale the feature and what are the two most common way to scale the feature ?

A

1) Feature scaling is transforming the data so they are on the same scale. 2) With few exceptions, Machine learning algorithms do not perform well when the input numerical attributes have very different scales. 3) Most optimization algorithm will slow down considerably if the parameter do not have the same scale. 4) min-max scaling and standardization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is min-max scaling (often called normalization)?

A

It is a form of feature engineering. The values are shifted and rescaled so that they end up ranging from 0 to 1. We do this by subtracting the min value and dividing by the max minus the min value. Sklearn provides a transformer called MinMaxScaler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is standardization ?

A

It is a form of feature engineering. In statistics, standardization is the process of putting different variables on the same scale. Basically we transform the data to follow a N(0,1) distribution. Typically, to standardize variables, you calculate the mean and standard deviation for a variable. Then, for each observed value of the variable, you subtract the mean and divide by the standard deviation.

27
Q

What are the main ways to fix underfitting?

A

1) Select a more powerful model 2) feed the training algorithm with better features 3) Reduce the constraints on the model.

28
Q

What is ensemble learning ?

A

Ensemble learning is a machine learning paradigm where multiple models (often called “weak learners”) are trained to solve the same problem and combined to get better results.

29
Q

What is the bias error?

A

The bias error is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).

30
Q

What is the variance error ?

A

The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).

31
Q

What is the variance bias trade off?

A

In statistics and machine learning, the bias–variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa.

32
Q

How to detect if we are overfitting ?

A

If our model does much better on the training set than on the test set, then we’re likely overfitting.

33
Q

How to prevent overfitting ?

A

1) Cross-validation 2) train with more data 3) remove features 4) early stopping: when a model is iterative. There is a point of diminishing return (mainly in deep learning) 5) Regularization 6) Ensemble learning

34
Q

Why should we save every model we experiment with?

A

So that you can come back easily to any model you want. Make sure you save both the hyper-parameters and the trained parameters, as well as the cross-validation scores and perhaps the actually predictions as well.

35
Q

What is a model parameter?

A

A model parameter is a configuration variable that is internal to the model and whose value can be estimated from data. For example, the mean and variance. -They are required by the model when making predictions. -They values define the skill of the model on your problem. -They are estimated or learned from data.

36
Q

What is a model hyperparameter?

A

A model hyperparameter is a configuration that is external to the model and whose value cannot be estimated from data. They are often used in processes to help estimate model parameters. If you have to specify a model parameter manually then it is probably a model hyperparameter. Some examples include: -The learning rate for training a neural network. -The C and sigma hyperparameters for support vector machines. -The k in k-nearest neighbors.

37
Q

What do we mean by fine tuning the hyperparameter ?

A

It is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process.

38
Q

What are the 6 mains approaches to tuning hyperparameters ?

A

1) Grid search 2) Random search 3) Bayesian optimization 4) Gradient based optimization 5) evolution optimization 6) population based optimization

39
Q

What is the main idea behind grid search?

A

To give a concrete example, if you’re using a support vector machine, you could use different values for gamma and C. Grid-search would basically train a SVM for each pair of (gamma, C) values, then evaluate it using cross-validation, and select the one that did best.

40
Q

What are the common “recipe” that almost all machine learning algorithms?

A

1) a dataset, 2) a cost function, 3) an optimization procedure, 4) and a model.

41
Q

What are false positive and false negative?

A

1) False positive is when your prediction state it a positive but the true value is negative. 2) False negative is when you prediction state it a negative but the true value is positive.

42
Q

When using a classifier, what do we mean by precision and recall ?

A

First let define some notation. TP = true positive, FP = false positive, FN = false negative. Precision = TP/(TP+FP) Recall = TP/(TP+FN)

43
Q

What is a confusion matrix?

A

It a NxN matrix showing the amount of TP,TF,FN and Fp for each class.

44
Q

What is the accuracy of a classifier?

A

Accuracy = (TP+TN)/(TP+TN+FN+FP)

45
Q

What is the Precision/Recall trade-off?

A

Tradeoff means increasing one parameter would lead to decreasing of other. In this case increasing Precision lead to a decrease of Recall, and vice versa.

46
Q

What is the ROC curve?

A

It a tool that plot the sensitivity (recall) versus the specificity. Sensitivity: measures the proportion of actual positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition). Specificity:measures the proportion of actual negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).

47
Q

How is k-fold cross validation performed ?

A

In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data.

48
Q

An important theoretical result of statistics and machine learning is the fact that a model’s generalization error can be expressed as the sum of three very different errors:

A
  • *Bias:** This part of the generalization error is due to wrong assumptions, such as assuming that the data is linear when it is actually quadratic. A high bias model is most likely to underfit the training data.
  • *Variance:** This part is due to the model’s excessive sensitivity to small variations in the training data and is likely to overfit.
  • *Irreducible error:** this part is due to the noisiness of the data itself. The only way to reduce this part of the error is to clean up the data.
49
Q

What tend to happen when we increase a model’s complexity? when we decrease the complexity?

A

Increase complexity: Typically the variance will increase and the bias will decrease

Decrease complexity: Increase the bias and reduces its variance.

This is the essence of the Bias/Variance trade-off.

50
Q

What do we mean by noise in the data?

A

By noise we mean the data points that don’t really represent the true properties of your data, but random chance.

51
Q

What is the cause of overfitting?

A

Overfitting happens because your model is trying too hard to capture the noise in your training dataset.

52
Q

What do we mean by regularization? Give one example.

A

Regularization

This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting.

example: the image shows ridge regression, where the RSS is modified by adding the shrinkage quantity. Now, the coefficients are estimated by minimizing this function. Here, λ is the tuning parameter that decides how much we want to penalize the flexibility of our model.

53
Q

If you want to know if a continuous variable is correlated with a categorical variable, what are your three options?

A

1) Logistic regression: If the resulting classifier has a high degree of fit, is accurate, sensitive, and specific we can conclude the two variables share a relationship and are indeed correlated. Note, that linear regression assume there is a linear relationship. Testing if this assumption is correct is not straightforward.
2) biserial correlation:
a) Similar to the Pearson coefficient, the point biserial correlation can range from -1 to +1.
b) The point biserial calculation assumes that the continuous variable is normally distributed and homoscedastic.

3) Kruskal wallis H test(Or parametric forms such as t-test or ANOVA):
A simple approach could be to group the continuous variable using the categorical variable, measure the variance in each group and comparing it to the overall variance of the continuous variable. If the variance after grouping falls down significantly, it means that the categorical variable can explain most of the variance of the continuous variable and so the two variables likely have a strong association.

54
Q

What is Cramer V?

A

In statistics, Cramer V is a measure of association between two nominal variables. It is based on Pearson’s chi-squared statistic.

1) it is the intercorrelation of two discrete variables and may be used with variables having two or more levels.
2) It is a symmetrical measure, it does not matter which variable we place in the col and row.
3) It can also be applied to goodness of fit chi-squared models when there is a 1xK table.
4) Its varies from 0 (no association) to 1 (complete association)
5) it can be a heavily biased estimator of its population.

55
Q

How to detect if we are underfitting?

A

Your model is underfitting the training data when the model performs poorly on the training data. This is because the model is unable to capture the relationship between the input examples (often called X) and the target values (often called Y).

56
Q

What is data leakage?

A

Data leakage is when information from outside the training dataset is used to create the model.

For example: Splitting the candle data randomly minute to minute instead of day to day.

57
Q

What is data drift?

A

Data-drift is defined as a variation in the production data from the data that was used to test and validate the model before deploying it in production.

For example, Interactive Broker did not send every trade on the live data but did on the historical data. This created a big difference between the live data and the training data.

58
Q

What is precision? what is another name used to describre precision?

A

precision = tp/(tp+fp)

precision = positive predictive value.

59
Q

What is recall ? what is another name used to describe recall

A

recall = tp/(tp+fn)

recall = sensitivity

recall = true positive rate

60
Q

What is specificity? what is another name used to describe specificity ?

A

specificity = tn/(tn+fp)

specificity = true negative rate

61
Q

What is the f1 score and what its formula?

A

The f1 score is the harmonic mean of precision and recall

f1 = 2*(precision*recall)/(precision+recall)

f1 = 2tp/(2tp+fp+fn)

62
Q

What is data drift?

A

Data drift is the situation where the model’s input distribution changes over time.

63
Q

What is concept drift?

A

(Real) concept drift is the situation when the functional relationship between the model inputs and outputs changes.

The cause of the relationship change is some kind of external event or process. For example, we try to predict life expectancy using geographic regions as input. As the region’s development level increases (or decreases) region loses its predictive power, and our model degrades.