Terms Flashcards

Question 1

Q

Supervised Learning

Answer

A

Algorithms are trained using well labeled training data

Question 2

Q

Methods of solving linear regression

Answer

A

Singular Value Decomposition and QR Decomposition

Question 3

Q

Difference between stochastic and gradient descent

Answer

A

For gradient descent, use all samples in training set to calculate loss.

In stochastic, use only one or a subset of training sample to calculate loss.

Question 4

Q

Mean Square Error (MSE) for evaluating regression models

Answer

A

Measures how close a regression line is to a set of data points

Question 5

Q

Root Mean Squared Error for evaluating regression models

Answer

A

Shows how far predictions fall from measured true values

Question 6

Q

Bias

Answer

A

Error introduced by approximating the true underlying function, which can be quite complex, by a simpler model

Question 7

Q

Low Bias

Answer

A

Fewer assumptions are taken to build the target function. So the model will closely match the training dataset

Question 8

Q

High Bias

Answer

A

More assumptions are taken to build the target function. Model will not match the training dataset closely. So underfitting occurs

Question 9

Q

Ways to reduce high bias

Answer

A

Use a more complex model: model too simple
Increase the number of features: make more complex
Reduce regularization of the model: since regularization decreases/prevents overfitting which is not what you want here since high bias causes underfitting.
Increase the size of the training data: provides the model with more examples to learn from the dataset

Question 10

Q

Mean Absolute Error (MAE) for evaluating regression models

Answer

A

Measures of the average size of the mistakes in a collection of predictions w/out taking their direction into account

Question 11

Q

Coefficient of determination (R^2)

Answer

A

Measures how well a statistical model predicts an outcome (from 0 to 1)

Question 12

Q

Variance

Answer

A

It tells us how much a random variable is different from its expected value as you move from one training set to another. Shows how the performance of a model changes when trained on different subsets of training data

Question 13

Q

What is overfitting?

Answer

A

Increased model complexity and so low bias and high variance.

So model does well on training set but can’t generalize to test set

Question 14

Q

Underfitting

Answer

A

Simpler model. So high bias and low variance.

Question 15

Q

Role of training set

Answer

A

Used to fit the model: train the model with data

Question 16

Q

Role of validation set

Answer

A

Provide unbiased evaluation of a model while fine tuning hyperparameters.

Improves generalization of the model.

Question 17

Q

Role of test set

Answer

A

Data model has never seen before.
Allows for an unbiased evaluation of the model.

Question 18

Q

Cross validation

Answer

A

Separate your total training set into subsets: training and validation set. Evaluate and choose hyperparameters.

Do this iteratively, select different training and validation sets to reduce bias that would occur by selecting only one validation set

Question 19

Q

K-fold cross validation and how big should k be

Answer

A

Cross validation method but dataset is divided into k parts. Each part has one validation set and k-1 training sets.

4 - small datasets
5- large ones

Question 20

Q

When do we use logistic regression

Answer

A

Binary Classification like a churn model

Still a linear model because our come depends on the sun of the inputs and parameters not the product or quotient

Question 21

Q

What is a sigmoid function

Answer

A

Activation function that limits output to between 0 and 1

Question 22

Q

List evaluation metrics for classification methods

Answer

A

Accuracy, Precision, Recall, F1 score, Logistic/Cross Entropy Loss