4404 Flashcards

1
Q

What is machine learning

A

Machine learns from exp E with respect to class of task T and performance measure of P. Performance P at tasks in T improves with exp E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Machine learning branches

A

Supervised | Unsupervised

Regression, Classification |
Regression for continous data and predictions, Classification for discrete data and predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vectorization

A

use vectors and matrices to represent data and feed into an algorithm. This is called vectorization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cause of underfitting
Solution to underfitting

A

Cause:
Occurs when model is too simple to learn underlying structure of data
Solution:
Select a more complex model
Feed better features to learning alg
reduce regularization

Leads to high bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cause of and solutions for overfitting

A

Cause:
Too complex of a model compared to size and noise of data
Solutions:
Use model with fewer parameters (linear vs high degree polynomial)
Constraint the model through regularization
Use more training data
Fix data errors and remove outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cost/Performance Y-axis with respect to Jtrain and Jval meanings

A

Lower cost represents less errors in predictions
Higher cost reps more errors
Relation between Jtrain and Jval represent how well a model is generalizing to the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Generalization

A

Ability of a model to make predictions on unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to mitigate overfitting

A

increase regularization parameter
increase data samples
use fewer features
early stopping/fewer epochs for training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to mitigate underfitting

A

decrease regularization parameter
use a more complicated model/ more features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Regularization

A

Prevent overfitting by adding a penalty to model complexity
Discourage model from fitting the training data to closely | includes noise and outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Regularization parameter

A

Term added to cost function to train the model. Controls trade-off between fitting and keeping parameters small.
Larger param penalizes larger coefficients more strongly
Lower param allows model to fit training data more closely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

L1 Regularization (Lasso)

A

Adds a penalty to the absolute value of the coefficients
encourages sparsity in the model, driving some coefficients to zero selecting a simpler model with fewer features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

L2 Regularization (Ridge)

A

Adds a penalty equal to the square of the coefficients
encourages small coefficients, but not necessarily 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Linear Classification

A

Perceptron is a linear supervised binary classification alg that can apply when classes are linearly separable.
If training data can be separated by linear decision rule, they are linearly separable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hyperplane

A

A decision boundary separating different classes.

17
Q

Support vectors

A

Data points closest to the hyperplane. Used to maximize distance between hyperplane and nearest data points from either class.

18
Q

Linear SVM

A

When datapoints are linearly separable, SVM finds a linear hyperplane to separate the classes

19
Q

Non-linear SVM

A

When data is not linearly separable, SVM uses tech ‘kernel trick’ to transform data into higher dimensional space where a linear hyperplane can be used

20
Q

Logistics regression

A

Uses sigmoid function. Used to classify data into two distinct classes

21
Q

Sigmoid function

A

Maps any real-valued number into the range of 0, 1 | y E [0,1]
Output interpreted as the probability of that input belonging to the positive class

22
Q
A