General Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Stochastic Gradient Descent

A

Gradient descent algorithm that increments the parameters using only single observations at a time.

More efficient than batch gradient descent, especially with large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Batch Gradient Descent

A

Gradient descent algorithm in which it is required to scan through the entire training set before taking a single step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Localized Linear Regression

A

A variant of traditional linear regression that uses only local data points around Xi to predict Yi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Type I Error (False Positive)

A

Incorrectly rejecting the null hypothesis in favor of the alternative hypothesis when the null is true.

Same as alpha, set at the beginning of the experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Type II Error (False Negative)

A

Failing to reject the null hypothesis when it is false

Also known as beta. Note that power is (1 - beta)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A\B Testing

A

Processing of testing two groups against a given desired measured to determine if there is a statistical difference

General Strategy:

  1. Identify comparative statistic
  2. Determine sample size
  3. Analyze results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SVMs : General description

A

Simple: Machine learning model that uses a hyperplane to differentiate and classify different groups of data

Detailed: SVM identified an appropriate hyperplane by attempting to maximize the margins between points between the closest points of each class to boundary.

If data that cannot be separated linearly, use transformations to map data into high dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SVMs: Soft Margin Classification

A

A mechanism that serves to reduce the overfitting of maximum margin classification by penalizing misclassifications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bias / Variance Tradeoff

A

A tradeoff in machine learning models where you have the choice of reducing bias (how well a model fits a specific set of data) vs. reducing variance (how much performance of a mode varies across many datasets).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Precision

A

TP / (TP + FP)

Measures the accuracy of positive predictions (but not necessarily identifying all of them).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Recall

A

TP / (TP + FN)

Measures completeness of positive predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

F1 Score

A

2 / ((1/Precision) + 1/Recall))

Harmonic mean of precision and recall, ranging between 0 and 100%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ROC Curve

A

Plots true positive rate (recall) against false positive rate. A good ROC curve goes toward the top left of the chart.

X = FPR
Y = TPR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

False Positive Rate
(1 - Specificity)

A

Proportion of negative instances that are incorrectly classified as positive (i.e. false positive)

FP / (FP + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Lasso Regression

A

Check this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Elastic Net

A

Check this

17
Q

Early Stopping

A

A way of regularizing a model by stopping training once validation error reaches a minimum

18
Q

Soft max Regression

A

Also known as multinomial logistic regression.

Classification with multiple classes. For each instance x, assigns a score s(x) for each class k, then estimates probability by applying a soft max function.

Soft max function is as followed:

Exp(s(x)) / summation(exp(s(x)))

19
Q

Cross-entropy

A

Loss function used to measure difference between predicted and true probability distributions. Penalizes low probability on true labels significantly.

-1/m ∑ ∑y log (p(k))

Essentially the mean of the -log(estimated probabilities).

20
Q

Accuracy

A

Number of correctly classified instances / number of all classified instances

21
Q

True Positive Rate
(Sensitivity)

A

Proportion of positive instances that are correctly classified as positive (i.e. true positive)

TP / (TP + FN)

22
Q

Specificity

A

True Negative Rate

Proportion of negative instances that are correctly classified as negative (i.e. true negative)

TN / (TN + FP)

23
Q

Gradient Descent

A

An algorithm that minimizes a particular function (in ML the loss function) by taking small steps in the direction of the steepest descent for that function.

Step 1: Take the derivative of the loss function for each parameter (i.e. take the gradient of the loss function).

Step 2: Initialize parameters with random values

Step 3: Plug parameters into the partial derivatives (gradient)

Step 4: Calculate step sizes (Calculated slope from step 3 * learning rate)

Step 5: Calculate the new parameters (New = Old - Step Size)

Step 6: Repeat 4-5 until convergence

24
Q

Steps for K-fold Cross Validation

A

Step 1: Shuffle data into equally sized blocks (folds)

Step 2: For each fold k, train model on all data except fold i, and evaluate validation error using the remaining fold i.

Step 3: Average the validation errors from step 2 to get estimate of the true error.

25
Q

Bootstrapping

A

Drawing observations from a large data sample repeatedly (sampling with replacement) and then estimating some quantity of a population by averaging estimates from multiple smaller samples.

Useful for small data sets and helping to deal with class imbalance.

26
Q

Hyperparameter tuning:
Grid search

A

Forming a grid that is the Cartesian product of all parameters and then sequentially trying all such combinations and seeing which yields best results.

27
Q

Hyperparameter tuning:
Random Search

A

Randomly sample from the joint distribution of all parameters.

28
Q

ROC Curve
AUC?

A

Plots the true positive rate (y) against the false positive rate (x) for various thresholds.

Area under the curve (AUC) measures how well the classifier separates classes.

29
Q

Conditional Probability

P(A/B)

A

P(A ∩ B) / P(B)

30
Q

Bayes Theorem

A

P(H/E) = P(H)*P(E/H) / P(E)