F1 Score - Harmonic mean of Precision and Recall. - Formula: 2 (Precision Recall) / (Precision + Recall).

Recall (Sensitivity, TPR) - Proportion of actual positives correctly identified by the model. - Formula: TP / (TP + FN).

AUC (Area Under the Curve) - Measures overall classification performance as the area under the ROC curve. - Values range from 0 to 1, with 1 being perfect and 0.5 representing random guessing.

TPR (True Positive Rate, Recall) - Proportion of actual positives classified as positive. - Formula: TP / (TP + FN).

Precision - Proportion of predicted positives that are actual positives. - Formula: TP / (TP + FP).

FPR (False Positive Rate) - Proportion of actual negatives incorrectly classified as positive. - Formula: FP / (FP + TN).

ROC (Receiver Operating Characteristic) Curve - Plots TPR vs. FPR across different thresholds. - Used to assess model performance over varying decision boundaries.

Fundamentals of Machine Learning I (Not Part of Certification) Flashcards by Unknown Unknown

Characteristics of Regression?

Predicts amount of something based on numerical values
Uses data splitting and penalty to get an effective mathematical function
Supervised
Metrics
- MAE, MSE, RMSE, R^2

How well did you know this?

Not at all

Perfectly

What is binary classification + characteristics

Prediction of one of two classes ex(diabetic to not diabetic)
Supervised
Uses probability
Classifies probability with sigmoid curve
Logistic Regression is a binary classifier
made with random subset of data
Confusion Matrixes
F1, Recall, Precision, AUC, TPR, FPR, ROC

How well did you know this?

Not at all

Perfectly

What is RMSE

RMSE (Root Mean Squared Error) Used to measure number of
incorrect predictions

How well did you know this?

Not at all

Perfectly

What is MAE

MAE (Mean Absolute Error) Calculated with mean error

How well did you know this?

Not at all

Perfectly

What is F1

F1 Score

Harmonic mean of Precision and Recall.
Formula: 2 * (Precision * Recall) / (Precision + Recall).

How well did you know this?

Not at all

Perfectly

What is Recall

Recall (Sensitivity, TPR)

Proportion of actual positives correctly identified by the model.
Formula: TP / (TP + FN).

How well did you know this?

Not at all

Perfectly

What is AUC

AUC (Area Under the Curve)

Measures overall classification performance as the area under the ROC curve.
Values range from 0 to 1, with 1 being perfect and 0.5 representing random guessing.

How well did you know this?

Not at all

Perfectly

What is TPR

TPR (True Positive Rate, Recall)

Proportion of actual positives classified as positive.
Formula: TP / (TP + FN).

How well did you know this?

Not at all

Perfectly

What is Precision

Precision

Proportion of predicted positives that are actual positives.
Formula: TP / (TP + FP).

How well did you know this?

Not at all

Perfectly

What is FPR

FPR (False Positive Rate)

Proportion of actual negatives incorrectly classified as positive.
Formula: FP / (FP + TN).

How well did you know this?

Not at all

Perfectly

What is ROC

ROC (Receiver Operating Characteristic) Curve

Plots TPR vs. FPR across different thresholds.
Used to assess model performance over varying decision boundaries.

How well did you know this?

Not at all

Perfectly

What is R^2

R^2 (Coefficient of determination) Used to measure variance in
data to calculate the fit of the model

How well did you know this?

Not at all

Perfectly

What is MSE

MSE (Mean squared error) Mean of error amount squared. Used
to amplify the error amount

How well did you know this?

Not at all

Perfectly

What is multiclass classification?

Multiclass classification is used to predict which of multiple possible classes an observation belongs to. It calculates probability values for each class label and predicts the most probable class.

(Supervised)

How well did you know this?

Not at all

Perfectly

What are the two types of algorithms used in multiclass classification?

The two types of algorithms are:

One-vs-Rest (OvR): Trains a binary classification function for each class.
Multinomial: Creates a single function that returns a probability distribution for all possible classes.

How well did you know this?

Not at all

Perfectly

How does the One-vs-Rest (OvR) algorithm work?

Study These Flashcards

The OvR algorithm trains separate binary classification functions for each class. Each function calculates the probability that an observation belongs to a specific class compared to all others, and the class with the highest probability is predicted.

What is a multinomial algorithm, and how does it work?

Study These Flashcards

A multinomial algorithm creates a single function that returns a vector with probability values for each class. These values add up to 1, and the class with the highest probability is predicted. An example is the softmax function.

How can you evaluate a multiclass classification model?

Study These Flashcards

You can evaluate by calculating binary classification metrics for each class or aggregate metrics across all classes. Metrics like accuracy, recall, precision, and F1-score can be derived from a multiclass confusion matrix.

How are binary classification metrics used in evaluating multiclass classification models?

Study These Flashcards

In multiclass classification, binary metrics such as accuracy, recall, precision, and F1-score can be calculated for each individual class. These are derived from the confusion matrix, where True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN) are recorded for each class. Aggregate metrics can then be calculated for overall model evaluation.

What metrics can be derived from a multiclass confusion matrix?

Study These Flashcards

Metrics derived from a multiclass confusion matrix include:

Accuracy, Recall, Precision, F1-Score

Define Unsupervised Learning

Study These Flashcards

Unsupervised learning uses data without labels to find patterns or groupings. Examples: clustering, dimensionality reduction.

Define Supervised Learning

Study These Flashcards

Supervised learning uses labeled data (features and known outcomes) to train a model to make predictions. Examples: classification, regression.

What is clustering in machine learning?

Study These Flashcards

Clustering is an unsupervised learning method where observations are grouped into clusters based on similarities in their features, without using labels.

Why is clustering considered unsupervised learning?

Study These Flashcards

Clustering is unsupervised because it doesn’t rely on known label values. Instead, it groups data points based solely on feature similarities.

What is an example of clustering?

In a flower dataset with features like the number of leaves and petals, clustering groups similar flowers based on these features without knowing their species.

What is the centroid process in K-Means clustering?

1. Initialization: Choose 𝑘 clusters and randomly select 𝑘 initial centroids. 2. Assignment Step: Calculate distances from each point to centroids, assigning points to the nearest centroid. 3. Update Step: Recalculate centroids by finding the mean of assigned points. 4. Repeat: Iterate the assignment and update steps until centroids stabilize or a maximum number of iterations is reached.

What is K-Means clustering?

K-Means is a clustering algorithm that assigns data points to clusters by minimizing the distance to the cluster's centroid, repeatedly adjusting centroids until stable clusters are formed.

How do you evaluate a clustering model?

Clustering models are evaluated by how well the clusters are separated, using metrics like average distance to centroid, maximum distance, and silhouette score

What is a silhouette score in clustering?

A silhouette score measures cluster separation, ranging from -1 to 1, where values closer to 1 indicate better-defined clusters.

What metrics are used to evaluate clusters?

Metrics include average distance to cluster center, average distance to other centers, maximum distance to center, and silhouette score.

What is a Centroid

A centroid is the central point of a cluster, representing the mean position of all data points assigned to that cluster in the feature space. It serves as the reference for assigning new points to clusters in algorithms like K-Means.

Fundamentals of Machine Learning I (Not Part of Certification) Flashcards

(31 cards)