Machine_Learning_Flash_Cards Flashcards by Paul Hopkins

Keyword/Topic

Definition

How well did you know this?

Not at all

Perfectly

Encoding (One-Hot, Bag of Words)

Encoding techniques used to convert categorical data or text into numerical format. One-hot encoding represents each category as a binary vector. Bag of words represents text as a count of word occurrences.

How well did you know this?

Not at all

Perfectly

Colour Palettes

Diverging Color Palettes:
Used to show data that has a meaningful midpoint (e.g., positive vs. negative values). Colors transition from one hue to a neutral midpoint and then to another hue.

Sequential Color Palettes:
Designed for ordered data, where colors transition gradually from light to dark (or vice versa) to represent increasing values.

Categorical Color Palettes:
Used for distinct, non-ordered categories. Each category is represented by a unique color to ensure clear differentiation.

How well did you know this?

Not at all

Perfectly

CSV and HDF5

CSV is a plain-text format storing tabular data, while HDF5 is a binary format designed for large datasets, supporting hierarchical data structures.

How well did you know this?

Not at all

Perfectly

Stevens’ Scales

Nominal:
Categories with no inherent order
Ordinal:
Categories with a meaningful order, but intervals
Interval:
Ordered categories with equal intervals, but no true zero point (e.g., temperature in Celsius or Fahrenheit).
Ratio:
Ordered categories with equal intervals and a true zero point, allowing for meaningful ratios (e.g., height, weight, age).

How well did you know this?

Not at all

Perfectly

Spearman vs Pearson

Spearman measures rank correlation (monotonic relationships). Pearson measures linear correlation between two continuous variables.

How well did you know this?

Not at all

Perfectly

Data Wrangling

The process of cleaning and transforming raw data into a usable format for analysis, including handling missing values, filtering, and merging datasets.

How well did you know this?

Not at all

Perfectly

MAR, MCAR, MNAR

MCAR (Missing Completely At Random):
Missing data is random and unrelated to any variables.

MAR (Missing At Random):
Missing data depends on observed variables.

MNAR (Missing Not At Random):
Missing data depends on the missing values themselves.

How well did you know this?

Not at all

Perfectly

Misclassification

A classification error where a data point is assigned the wrong class label.

How well did you know this?

Not at all

Perfectly

Variance

A measure of the spread of data points. High variance indicates a model is sensitive to fluctuations in the training data.

How well did you know this?

Not at all

Perfectly

Bagging

An ensemble method that reduces variance by training multiple models on bootstrapped datasets and averaging their predictions.

How well did you know this?

Not at all

Perfectly

K-Means (Elbow Method)

A clustering algorithm. The Elbow Method helps determine the optimal number of clusters by plotting the sum of squared errors for different k values.

How well did you know this?

Not at all

Perfectly

Boosting

1.Initialize the model with equal weights for all data points.
2.Train a weak learner (e.g., decision tree).
3.Calculate errors and assign higher weights to misclassified points.
4.Train the next weak learner on the weighted data.
5.Combine all weak learners to form a strong model.

How well did you know this?

Not at all

Perfectly

Agglomerative Clustering

A bottom-up hierarchical clustering method where clusters are merged iteratively based on similarity.

How well did you know this?

Not at all

Perfectly

Cluster Assumption

Assumes data points within the same cluster share similar properties, foundational in unsupervised learning.
Single
Complete
Average

How well did you know this?

Not at all

Perfectly

Maximum Likelihood Estimator (MLE)

A method to estimate parameters by maximizing the likelihood function, representing the probability of observing the data given the model.

Regression vs Classification

Regression predicts continuous values; classification predicts discrete class labels.

Mean Square Error (MSE)

A loss function for regression that calculates the average squared difference between predicted and actual values.

K-Fold Cross-Validation

A model validation method that splits data into k subsets, training on k-1 and validating on the remaining fold iteratively.

Gradient Descent & SGD

Optimization algorithms: Gradient Descent minimizes loss by adjusting weights iteratively. Stochastic Gradient Descent updates weights using single examples.

L1 and L2 Regularization

L1 adds a penalty proportional to the absolute value of weights (sparse solutions). L2 penalizes the square of weights (weight decay).

K-Nearest Neighbors (KNN)

A classification/regression algorithm that predicts based on the majority vote or average of the k closest data points.

Logistic Regression (Softmax)

A classification algorithm. Softmax generalizes logistic regression for multi-class classification.

No Free Lunch Theorem

No algorithm is universally best for all problems. Performance depends on the specific dataset.

Bias

Bias is the error introduced by approximating real-world problems with simplified models. High bias leads to underfitting.

Overfitting

A model captures noise in the training data, performing poorly on new data. Regularization and validation sets help mitigate this.

Validation Set

A subset of data used to tune model hyperparameters and assess performance during training.

Testing Set

A hold-out dataset used to evaluate the final performance of a trained model.

Reduce Variance in Trees

Techniques like bagging, random forests, and pruning reduce overfitting and variance in decision trees.