Machine_Learning_Flash_Cards Flashcards

1
Q

Keyword/Topic

A

Definition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Encoding (One-Hot, Bag of Words)

A

Encoding techniques used to convert categorical data or text into numerical format. One-hot encoding represents each category as a binary vector. Bag of words represents text as a count of word occurrences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Colour Palettes

A

Diverging Color Palettes:
Used to show data that has a meaningful midpoint (e.g., positive vs. negative values). Colors transition from one hue to a neutral midpoint and then to another hue.

Sequential Color Palettes:
Designed for ordered data, where colors transition gradually from light to dark (or vice versa) to represent increasing values.

Categorical Color Palettes:
Used for distinct, non-ordered categories. Each category is represented by a unique color to ensure clear differentiation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

CSV and HDF5

A

CSV is a plain-text format storing tabular data, while HDF5 is a binary format designed for large datasets, supporting hierarchical data structures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Stevens’ Scales

A

Nominal:
Categories with no inherent order
Ordinal:
Categories with a meaningful order, but intervals
Interval:
Ordered categories with equal intervals, but no true zero point (e.g., temperature in Celsius or Fahrenheit).
Ratio:
Ordered categories with equal intervals and a true zero point, allowing for meaningful ratios (e.g., height, weight, age).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Spearman vs Pearson

A

Spearman measures rank correlation (monotonic relationships). Pearson measures linear correlation between two continuous variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Wrangling

A

The process of cleaning and transforming raw data into a usable format for analysis, including handling missing values, filtering, and merging datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

MAR, MCAR, MNAR

A

MCAR (Missing Completely At Random):
Missing data is random and unrelated to any variables.

MAR (Missing At Random):
Missing data depends on observed variables.

MNAR (Missing Not At Random):
Missing data depends on the missing values themselves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Misclassification

A

A classification error where a data point is assigned the wrong class label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Variance

A

A measure of the spread of data points. High variance indicates a model is sensitive to fluctuations in the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bagging

A

An ensemble method that reduces variance by training multiple models on bootstrapped datasets and averaging their predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

K-Means (Elbow Method)

A

A clustering algorithm. The Elbow Method helps determine the optimal number of clusters by plotting the sum of squared errors for different k values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Boosting

A

1.Initialize the model with equal weights for all data points.
2.Train a weak learner (e.g., decision tree).
3.Calculate errors and assign higher weights to misclassified points.
4.Train the next weak learner on the weighted data.
5.Combine all weak learners to form a strong model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Agglomerative Clustering

A

A bottom-up hierarchical clustering method where clusters are merged iteratively based on similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cluster Assumption

A

Assumes data points within the same cluster share similar properties, foundational in unsupervised learning.
Single
Complete
Average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Maximum Likelihood Estimator (MLE)

A

A method to estimate parameters by maximizing the likelihood function, representing the probability of observing the data given the model.

17
Q

Regression vs Classification

A

Regression predicts continuous values; classification predicts discrete class labels.

18
Q

Mean Square Error (MSE)

A

A loss function for regression that calculates the average squared difference between predicted and actual values.

19
Q

K-Fold Cross-Validation

A

A model validation method that splits data into k subsets, training on k-1 and validating on the remaining fold iteratively.

20
Q

Gradient Descent & SGD

A

Optimization algorithms: Gradient Descent minimizes loss by adjusting weights iteratively. Stochastic Gradient Descent updates weights using single examples.

21
Q

L1 and L2 Regularization

A

L1 adds a penalty proportional to the absolute value of weights (sparse solutions). L2 penalizes the square of weights (weight decay).

22
Q

K-Nearest Neighbors (KNN)

A

A classification/regression algorithm that predicts based on the majority vote or average of the k closest data points.

23
Q

Logistic Regression (Softmax)

A

A classification algorithm. Softmax generalizes logistic regression for multi-class classification.

24
Q

No Free Lunch Theorem

A

No algorithm is universally best for all problems. Performance depends on the specific dataset.

25
Q

Bias

A

Bias is the error introduced by approximating real-world problems with simplified models. High bias leads to underfitting.

26
Q

Overfitting

A

A model captures noise in the training data, performing poorly on new data. Regularization and validation sets help mitigate this.

27
Q

Validation Set

A

A subset of data used to tune model hyperparameters and assess performance during training.

28
Q

Testing Set

A

A hold-out dataset used to evaluate the final performance of a trained model.

29
Q

Reduce Variance in Trees

A

Techniques like bagging, random forests, and pruning reduce overfitting and variance in decision trees.