ML learn Flashcards

1
Q

List xgboost benefits

A

L1l2 reg prevents overfitting on high dimensional space
Missing values handling
Cross-validation
Allow early stopping
Option to look at the learning graph to choose different checkpoint
Multiple cpu optimised
Goes to deep trees -> allows more optimised trees for inference
Allow multiple objective functions
Easy interface (python)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

GBoost start - explain in words what is the initial prediction

A

The value that minimises the loss function over the observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

GBoost - explain in words how to refine the prediction over the previous prediciton

A

Add a decision classification/regression tree that predictits each input. In each leaf have a value that minimises the errors in that leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

GBoost - explain in words how to add the new tree prediction that minimises the errors of the previous prediction

A

Scale the tree with the learning rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

GBoost - How each error is being added to the tree

A

Each error goes through the decision tree, and in the leaf we add the error that minimise the loss function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of the Linear Regression algorithm in machine learning?

A

Linear Regression is used to model the relationship between a dependent variable and one or more independent variables. It predicts continuous values by fitting a linear equation to the data. The goal is to minimize the sum of squared residuals between predicted and actual values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Logistic Regression used for in machine learning?

A

Logistic Regression is used for binary classification tasks. It models the probability that a given input belongs to a certain class using a logistic function (sigmoid) to output values between 0 and 1. It estimates the parameters using Maximum Likelihood Estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do Decision Trees work in machine learning?

A

Decision Trees partition the data into subsets based on feature values, making decisions at each node to minimize impurity (like Gini index or entropy). They are simple to interpret and can be used for classification and regression tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the key idea behind Random Forests in machine learning?

A

Random Forest is an ensemble method that builds multiple decision trees on random subsets of the data and features. It then aggregates their predictions (by majority vote for classification or averaging for regression) to improve accuracy and reduce overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Gradient Boosting (e.g., XGBoost, LightGBM, CatBoost) improve prediction accuracy?

A

Gradient Boosting builds an ensemble of trees sequentially, where each new tree corrects the errors of the previous ones by focusing on the residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the role of Support Vector Machines (SVM) in classification tasks?

A

SVM is a supervised learning algorithm used for classification and regression. It aims to find the hyperplane that best separates the data into distinct classes with the maximum margin. SVMs can handle both linear and nonlinear classification using the kernel trick.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does the K-Nearest Neighbors (KNN) algorithm work?

A

KNN is a simple, non-parametric algorithm used for classification and regression. It classifies a data point based on the majority class (for classification) or the average value (for regression) of its k nearest neighbors in the feature space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Naive Bayes classifier based on?

A

Naive Bayes is a probabilistic classifier based on Bayes’ Theorem, assuming independence between features. It calculates the probability of each class given the features and assigns the class with the highest probability. It’s often used for text classification tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is K-Means clustering used for?

A

K-Means is a clustering algorithm that partitions data into k clusters based on the mean of the points in each cluster. It minimizes the within-cluster variance by iteratively assigning points to clusters and recalculating the cluster centroids.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is DBSCAN (Density-Based Spatial Clustering of Applications with Noise)?

A

DBSCAN is a density-based clustering algorithm that groups points closely packed together while marking outliers as noise. It requires two parameters: epsilon (the radius of a neighborhood) and min_samples (minimum points to form a dense region).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does Hierarchical Clustering work?

A

Hierarchical Clustering creates a tree-like structure (dendrogram) by successively merging or splitting clusters based on their similarity. It can be agglomerative (bottom-up) or divisive (top-down) and doesn’t require the number of clusters to be predefined.

17
Q

What is the goal of Dimensionality Reduction?

A

Dimensionality Reduction techniques aim to reduce the number of features in a dataset while preserving its important information. This can help improve model performance, reduce overfitting, and speed up training.

18
Q

What is PCA (Principal Component Analysis)?

A

PCA is a linear dimensionality reduction technique that transforms data into a new coordinate system where the greatest variances in the data come first. It helps reduce dimensionality while retaining the most significant features.

19
Q

What is t-SNE (t-Distributed Stochastic Neighbor Embedding) used for?

A

t-SNE is a non-linear dimensionality reduction technique that is particularly effective for visualizing high-dimensional data in 2 or 3 dimensions. It preserves the pairwise distances between points in high-dimensional space, making it useful for cluster analysis.

20
Q

What is UMAP (Uniform Manifold Approximation and Projection)?

A

UMAP is a non-linear dimensionality reduction technique that preserves both local and global structures in the data. It is often faster and more scalable than t-SNE while providing similar quality for visualizations.

21
Q

Give the examples of how XGBoost, LightGBM, and CatBoost optimise Gradient boosting

A

Algorithms like optimize this process using techniques like regularization, advanced splitting, and efficient training.