ML Kaggle (basic level) Flashcards

Question

Модель сошлась (типо стала рабочей и тд)

Answer 1

The model has converged

Answer 2

model convergence [конвёрджэнс]

Answer 3

The loss functions for linear models always produce a convex surface

Answer 4

as a result

Answer 5

convexity of loss function

Answer 6

The loss curve [кёрв] illustrate how loss changes over iterations during training

Answer 7

Learning rate is a floating point number, that you set, and that influences how quickly the model converges. The goal is to pick a learning rate that’s not too high and not too low, so that the model converges quickly

Answer 8

Batch Size is a hyperparameter that defines how many examples the model processes before updating its weights and bias.

Answer 9

Hyperparameters are values, you choose before training (learning rate, batch size, epochs, etc.). Parameters are values the model learns during training (weights, bias)

Answer 10

There are two common gradient descent techniques: 1. Stochastic [стокхастик] Gradient Descent (SGD) - uses only one example per iteration (batch size = 1) - Works with enough iterations but introduces noise (fluctuations in loss) - Stochastic means that the single example is chosen randomly 2. Mini-Batch Stochastic Gradient Descent (Mini-Batch SGD) - A compromise between full-batch and SGD - Uses a small subset(подмножество) of example per batch (1

Answer 11

1. Small batch size (1-32) behaves like SGD, more noise, but learns faster. 2. Large batch size (128+) behaves like full-batch gradient descent, more stable but requires more memory.

Answer 12

The Epoch [ипок] is a hyperparameter, that is one full pass though the entire training dataset. One epoch is not enough for the model to learn meaningful patterns (значимые закономерности)

Answer 13

Logistic regression is a powerful technique for estimating probabilities. It‘s commonly used in classification problems where the output needs to be a probability between 0 and 1.

Answer 14

It takes an input (features) and applies (применять) a sigmoid function to transform it into a probability.

Answer 15

The sigmoid function ensures (provides) that the model’s output is always between 0 and 1, making it suitable [suitable] (fitting, corresponding) for probability estimation.

Answer 16

Training Logistic Regression Models is the same process as training linear regression models, with two key distinctions: 1. Log [лог] Loss as the loss function. 2. Regularisation [рэгуларайзэйшн] to prevent overfitting.

Answer 17

Squared loss works well for a linear model where the rate of change (темп изменения) is constant. However, the rate of change of a logistic regression model is not constant. The sigmoid curve is s-shaped rather than linear.

Answer 18

Classification is the task of predicting witch of a set of classes (к какому из множества классов) an example belongs to

Answer 19

The classification threshold [трэшолд] is used to convert a probability estimate into a definitive class. (окончательный класс)

Answer 20

1) If the predicted probability is above the threshold. Then the example is assigned to the positive class. 2) If the predicted probability is below the threshold. Then the example is assigned to the negative class.

Answer 21

The choice of classification threshold significantly (существенно) impacts the model’s performance. A higher threshold increases precision (повышает точность) but may lower recall (меньше полнота класса) A lower threshold increases recall but may lower precision

Answer 22

A confusion matrix [мэйтрикс] is used to evaluate the performance of a binary classifier

Answer 23

A confusion matrix compares the model’s predictions with the actual ground truth, breaking them down into four possible outcomes: True positive, true negative and false positive, false negative

Answer 24

A hypothetical [хайпотэтикал] perfect classification model should have 0 False Positive and 0 False Negative predictions. If a model achieves this, the performance metrics would be: Accuracy (точность всех предсказаний) is equal 1 Recall (полнота правильных примеров) is equal 1 False Positive Rate (количество ложных примеров) is equal 0 Precision (точность положительных предсказаний) is equal 1

Answer 25

ROC [ароси] curve shows how a model’s TPR [типиар] and FPR [эфпиар] change across thresholds (в зависимости от пороговых значений)

Answer 26

AUC [эй б си] measures the overall performance of a classifier.

Answer 27

The optimal threshold depends on the trade-off (компромисс) between false positives and false negatives

ML Kaggle (basic level) Flashcards

(53 cards)