AI & ML Flashcards

Question

Semi-Supervised Learning

Answer 1

ML learning method that uses a small amount of labeled data and a large amount of unlabeled data to train systems After that, the partially trained algorithm itself labels the unlabeled data; this is called pseudo-labeling The model is then re-trained on the resulting data mix without being explicitly programmed

Answer 2

Type of ML where an agent learns to make decisions by performing actions in an environment to maximize cumulative rewards

Answer 3

Reinforcement Learning from Human Feedback Use human feedback to help ML models to self-learn more efficiently; incorporates human feedback in the reward function, to be more aligned with human goals, wants and needs Used throughout GenAI applications including LLM Models; significantly enhances the model performance Steps: Data Collection, Supervised Fine-Tuning, Training Reward Model, Optimization

Answer 4

Measurement of how well a machine learning model adapts to data that is similar to the data on which it was trained Overfitting: performs well on the training data, but doesn’t perform well on evaluation data Underfitting: performs poorly on training data; could be a problem of having a model too simple or poor data features Balanced if performs well on training data and evaluation data

Answer 5

Difference, or error, between predicted and actual value Occurs due to the wrong choice in the ML process If high, model doesn’t closely match the training data; considered as underfitting Reduce this by using a more complex model and increase the number of features

Answer 6

How much the performance of a model changes if trained on a different dataset which has a similar distribution If high, model is very sensitive to changes in the training data; considered as overfitting Reduce this through feature selection for less, more important features; split into training and test data sets multiple times

Answer 7

Matrix that summarizes the performance of a machine learning model on a set of test data Best way to evaluate the performance of a model that does classifications; i.e. binary classification Precision best when false positives are costly; Recall best when false negatives are costly F1 Score best for balance of Precision and Recall, especially for imbalanced datasets; Accuracy best for balanced datasets

Answer 8

Area under the curve-receiver operator curve Value from 0 to 1, max value represents absolute perfection Shows what the curve for true positive compared to false positive looks like at various thresholds, with multiple confusion matrixes

Answer 9

MAE: Mean Absolute Error; MAPE: Mean Absolute Percentage Error; RMSE: Root Mean Squared Error MAE, MAPE, and RMSE measure the error, or how accurate the model is R² explains variance in your model; close to 1 means predictions are good

Answer 10

When a model is making prediction on new data Real Time: models have to make decisions quickly as data arrives, and speed is preferred over perfect accuracy; i.e. chatbots Batch: large amount of data that is analyzed all at once, and perfect accuracy is preferred over speed; often used for data analysis

Answer 11

When a model is making prediction on new data, and is close to where the data is being generated Use edge devices which run your model; less computational power but close proximity to your data Small Language Models on edge devices offer very low latency, low compute footprint, and offline capability Large Language Models on remote servers are more powerful, but higher latency and must be online for access

Answer 12

Finding the best hyperparameters values to optimize the model performance Hyperparameters are settings that define the model structure, learning algorithm, and process; set before training begins Tuning improves model accuracy, reduces overfitting, and enhances generalization

Answer 13

Hyperparameter for how large or small the steps are when updating the model's weights during training High rate can lead to faster convergence but risks overshooting the optimal solution Low rate may result in more precise but slower convergence

Answer 14

Hyperparameter for number of training examples used to update the model weights in one iteration Smaller sizes can lead to more stable learning but require more time to compute Larger sizes are faster but may lead to less stable updates

Answer 15

Hyperparameter that refers to how many times the model will iterate over the entire training dataset Too few can lead to underfitting, while too many may cause overfitting

AI & ML Flashcards

(39 cards)