first 100 ML/AI terms Flashcards

Question

Activation Function

Answer 1

A function in neural networks that determines the output of a node given an input or set of inputs (e.g., ReLU, Sigmoid).

Answer 2

A method used in neural networks to calculate the gradient of the loss function and update the weights.

Answer 3

A deep learning model commonly used for image recognition tasks, consisting of convolutional layers to extract features.

Answer 4

A neural network with loops, allowing information to persist for sequential data processing like time series or natural language.

Answer 5

A type of RNN designed to overcome the vanishing gradient problem by introducing memory cells to retain information.

Answer 6

A technique where a pre-trained model on one task is reused and fine-tuned for another related task.

Answer 7

A technique where multiple models are combined to improve the performance of the final model.

Answer 8

A method in ensemble learning that reduces variance by training multiple models on different random subsets of the training data.

Answer 9

An ensemble technique that sequentially trains models, with each new model focusing on the errors of the previous ones.

Answer 10

A variation of gradient descent where only a random subset of data points is used to update the model parameters.

Answer 11

One complete pass through the entire training dataset during the learning process.

Answer 12

The number of training examples used in one iteration of model training.

Answer 13

The process of breaking down text into smaller units, like words or subwords, for processing in NLP tasks.

Answer 14

A dense vector representation of words that captures semantic relationships, such as Word2Vec and GloVe.

Answer 15

A technique in deep learning that allows models to focus on specific parts of the input sequence when making predictions.

Answer 16

An architecture in deep learning that uses self-attention mechanisms and is commonly used for NLP tasks, such as in models like BERT and GPT.

Answer 17

The field of AI that focuses on the interaction between computers and human language.

Answer 18

A simple NLP model that represents text as an unordered collection of words, disregarding grammar and word order.

Answer 19

A probabilistic classifier based on Bayes' theorem, assuming independence between features.

Answer 20

A dimensionality reduction technique that transforms data into a set of orthogonal components.

Answer 21

A nonlinear dimensionality reduction technique for visualizing high-dimensional data.

Answer 22

The process of creating new features from existing data to improve model performance.

Answer 23

The process of normalizing or standardizing the range of independent variables or features in the data.

Answer 24

A technique to represent categorical variables as binary vectors for use in ML models.

Answer 25

A table used to describe the performance of a classification model, showing true positives, false positives, true negatives, and false negatives.

Answer 26

The ratio of true positive predictions to the total number of positive predictions made by the model.

Answer 27

The ratio of true positive predictions to the total actual positive cases in the dataset.

Answer 28

The harmonic mean of precision and recall, providing a single metric for model performance.

Answer 29

A graphical plot illustrating the performance of a binary classifier as its discrimination threshold is varied.

Answer 30

A single scalar value that summarizes the performance of a classifier, representing the probability of correctly ranking positive and negative instances.

Answer 31

An unsupervised learning algorithm that partitions data into K clusters based on feature similarity.

Answer 32

A clustering algorithm that groups data based on density, identifying clusters of varying shapes and sizes.

Answer 33

A method of clustering that builds a hierarchy of clusters using either agglomerative (bottom-up) or divisive (top-down) approaches.

Answer 34

The process of identifying data points that are significantly different from the rest of the dataset.

Answer 35

A method to identify unusual patterns or data points that do not conform to the expected behavior.

Answer 36

A type of neural network used to learn efficient representations (encodings) of data, commonly used for dimensionality reduction and anomaly detection.

Answer 37

A deep learning model consisting of two neural networks (generator and discriminator) that compete to generate realistic synthetic data.

Answer 38

Artificially generated data that imitates the statistical properties of real-world data, often used for privacy-preserving training.

Answer 39

The balance between a model's ability to fit the training data (bias) and its ability to generalize to new data (variance).

Answer 40

A challenge in recommendation systems when there is little or no data for new users or items.

Answer 41

A technique to increase the diversity of training data by applying transformations such as rotation, flipping, or scaling.

Answer 42

A regularization technique in neural networks where randomly selected neurons are ignored during training to prevent overfitting.

Answer 43

A method to prevent overfitting by stopping the training process once the model's performance on a validation set starts to deteriorate.

Answer 44

A problem in deep neural networks where gradients become too small, slowing down learning, commonly addressed using techniques like LSTM and ReLU.

Answer 45

An activation function that transforms a vector of values into a probability distribution, used in multi-class classification.

Answer 46

A statistical model for binary classification that estimates the probability of a binary outcome using a logistic function.

Answer 47

A class of neural network consisting of multiple layers of nodes in a directed graph.

Answer 48

A measure of the contribution of each feature to the model's predictions.

Answer 49

An ensemble technique that builds models sequentially, with each new model correcting the errors of the previous ones.

Answer 50

An optimized implementation of gradient boosting, known for its speed and performance in ML competitions.

Answer 51

The process of finding the best hyperparameter values for a model using methods like grid search or random search.

Answer 52

A plot that shows how a model's performance improves as it learns from more training data.

Answer 53

A feature scaling technique that normalizes data to a fixed range, usually [0, 1].

Answer 54

Adjusting values in data to a common scale without distorting differences in ranges of values.

Answer 55

A unit of text (e.g., word, subword) used as input for natural language processing models.

Answer 56

Short for Bootstrap Aggregating, an ensemble learning method that uses multiple models trained on random subsets of data to improve performance.

Answer 57

A loss function commonly used in classification tasks to measure the difference between true labels and predicted probabilities.

Answer 58

A technique to improve the training of deep neural networks by normalizing inputs of each layer.

Answer 59

An optimization algorithm that adjusts the learning rate based on estimates of the first and second moments of the gradients.

Answer 60

A metric for evaluating the quality of machine-translated text against one or more reference translations.

Answer 61

A matrix in which most of the elements are zero, often used in text mining and NLP.

Answer 62

A binary vector used to represent categorical data, where one element is 'hot' (1) and all others are 'cold' (0).

Answer 63

The process of breaking down text into smaller units like words, subwords, or characters.

Answer 64

A stochastic model describing a sequence of possible events, where each event depends only on the state attained in the previous event.

Answer 65

A computational technique that uses random sampling to obtain numerical results, often used in probabilistic modeling.

Answer 66

A measure of uncertainty or randomness in information theory, used in decision trees for determining data splits.

Answer 67

A measure used in decision trees to determine the best feature to split on, based on the reduction in entropy.

Answer 68

A statistical measure that indicates how well a regression model fits the data.

Answer 69

A technique that reduces the learning rate over time to stabilize the training process and improve model convergence.

Answer 70

Low-dimensional, dense vector representations of high-dimensional data, often used in NLP.

Answer 71

The simplest type of artificial neural network, consisting of a single layer of weights and a threshold activation function.

Answer 72

A technique to prevent exploding gradients in neural networks by capping gradients to a maximum value.

Answer 73

A loss function used for training SVMs, penalizing points that lie on the wrong side of the decision boundary.

Answer 74

The use of AI to generate human-like text based on input data or patterns.

Answer 75

A model's ability to make predictions on classes not seen during training by leveraging shared attributes or semantics.

Answer 76

An ML approach where models learn to make predictions from a very small number of training examples.

first 100 ML/AI terms Flashcards

(100 cards)