first 100 ML/AI terms Flashcards

1
Q

Artificial Intelligence (AI)

A

The simulation of human intelligence processes by machines, particularly computer systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Machine Learning (ML)

A

A subset of AI that involves the use of algorithms and statistical models to enable computers to learn from data and make decisions without being explicitly programmed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Deep Learning

A

A subset of ML involving neural networks with many layers that learn complex patterns in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Neural Network

A

A network of artificial neurons that mimics the structure of the human brain to process information in a layered fashion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Supervised Learning

A

A type of ML where the model is trained on labeled data (input-output pairs).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unsupervised Learning

A

A type of ML that deals with unlabeled data, finding patterns or structures in input data without predefined labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reinforcement Learning

A

An ML technique where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Classification

A

A supervised learning task where the model predicts a discrete label for input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Regression

A

A supervised learning task where the model predicts a continuous value for input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Clustering

A

An unsupervised learning method that groups data points into clusters based on their similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Decision Tree

A

A model that splits data into branches based on feature values, used for classification and regression tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Random Forest

A

An ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Support Vector Machine (SVM)

A

A supervised learning algorithm that finds the hyperplane best separating different classes in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

K-Nearest Neighbors (KNN)

A

A simple ML algorithm that classifies data points based on the majority class of their k nearest neighbors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Gradient Descent

A

An optimization algorithm used to minimize the loss function in various ML models by iteratively adjusting parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Learning Rate

A

A hyperparameter that controls the step size at each iteration of gradient descent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Overfitting

A

A modeling error that occurs when the model learns the noise in the training data too well, performing poorly on new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Underfitting

A

When a model is too simple to capture the underlying pattern in the data, resulting in poor performance on both training and new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Bias

A

The error due to overly simplistic assumptions in the learning algorithm, leading to underfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Variance

A

The error due to the model’s sensitivity to small fluctuations in the training set, leading to overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Hyperparameter

A

A parameter whose value is set before the learning process begins and controls the model’s training process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Cross-Validation

A

A technique for evaluating ML models by partitioning the data into subsets and training/testing the model multiple times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Regularization

A

Techniques like L1 (Lasso) and L2 (Ridge) used to prevent overfitting by penalizing large coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Loss Function

A

A function that measures the discrepancy between the predicted value and the actual value, guiding model training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Activation Function

A

A function in neural networks that determines the output of a node given an input or set of inputs (e.g., ReLU, Sigmoid).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Backpropagation

A

A method used in neural networks to calculate the gradient of the loss function and update the weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Convolutional Neural Network (CNN)

A

A deep learning model commonly used for image recognition tasks, consisting of convolutional layers to extract features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Recurrent Neural Network (RNN)

A

A neural network with loops, allowing information to persist for sequential data processing like time series or natural language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Long Short-Term Memory (LSTM)

A

A type of RNN designed to overcome the vanishing gradient problem by introducing memory cells to retain information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Transfer Learning

A

A technique where a pre-trained model on one task is reused and fine-tuned for another related task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Ensemble Learning

A

A technique where multiple models are combined to improve the performance of the final model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Bagging

A

A method in ensemble learning that reduces variance by training multiple models on different random subsets of the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Boosting

A

An ensemble technique that sequentially trains models, with each new model focusing on the errors of the previous ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Stochastic Gradient Descent (SGD)

A

A variation of gradient descent where only a random subset of data points is used to update the model parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Epoch

A

One complete pass through the entire training dataset during the learning process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Batch Size

A

The number of training examples used in one iteration of model training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Tokenization

A

The process of breaking down text into smaller units, like words or subwords, for processing in NLP tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Word Embedding

A

A dense vector representation of words that captures semantic relationships, such as Word2Vec and GloVe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Attention Mechanism

A

A technique in deep learning that allows models to focus on specific parts of the input sequence when making predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Transformer

A

An architecture in deep learning that uses self-attention mechanisms and is commonly used for NLP tasks, such as in models like BERT and GPT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Natural Language Processing (NLP)

A

The field of AI that focuses on the interaction between computers and human language.

42
Q

Bag-of-Words (BoW)

A

A simple NLP model that represents text as an unordered collection of words, disregarding grammar and word order.

43
Q

Naive Bayes

A

A probabilistic classifier based on Bayes’ theorem, assuming independence between features.

44
Q

Principal Component Analysis (PCA)

A

A dimensionality reduction technique that transforms data into a set of orthogonal components.

45
Q

t-Distributed Stochastic Neighbor Embedding (t-SNE)

A

A nonlinear dimensionality reduction technique for visualizing high-dimensional data.

46
Q

Feature Engineering

A

The process of creating new features from existing data to improve model performance.

47
Q

Feature Scaling

A

The process of normalizing or standardizing the range of independent variables or features in the data.

48
Q

One-Hot Encoding

A

A technique to represent categorical variables as binary vectors for use in ML models.

49
Q

Confusion Matrix

A

A table used to describe the performance of a classification model, showing true positives, false positives, true negatives, and false negatives.

50
Q

Precision

A

The ratio of true positive predictions to the total number of positive predictions made by the model.

51
Q

Recall (Sensitivity)

A

The ratio of true positive predictions to the total actual positive cases in the dataset.

52
Q

F1 Score

A

The harmonic mean of precision and recall, providing a single metric for model performance.

53
Q

ROC Curve (Receiver Operating Characteristic)

A

A graphical plot illustrating the performance of a binary classifier as its discrimination threshold is varied.

54
Q

Area Under the Curve (AUC)

A

A single scalar value that summarizes the performance of a classifier, representing the probability of correctly ranking positive and negative instances.

55
Q

K-Means Clustering

A

An unsupervised learning algorithm that partitions data into K clusters based on feature similarity.

56
Q

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

A

A clustering algorithm that groups data based on density, identifying clusters of varying shapes and sizes.

57
Q

Hierarchical Clustering

A

A method of clustering that builds a hierarchy of clusters using either agglomerative (bottom-up) or divisive (top-down) approaches.

58
Q

Outlier Detection

A

The process of identifying data points that are significantly different from the rest of the dataset.

59
Q

Anomaly Detection

A

A method to identify unusual patterns or data points that do not conform to the expected behavior.

60
Q

Autoencoder

A

A type of neural network used to learn efficient representations (encodings) of data, commonly used for dimensionality reduction and anomaly detection.

61
Q

Generative Adversarial Network (GAN)

A

A deep learning model consisting of two neural networks (generator and discriminator) that compete to generate realistic synthetic data.

62
Q

Synthetic Data

A

Artificially generated data that imitates the statistical properties of real-world data, often used for privacy-preserving training.

63
Q

Bias-Variance Tradeoff

A

The balance between a model’s ability to fit the training data (bias) and its ability to generalize to new data (variance).

64
Q

Cold Start Problem

A

A challenge in recommendation systems when there is little or no data for new users or items.

65
Q

Data Augmentation

A

A technique to increase the diversity of training data by applying transformations such as rotation, flipping, or scaling.

66
Q

Dropout

A

A regularization technique in neural networks where randomly selected neurons are ignored during training to prevent overfitting.

67
Q

Early Stopping

A

A method to prevent overfitting by stopping the training process once the model’s performance on a validation set starts to deteriorate.

68
Q

Gradient Vanishing

A

A problem in deep neural networks where gradients become too small, slowing down learning, commonly addressed using techniques like LSTM and ReLU.

69
Q

Softmax

A

An activation function that transforms a vector of values into a probability distribution, used in multi-class classification.

70
Q

Logistic Regression

A

A statistical model for binary classification that estimates the probability of a binary outcome using a logistic function.

71
Q

Multilayer Perceptron (MLP)

A

A class of neural network consisting of multiple layers of nodes in a directed graph.

72
Q

Feature Importance

A

A measure of the contribution of each feature to the model’s predictions.

73
Q

Gradient Boosting

A

An ensemble technique that builds models sequentially, with each new model correcting the errors of the previous ones.

74
Q

XGBoost

A

An optimized implementation of gradient boosting, known for its speed and performance in ML competitions.

75
Q

Hyperparameter Tuning

A

The process of finding the best hyperparameter values for a model using methods like grid search or random search.

76
Q

Learning Curve

A

A plot that shows how a model’s performance improves as it learns from more training data.

77
Q

Min-Max Scaling

A

A feature scaling technique that normalizes data to a fixed range, usually [0, 1].

78
Q

Normalization

A

Adjusting values in data to a common scale without distorting differences in ranges of values.

79
Q

Token

A

A unit of text (e.g., word, subword) used as input for natural language processing models.

80
Q

Bagging

A

Short for Bootstrap Aggregating, an ensemble learning method that uses multiple models trained on random subsets of data to improve performance.

81
Q

Cross-Entropy Loss

A

A loss function commonly used in classification tasks to measure the difference between true labels and predicted probabilities.

82
Q

Batch Normalization

A

A technique to improve the training of deep neural networks by normalizing inputs of each layer.

83
Q

Adam Optimizer

A

An optimization algorithm that adjusts the learning rate based on estimates of the first and second moments of the gradients.

84
Q

BLEU Score (Bilingual Evaluation Understudy)

A

A metric for evaluating the quality of machine-translated text against one or more reference translations.

85
Q

Sparse Matrix

A

A matrix in which most of the elements are zero, often used in text mining and NLP.

86
Q

One-Hot Vector

A

A binary vector used to represent categorical data, where one element is ‘hot’ (1) and all others are ‘cold’ (0).

87
Q

Tokenization

A

The process of breaking down text into smaller units like words, subwords, or characters.

88
Q

Markov Chain

A

A stochastic model describing a sequence of possible events, where each event depends only on the state attained in the previous event.

89
Q

Monte Carlo Simulation

A

A computational technique that uses random sampling to obtain numerical results, often used in probabilistic modeling.

90
Q

Entropy

A

A measure of uncertainty or randomness in information theory, used in decision trees for determining data splits.

91
Q

Information Gain

A

A measure used in decision trees to determine the best feature to split on, based on the reduction in entropy.

92
Q

R-Squared (Coefficient of Determination)

A

A statistical measure that indicates how well a regression model fits the data.

93
Q

Learning Rate Decay

A

A technique that reduces the learning rate over time to stabilize the training process and improve model convergence.

94
Q

Embeddings

A

Low-dimensional, dense vector representations of high-dimensional data, often used in NLP.

95
Q

Perceptron

A

The simplest type of artificial neural network, consisting of a single layer of weights and a threshold activation function.

96
Q

Gradient Clipping

A

A technique to prevent exploding gradients in neural networks by capping gradients to a maximum value.

97
Q

Hinge Loss

A

A loss function used for training SVMs, penalizing points that lie on the wrong side of the decision boundary.

98
Q

Natural Language Generation (NLG)

A

The use of AI to generate human-like text based on input data or patterns.

99
Q

Zero-Shot Learning

A

A model’s ability to make predictions on classes not seen during training by leveraging shared attributes or semantics.

100
Q

Few-Shot Learning

A

An ML approach where models learn to make predictions from a very small number of training examples.