AI & ML Flashcards

1
Q

Artificial Intelligence

A

Development of intelligent systems capable of performing tasks that typically require human intelligence

Examples include perception, reasoning, learning, problem-solving, decision-making

Used for technologies like computer vision, facial recognition, fraud detection, and intelligent document processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Machine Learning

A

Type of AI for building methods that allow machines to learn, but not the same as AI

Data is leveraged to improve computer performance on a set of tasks

Make predictions based on data used to train the model; no explicit programming of rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Neural Network

A

Method in AI where nodes are connected together and organized in layers, talking to each other by passing data to the next layer

Creates an adaptive system that computers use to learn from their mistakes and improve continuously

Consists of Input Layer, Hidden Layers, and Output Layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Deep Learning

A

Method in AI that teaches computers to process data in a way that is inspired by the human brain

Uses neurons and synapses to train a model; process is more complex patterns in the data than traditional ML

Computer Vision, NLP; takes a large amount of input data and requires GPU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Generative AI

A

Field of computer science as a subset of Deep Learning for generating new data similar to the data it was trained on, such as images, text, audio, video, code, etc.

Unlabeled Data is used to pre-train a Foundation Model backed by a neural network; this model can then be adapted for more specific uses like text generation, info extraction, chatbots, and more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Training Data

A

Large dataset used to train MLs to process information and accurately predict outcomes, and is the most critical stage to building a good model

Can be Structured or Unstructured; Labeled or Unlabeled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Labeled Data

A

ML data that includes both input features and corresponding output labels

Used for Supervised Learning, where the model is trained to map inputs to known outputs

For example, dataset with images of animals where each image is labeled with the corresponding animal type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Unlabeled Data

A

ML data that includes only input features without any output labels

For example, a collection of images without any associated labels

Used for Unsupervised Learning, where the model tries to find patterns or structures in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Structured Data

A

Data is organized in a structured format, often in rows and columns

Tabular Data is data arranged in a table with rows representing records and columns representing features

Time Series Data is a series of data points collected or recorded at successive points in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Unstructured Data

A

Data that doesn’t follow a specific structure and is often text-heavy or multimedia content

Text Data is unstructured text such as articles, social media posts, or customer reviews

Image Data is data in the form of images, which can vary widely in format and content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Supervised Learning

A

ML learning method that learns a mapping function that can predict the output for new unseen input data

Needs Labeled Data; very powerful, but difficult to perform on millions of datapoints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression

A

Supervised Learning technique used to predict a numeric value based on input data

The output variable is continuous, meaning it can take any value within a range

Used when the goal is to predict a quantity or a real value; predicting house prices, stock prices, weather forecasting, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Classification

A

Supervised Learning technique used to predict the categorical label of input data

Output variable is discrete, which means it falls into a specific category or class

Used for scenarios where decisions or predictions need to be made between distinct categories; fraud, image types, diagnostics

Classify emails, animals; give labels to movies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Training Set

A

Data set used to train the model

Typically, 60-80% of the dataset

For example, 800 labeled images from a dataset of 1000 images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Validation Set

A

Data set used to tune model parameters and validate performance

Typically, 10-20% of the dataset

For example, 100 labeled images for hyperparameter tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Test Set

A

Data set used to evaluate the final model performance

Typically, 10-20% of the dataset

For example, 100 labeled images to test the model’s accuracy

17
Q

Feature Engineering

A

Process of using domain knowledge to select and transform raw data into meaningful features

Helps enhance performance of ML models

18
Q

Feature Extraction

A

Feature Engineering technique where you extract useful information from raw data, such as deriving age from date of birth

19
Q

Feature Selection

A

Feature Engineering technique where you select a subset of relevant features, like choosing important predictors in a regression model

20
Q

Feature Transformation

A

Feature Engineering technique where you transform data for better model performance, such as normalizing numerical data

21
Q

Unsupervised Learning

A

ML learning method that for discovering inherent patterns, structures, or relationships within the input data

Machine must uncover and create the groups itself, but humans still put labels on the output groups

Feature Engineering can help improve the quality of the training

Clustering use cases include customer segmentation, targeted marketing, recommender systems

22
Q

Clustering

A

Unsupervised Learning technique used to group similar data points together into clusters based on their features

For example, segment customers to understand different purchasing behaviors; then, target each segment with tailored marketing strategies

23
Q

Association Rule Learning

A

Unsupervised Learning technique used to group data points based on their relation to one another

For example, understand which products are frequently bought together; then, supermarket can place associated products together to boost sales

24
Q

Anomaly Detection

A

Unsupervised Learning technique used to identify outliers and strange patterns in data

For example, fraud detection in credit card purchases

25
Q

Semi-Supervised Learning

A

ML learning method that uses a small amount of labeled data and a large amount of unlabeled data to train systems

After that, the partially trained algorithm itself labels the unlabeled data; this is called pseudo-labeling

The model is then re-trained on the resulting data mix without being explicitly programmed

26
Q

Reinforcement Learning

A

Type of ML where an agent learns to make decisions by performing actions in an environment to maximize cumulative rewards

27
Q

RLHF

A

Reinforcement Learning from Human Feedback

Use human feedback to help ML models to self-learn more efficiently; incorporates human feedback in the reward function, to be more aligned with human goals, wants and needs

Used throughout GenAI applications including LLM Models; significantly enhances the model performance

Steps: Data Collection, Supervised Fine-Tuning, Training Reward Model, Optimization

28
Q

Model Fit

A

Measurement of how well a machine learning model adapts to data that is similar to the data on which it was trained

Overfitting: performs well on the training data, but doesn’t perform well on evaluation data

Underfitting: performs poorly on training data; could be a problem of having a model too
simple or poor data features

Balanced if performs well on training data and evaluation data

29
Q

Bias

A

Difference, or error, between predicted and actual value

Occurs due to the wrong choice in the ML process

If high, model doesn’t closely match the training data; considered as underfitting

Reduce this by using a more complex model and increase the number of features

30
Q

Variance

A

How much the performance of a model changes if trained on a different dataset which has a similar
distribution

If high, model is very sensitive to changes in the training data; considered as overfitting

Reduce this through feature selection for less, more important features; split into training and test data sets multiple times

31
Q

Confusion Matrix

A

Matrix that summarizes the performance of a machine learning model on a set of test data

Best way to evaluate the performance of a model that does classifications; i.e. binary classification

Precision best when false positives are costly; Recall best when false negatives are costly

F1 Score best for balance of Precision and Recall, especially for imbalanced datasets; Accuracy best for balanced datasets

32
Q

AUC-ROC

A

Area under the curve-receiver operator curve

Value from 0 to 1, max value represents absolute perfection

Shows what the curve for true positive compared to false positive looks like at various thresholds, with multiple confusion matrixes

33
Q

Regression Metrics

A

MAE: Mean Absolute Error; MAPE: Mean Absolute Percentage Error; RMSE: Root Mean Squared Error

MAE, MAPE, and RMSE measure the error, or how accurate the model is

R² explains variance in your model; close to 1 means predictions are good

34
Q

Inferencing

A

When a model is making prediction on new data

Real Time: models have to make decisions quickly as data arrives, and speed is preferred over perfect
accuracy; i.e. chatbots

Batch: large amount of data that is analyzed all at once, and perfect accuracy is preferred over speed; often used for data analysis

35
Q

Edge Inferencing

A

When a model is making prediction on new data, and is close to where the data is being generated

Use edge devices which run your model; less computational power but close proximity to your data

Small Language Models on edge devices offer very low latency, low compute footprint, and offline capability

Large Language Models on remote servers are more powerful, but higher latency and must be online for access

36
Q

Hyperparameter Tuning

A

Finding the best hyperparameters values to optimize the model performance

Hyperparameters are settings that define the model structure, learning algorithm, and process; set before training begins

Tuning improves model accuracy, reduces overfitting, and enhances generalization

37
Q

Learning Rate

A

Hyperparameter for how large or small the steps are when updating the model’s weights during training

High rate can lead to faster convergence but risks overshooting the optimal
solution

Low rate may result in more precise but slower convergence

38
Q

Batch Size

A

Hyperparameter for number of training examples used to update the model weights in one iteration

Smaller sizes can lead to more stable learning but require more time to compute

Larger sizes are faster but may lead to less stable updates

39
Q

Epochs

A

Hyperparameter that refers to how many times the model will iterate over the entire training dataset

Too few can lead to underfitting, while too many may cause overfitting