Artificial Intelligence & Applications Flashcards

1
Q

What is Artificial Intelligence (AI)?

A

AI is the ability of machines to think and act intelligently, like a human would.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does AI differ from traditional programming?

A

AI learns from data and makes decisions, while traditional programming involves telling a computer exactly what to do.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are key areas of AI?

A
  • Computer Vision
  • Machine Learning (ML)
  • Deep Learning (DL)
  • Data Mining
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Weak AI?

A

AI built for specific tasks—it doesn’t think like a human.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give examples of Weak AI.

A
  • Siri and Alexa
  • Chess-playing AI
  • Chatbots like GPT-4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Strong AI?

A

AI that can think, learn, and adapt like a human.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Does Strong AI exist?

A

Not yet! Scientists are still trying to create it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Machine Learning (ML)?

A

A method where AI is trained on data instead of being programmed manually.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two main types of Machine Learning?

A
  • Supervised Learning
  • Unsupervised Learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Supervised Learning?

A

The AI is trained on labeled data (data with answers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an example of Supervised Learning?

A

Teaching AI to recognize cats using labeled cat pictures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Unsupervised Learning?

A

AI finds patterns in data by itself—no labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between Machine Learning and Data Mining?

A

ML = AI learns patterns and makes predictions; Data Mining = Humans find patterns manually in big data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Deep Learning (DL)?

A

A special kind of Machine Learning that uses neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are key models in Deep Learning?

A
  • CNNs (Convolutional Neural Networks)
  • RNNs (Recurrent Neural Networks)
  • Autoencoders
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is CNN best for?

A

Images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is RNN best for?

A

Sequences like speech and text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Where is AI used?

A
  • Computer Vision
  • Natural Language Processing (NLP)
  • Generative AI
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is an application of AI in Computer Vision?

A

Medical imaging for detecting diseases from scans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does Natural Language Processing (NLP) enable AI to do?

A

Understand language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is an example of Generative AI?

A

GANs (Generative Adversarial Networks) that create new images, videos, and music.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Fill in the blank: AI is __________ technology that mimics human intelligence.

A

[smart]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

True or False: Deep Learning makes AI less powerful than traditional Machine Learning.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the goals of data exploration?

A

✔️ Visualize patterns and trends
✔️ Summarize key statistics
✔️ Detect anomalies
✔️ Understand relationships between variables

Example: Analyzing diamond prices for trends related to carat size and cut.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Why is data visualization important?
📌 Find patterns and trends 📌 Understand relationships between variables 📌 Detect errors or missing data 📌 Make data easier to interpret ## Footnote Quote: 'Make both calculations and graphs.' – F.J. Anscombe, 1973
26
What types of charts are used for different purposes?
🔹 Relationship → Scatter plots 🔹 Composition → Pie charts 🔹 Comparison → Bar charts, line graphs 🔹 Location → Maps & heatmaps ## Footnote Example: Scatter plots for height vs. weight.
27
What did researchers Cleveland & McGill find about chart design?
✔️ Position & length are the most accurate ways to show numbers ✔️ Pie charts are harder to interpret than bar charts ## Footnote Best Practice: Use clear, simple charts.
28
What is the Grammar of Graphics?
✔️ A structured approach to designing visualizations ✔️ Ensures consistency in designing graphs ✔️ Used in tools like ggplot2 in R ## Footnote Helps in creating clear visualizations.
29
What are common data issues in data pre-processing?
❌ Missing Values ❌ Duplicates ❌ Inconsistent Data ❌ Noise & Outliers ## Footnote Solutions include filling missing values and standardizing formats.
30
What is feature engineering in data pre-processing?
✔️ Feature Selection → Keep important variables ✔️ Feature Transformation → Convert data into better formats ## Footnote Example: Standardizing price and carat size in a diamonds dataset.
31
Fill in the blank: The package used for creating visualizations in R is _______.
[ggplot2]
32
What is the purpose of a scatter plot in data visualization?
Helps us see the relationship between two variables ## Footnote Example: Carat vs. Price in diamond datasets.
33
True or False: Cleaning data is essential for accuracy.
True
34
What are the key takeaways from the data exploration and visualization process?
✔️ Data exploration helps us understand patterns ✔️ Visualization is key for discovering insights ✔️ Choosing the right chart aids interpretation ✔️ The Grammar of Graphics helps create structured visualizations ✔️ Cleaning data is essential for accuracy
35
When do we use Machine Learning?
When no direct formula exists to solve a problem and when we have data that can help find patterns. ## Footnote Example: Predicting customer purchase behavior.
36
What is Supervised Learning?
A type of ML where the model is trained on labeled data, learning from known answers.
37
What are key features used in Supervised Learning?
* Buying Price * Maintenance Cost * Number of Doors * Seating Capacity * Luggage Boot Size * Safety Rating
38
What is Predictive Modeling?
When ML learns patterns from data to make predictions.
39
What is the difference between Regression and Classification?
* Regression → Predicts continuous values (e.g., house prices). * Classification → Assigns data into categories (e.g., spam or not spam).
40
What is Linear Regression?
A method to find the best-fit line Y = mx + c, where c is the intercept and m is the slope.
41
What are the limitations of Linear Regression?
* Not good for non-linear relationships * Not good when there are too many outliers.
42
What is a Decision Tree?
A flowchart-like structure where each decision leads to an outcome.
43
What is the process of creating a Decision Tree?
* Pick the best feature * Split the data into groups * Keep splitting until groups are pure.
44
What is Random Forest?
A collection of multiple decision trees to improve accuracy and reduce overfitting.
45
How does Random Forest work?
* Train many Decision Trees on random data subsets * Use different features at each split * Combine all tree predictions.
46
What is k-Nearest Neighbors (k-NN)?
A method that classifies new data points based on the 'k' closest points in the dataset.
47
What is the process for k-NN classification?
* Choose k * Compute distances from the new point to all training points * Find the k closest neighbors * Assign the most common class label or take the average.
48
What is a limitation of k-NN?
It is slow for large datasets.
49
List the main concepts of Supervised Learning.
* Uses labeled data * Regression vs. Classification * Linear Regression * Decision Trees * Random Forest * k-NN
50
What is the goal of Linear Regression?
To find the best-fit line that represents the relationship between variables.
51
True or False: Decision Trees can overfit.
True
52
Fill in the blank: Random Forest is an army of _______.
[Decision Trees]
53
What is Unsupervised Learning?
Unsupervised learning is a machine learning approach where the model learns patterns, structures, or groupings in the data without labeled outputs. ## Footnote Key characteristics include working with unlabelled data, finding hidden structures, and being used for clustering and dimensionality reduction.
54
What are the key characteristics of Unsupervised Learning?
* Works with unlabelled data (no predefined categories) * Finds hidden structures in data * Used for clustering & dimensionality reduction ## Footnote Example: Grouping similar books in an unorganized library.
55
What is Clustering?
Clustering is a method in unsupervised learning that groups similar data points together. ## Footnote Within a cluster, data points are similar; in different clusters, they are dissimilar.
56
Why is Clustering used?
* Data Reduction * Outlier Detection * Data Segmentation ## Footnote Clustering helps summarize large datasets, identifies unusual patterns, and groups customers by behavior.
57
What are some real-world applications of Clustering?
* Social Network Analysis * Image Segmentation * Data Annotation ## Footnote Examples include grouping users based on interests and dividing images for medical imaging.
58
What are the steps in Clustering?
* Define a distance metric to measure similarity * Form clusters by grouping similar data points * Maximize within-cluster similarity, minimize between-cluster similarity ## Footnote Common distance metrics include Euclidean and Manhattan distances.
59
What is K-Means Clustering?
K-Means is a partition-based clustering algorithm that groups data into k clusters. ## Footnote It is one of the most popular clustering algorithms.
60
How does K-Means Clustering work?
* Choose the number of clusters (k) * Select k random points as initial centroids * Assign each data point to the nearest centroid * Recalculate centroids by finding the mean of each cluster * Repeat until centroids stop changing ## Footnote Example: Grouping customers into low, medium, and high spenders.
61
What is the Elbow Method in K-Means?
The Elbow Method involves plotting the Within-Cluster Sum of Squares (WCSS) and looking for the 'elbow' point where adding more clusters stops improving the fit significantly. ## Footnote The bend in the curve indicates the optimal number of clusters (k).
62
What are the strengths of K-Means Clustering?
* Simple and efficient * Works well for large datasets ## Footnote K-Means is favored for its speed and ease of use.
63
What are the weaknesses of K-Means Clustering?
* Requires predefined k * Sensitive to initialization * Struggles with non-globular clusters ## Footnote These limitations can affect the clustering results.
64
What is Hierarchical Clustering?
Hierarchical Clustering builds a tree-like structure (dendrogram) instead of predefined partitions. ## Footnote It allows for a more flexible grouping of data.
65
How does Hierarchical Clustering work?
* Start with each data point as its own cluster * Merge the closest clusters based on a chosen distance metric * Repeat until one large cluster remains ## Footnote Agglomerative Clustering is a bottom-up approach.
66
What are common distance metrics used in Hierarchical Clustering?
* Single Linkage * Complete Linkage * Centroid Distance ## Footnote These metrics help determine how clusters are formed.
67
What are the strengths of Hierarchical Clustering?
* No need to specify k beforehand * Creates arbitrarily shaped clusters ## Footnote This flexibility is an advantage over K-Means.
68
What are the weaknesses of Hierarchical Clustering?
Hierarchical Clustering is computationally expensive for large datasets. ## Footnote This can limit its practicality in big data scenarios.
69
What are the key takeaways from the study of Unsupervised Learning and Clustering?
* Unsupervised learning finds patterns in unlabeled data * Clustering groups similar data points together * K-Means is a fast, efficient partition-based method * Hierarchical clustering builds a tree-like structure * Choosing the right k is crucial for effective clustering ## Footnote These points summarize the fundamental concepts of clustering.
70
What is the main difference between Traditional Machine Learning and Deep Learning?
Traditional Machine Learning involves manually selecting features, while Deep Learning learns features automatically from raw data. ## Footnote Example: K-Means Clustering for ML vs. Neural Networks for DL.
71
What is the purpose of Gradient Descent in machine learning?
To reduce errors in predictions and improve the model's accuracy. ## Footnote This process makes AI smarter by adjusting weights based on predictions.
72
What are the steps involved in the Backpropagation process?
* Forward Pass * Compute Loss * Backpropagation * Gradient Descent ## Footnote These steps help the model adjust weights to minimize prediction errors.
73
True or False: Neural Networks are inspired by the structure of the human brain.
True ## Footnote Neural networks use artificial 'neurons' to process information.
74
What are the three types of layers in a Neural Network?
* Input Layer * Hidden Layers * Output Layer ## Footnote Each layer plays a specific role in processing data and making predictions.
75
Fill in the blank: The equation for a prediction in a neural network is Prediction = Input × Weight → _______.
Activation Function ## Footnote This function determines the output based on the weighted input.
76
What are the two types of Loss Functions mentioned?
* Binary Cross-Entropy * Categorical Cross-Entropy ## Footnote These functions help determine how well the model is performing.
77
What is the role of an Optimizer in deep learning?
To help the AI learn faster by adjusting learning rates and strategies. ## Footnote Examples include SGD and Adam Optimizer.
78
What dataset is used in the example of Handwritten Digit Recognition?
MNIST Dataset ## Footnote This dataset contains 60,000 training images and 10,000 test images.
79
What is the first step in building a simple neural network using Keras?
Import Libraries ## Footnote Essential libraries include Sequential, Dense, np_utils, and mnist.
80
What is the purpose of normalizing pixel values in the MNIST dataset?
To improve training efficiency by scaling values from 0-255 to 0-1. ## Footnote This normalization helps the model learn better.
81
What is One-Hot Encoding used for in the context of neural networks?
To convert labels into a format the neural network can understand. ## Footnote This is crucial for multi-class classification.
82
What activation function is used in the first layer of the example neural network?
ReLU (Rectified Linear Unit) ## Footnote This function helps in learning complex patterns.
83
What metric is used to evaluate the performance of the model on test data?
Accuracy ## Footnote This metric indicates how well the model predicts unseen data.
84
True or False: Keras simplifies the process of building neural networks.
True ## Footnote It provides an easy-to-use interface for model creation.
85
What is the goal of training a neural network on the MNIST dataset?
To correctly predict handwritten digits (0-9). ## Footnote This involves optimizing the model through various epochs.
86
What is the goal of Backpropagation and Gradient Descent?
Minimize the error between predicted and actual outputs.
87
What are the steps involved in Backpropagation?
* Forward Pass: Compute predictions * Compute Loss: Measure the error * Backpropagation: Calculate gradients * Gradient Descent: Adjust weights to reduce error
88
What do activation functions prevent in neural networks?
They prevent neural networks from behaving like linear regression models and allow them to learn complex relationships.
89
What is the equation without activation functions?
output = dot(W, input) + b
90
What is the equation with activation functions?
output = ReLU(dot(W, input) + b)
91
What is a Linear Activation Function?
Output = Input (Straight line)
92
What is the main issue with the Sigmoid activation function?
Vanishing Gradient – When values go beyond ±3, the gradient becomes tiny, and learning slows down.
93
What does the Softmax activation function do?
Converts values into probabilities that sum to 1.
94
What is an example of Softmax output?
* Cat: 70% * Dog: 20% * Bird: 10%
95
What are the benefits of the Tanh activation function?
Maps inputs between -1 and 1, allowing negative values.
96
What is a key problem with Tanh?
Still suffers from vanishing gradient for large values.
97
What distinguishes ReLU from other activation functions?
Only activates for positive values; negative inputs become 0.
98
What is a problem associated with ReLU?
Dead Neurons – if a neuron only gets negative values, it stops learning.
99
How does Leaky ReLU address the dead neuron problem?
Gives negative values a small value instead of zeroing them out.
100
What is the best use case for the Sigmoid activation function?
Probability outputs (binary classification).
101
What problems are associated with Softmax?
Can be computationally expensive.
102
When is Tanh best utilized?
In situations where negative values are needed.
103
What are the common uses for ReLU?
Deep learning models (default choice).
104
What is the main advantage of Leaky ReLU?
Prevents dead neurons and improves training.
105
Fill in the blank: Activation functions allow _______ models to learn complex data.
[deep learning]
106
What is the most commonly used activation function for hidden layers?
ReLU
107
What activation function is best for multi-class classification?
Softmax
108
What is the purpose of activation functions?
They enable deep networks to learn complex data relationships. ## Footnote Activation functions are crucial for introducing non-linearity into the model.
109
What is a key limitation of linear activation functions?
They can't handle complex patterns. ## Footnote Linear activation functions are insufficient for deep learning tasks.
110
Name a common activation function used for probabilities.
Sigmoid ## Footnote Sigmoid is often used in binary classification tasks.
111
What is the main drawback of the sigmoid activation function?
It suffers from vanishing gradients. ## Footnote This can slow down the learning process in deep networks.
112
Which activation function is used in multi-class classification?
Softmax ## Footnote Softmax converts logits to probabilities for multiple classes.
113
What range does the Tanh activation function map inputs to?
-1 and 1. ## Footnote Tanh is faster than sigmoid and can help with gradient flow.
114
What is the most commonly used activation function?
ReLU ## Footnote ReLU helps avoid vanishing gradients, making it popular in deep networks.
115
What problem does Leaky ReLU address?
The 'dying neuron' problem. ## Footnote Leaky ReLU allows small negative values to keep neurons active.
116
What is the purpose of a loss function?
It measures model error. ## Footnote Loss functions provide feedback on the model's performance.
117
What type of loss function is Log Loss?
Cross-Entropy Loss. ## Footnote It is commonly used for classification tasks.
118
What loss function is used when there are only two classes?
Binary Cross-Entropy. ## Footnote This is applicable in scenarios like cat vs. dog classification.
119
What is the formula for Cross-Entropy Loss?
L = - (y log(y_pred) + (1 - y) log(1 - y_pred)) ## Footnote This formula calculates the loss based on predicted and actual values.
120
What is backpropagation used for?
To adjust model weights to reduce error. ## Footnote It calculates the contribution of each weight to the total error.
121
Define Gradient Descent.
Step-by-Step Learning. ## Footnote It is an optimization algorithm used to minimize the loss function.
122
What is Stochastic Gradient Descent (SGD)?
Updates weights after each training sample. ## Footnote This can lead to faster convergence but may introduce noise.
123
What is Mini-Batch Gradient Descent?
Updates weights using small batches. ## Footnote This method balances speed and stability during training.
124
What is the learning rate's role in gradient descent?
Controls how big the update steps are. ## Footnote A proper learning rate is crucial for effective training.
125
What does momentum do in gradient descent?
Helps avoid local minima by adding past weight updates to the current one. ## Footnote This technique accelerates convergence.
126
What is dropout in the context of model training?
Randomly removes neurons during training. ## Footnote This prevents over-reliance on certain features.
127
What does regularization do?
Penalizes overly complex models to encourage generalization. ## Footnote This helps to avoid overfitting.
128
What is early stopping?
Stops training when validation loss stops improving. ## Footnote This technique helps prevent overfitting by halting training at the right time.
129
What are epochs in machine learning?
Number of times the entire dataset is passed through the model. ## Footnote More epochs can improve learning but may also lead to overfitting.
130
What is the batch size in gradient descent?
Number of samples used per gradient update. ## Footnote The choice of batch size can affect training speed and model performance.
131
Name a common optimizer used in machine learning.
Adam ## Footnote Adam is known for its adaptive learning rate and is effective for many tasks.
132
What is the main takeaway regarding activation functions?
They enable deep learning. ## Footnote Activation functions are essential for neural networks to learn complex patterns.
133
True or False: Dropout and regularization help prevent overfitting.
True ## Footnote These techniques are widely used in training models to improve generalization.
134
What are the two main types of supervised machine learning?
Classification and Regression ## Footnote Classification involves categorical output, while regression predicts numerical output.
135
What does regression in AI predict?
Continuous values based on input data ## Footnote Example: Predicting house prices based on various factors.
136
How does regression work in AI?
AI learns a function y = f(x) to predict Y ## Footnote X represents input variables like house size, location, etc.
137
What is Mean Squared Error (MSE)?
A loss function that measures how far off AI’s predictions are ## Footnote Formula: MSE = (1/n) ∑(y_actual - y_predicted)².
138
What does a lower MSE indicate?
A better AI model
139
What is a Simple Artificial Neural Network (ANN) for regression composed of?
Layers of neurons making decisions
140
What is the first step in building a regression model?
Load & Prepare Data
141
What is Min-Max Scaling?
Rescales data to [0,1]
142
What is Z-Score Normalization?
Makes data have mean = 0, std dev = 1
143
What is the activation function used in the hidden layers of the regression model?
ReLU
144
What optimizer is used when compiling the regression model?
Adam
145
What is Text Classification?
AI sorts text into categories
146
What is the example dataset used for text classification?
IMDB movie reviews
147
How many reviews are in the IMDB dataset?
50,000 highly polarized reviews
148
What is the first step in preprocessing text for AI?
Convert words into a dictionary of numbers
149
What does One-Hot Encoding do?
Turns words into 0s & 1s
150
What is the input layer dimension for the FCN used in sentiment analysis?
10,000-dimensional input
151
What activation function is used in the output layer for binary classification?
Sigmoid
152
What is the loss function used for the IMDB dataset?
Binary Crossentropy
153
What task is performed with the Reuters dataset?
Classify short news articles into 46 different categories
154
How is the Reuters classification different from IMDB classification?
46 categories instead of 2
155
What activation function is used in the final layer for multi-class classification?
Softmax
156
What loss function is used for the Reuters dataset?
Categorical Crossentropy
157
What is the main purpose of regression in the context of the Boston Housing Prices example?
Predict continuous values
158
What are the key components of text classification for IMDB and Reuters?
* Predicts categories * One-Hot Encoding * Activation functions: ReLU (hidden), Sigmoid (IMDB), Softmax (Reuters) * Loss functions: Binary Crossentropy (IMDB), Categorical Crossentropy (Reuters)
159
Why do we evaluate Machine Learning (ML) and Deep Learning (DL) models?
To ensure they do what we want, avoid overfitting/underfitting, and pick the best model and settings.
160
What is the problem with model evaluation?
How do you know your model isn’t just memorizing the training data?
161
What is the solution to model evaluation problems?
Split the data into training, validation, and test sets.
162
What is a Training Set?
Used to train the model.
163
What is a Validation Set?
Used during training to tune hyperparameters.
164
What is a Test Set?
Used after training to check final performance.
165
What is the Holdout method in model validation?
One-time split: Train (e.g. 60%), Val (20%), Test (20%).
166
When is the Holdout method best used?
For large datasets.
167
What is K-Fold Cross Validation (KCV)?
Split data into k parts, rotate training/testing.
168
When is K-Fold Cross Validation best used?
Best for small data, better accuracy.
169
What is Overfitting?
Too good on training, bad on new data.
170
What is Underfitting?
Bad on both training and new data.
171
What are the characteristics of Overfitting?
High variance, memorizing.
172
What are the characteristics of Underfitting?
High bias, guessing.
173
What is Early Stopping?
Stop training when validation loss goes up.
174
What is L2 Regularization?
Penalizes large weights to keep the model simple.
175
What is the L2 Regularization formula?
λ * Σ(weights²) → encourages smaller weights.
176
What are Hyperparameters?
Settings you pick before training (not learned from data).
177
Give examples of Hyperparameters.
* Learning rate * Batch size * Number of layers * Activation functions
178
What is Grid Search?
Try every combo of settings.
179
When is Grid Search effective?
Good for small search spaces.
180
What is a disadvantage of Grid Search?
Super slow if too many options.
181
What is Random Search?
Pick random combos.
182
When is Random Search better?
Better for large/continuous spaces.
183
What are Classification Models used for?
To assign a class (label) to data.
184
What is an example of a Classification Model?
Is this email spam? Is the tumor benign or malignant?
185
What is Accuracy in model evaluation?
% of correct predictions.
186
What is a limitation of Accuracy?
Doesn't work well when classes are imbalanced.
187
What is a Confusion Matrix?
A table used to describe the performance of a classification model.
188
What does TP stand for in a Confusion Matrix?
True Positives.
189
What does FN stand for in a Confusion Matrix?
False Negatives.
190
What does FP stand for in a Confusion Matrix?
False Positives.
191
What does TN stand for in a Confusion Matrix?
True Negatives.
192
What is Precision?
TP / (TP + FP) → How many predicted positives were correct?
193
What is Recall?
TP / (TP + FN) → How many actual positives were found?
194
What is the F1 Score?
Harmonic mean of Precision & Recall → 2 * (P * R) / (P + R).
195
What is AUC?
Area Under Curve measures model’s ability to distinguish classes.
196
When should you use F1 or AUC?
When data is imbalanced or missing a positive is worse than a few false alarms.
197
What are Regression Models used for?
When predicting a number (not a category).
198
Give examples of Regression Models.
* House prices * Stock market trends
199
What is MAE?
Mean Absolute Error – average of errors, less sensitive to outliers.
200
What is MSE?
Mean Squared Error – squares errors, punishes big mistakes more.
201
What are Clustering Models used for?
When you don’t have labels – model tries to find natural groupings.
202
Give an example of a Clustering Model use case.
Segmenting customers into behavior types.
203
What metric is used for evaluating clustering?
Silhouette Coefficient.
204
What does a higher Silhouette Coefficient indicate?
Better clustering.
205
What is the primary problem with Fully Connected Networks (FCNs) regarding image processing?
Images are flattened into 1D vectors, losing spatial structure ## Footnote The network does not recognize the spatial relationships between pixels.
206
How do Convolutional Neural Networks (CNNs) address the issues of FCNs?
Preserve spatial locality and handle images more intelligently ## Footnote CNNs maintain the relationships between neighboring pixels.
207
What is a Convolutional Neural Network (CNN)?
A deep learning model that works especially well with visual data like images ## Footnote Used for applications like facial recognition, self-driving cars, medical imaging, and object detection.
208
What is the key idea behind how CNNs function?
They learn patterns in an image using filters ## Footnote This involves detecting features such as edges and textures.
209
What is the convolution operation in CNNs?
A small matrix called a kernel slides over the image, multiplying and summing to create a feature map ## Footnote This process helps detect edges, textures, and shapes.
210
What does the term 'stride' refer to in the context of CNNs?
How far the filter moves each step ## Footnote Stride affects the size of the output feature map.
211
What are the two types of padding in CNNs?
'valid' (no padding) and 'same' (padding with zeros) ## Footnote Padding influences the output size of the feature maps.
212
What is the function of the convolutional layer in a CNN architecture?
Extracts feature maps ## Footnote This is the first layer that processes the input image.
213
What does the pooling layer do in a CNN?
Shrinks size while keeping important information ## Footnote It reduces the dimensionality of feature maps.
214
What is the purpose of the activation layer in a CNN?
Adds non-linearity (e.g., ReLU) ## Footnote This helps the model learn complex patterns.
215
What is the role of the fully connected layer in CNNs?
Final decision-making (e.g., classification) ## Footnote It combines all features extracted to make predictions.
216
How do CNNs learn?
By adjusting filters to minimize prediction error ## Footnote This involves a process of convolution, pooling, and backpropagation.
217
What is the basic process of CNN learning?
Input image goes through convolution + pooling, output passed to fully connected layers, loss function calculates error, backpropagation computes gradients, optimizer updates weights ## Footnote This iterative process helps improve the model's accuracy.
218
What is an example of the input and output of a convolutional layer?
Input: 32×32 image, Output: 30×30×16 ## Footnote This is based on using a 3×3 kernel with 16 filters.
219
What are the total learnable parameters in a CNN example provided?
255,632 ## Footnote This is the sum of all parameters across different layers.
220
What is max pooling in CNNs?
Picks the biggest number in a region ## Footnote This method helps retain the most significant features.
221
What is average pooling in CNNs?
Takes the average of numbers in a region ## Footnote This method can help smooth out the feature maps.
222
What are the benefits of pooling layers in CNNs?
* Reduces size * Controls overfitting * Speeds up training ## Footnote Pooling is crucial for efficient CNN performance.
223
What happens in the fully connected layer of a CNN?
All features combine to make a decision ## Footnote For example, it classifies images as either a cat or a dog.
224
What do convolutional layers learn using?
Filters ## Footnote Filters are also referred to as kernels.
225
What is the purpose of pooling in CNNs?
Downsample while keeping key info
226
What does activation introduce in CNNs?
Non-linear twist (usually ReLU)
227
What is the role of fully connected layers in CNNs?
Combine all for final output
228
What is the function of filters (kernels) in convolutional layers?
Scan over input image and create feature maps
229
What is stride in the context of convolutional layers?
How far the filter moves
230
What does 'valid' padding mean?
No padding, smaller output
231
What does 'same' padding do?
Keeps input/output sizes the same (zero-padding edges)
232
What is the typical structure of each convolutional layer?
Convolution and Activation (e.g., ReLU)
233
What is the most common pooling method used in CNNs?
MaxPooling2D
234
What is the pool size and stride for MaxPooling2D?
Pool size: (2,2), stride: (2,2)
235
What do fully connected layers do with learned features?
Flatten them and feed into final classifier
236
What is LeNet known for?
An early CNN architecture for digit recognition (MNIST)
237
What is the key idea behind LeNet architecture?
Alternating Conv → ReLU → Pool
238
What do feature maps in LeNet use to save memory?
Shared weights
239
What is the training configuration for the LeNet model?
20 epochs, Batch size: 128, Optimizer: Adam
240
What are two regularization techniques mentioned for improving generalization?
Spatial Dropout and Hyperparameter Tuning
241
What problem can arise during CNN training?
Vanishing Gradients or Exploding Gradients
242
What does batch normalization do?
Makes training smoother, faster, and more stable
243
What is the formula for batch normalization?
Norm = (x - mean) / sqrt(var + ε) Out = γ * Norm + β
244
What are the benefits of using batch normalization?
* Reduces sensitivity to weight init * Slightly regularises model (less overfitting) * Allows higher learning rates
245
When is it recommended to add batch normalization in Keras?
Before the activation
246
What do convolutional layers do in a CNN?
Extract features
247
What is the purpose of pooling layers?
Shrink spatial size
248
What is the function of fully connected layers?
Make decisions
249
What do activation functions add to a neural network?
Non-linearity
250
What does batch normalization do during training?
Stabilises training
251
What is transfer learning?
Reusing a pretrained model for your own task
252
What are the benefits of transfer learning?
* Saves time * Works well with small datasets * Leverages existing learned features
253
What is one strategy for transfer learning?
Freeze layers and only retrain the final classifier
254
What is data augmentation?
Pretend you have more data by slightly changing existing images
255
List examples of data augmentation techniques.
* Rotate * Flip * Zoom * Brightness tweak * Crop
256
What does dropout do during training?
Randomly 'turn off' neurons
257
What problem does dropout help prevent?
* Overfitting * Reliance on specific paths in the network
258
What is a common value for dropout?
Dropout(0.5) → 50% of neurons are dropped
259
What is a characteristic of VGG architecture?
Uses only 3×3 convolutions
260
What notable achievement is associated with AlexNet?
Won ImageNet 2012
261
What is a key feature of ResNet architecture?
Uses skip connections (identity mappings)
262
What does ReLU stand for and what does it do?
ReLU: max(0, x); Fast and simple activation function
263
What is the advantage of Leaky ReLU?
Allows small negative values to avoid 'dead neurons'
264
What does the sigmoid function do?
Maps to (0, 1); good for binary classification
265
What is the range of the tanh function?
Maps to (-1, 1)
266
What is the purpose of optimisers in training?
Update weights to minimise loss
267
What does SGD stand for?
Simple gradient descent
268
What does Adam optimiser combine?
Momentum + RMSprop
269
Why is visualising CNNs important?
* See what filters are learning * Debug issues * Understand model behaviour
270
What is one technique for visualising CNNs?
Feature map visualisation
271
Fill in the blank: Data augmentation helps fight overfitting by training on '______' versions of your data.
new
272
What is class imbalance?
Class imbalance = When one class has way more examples than another.
273
What is an example of class imbalance?
Negative samples: 998, Positive samples: 2.
274
What accuracy could a model achieve by always predicting 'Negative' in a class imbalance scenario?
99.8% accuracy.
275
Why is class imbalance a problem?
Model ignores the minority class.
276
What is the impact of class imbalance on model predictions?
Biased boundaries = Bad predictions.
277
In which areas is class imbalance especially problematic?
* Medical diagnoses * Fraud detection * Rare event prediction
278
What is Binary Cross-Entropy Loss (BCE)?
L_BCE = - y_i * log(y_i*) - (1 - y_i) * log(1 - y_i*).
279
How does class imbalance affect the Binary Cross-Entropy Loss?
Majority class dominates the loss function.
280
What does the model optimize for in the presence of class imbalance?
Overall accuracy, not fair balance.
281
What is a solution to class imbalance in model training?
Weighted Loss Functions.
282
What is Weighted Binary Cross Entropy?
Assign higher importance (weight) to the minority class.
283
Provide a Keras example for setting class weights.
class_weights = {0: 1.0, 1: 5.0}.
284
What is Weighted Categorical Cross Entropy used for?
For multi-class problems.
285
What is one strategy for fixing imbalanced data?
Collect more data.
286
What is oversampling?
Duplicate minority class samples.
287
What is undersampling?
Remove majority class samples.
288
What is data augmentation?
Make more diverse samples for the minority.
289
What does SMOTE stand for?
Synthetic Minority Over-sampling Technique.
290
What are the steps involved in SMOTE?
* Pick a minority sample * Find its k-nearest neighbors * Interpolate a new sample between it and a neighbor.
291
What is an analogy for the SMOTE process?
Draw a line between two known dots and place a new dot somewhere along it.
292
What is the limitation of SMOTE?
Great for structured datasets – not ideal for raw images.
293
What is the purpose of data augmentation for images?
Trick your model into seeing new examples by tweaking real ones.
294
Provide a Keras example for data augmentation.
datagen = ImageDataGenerator(rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest').
295
What is a warning regarding data augmentation and SMOTE?
Don’t augment or SMOTE your whole dataset before splitting into train/test.
296
What should you apply balancing/augmentation to?
Only on the training set.
297
Why is accuracy misleading in the presence of class imbalance?
Accuracy does not reflect the performance on the minority class.
298
What are better metrics to use when class imbalance exists?
* Recall * Precision * Accuracy
299
What does Recall measure?
TP / (TP + FN).
300
What does Precision measure?
TP / (TP + FP).
301
What does Accuracy measure?
(TP + TN) / total.
302
What is the ideal combination of metrics for a model?
High recall + high precision.
303
What is Dimensionality Reduction?
Shrinking your data without losing its meaning. ## Footnote Keeps only the important features, reduces memory & computation, helps models generalise better.
304
What is the Curse of Dimensionality?
More features → data becomes sparse. ## Footnote Sparse data → model overfits easier, fewer samples per feature = less reliable learning.
305
What is an Autoencoder?
A special kind of neural network that learns to compress then rebuild data.
306
What is the key idea behind an Autoencoder?
Learn a smart encoding of the input, then use that encoding to reconstruct the original.
307
What are the components of an Autoencoder?
* Encoder: Compresses input into smaller vector * Latent Space (Code): The compressed form * Decoder: Reconstructs the original from the code * Loss: Measures how close output is to original (e.g., MSE)
308
What is the goal of an Autoencoder?
Minimise the difference between input and output.
309
Fill in the blank: The __________ is the compressed form in an Autoencoder.
Latent Space (Code)
310
What is the architecture of an Autoencoder?
Input size = 784, hidden size = 128, code size = 32.
311
What is a Denoising Autoencoder?
Trains the autoencoder to remove noise from input images.
312
What does the input and target look like for a Denoising Autoencoder?
Input = Noisy image, Target = Clean image.
313
What is a Convolutional Autoencoder (CAE)?
Autoencoders for image data using Conv2D layers.
314
What is the role of the Encoder in a Convolutional Autoencoder?
It uses Conv2D layers and MaxPooling2D to compress the input.
315
What is the role of the Decoder in a Convolutional Autoencoder?
It reconstructs the image using Conv2D layers and upsampling methods.
316
True or False: Learnable upsampling in a Decoder leads to better performance than fixed upsampling.
True.
317
What are the applications of Autoencoders?
* Denoising * Compression * Image Colourisation * Anomaly Detection * Feature Extraction
318
Fill in the blank: In anomaly detection, a large __________ error indicates a likely anomaly.
reconstruction
319
What was the result of using a Convolutional Autoencoder on medical ultrasound images?
Successfully removed added annotations & noise.
320
What is the goal of a Computer Vision Pipeline?
Get machines to 'see' and understand images/videos.
321
What are the three levels of tasks in Computer Vision?
* Low-level * Mid-level * High-level
322
What are examples of low-level tasks in Computer Vision?
* Edge detection * Texture analysis * Color analysis
323
What are examples of mid-level tasks in Computer Vision?
* Segmentation * Object tracking
324
What are examples of high-level tasks in Computer Vision?
* Object recognition * Scene understanding
325
What is Image Segmentation?
Segmenting = Splitting an image into meaningful parts.
326
What are the types of Image Segmentation?
* Unsupervised Segmentation * Supervised Segmentation * Semantic Segmentation * Instance Segmentation
327
What is Unsupervised Segmentation?
No labels, cluster-based
328
What is Supervised Segmentation?
Learn from labeled data
329
What is Semantic Segmentation?
Label each pixel with a class (e.g., 'car')
330
What is Instance Segmentation?
Separates individual objects (e.g., car #1 vs car #2)
331
What does R-CNN stand for?
Region-based CNN
332
What is the primary function of R-CNN?
Detect objects in images using bounding boxes.
333
What is the pipeline of R-CNN?
* Input Image * Generate ~2000 region proposals (Selective Search) * Classify each region using CNN * Refine bounding boxes
334
Why is R-CNN considered slow?
* Classifies all 2k regions separately * Selective Search is not learnable * Trains 3 models: CNN + Classifier + Bounding Box Regressor
335
What are the key changes in Fast R-CNN?
* Runs CNN once on the image * Extracts a feature map * Regions of Interest (RoI) are pulled from that map * Everything is trained end-to-end in a single model
336
What are the pros of Fast R-CNN?
* Way faster * More efficient learning
337
What are the cons of Fast R-CNN?
Still uses Selective Search, which is slow.
338
What major improvement does Faster R-CNN introduce?
Adds a Region Proposal Network (RPN)
339
How does the Region Proposal Network (RPN) work?
* CNN → Feature Map * RPN slides across map, creates anchors * Predicts which anchor = object and how well it fits
340
What is the purpose of RoI Pooling?
Converts different-sized regions into fixed-size feature maps.
341
What is U-Net designed for?
Label every pixel in medical images.
342
What is the structure of U-Net?
* Downsampling path * Upsampling path * Skip connections between matching levels
343
What are the pros of U-Net?
* No Dense layers * Any input size allowed * Combines location + context
344
What is Mask R-CNN built upon?
Faster R-CNN
345
What additional feature does Mask R-CNN provide?
A branch for pixel-wise binary masks
346
What outputs does Mask R-CNN provide?
* Class * Bounding box * Object shape
347
What is the main purpose of R-CNN?
Object Detection
348
What is the main purpose of Fast R-CNN?
Faster Detection
349
What is the main purpose of Faster R-CNN?
Fully learnable Detection
350
What is the main purpose of U-Net?
Semantic Segmentation
351
What is the main purpose of Mask R-CNN?
Instance Segmentation
352
What is a key strength of R-CNN?
Accurate but slow
353
What is a key strength of Fast R-CNN?
Shared CNN pass
354
What is a key strength of Faster R-CNN?
Adds RPN
355
What is a key strength of U-Net?
Flexible, great for medical applications
356
What is a key strength of Mask R-CNN?
Adds mask branch to Faster R-CNN
357
What does NLP stand for?
Natural Language Processing ## Footnote A broad field for making computers understand and process language.
358
What are LLMs?
Large Language Models ## Footnote Deep learning models trained on massive text, such as GPT and BERT.
359
List three tools included in NLP.
* N-grams * TF-IDF * Bag of Words
360
What is the main goal of NLP?
Make machines understand and interpret human language (spoken or written)
361
What are the two major tasks of NLP?
* Natural Language Understanding (NLU) * Natural Language Generation (NLG)
362
Define Natural Language Understanding (NLU).
Get meaning from language.
363
Define Natural Language Generation (NLG).
Produce human-like text.
364
What is lexical ambiguity?
Word meaning ambiguity, e.g., 'bank' (money vs river).
365
What is semantic ambiguity?
Sentence meaning ambiguity, e.g., 'I saw him with a telescope.'
366
What is anaphoric ambiguity?
Referring to something earlier, e.g., 'He told his dog to sit, and it did.'
367
List three applications of NLU.
* Search * Word prediction * Text classification (e.g., spam detection)
368
What is the first step in the NLP pipeline?
Sentence Segmentation
369
Fill in the blank: The process of breaking sentences into words is called _______.
[Tokenization]
370
What is the difference between stemming and lemmatization?
* Stemming: Chops suffixes crudely * Lemmatization: Uses dictionary rules
371
Provide an example of stemming.
'drove' → 'drov'
372
Provide an example of lemmatization.
'drove' → 'drive'
373
What are stop words?
Common words with little meaning on their own (e.g., 'the', 'is', 'and').
374
Why are stop words removed in text analysis?
Helps reduce noise in text analysis.
375
What does POS tagging stand for?
Part of Speech tagging
376
List four types of labels assigned in POS tagging.
* Noun * Verb * Adjective * Adverb
377
What does Bag of Words (BoW) do?
Converts text into a vector of word counts, ignoring word order.
378
What is a limitation of the Bag of Words model?
Loses grammar & order info.
379
What is the purpose of Information Retrieval Models?
Rank documents based on similarity to a search query.
380
What are the components of the TF-IDF formula?
* TF = How often word shows up in a doc * IDF = How rare word is in whole corpus * TF-IDF = TF × IDF
381
What does TF-IDF emphasize?
Unique, meaningful words.
382
What does an N-gram predict?
Next word using previous N-1 words.
383
What is a Bigram?
2 words.
384
What is a Trigram?
3 words.
385
What is a limitation of N-grams?
High memory usage with large N.
386
True or False: N-grams can handle unseen sequences.
False
387
What do LLMs use to solve limitations of N-grams?
Neural networks.
388
What is the Bag of Words model?
Counts word occurrences, but ignores order.
389
What does TF-IDF measure?
Measures word importance across documents.
390
What is an N-Gram model?
Predicts the next word based on the previous N−1 words.
391
What is a Bigram?
A two-word sequence.
392
What is a limitation of the Bag of Words model?
Ignores long-term relationships (word order & meaning fade fast).
393
How many total reviews were in the IMDB dataset?
50,000 total reviews.
394
What is the distribution of positive and negative reviews in the IMDB dataset?
50% positive, 50% negative.
395
What type of classification is used with the IMDB dataset?
Binary classification (0 = negative, 1 = positive).
396
What is the first step in building the IMDB classifier with a Fully Connected Network?
Data Preprocessing.
397
What transformation is applied to word indices in data preprocessing?
Convert to 10,000-length one-hot vectors.
398
What is the structure of the Fully Connected Network (FCN) used for IMDB classification?
Sequential model with three layers: Dense(16, activation='relu'), Dense(16, activation='relu'), Dense(1, activation='sigmoid').
399
What is the optimizer used in compiling the FCN model?
Adam.
400
What is the loss function used in the FCN model?
Binary crossentropy.
401
What is a major issue with Fully Connected Networks in text classification?
They ignore word order.
402
What do Recurrent Neural Networks (RNNs) retain across time steps?
Memory.
403
What is the key idea behind RNNs?
Input at time t → output + passes info (h_t) to next step.
404
What does the hidden state (h) in RNNs do?
Carries context forward.
405
What is Backpropagation Through Time (BTT) in RNNs?
Learning via forward pass through time, loss computed at final step, backward pass through each time step.
406
What is used to ensure all sequences are the same length in RNNs?
Padding sequences.
407
What are word embeddings?
Dense vectors that encode meaning instead of one-hot vectors.
408
What does Word2Vec learn?
Word relationships.
409
What are the two models of Word2Vec?
* CBOW: Predict center word from context * Skip-Gram: Predict context from center word.
410
What is the first layer in the IMDB + RNN model?
Embedding layer.
411
What does the LSTM stand for?
Long Short-Term Memory Networks.
412
What problem do LSTMs solve in RNNs?
Vanishing gradients.
413
What are the three gates in LSTMs and their roles?
* Input Gate: Allow new info in * Forget Gate: Discard old info * Output Gate: Output current state.
414
What is the workflow for predicting the next word in text generation?
Tokenize text → convert to word indices → create sequences → train RNN model.
415
What activation function is used in the output layer of the RNN for text generation?
Softmax.
416
What is Actionability in AI?
Deep learning understands data; search algorithms act on it.
417
What is the goal of search algorithms?
Plan a sequence of actions that takes us from a start state to a goal state.
418
What is the Initial State in search algorithms?
Where the problem starts.
419
What does State Space refer to?
All possible configurations.
420
What are Actions in the context of search algorithms?
Choices you can make.
421
What is a Goal Test?
Did we solve it?
422
What is Path Cost?
Total cost of reaching a goal.
423
What is a Search Tree?
Structure of all possible steps.
424
What does Completeness mean in search algorithms?
Will it always find a solution if one exists?
425
What is Optimality in search algorithms?
Will it find the best solution?
426
What does Time Complexity measure?
How long does it take?
427
What is Space Complexity?
How much memory does it use?
428
What is Breadth-First Search (BFS)?
Explore level-by-level.
429
Is Breadth-First Search (BFS) complete?
Yes.
430
Is Breadth-First Search (BFS) optimal?
Yes, if all steps cost the same.
431
What is Depth-First Search (DFS)?
Explore down a path, then backtrack.
432
Is Depth-First Search (DFS) always complete?
No.
433
Is Depth-First Search (DFS) optimal?
No.
434
What does Uniform Cost Search (UCS) do?
Always picks the lowest-cost path next.
435
Is Uniform Cost Search (UCS) complete?
Yes.
436
Is Uniform Cost Search (UCS) optimal?
Yes.
437
What is Greedy Search?
Uses h(x) (estimated distance to goal) and picks the closest-looking node.
438
Is Greedy Search optimal?
No.
439
Is Greedy Search always complete?
No.
440
What does A* Search algorithm use?
f(x) = g(x) + h(x), where g(x) = cost from start and h(x) = estimated cost to goal.
441
Is A* Search complete?
Yes.
442
Is A* Search optimal?
Yes.
443
What is the objective of the 8-Puzzle?
Slide tiles to reach the goal state.
444
What is Monte Carlo Dropout?
Train model with dropout and keep dropout on at test time to run predictions multiple times.
445
What does Explainable AI (XAI) do?
Makes AI decisions interpretable.
446
What is Grad-CAM used for?
Highlights which image regions influenced the decision.
447
What does Grad-CAM compute?
Gradient of predicted class with respect to the last convolutional layer.
448
What are some applications of Grad-CAM?
* Medical imaging * Object detection * Debugging deep models
449
What does AI stand for?
Systems that act intelligently
450
Define Machine Learning (ML).
Learn from data without being explicitly programmed
451
What is Deep Learning (DL)?
ML + Neural networks with many layers
452
What is Data Mining?
Finding patterns in large datasets
453
List common issues in data cleaning.
* Missing values * Duplicate data * Noise and outliers * Inconsistencies
454
What are some fixes for data cleaning issues?
* Drop/merge/estimate missing or bad data * Visualize with summaries (histograms, boxplots, etc.)
455
What is the k-Nearest Neighbours (k-NN) method?
Find k closest labelled data points and assign the most common class (majority vote)
456
What is the formula for Linear Regression?
Y = b + mX
457
What is Gradient Descent used for in Linear Regression?
To minimize loss
458
Define Unsupervised Learning.
Learning from data without labeled responses
459
What is Partition-based clustering?
Group data into k clusters (e.g. k-means)
460
What does Hierarchical clustering do?
Build a tree of nested clusters (dendrograms)
461
What is Representation Learning?
Learns features automatically from raw data
462
What are Fully Connected Networks (FCN)?
A type of neural network architecture
463
What are CNNs used for?
Vision tasks (e.g. Faster R-CNN, U-Net, Mask R-CNN)
464
What are RNNs/LSTMs primarily used for?
Language processing
465
What is Backpropagation?
Updates weights using error gradients
466
What does Batch Norm do?
Stabilizes training
467
What is Regularisation in the context of neural networks?
Prevents overfitting (e.g. Dropout)
468
List types of Loss Functions.
* Binary Classification: binary_crossentropy * Multiclass: categorical_crossentropy * Regression: mean_squared_error
469
What is Grid Search?
An exhaustive but slow optimization method
470
What is Random Search?
An efficient optimization method for large spaces
471
What is Faster R-CNN used for?
Object detection
472
What is U-Net used for?
Semantic segmentation
473
What does Mask R-CNN do?
Instance segmentation (detect + pixel mask)
474
What does Bag of Words (BoW) do?
Counts words, ignores order
475
What does TF-IDF stand for?
Term Frequency–Inverse Document Frequency
476
What is an N-Gram?
Predicts next word using previous N-1 words
477
True or False: N-Grams can handle long-term dependencies.
False
478
What can be used to handle long-term dependencies in NLP?
RNNs / LSTMs
479
What does BFS stand for in search algorithms?
Breadth-First Search
480
Is BFS complete and optimal?
Yes (equal cost only)
481
Is DFS complete and optimal?
No
482
What does UCS stand for?
Uniform Cost Search
483
Is UCS complete and optimal?
Yes
484
What is the A* algorithm known for?
Best combo of cost + heuristic
485
What is Dropout Monte Carlo used for?
Keep dropout ON at test time to estimate uncertainty
486
What does Grad-CAM do?
Highlights image regions used in CNN decisions
487
What should you understand for exam prep regarding architectures?
Each architecture’s use case
488
What should you compare for exam prep in NLP?
BoW vs TF-IDF vs Word Embeddings
489
What metrics should you practice for ML evaluation?
Confusion matrices, F1, AUC, etc.
490
Fill in the blank: Don’t just memorize — practice with _______.
examples