Artificial Intelligence & Applications Flashcards

Question

Why is data visualization important?

Answer 1

📌 Find patterns and trends 📌 Understand relationships between variables 📌 Detect errors or missing data 📌 Make data easier to interpret ## Footnote Quote: 'Make both calculations and graphs.' – F.J. Anscombe, 1973

Answer 2

🔹 Relationship → Scatter plots 🔹 Composition → Pie charts 🔹 Comparison → Bar charts, line graphs 🔹 Location → Maps & heatmaps ## Footnote Example: Scatter plots for height vs. weight.

Answer 3

✔️ Position & length are the most accurate ways to show numbers ✔️ Pie charts are harder to interpret than bar charts ## Footnote Best Practice: Use clear, simple charts.

Answer 4

✔️ A structured approach to designing visualizations ✔️ Ensures consistency in designing graphs ✔️ Used in tools like ggplot2 in R ## Footnote Helps in creating clear visualizations.

Answer 5

❌ Missing Values ❌ Duplicates ❌ Inconsistent Data ❌ Noise & Outliers ## Footnote Solutions include filling missing values and standardizing formats.

Answer 6

✔️ Feature Selection → Keep important variables ✔️ Feature Transformation → Convert data into better formats ## Footnote Example: Standardizing price and carat size in a diamonds dataset.

Answer 7

Helps us see the relationship between two variables ## Footnote Example: Carat vs. Price in diamond datasets.

Answer 8

✔️ Data exploration helps us understand patterns ✔️ Visualization is key for discovering insights ✔️ Choosing the right chart aids interpretation ✔️ The Grammar of Graphics helps create structured visualizations ✔️ Cleaning data is essential for accuracy

Answer 9

When no direct formula exists to solve a problem and when we have data that can help find patterns. ## Footnote Example: Predicting customer purchase behavior.

Answer 10

A type of ML where the model is trained on labeled data, learning from known answers.

Answer 11

* Buying Price * Maintenance Cost * Number of Doors * Seating Capacity * Luggage Boot Size * Safety Rating

Answer 12

When ML learns patterns from data to make predictions.

Answer 13

* Regression → Predicts continuous values (e.g., house prices). * Classification → Assigns data into categories (e.g., spam or not spam).

Answer 14

A method to find the best-fit line Y = mx + c, where c is the intercept and m is the slope.

Answer 15

* Not good for non-linear relationships * Not good when there are too many outliers.

Answer 16

A flowchart-like structure where each decision leads to an outcome.

Answer 17

* Pick the best feature * Split the data into groups * Keep splitting until groups are pure.

Answer 18

A collection of multiple decision trees to improve accuracy and reduce overfitting.

Answer 19

* Train many Decision Trees on random data subsets * Use different features at each split * Combine all tree predictions.

Answer 20

A method that classifies new data points based on the 'k' closest points in the dataset.

Answer 21

* Store the data * Choose k * Measure who’s closest * Pick the k nearest * Count votes & classify based on majority

Answer 22

It is slow for large datasets.

Answer 23

* Uses labeled data * Regression vs. Classification * Linear Regression * Decision Trees * Random Forest * k-NN

Answer 24

To find the best-fit line that represents the relationship between variables.

Answer 25

[Decision Trees]

Answer 26

Unsupervised learning is a machine learning approach where the model learns patterns, structures, or groupings in the data without labeled outputs. ## Footnote Key characteristics include working with unlabelled data, finding hidden structures, and being used for clustering and dimensionality reduction.

Answer 27

* Works with unlabelled data (no predefined categories) * Finds hidden structures in data * Used for clustering & dimensionality reduction ## Footnote Example: Grouping similar books in an unorganized library.

Answer 28

Clustering is a method in unsupervised learning that groups similar data points together. ## Footnote Within a cluster, data points are similar; in different clusters, they are dissimilar.

Answer 29

* Data Reduction * Outlier Detection * Data Segmentation ## Footnote Clustering helps summarize large datasets, identifies unusual patterns, and groups customers by behavior.

Answer 30

* Social Network Analysis * Image Segmentation * Data Annotation ## Footnote Examples include grouping users based on interests and dividing images for medical imaging.

Answer 31

* Define a distance metric to measure similarity * Form clusters by grouping similar data points * Maximize within-cluster similarity, minimize between-cluster similarity ## Footnote Common distance metrics include Euclidean and Manhattan distances.

Answer 32

K-Means is a partition-based clustering algorithm that groups data into k clusters. ## Footnote It is one of the most popular clustering algorithms.

Answer 33

* Choose the number of clusters (k) * Select k random points as initial centroids * Assign each data point to the nearest centroid * Recalculate centroids by finding the mean of each cluster * Repeat until centroids stop changing ## Footnote Example: Grouping customers into low, medium, and high spenders.

Answer 34

The Elbow Method involves plotting the Within-Cluster Sum of Squares (WCSS) and looking for the 'elbow' point where adding more clusters stops improving the fit significantly. ## Footnote The bend in the curve indicates the optimal number of clusters (k).

Answer 35

* Simple and efficient * Works well for large datasets ## Footnote K-Means is favored for its speed and ease of use.

Answer 36

* Requires predefined k * Sensitive to initialization * Struggles with non-globular clusters ## Footnote These limitations can affect the clustering results.

Answer 37

Hierarchical Clustering builds a tree-like structure (dendrogram) instead of predefined partitions. ## Footnote It allows for a more flexible grouping of data.

Answer 38

* Start with each data point as its own cluster * Merge the closest clusters based on a chosen distance metric * Repeat until one large cluster remains ## Footnote Agglomerative Clustering is a bottom-up approach.

Answer 39

* Single Linkage * Complete Linkage * Centroid Distance ## Footnote These metrics help determine how clusters are formed.

Answer 40

* No need to specify k beforehand * Creates arbitrarily shaped clusters ## Footnote This flexibility is an advantage over K-Means.

Answer 41

Hierarchical Clustering is computationally expensive for large datasets. ## Footnote This can limit its practicality in big data scenarios.

Answer 42

* Unsupervised learning finds patterns in unlabeled data * Clustering groups similar data points together * K-Means is a fast, efficient partition-based method * Hierarchical clustering builds a tree-like structure * Choosing the right k is crucial for effective clustering ## Footnote These points summarize the fundamental concepts of clustering.

Answer 43

Traditional Machine Learning involves manually selecting features, while Deep Learning learns features automatically from raw data. ## Footnote Example: K-Means Clustering for ML vs. Neural Networks for DL.

Answer 44

To reduce errors in predictions and improve the model's accuracy. ## Footnote This process makes AI smarter by adjusting weights based on predictions.

Answer 45

* Forward Pass * Compute Loss * Backpropagation * Gradient Descent ## Footnote These steps help the model adjust weights to minimize prediction errors.

Answer 46

True ## Footnote Neural networks use artificial 'neurons' to process information.

Answer 47

* Input Layer * Hidden Layers * Output Layer ## Footnote Each layer plays a specific role in processing data and making predictions.

Answer 48

Activation Function ## Footnote This function determines the output based on the weighted input.

Answer 49

* Binary Cross-Entropy * Categorical Cross-Entropy ## Footnote These functions help determine how well the model is performing.

Answer 50

To help the AI learn faster by adjusting learning rates and strategies. ## Footnote Examples include SGD and Adam Optimizer.

Answer 51

MNIST Dataset ## Footnote This dataset contains 60,000 training images and 10,000 test images.

Answer 52

Import Libraries ## Footnote Essential libraries include Sequential, Dense, np_utils, and mnist.

Answer 53

To improve training efficiency by scaling values from 0-255 to 0-1. ## Footnote This normalization helps the model learn better.

Answer 54

To convert labels into a format the neural network can understand. ## Footnote This is crucial for multi-class classification.

Answer 55

ReLU (Rectified Linear Unit) ## Footnote This function helps in learning complex patterns.

Answer 56

Accuracy ## Footnote This metric indicates how well the model predicts unseen data.

Answer 57

True ## Footnote It provides an easy-to-use interface for model creation.

Answer 58

To correctly predict handwritten digits (0-9). ## Footnote This involves optimizing the model through various epochs.

Answer 59

Minimize the error between predicted and actual outputs.

Answer 60

* Forward Pass: Compute predictions * Compute Loss: Measure the error * Backpropagation: Calculate gradients * Gradient Descent: Adjust weights to reduce error

Answer 61

They prevent neural networks from behaving like linear regression models and allow them to learn complex relationships.

Answer 62

output = dot(W, input) + b

Answer 63

output = ReLU(dot(W, input) + b)

Answer 64

Output = Input (Straight line)

Answer 65

Vanishing Gradient – When values go beyond ±3, the gradient becomes tiny, and learning slows down.

Answer 66

Converts values into probabilities that sum to 1.

Answer 67

* Cat: 70% * Dog: 20% * Bird: 10%

Answer 68

Maps inputs between -1 and 1, allowing negative values.

Answer 69

Still suffers from vanishing gradient for large values.

Answer 70

Only activates for positive values; negative inputs become 0.

Answer 71

Dead Neurons – if a neuron only gets negative values, it stops learning.

Answer 72

Gives negative values a small value instead of zeroing them out.

Answer 73

Probability outputs (binary classification).

Answer 74

Can be computationally expensive.

Answer 75

In situations where negative values are needed.

Answer 76

Deep learning models (default choice).

Answer 77

Prevents dead neurons and improves training.

Answer 78

[deep learning]

Answer 79

They enable deep networks to learn complex data relationships. ## Footnote Activation functions are crucial for introducing non-linearity into the model.

Answer 80

They can't handle complex patterns. ## Footnote Linear activation functions are insufficient for deep learning tasks.

Answer 81

Sigmoid ## Footnote Sigmoid is often used in binary classification tasks.

Answer 82

It suffers from vanishing gradients. ## Footnote This can slow down the learning process in deep networks.

Answer 83

Softmax ## Footnote Softmax converts logits to probabilities for multiple classes.

Answer 84

-1 and 1. ## Footnote Tanh is faster than sigmoid and can help with gradient flow.

Answer 85

ReLU ## Footnote ReLU helps avoid vanishing gradients, making it popular in deep networks.

Answer 86

The 'dying neuron' problem. ## Footnote Leaky ReLU allows small negative values to keep neurons active.

Answer 87

It measures model error. ## Footnote Loss functions provide feedback on the model's performance.

Answer 88

Cross-Entropy Loss. ## Footnote It is commonly used for classification tasks.

Answer 89

Binary Cross-Entropy. ## Footnote This is applicable in scenarios like cat vs. dog classification.

Answer 90

L = - (y log(y_pred) + (1 - y) log(1 - y_pred)) ## Footnote This formula calculates the loss based on predicted and actual values.

Answer 91

To adjust model weights to reduce error. ## Footnote It calculates the contribution of each weight to the total error.

Answer 92

Step-by-Step Learning. ## Footnote It is an optimization algorithm used to minimize the loss function.

Answer 93

Updates weights after each training sample. ## Footnote This can lead to faster convergence but may introduce noise.

Answer 94

Updates weights using small batches. ## Footnote This method balances speed and stability during training.

Answer 95

Controls how big the update steps are. ## Footnote A proper learning rate is crucial for effective training.

Answer 96

Helps avoid local minima by adding past weight updates to the current one. ## Footnote This technique accelerates convergence.

Answer 97

Randomly removes neurons during training. ## Footnote This prevents over-reliance on certain features.

Answer 98

Penalizes overly complex models to encourage generalization. ## Footnote This helps to avoid overfitting.

Answer 99

Stops training when validation loss stops improving. ## Footnote This technique helps prevent overfitting by halting training at the right time.

Answer 100

Number of times the entire dataset is passed through the model. ## Footnote More epochs can improve learning but may also lead to overfitting.

Answer 101

Number of samples used per gradient update. ## Footnote The choice of batch size can affect training speed and model performance.

Answer 102

Adam ## Footnote Adam is known for its adaptive learning rate and is effective for many tasks.

Answer 103

They enable deep learning. ## Footnote Activation functions are essential for neural networks to learn complex patterns.

Answer 104

True ## Footnote These techniques are widely used in training models to improve generalization.

Answer 105

Classification and Regression ## Footnote Classification involves categorical output, while regression predicts numerical output.

Answer 106

Continuous values based on input data ## Footnote Example: Predicting house prices based on various factors.

Answer 107

AI learns a function y = f(x) to predict Y ## Footnote X represents input variables like house size, location, etc.

Answer 108

A loss function that measures how far off AI’s predictions are ## Footnote Formula: MSE = (1/n) ∑(y_actual - y_predicted)².

Answer 109

A better AI model

Answer 110

Layers of neurons making decisions

Answer 111

Load & Prepare Data

Answer 112

Rescales data to [0,1]

Answer 113

Makes data have mean = 0, std dev = 1

Answer 114

AI sorts text into categories

Answer 115

IMDB movie reviews

Answer 116

50,000 highly polarized reviews

Answer 117

Convert words into a dictionary of numbers

Answer 118

Turns words into 0s & 1s

Answer 119

10,000-dimensional input

Answer 120

Binary Crossentropy

Answer 121

Classify short news articles into 46 different categories

Answer 122

46 categories instead of 2

Answer 123

Categorical Crossentropy

Answer 124

Predict continuous values

Answer 125

* Predicts categories * One-Hot Encoding * Activation functions: ReLU (hidden), Sigmoid (IMDB), Softmax (Reuters) * Loss functions: Binary Crossentropy (IMDB), Categorical Crossentropy (Reuters)

Answer 126

To ensure they do what we want, avoid overfitting/underfitting, and pick the best model and settings.

Answer 127

How do you know your model isn’t just memorizing the training data?

Answer 128

Split the data into training, validation, and test sets.

Answer 129

Used to train the model.

Answer 130

Used during training to tune hyperparameters.

Answer 131

Used after training to check final performance.

Answer 132

One-time split: Train (e.g. 60%), Val (20%), Test (20%).

Answer 133

For large datasets.

Answer 134

Split data into k parts, rotate training/testing.

Answer 135

Best for small data, better accuracy.

Answer 136

Too good on training, bad on new data.

Answer 137

Bad on both training and new data.

Answer 138

High variance, memorizing.

Answer 139

High bias, guessing.

Answer 140

Stop training when validation loss goes up.

Answer 141

Penalizes large weights to keep the model simple.

Answer 142

λ * Σ(weights²) → encourages smaller weights.

Answer 143

Settings you pick before training (not learned from data).

Answer 144

* Learning rate * Batch size * Number of layers * Activation functions

Answer 145

Try every combo of settings.

Answer 146

Good for small search spaces.

Answer 147

Super slow if too many options.

Answer 148

Pick random combos.

Answer 149

Better for large/continuous spaces.

Answer 150

To assign a class (label) to data.

Answer 151

Is this email spam? Is the tumor benign or malignant?

Answer 152

% of correct predictions.

Answer 153

Doesn't work well when classes are imbalanced.

Answer 154

A table used to describe the performance of a classification model.

Answer 155

True Positives.

Answer 156

False Negatives.

Answer 157

False Positives.

Answer 158

True Negatives.

Answer 159

TP / (TP + FP) → How many predicted positives were correct?

Answer 160

TP / (TP + FN) → How many actual positives were found?

Answer 161

Harmonic mean of Precision & Recall → 2 * (P * R) / (P + R).

Answer 162

Area Under Curve measures model’s ability to distinguish classes.

Answer 163

When data is imbalanced or missing a positive is worse than a few false alarms.

Answer 164

When predicting a number (not a category).

Answer 165

* House prices * Stock market trends

Answer 166

Mean Absolute Error – average of errors, less sensitive to outliers.

Answer 167

Mean Squared Error – squares errors, punishes big mistakes more.

Answer 168

When you don’t have labels – model tries to find natural groupings.

Answer 169

Segmenting customers into behavior types.

Answer 170

Silhouette Coefficient.

Answer 171

Better clustering.

Answer 172

Images are flattened into 1D vectors, losing spatial structure ## Footnote The network does not recognize the spatial relationships between pixels.

Answer 173

Preserve spatial locality and handle images more intelligently ## Footnote CNNs maintain the relationships between neighboring pixels.

Answer 174

A deep learning model that works especially well with visual data like images ## Footnote Used for applications like facial recognition, self-driving cars, medical imaging, and object detection.

Answer 175

They learn patterns in an image using filters ## Footnote This involves detecting features such as edges and textures.

Answer 176

A small matrix called a kernel slides over the image, multiplying and summing to create a feature map ## Footnote This process helps detect edges, textures, and shapes.

Answer 177

How far the filter moves each step ## Footnote Stride affects the size of the output feature map.

Answer 178

'valid' (no padding) and 'same' (padding with zeros) ## Footnote Padding influences the output size of the feature maps.

Answer 179

Extracts feature maps ## Footnote This is the first layer that processes the input image.

Answer 180

Shrinks size while keeping important information ## Footnote It reduces the dimensionality of feature maps.

Answer 181

Adds non-linearity (e.g., ReLU) ## Footnote This helps the model learn complex patterns.

Answer 182

Final decision-making (e.g., classification) ## Footnote It combines all features extracted to make predictions.

Answer 183

By adjusting filters to minimize prediction error ## Footnote This involves a process of convolution, pooling, and backpropagation.

Answer 184

Input image goes through convolution + pooling, output passed to fully connected layers, loss function calculates error, backpropagation computes gradients, optimizer updates weights ## Footnote This iterative process helps improve the model's accuracy.

Answer 185

Input: 32×32 image, Output: 30×30×16 ## Footnote This is based on using a 3×3 kernel with 16 filters.

Answer 186

255,632 ## Footnote This is the sum of all parameters across different layers.

Answer 187

Picks the biggest number in a region ## Footnote This method helps retain the most significant features.

Answer 188

Takes the average of numbers in a region ## Footnote This method can help smooth out the feature maps.

Answer 189

* Reduces size * Controls overfitting * Speeds up training ## Footnote Pooling is crucial for efficient CNN performance.

Answer 190

All features combine to make a decision ## Footnote For example, it classifies images as either a cat or a dog.

Answer 191

Filters ## Footnote Filters are also referred to as kernels.

Answer 192

Downsample while keeping key info

Answer 193

Non-linear twist (usually ReLU)

Answer 194

Combine all for final output

Answer 195

Scan over input image and create feature maps

Answer 196

How far the filter moves

Answer 197

No padding, smaller output

Answer 198

Keeps input/output sizes the same (zero-padding edges)

Answer 199

Convolution and Activation (e.g., ReLU)

Answer 200

MaxPooling2D

Answer 201

Pool size: (2,2), stride: (2,2)

Answer 202

Flatten them and feed into final classifier

Answer 203

An early CNN architecture for digit recognition (MNIST)

Answer 204

Alternating Conv → ReLU → Pool

Answer 205

Shared weights

Answer 206

20 epochs, Batch size: 128, Optimizer: Adam

Answer 207

Spatial Dropout and Hyperparameter Tuning

Answer 208

Vanishing Gradients or Exploding Gradients

Answer 209

Makes training smoother, faster, and more stable

Answer 210

Norm = (x - mean) / sqrt(var + ε) Out = γ * Norm + β

Answer 211

* Reduces sensitivity to weight init * Slightly regularises model (less overfitting) * Allows higher learning rates

Answer 212

Before the activation

Answer 213

Extract features

Answer 214

Shrink spatial size

Answer 215

Make decisions

Answer 216

Non-linearity

Answer 217

Stabilises training

Answer 218

Reusing a pretrained model for your own task

Answer 219

* Saves time * Works well with small datasets * Leverages existing learned features

Answer 220

Freeze layers and only retrain the final classifier

Answer 221

Pretend you have more data by slightly changing existing images

Answer 222

* Rotate * Flip * Zoom * Brightness tweak * Crop

Answer 223

Randomly 'turn off' neurons

Answer 224

* Overfitting * Reliance on specific paths in the network

Answer 225

Dropout(0.5) → 50% of neurons are dropped

Answer 226

Uses only 3×3 convolutions

Answer 227

Won ImageNet 2012

Answer 228

Uses skip connections (identity mappings)

Answer 229

ReLU: max(0, x); Fast and simple activation function

Answer 230

Allows small negative values to avoid 'dead neurons'

Answer 231

Maps to (0, 1); good for binary classification

Answer 232

Maps to (-1, 1)

Answer 233

Update weights to minimise loss

Answer 234

Simple gradient descent

Answer 235

Momentum + RMSprop

Answer 236

* See what filters are learning * Debug issues * Understand model behaviour

Answer 237

Feature map visualisation

Answer 238

Class imbalance = When one class has way more examples than another.

Answer 239

Negative samples: 998, Positive samples: 2.

Answer 240

99.8% accuracy.

Answer 241

Model ignores the minority class.

Answer 242

Biased boundaries = Bad predictions.

Answer 243

* Medical diagnoses * Fraud detection * Rare event prediction

Answer 244

L_BCE = - y_i * log(y_i*) - (1 - y_i) * log(1 - y_i*).

Answer 245

Majority class dominates the loss function.

Answer 246

Overall accuracy, not fair balance.

Answer 247

Weighted Loss Functions.

Answer 248

Assign higher importance (weight) to the minority class.

Answer 249

class_weights = {0: 1.0, 1: 5.0}.

Answer 250

For multi-class problems.

Answer 251

Collect more data.

Answer 252

Duplicate minority class samples.

Answer 253

Remove majority class samples.

Answer 254

Make more diverse samples for the minority.

Answer 255

Synthetic Minority Over-sampling Technique.

Answer 256

* Pick a minority sample * Find its k-nearest neighbors * Interpolate a new sample between it and a neighbor.

Answer 257

Draw a line between two known dots and place a new dot somewhere along it.

Answer 258

Great for structured datasets – not ideal for raw images.

Answer 259

Trick your model into seeing new examples by tweaking real ones.

Answer 260

datagen = ImageDataGenerator(rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest').

Answer 261

Don’t augment or SMOTE your whole dataset before splitting into train/test.

Answer 262

Only on the training set.

Answer 263

Accuracy does not reflect the performance on the minority class.

Answer 264

* Recall * Precision * Accuracy

Answer 265

TP / (TP + FN).

Answer 266

TP / (TP + FP).

Answer 267

(TP + TN) / total.

Answer 268

High recall + high precision.

Answer 269

Shrinking your data without losing its meaning. ## Footnote Keeps only the important features, reduces memory & computation, helps models generalise better.

Answer 270

More features → data becomes sparse. ## Footnote Sparse data → model overfits easier, fewer samples per feature = less reliable learning.

Answer 271

A special kind of neural network that learns to compress then rebuild data.

Answer 272

Learn a smart encoding of the input, then use that encoding to reconstruct the original.

Answer 273

* Encoder: Compresses input into smaller vector * Latent Space (Code): The compressed form * Decoder: Reconstructs the original from the code * Loss: Measures how close output is to original (e.g., MSE)

Answer 274

Minimise the difference between input and output.

Answer 275

Latent Space (Code)

Answer 276

Input size = 784, hidden size = 128, code size = 32.

Answer 277

Trains the autoencoder to remove noise from input images.

Answer 278

Input = Noisy image, Target = Clean image.

Answer 279

Autoencoders for image data using Conv2D layers.

Answer 280

It uses Conv2D layers and MaxPooling2D to compress the input.

Answer 281

It reconstructs the image using Conv2D layers and upsampling methods.

Answer 282

* Denoising * Compression * Image Colourisation * Anomaly Detection * Feature Extraction

Answer 283

reconstruction

Answer 284

Successfully removed added annotations & noise.

Answer 285

Get machines to 'see' and understand images/videos.

Answer 286

* Low-level * Mid-level * High-level

Answer 287

* Edge detection * Texture analysis * Color analysis

Answer 288

* Segmentation * Object tracking

Answer 289

* Object recognition * Scene understanding

Answer 290

Segmenting = Splitting an image into meaningful parts.

Answer 291

* Unsupervised Segmentation * Supervised Segmentation * Semantic Segmentation * Instance Segmentation

Answer 292

No labels, cluster-based

Answer 293

Learn from labeled data

Answer 294

Label each pixel with a class (e.g., 'car')

Answer 295

Separates individual objects (e.g., car #1 vs car #2)

Answer 296

Region-based CNN

Answer 297

Detect objects in images using bounding boxes.

Answer 298

* Input Image * Generate ~2000 region proposals (Selective Search) * Classify each region using CNN * Refine bounding boxes

Answer 299

* Classifies all 2k regions separately * Selective Search is not learnable * Trains 3 models: CNN + Classifier + Bounding Box Regressor

Answer 300

* Runs CNN once on the image * Extracts a feature map * Regions of Interest (RoI) are pulled from that map * Everything is trained end-to-end in a single model

Answer 301

* Way faster * More efficient learning

Answer 302

Still uses Selective Search, which is slow.

Answer 303

Adds a Region Proposal Network (RPN)

Answer 304

* CNN → Feature Map * RPN slides across map, creates anchors * Predicts which anchor = object and how well it fits

Answer 305

Converts different-sized regions into fixed-size feature maps.

Answer 306

Label every pixel in medical images.

Answer 307

* Downsampling path * Upsampling path * Skip connections between matching levels

Answer 308

* No Dense layers * Any input size allowed * Combines location + context

Answer 309

Faster R-CNN

Answer 310

A branch for pixel-wise binary masks

Answer 311

* Class * Bounding box * Object shape

Answer 312

Object Detection

Answer 313

Faster Detection

Answer 314

Fully learnable Detection

Answer 315

Semantic Segmentation

Answer 316

Instance Segmentation

Answer 317

Accurate but slow

Answer 318

Shared CNN pass

Answer 319

Flexible, great for medical applications

Answer 320

Adds mask branch to Faster R-CNN

Answer 321

Natural Language Processing ## Footnote A broad field for making computers understand and process language.

Answer 322

Large Language Models ## Footnote Deep learning models trained on massive text, such as GPT and BERT.

Answer 323

* N-grams * TF-IDF * Bag of Words

Answer 324

Make machines understand and interpret human language (spoken or written)

Answer 325

* Natural Language Understanding (NLU) * Natural Language Generation (NLG)

Answer 326

Get meaning from language.

Answer 327

Produce human-like text.

Answer 328

Word meaning ambiguity, e.g., 'bank' (money vs river).

Answer 329

Sentence meaning ambiguity, e.g., 'I saw him with a telescope.'

Answer 330

Referring to something earlier, e.g., 'He told his dog to sit, and it did.'

Answer 331

* Search * Word prediction * Text classification (e.g., spam detection)

Answer 332

Sentence Segmentation

Answer 333

[Tokenization]

Answer 334

* Stemming: Chops suffixes crudely * Lemmatization: Uses dictionary rules

Answer 335

'drove' → 'drov'

Answer 336

'drove' → 'drive'

Answer 337

Common words with little meaning on their own (e.g., 'the', 'is', 'and').

Answer 338

Helps reduce noise in text analysis.

Answer 339

Part of Speech tagging

Answer 340

* Noun * Verb * Adjective * Adverb

Answer 341

Converts text into a vector of word counts, ignoring word order.

Answer 342

Loses grammar & order info.

Answer 343

Rank documents based on similarity to a search query.

Answer 344

* TF = How often word shows up in a doc * IDF = How rare word is in whole corpus * TF-IDF = TF × IDF

Answer 345

Unique, meaningful words.

Answer 346

Next word using previous N-1 words.

Answer 347

High memory usage with large N.

Answer 348

Neural networks.

Answer 349

Counts word occurrences, but ignores order.

Answer 350

Measures word importance across documents.

Answer 351

Predicts the next word based on the previous N−1 words.

Answer 352

A two-word sequence.

Answer 353

Ignores long-term relationships (word order & meaning fade fast).

Answer 354

50,000 total reviews.

Answer 355

50% positive, 50% negative.

Answer 356

Binary classification (0 = negative, 1 = positive).

Answer 357

Data Preprocessing.

Answer 358

Convert to 10,000-length one-hot vectors.

Answer 359

Sequential model with three layers: Dense(16, activation='relu'), Dense(16, activation='relu'), Dense(1, activation='sigmoid').

Answer 360

Binary crossentropy.

Answer 361

They ignore word order.

Answer 362

Input at time t → output + passes info (h_t) to next step.

Answer 363

Carries context forward.

Answer 364

Learning via forward pass through time, loss computed at final step, backward pass through each time step.

Answer 365

Padding sequences.

Answer 366

Dense vectors that encode meaning instead of one-hot vectors.

Answer 367

Word relationships.

Answer 368

* CBOW: Predict center word from context * Skip-Gram: Predict context from center word.

Answer 369

Embedding layer.

Answer 370

Long Short-Term Memory Networks.

Answer 371

Vanishing gradients.

Answer 372

* Input Gate: Allow new info in * Forget Gate: Discard old info * Output Gate: Output current state.

Answer 373

Tokenize text → convert to word indices → create sequences → train RNN model.

Answer 374

Deep learning understands data; search algorithms act on it.

Answer 375

Plan a sequence of actions that takes us from a start state to a goal state.

Answer 376

Where the problem starts.

Answer 377

All possible configurations.

Answer 378

Choices you can make.

Answer 379

Did we solve it?

Answer 380

Total cost of reaching a goal.

Answer 381

Structure of all possible steps.

Answer 382

Will it always find a solution if one exists?

Answer 383

Will it find the best solution?

Answer 384

How long does it take?

Answer 385

How much memory does it use?

Answer 386

Explore level-by-level.

Answer 387

Yes, if all steps cost the same.

Answer 388

Explore down a path, then backtrack.

Answer 389

Always picks the lowest-cost path next.

Answer 390

Uses h(x) (estimated distance to goal) and picks the closest-looking node.

Answer 391

f(x) = g(x) + h(x), where g(x) = cost from start and h(x) = estimated cost to goal.

Answer 392

Slide tiles to reach the goal state.

Answer 393

Train model with dropout and keep dropout on at test time to run predictions multiple times.

Answer 394

Makes AI decisions interpretable.

Answer 395

Highlights which image regions influenced the decision.

Answer 396

Gradient of predicted class with respect to the last convolutional layer.

Answer 397

* Medical imaging * Object detection * Debugging deep models

Answer 398

Systems that act intelligently

Answer 399

Learn from data without being explicitly programmed

Answer 400

ML + Neural networks with many layers

Answer 401

Finding patterns in large datasets

Answer 402

* Missing values * Duplicate data * Noise and outliers * Inconsistencies

Answer 403

* Drop/merge/estimate missing or bad data * Visualize with summaries (histograms, boxplots, etc.)

Answer 404

Find k closest labelled data points and assign the most common class (majority vote)

Answer 405

Y = b + mX

Answer 406

To minimize loss

Answer 407

Learning from data without labeled responses

Answer 408

Group data into k clusters (e.g. k-means)

Answer 409

Build a tree of nested clusters (dendrograms)

Answer 410

Learns features automatically from raw data

Answer 411

A type of neural network architecture

Answer 412

Vision tasks (e.g. Faster R-CNN, U-Net, Mask R-CNN)

Answer 413

Language processing

Answer 414

Updates weights using error gradients

Answer 415

Stabilizes training

Answer 416

Prevents overfitting (e.g. Dropout)

Answer 417

* Binary Classification: binary_crossentropy * Multiclass: categorical_crossentropy * Regression: mean_squared_error

Answer 418

An exhaustive but slow optimization method

Answer 419

An efficient optimization method for large spaces

Answer 420

Object detection

Answer 421

Semantic segmentation

Answer 422

Instance segmentation (detect + pixel mask)

Answer 423

Counts words, ignores order

Answer 424

Term Frequency–Inverse Document Frequency

Answer 425

Predicts next word using previous N-1 words

Answer 426

RNNs / LSTMs

Answer 427

Breadth-First Search

Answer 428

Yes (equal cost only)

Answer 429

Uniform Cost Search

Answer 430

Best combo of cost + heuristic

Answer 431

Keep dropout ON at test time to estimate uncertainty

Answer 432

Highlights image regions used in CNN decisions

Answer 433

Each architecture’s use case

Answer 434

BoW vs TF-IDF vs Word Embeddings

Answer 435

Confusion matrices, F1, AUC, etc.

Artificial Intelligence & Applications Flashcards

(490 cards)