Artificial Intelligence & Applications Flashcards
What is Artificial Intelligence (AI)?
AI is the ability of machines to think and act intelligently, like a human would.
How does AI differ from traditional programming?
AI learns from data and makes decisions, while traditional programming involves telling a computer exactly what to do.
What are key areas of AI?
- Computer Vision
- Machine Learning (ML)
- Deep Learning (DL)
- Data Mining
What is Weak AI?
AI built for specific tasks—it doesn’t think like a human.
Give examples of Weak AI.
- Siri and Alexa
- Chess-playing AI
- Chatbots like GPT-4
What is Strong AI?
AI that can think, learn, and adapt like a human.
Does Strong AI exist?
Not yet! Scientists are still trying to create it.
What is Machine Learning (ML)?
A method where AI is trained on data instead of being programmed manually.
What are the two main types of Machine Learning?
- Supervised Learning
- Unsupervised Learning
What is Supervised Learning?
The AI is trained on labeled data (data with answers).
What is an example of Supervised Learning?
Teaching AI to recognize cats using labeled cat pictures.
What is Unsupervised Learning?
AI finds patterns in data by itself—no labels.
What is the difference between Machine Learning and Data Mining?
ML = AI learns patterns and makes predictions; Data Mining = Humans find patterns manually in big data.
What is Deep Learning (DL)?
A special kind of Machine Learning that uses neural networks.
What are key models in Deep Learning?
- CNNs (Convolutional Neural Networks)
- RNNs (Recurrent Neural Networks)
- Autoencoders
What is CNN best for?
Images.
What is RNN best for?
Sequences like speech and text.
Where is AI used?
- Computer Vision
- Natural Language Processing (NLP)
- Generative AI
What is an application of AI in Computer Vision?
Medical imaging for detecting diseases from scans.
What does Natural Language Processing (NLP) enable AI to do?
Understand language.
What is an example of Generative AI?
GANs (Generative Adversarial Networks) that create new images, videos, and music.
Fill in the blank: AI is __________ technology that mimics human intelligence.
[smart]
True or False: Deep Learning makes AI less powerful than traditional Machine Learning.
False.
What are the goals of data exploration?
✔️ Visualize patterns and trends
✔️ Summarize key statistics
✔️ Detect anomalies
✔️ Understand relationships between variables
Example: Analyzing diamond prices for trends related to carat size and cut.
Why is data visualization important?
📌 Find patterns and trends
📌 Understand relationships between variables
📌 Detect errors or missing data
📌 Make data easier to interpret
Quote: ‘Make both calculations and graphs.’ – F.J. Anscombe, 1973
What types of charts are used for different purposes?
🔹 Relationship → Scatter plots
🔹 Composition → Pie charts
🔹 Comparison → Bar charts, line graphs
🔹 Location → Maps & heatmaps
Example: Scatter plots for height vs. weight.
What did researchers Cleveland & McGill find about chart design?
✔️ Position & length are the most accurate ways to show numbers
✔️ Pie charts are harder to interpret than bar charts
Best Practice: Use clear, simple charts.
What is the Grammar of Graphics?
✔️ A structured approach to designing visualizations
✔️ Ensures consistency in designing graphs
✔️ Used in tools like ggplot2 in R
Helps in creating clear visualizations.
What are common data issues in data pre-processing?
❌ Missing Values
❌ Duplicates
❌ Inconsistent Data
❌ Noise & Outliers
Solutions include filling missing values and standardizing formats.
What is feature engineering in data pre-processing?
✔️ Feature Selection → Keep important variables
✔️ Feature Transformation → Convert data into better formats
Example: Standardizing price and carat size in a diamonds dataset.
Fill in the blank: The package used for creating visualizations in R is _______.
[ggplot2]
What is the purpose of a scatter plot in data visualization?
Helps us see the relationship between two variables
Example: Carat vs. Price in diamond datasets.
True or False: Cleaning data is essential for accuracy.
True
What are the key takeaways from the data exploration and visualization process?
✔️ Data exploration helps us understand patterns
✔️ Visualization is key for discovering insights
✔️ Choosing the right chart aids interpretation
✔️ The Grammar of Graphics helps create structured visualizations
✔️ Cleaning data is essential for accuracy
When do we use Machine Learning?
When no direct formula exists to solve a problem and when we have data that can help find patterns.
Example: Predicting customer purchase behavior.
What is Supervised Learning?
A type of ML where the model is trained on labeled data, learning from known answers.
What are key features used in Supervised Learning?
- Buying Price
- Maintenance Cost
- Number of Doors
- Seating Capacity
- Luggage Boot Size
- Safety Rating
What is Predictive Modeling?
When ML learns patterns from data to make predictions.
What is the difference between Regression and Classification?
- Regression → Predicts continuous values (e.g., house prices).
- Classification → Assigns data into categories (e.g., spam or not spam).
What is Linear Regression?
A method to find the best-fit line Y = mx + c, where c is the intercept and m is the slope.
What are the limitations of Linear Regression?
- Not good for non-linear relationships
- Not good when there are too many outliers.
What is a Decision Tree?
A flowchart-like structure where each decision leads to an outcome.
What is the process of creating a Decision Tree?
- Pick the best feature
- Split the data into groups
- Keep splitting until groups are pure.
What is Random Forest?
A collection of multiple decision trees to improve accuracy and reduce overfitting.
How does Random Forest work?
- Train many Decision Trees on random data subsets
- Use different features at each split
- Combine all tree predictions.
What is k-Nearest Neighbors (k-NN)?
A method that classifies new data points based on the ‘k’ closest points in the dataset.
What is the process for k-NN classification?
- Choose k
- Compute distances from the new point to all training points
- Find the k closest neighbors
- Assign the most common class label or take the average.
What is a limitation of k-NN?
It is slow for large datasets.
List the main concepts of Supervised Learning.
- Uses labeled data
- Regression vs. Classification
- Linear Regression
- Decision Trees
- Random Forest
- k-NN
What is the goal of Linear Regression?
To find the best-fit line that represents the relationship between variables.
True or False: Decision Trees can overfit.
True
Fill in the blank: Random Forest is an army of _______.
[Decision Trees]
What is Unsupervised Learning?
Unsupervised learning is a machine learning approach where the model learns patterns, structures, or groupings in the data without labeled outputs.
Key characteristics include working with unlabelled data, finding hidden structures, and being used for clustering and dimensionality reduction.
What are the key characteristics of Unsupervised Learning?
- Works with unlabelled data (no predefined categories)
- Finds hidden structures in data
- Used for clustering & dimensionality reduction
Example: Grouping similar books in an unorganized library.
What is Clustering?
Clustering is a method in unsupervised learning that groups similar data points together.
Within a cluster, data points are similar; in different clusters, they are dissimilar.
Why is Clustering used?
- Data Reduction
- Outlier Detection
- Data Segmentation
Clustering helps summarize large datasets, identifies unusual patterns, and groups customers by behavior.
What are some real-world applications of Clustering?
- Social Network Analysis
- Image Segmentation
- Data Annotation
Examples include grouping users based on interests and dividing images for medical imaging.
What are the steps in Clustering?
- Define a distance metric to measure similarity
- Form clusters by grouping similar data points
- Maximize within-cluster similarity, minimize between-cluster similarity
Common distance metrics include Euclidean and Manhattan distances.
What is K-Means Clustering?
K-Means is a partition-based clustering algorithm that groups data into k clusters.
It is one of the most popular clustering algorithms.
How does K-Means Clustering work?
- Choose the number of clusters (k)
- Select k random points as initial centroids
- Assign each data point to the nearest centroid
- Recalculate centroids by finding the mean of each cluster
- Repeat until centroids stop changing
Example: Grouping customers into low, medium, and high spenders.
What is the Elbow Method in K-Means?
The Elbow Method involves plotting the Within-Cluster Sum of Squares (WCSS) and looking for the ‘elbow’ point where adding more clusters stops improving the fit significantly.
The bend in the curve indicates the optimal number of clusters (k).
What are the strengths of K-Means Clustering?
- Simple and efficient
- Works well for large datasets
K-Means is favored for its speed and ease of use.
What are the weaknesses of K-Means Clustering?
- Requires predefined k
- Sensitive to initialization
- Struggles with non-globular clusters
These limitations can affect the clustering results.
What is Hierarchical Clustering?
Hierarchical Clustering builds a tree-like structure (dendrogram) instead of predefined partitions.
It allows for a more flexible grouping of data.
How does Hierarchical Clustering work?
- Start with each data point as its own cluster
- Merge the closest clusters based on a chosen distance metric
- Repeat until one large cluster remains
Agglomerative Clustering is a bottom-up approach.
What are common distance metrics used in Hierarchical Clustering?
- Single Linkage
- Complete Linkage
- Centroid Distance
These metrics help determine how clusters are formed.
What are the strengths of Hierarchical Clustering?
- No need to specify k beforehand
- Creates arbitrarily shaped clusters
This flexibility is an advantage over K-Means.
What are the weaknesses of Hierarchical Clustering?
Hierarchical Clustering is computationally expensive for large datasets.
This can limit its practicality in big data scenarios.
What are the key takeaways from the study of Unsupervised Learning and Clustering?
- Unsupervised learning finds patterns in unlabeled data
- Clustering groups similar data points together
- K-Means is a fast, efficient partition-based method
- Hierarchical clustering builds a tree-like structure
- Choosing the right k is crucial for effective clustering
These points summarize the fundamental concepts of clustering.
What is the main difference between Traditional Machine Learning and Deep Learning?
Traditional Machine Learning involves manually selecting features, while Deep Learning learns features automatically from raw data.
Example: K-Means Clustering for ML vs. Neural Networks for DL.
What is the purpose of Gradient Descent in machine learning?
To reduce errors in predictions and improve the model’s accuracy.
This process makes AI smarter by adjusting weights based on predictions.
What are the steps involved in the Backpropagation process?
- Forward Pass
- Compute Loss
- Backpropagation
- Gradient Descent
These steps help the model adjust weights to minimize prediction errors.
True or False: Neural Networks are inspired by the structure of the human brain.
True
Neural networks use artificial ‘neurons’ to process information.
What are the three types of layers in a Neural Network?
- Input Layer
- Hidden Layers
- Output Layer
Each layer plays a specific role in processing data and making predictions.
Fill in the blank: The equation for a prediction in a neural network is Prediction = Input × Weight → _______.
Activation Function
This function determines the output based on the weighted input.
What are the two types of Loss Functions mentioned?
- Binary Cross-Entropy
- Categorical Cross-Entropy
These functions help determine how well the model is performing.
What is the role of an Optimizer in deep learning?
To help the AI learn faster by adjusting learning rates and strategies.
Examples include SGD and Adam Optimizer.
What dataset is used in the example of Handwritten Digit Recognition?
MNIST Dataset
This dataset contains 60,000 training images and 10,000 test images.
What is the first step in building a simple neural network using Keras?
Import Libraries
Essential libraries include Sequential, Dense, np_utils, and mnist.
What is the purpose of normalizing pixel values in the MNIST dataset?
To improve training efficiency by scaling values from 0-255 to 0-1.
This normalization helps the model learn better.
What is One-Hot Encoding used for in the context of neural networks?
To convert labels into a format the neural network can understand.
This is crucial for multi-class classification.
What activation function is used in the first layer of the example neural network?
ReLU (Rectified Linear Unit)
This function helps in learning complex patterns.
What metric is used to evaluate the performance of the model on test data?
Accuracy
This metric indicates how well the model predicts unseen data.
True or False: Keras simplifies the process of building neural networks.
True
It provides an easy-to-use interface for model creation.
What is the goal of training a neural network on the MNIST dataset?
To correctly predict handwritten digits (0-9).
This involves optimizing the model through various epochs.
What is the goal of Backpropagation and Gradient Descent?
Minimize the error between predicted and actual outputs.
What are the steps involved in Backpropagation?
- Forward Pass: Compute predictions
- Compute Loss: Measure the error
- Backpropagation: Calculate gradients
- Gradient Descent: Adjust weights to reduce error
What do activation functions prevent in neural networks?
They prevent neural networks from behaving like linear regression models and allow them to learn complex relationships.
What is the equation without activation functions?
output = dot(W, input) + b
What is the equation with activation functions?
output = ReLU(dot(W, input) + b)
What is a Linear Activation Function?
Output = Input (Straight line)
What is the main issue with the Sigmoid activation function?
Vanishing Gradient – When values go beyond ±3, the gradient becomes tiny, and learning slows down.
What does the Softmax activation function do?
Converts values into probabilities that sum to 1.
What is an example of Softmax output?
- Cat: 70%
- Dog: 20%
- Bird: 10%
What are the benefits of the Tanh activation function?
Maps inputs between -1 and 1, allowing negative values.
What is a key problem with Tanh?
Still suffers from vanishing gradient for large values.
What distinguishes ReLU from other activation functions?
Only activates for positive values; negative inputs become 0.
What is a problem associated with ReLU?
Dead Neurons – if a neuron only gets negative values, it stops learning.
How does Leaky ReLU address the dead neuron problem?
Gives negative values a small value instead of zeroing them out.
What is the best use case for the Sigmoid activation function?
Probability outputs (binary classification).
What problems are associated with Softmax?
Can be computationally expensive.
When is Tanh best utilized?
In situations where negative values are needed.
What are the common uses for ReLU?
Deep learning models (default choice).
What is the main advantage of Leaky ReLU?
Prevents dead neurons and improves training.
Fill in the blank: Activation functions allow _______ models to learn complex data.
[deep learning]
What is the most commonly used activation function for hidden layers?
ReLU
What activation function is best for multi-class classification?
Softmax
What is the purpose of activation functions?
They enable deep networks to learn complex data relationships.
Activation functions are crucial for introducing non-linearity into the model.
What is a key limitation of linear activation functions?
They can’t handle complex patterns.
Linear activation functions are insufficient for deep learning tasks.
Name a common activation function used for probabilities.
Sigmoid
Sigmoid is often used in binary classification tasks.
What is the main drawback of the sigmoid activation function?
It suffers from vanishing gradients.
This can slow down the learning process in deep networks.
Which activation function is used in multi-class classification?
Softmax
Softmax converts logits to probabilities for multiple classes.
What range does the Tanh activation function map inputs to?
-1 and 1.
Tanh is faster than sigmoid and can help with gradient flow.
What is the most commonly used activation function?
ReLU
ReLU helps avoid vanishing gradients, making it popular in deep networks.
What problem does Leaky ReLU address?
The ‘dying neuron’ problem.
Leaky ReLU allows small negative values to keep neurons active.
What is the purpose of a loss function?
It measures model error.
Loss functions provide feedback on the model’s performance.
What type of loss function is Log Loss?
Cross-Entropy Loss.
It is commonly used for classification tasks.
What loss function is used when there are only two classes?
Binary Cross-Entropy.
This is applicable in scenarios like cat vs. dog classification.
What is the formula for Cross-Entropy Loss?
L = - (y log(y_pred) + (1 - y) log(1 - y_pred))
This formula calculates the loss based on predicted and actual values.
What is backpropagation used for?
To adjust model weights to reduce error.
It calculates the contribution of each weight to the total error.
Define Gradient Descent.
Step-by-Step Learning.
It is an optimization algorithm used to minimize the loss function.
What is Stochastic Gradient Descent (SGD)?
Updates weights after each training sample.
This can lead to faster convergence but may introduce noise.
What is Mini-Batch Gradient Descent?
Updates weights using small batches.
This method balances speed and stability during training.
What is the learning rate’s role in gradient descent?
Controls how big the update steps are.
A proper learning rate is crucial for effective training.
What does momentum do in gradient descent?
Helps avoid local minima by adding past weight updates to the current one.
This technique accelerates convergence.
What is dropout in the context of model training?
Randomly removes neurons during training.
This prevents over-reliance on certain features.
What does regularization do?
Penalizes overly complex models to encourage generalization.
This helps to avoid overfitting.
What is early stopping?
Stops training when validation loss stops improving.
This technique helps prevent overfitting by halting training at the right time.
What are epochs in machine learning?
Number of times the entire dataset is passed through the model.
More epochs can improve learning but may also lead to overfitting.
What is the batch size in gradient descent?
Number of samples used per gradient update.
The choice of batch size can affect training speed and model performance.
Name a common optimizer used in machine learning.
Adam
Adam is known for its adaptive learning rate and is effective for many tasks.
What is the main takeaway regarding activation functions?
They enable deep learning.
Activation functions are essential for neural networks to learn complex patterns.
True or False: Dropout and regularization help prevent overfitting.
True
These techniques are widely used in training models to improve generalization.
What are the two main types of supervised machine learning?
Classification and Regression
Classification involves categorical output, while regression predicts numerical output.
What does regression in AI predict?
Continuous values based on input data
Example: Predicting house prices based on various factors.
How does regression work in AI?
AI learns a function y = f(x) to predict Y
X represents input variables like house size, location, etc.
What is Mean Squared Error (MSE)?
A loss function that measures how far off AI’s predictions are
Formula: MSE = (1/n) ∑(y_actual - y_predicted)².
What does a lower MSE indicate?
A better AI model
What is a Simple Artificial Neural Network (ANN) for regression composed of?
Layers of neurons making decisions
What is the first step in building a regression model?
Load & Prepare Data
What is Min-Max Scaling?
Rescales data to [0,1]
What is Z-Score Normalization?
Makes data have mean = 0, std dev = 1
What is the activation function used in the hidden layers of the regression model?
ReLU
What optimizer is used when compiling the regression model?
Adam
What is Text Classification?
AI sorts text into categories
What is the example dataset used for text classification?
IMDB movie reviews
How many reviews are in the IMDB dataset?
50,000 highly polarized reviews
What is the first step in preprocessing text for AI?
Convert words into a dictionary of numbers
What does One-Hot Encoding do?
Turns words into 0s & 1s
What is the input layer dimension for the FCN used in sentiment analysis?
10,000-dimensional input
What activation function is used in the output layer for binary classification?
Sigmoid
What is the loss function used for the IMDB dataset?
Binary Crossentropy
What task is performed with the Reuters dataset?
Classify short news articles into 46 different categories
How is the Reuters classification different from IMDB classification?
46 categories instead of 2
What activation function is used in the final layer for multi-class classification?
Softmax
What loss function is used for the Reuters dataset?
Categorical Crossentropy
What is the main purpose of regression in the context of the Boston Housing Prices example?
Predict continuous values
What are the key components of text classification for IMDB and Reuters?
- Predicts categories
- One-Hot Encoding
- Activation functions: ReLU (hidden), Sigmoid (IMDB), Softmax (Reuters)
- Loss functions: Binary Crossentropy (IMDB), Categorical Crossentropy (Reuters)