Definitions Flashcards
Edge AI
A technique of having the AI model deployed on the device. For eg: when the AI model is deployed on a drone or CCTV camera
JetSon Nano
Nvidia’s hardware containing CPU and GPU that will be put on the devices to execute the AI on the data. Implementation of Edge AI
RAG
Retrieval augmented generation
RAG (Retrieval-Augmented Generation) AI is an advanced method for improving the capabilities of natural language models like GPT. It combines the strengths of pre-trained generative models with information retrieval systems to produce more accurate, contextually relevant, and up-to-date answers.
Here’s how it works:
1. Retrieval: In a RAG system, the model first retrieves relevant documents or pieces of information from an external database or knowledge base. This external information helps the model fill in any gaps in its own knowledge and grounds its responses in more factual or up-to-date sources. 2. Generation: Once the relevant information is retrieved, the generative model processes it and uses it to craft a response. This means that instead of relying purely on what was learned during training, the model dynamically incorporates fresh, context-specific knowledge.
This combination is particularly useful when dealing with questions that require real-time or highly specific information, ensuring that responses are both coherent and factually reliable.
Chatbot categories
- Flow based
a. Rule based. (Can be achieved with programming)
b. AI/NLP (using chatbot frameworks such as DialogFlow, RASA, Amazon Lex, botpress) or custom implementation using LLM/Langchain - Open Ended
DialogFlow
Chatbot framework by Google
Collaborative filtering
Recommendation technique that suggests item to users based on the preference and behavior of similar users
K nearest neighbour
Machine learning algorithms to find the nearest neighbour based on set of features
Recommendation system
Popular application of AI which recommends relevant content to users based on their past activities
Two approaches to build a recommendation system
Content based filtering and collaborative filtering
10 stages of AI project lifecycle
- Requirments
- Data collection
- Data preparation
- EDA : Exploratory data analysis - Process of exploring the data to identify the pattern and structures
- Feature engineering : Process of selecting the right attributes of the data and tranforming it to train the model, this step is like selecting the right ingredients for making the required pizza
- Monitoring the feedback loop
- Model deployment
- Model fine tuning
- Model evaluation
- Model selection and training
Map and Reduce
Map and Reduce: Map is the process used in distributed computing to assign work to different computers on cloud and Reduce is the process of collecting individual results and aggregating them to final answer.
Feature Engineering
Feature Engineering: Feature Engineering is a process of transforming raw data into meaningful features (which can be new columns) such that these features help in improving the performance of the model being trained. Domain understanding and math/statistics can be used for doing feature engineering.
Scikit
Scikit Learn: A library used to perform feature learning in machine learning problems.
Scaling
Scaling: A technique used to adjust the data points to a scale that can be easily interpreted such as a scale between 0 to 1.
Accuracy (Model Evaluation Metrics)
Accuracy: The percentage of correct
predictions made with respect to overall
predictions .
Accuracy and precision are both metrics used to evaluate the performance of a model, but they measure different aspects of prediction quality. Here’s the difference:
Accuracy:
• Definition: Accuracy refers to the overall correctness of a model’s predictions. It measures how often the model correctly predicts the true class (both positives and negatives). • Formula:
Accuracy = (True Positives + True Negatives)/Total Predictions
• Example: If a model predicts 90 correct results out of 100 total predictions, the accuracy is 90%.
Accuracy is best used when the dataset is balanced, meaning the number of instances in each class is roughly equal.
Precision (Model Evaluation Metrics)
Precision: The percentage of correct predictions made for a given class (e.g.class=“dog”) with respect to all predictions which resulted in the value of a given class (e.g.class=“dog”).
A system with high precision might leave some good items out, but what it returns is of high quality
Recall (Model Evaluation Metric)
Recall: The percentage of correct predictions made for a given class (e.g. class= “dog”) with respect to total number of instances of the class (e.g. class=“dog”)
A system with high recall might give you a lot of duds, but it also returns most of the good items
Recall, also known as sensitivity or true positive rate, is a model evaluation metric that measures the ability of a model to correctly identify all relevant instances within a dataset. It focuses on how well the model can detect true positives (actual positive cases) out of all the actual positives in the dataset.
Formula:
Recall = True Positives/(True Positives + False Negatives)
Where:
• True Positives (TP): Cases where the model correctly predicts the positive class. • False Negatives (FN): Cases where the model incorrectly predicts the negative class, missing actual positives.
Interpretation:
• A high recall means that the model is able to detect most of the actual positive cases, but it may also increase the number of false positives (incorrectly identifying negatives as positives). • A low recall means the model is missing many true positives, failing to identify a significant portion of actual positive cases.
Use Case:
Recall is particularly important in scenarios where missing positive cases is costly or critical, such as:
• Medical Diagnoses: Missing a positive diagnosis (e.g., cancer) can have severe consequences, so high recall is prioritized. • Fraud Detection: Detecting fraudulent transactions is critical, even if it means flagging some legitimate transactions (higher false positives).
Trade-Off:
Recall often has a trade-off with precision, which measures how many of the positive predictions made by the model are actually correct. In scenarios where both metrics are important, the F1 score is used to balance recall and precision.
F1-Score (Model Evaluation Metric)
F1-Score: F1-Score is a harmonic mean of precision & recall, providing a single metric that balances both .
Ground Truth / Labeled data set
Data set that is labeled and used for training
Gradient Descent
Algorithm that internally updates / corrects itself based on the feedback
Gradient descent is an optimization algorithm used to minimize a function by iteratively moving towards the minimum of the function. It’s commonly used in machine learning to find the optimal parameters of a model, such as the weights in a neural network.
Here’s how it works:
1. Initialization: Start with a set of random parameters (weights). 2. Compute the gradient: At each iteration, calculate the gradient of the loss function with respect to each parameter. The gradient tells us the direction of the steepest ascent (increase) in the loss function. 3. Update the parameters: Move the parameters in the opposite direction of the gradient (steepest descent) by a small step, called the learning rate. This reduces the loss. 4. Repeat: Keep repeating the process until the parameters converge to values where the loss is minimized (or sufficiently small).
The key idea is that each step gets you closer to the optimal solution by gradually lowering the loss function. If the learning rate is too small, the process can be slow, and if it’s too large, it may overshoot the minimum.
Gradient descent can take several forms:
• Batch gradient descent: Uses the entire dataset to compute the gradient at each step. • Stochastic gradient descent (SGD): Uses a single data point at each step. • Mini-batch gradient descent: Uses a small batch of data points at each step, which is a compromise between the two.
It’s fundamental to training many machine learning models, especially deep learning models.
Inference
Process of using a trained ML model to make predictions or decisions based on new, unseen data
Machine learning
A discipline in computer science where we train machines on data so that they can make predictions without explicit programming
Model
Its a file that contains the logic build based on the training of input and output data provided to it
ML Inference
Phase where fresh input is provided to model to get an output
OR
Process of using a training ML model to make predictions or decisions based on new, unseen data
ML training
Phase where input and output data is provided for training the model. The output is model that is saved as a file
Labeled dataset
A collection of data where each instance is tagged with the correct answer or outcome, used to train machine learning models
binary classification
A machine learning task type where the output ML model is based on the input data provided can be classified as one of the two outputs
Multiclass classification
A machine learning task type where the output ML model is based on the input data provided can be classified to one of the many outputs
Regression
A machine learning task type that involves predicting a continuous numerical outcome, such as house prices or stock values
Regression mathematical technique
A mathematical technique that finds a best fit line from where the distance to most of the data points is minimum
Total error
the distance from the best fit line to the data points
Gradient descent
Optimisation technique that is used to find the best fit line
Categorical output
Relating to data that can be divided input specific groups or categories, such as “red” or “blue”, “male” or “female”
Continuous output
Relating to numerical data that can take on any value within a range, often represented by intervals on a scale
Machine learning task types
Machine learning tasks can be broadly categorized based on the type of problem the model is solving or the outcome it is expected to achieve. Here are the key types of machine learning tasks:
- Classification:• Description: The goal of a classification task is to predict a discrete label or category for each input data point. The model assigns data points to predefined classes or categories.
• Examples:
• Spam detection (Spam or Not Spam).
• Image classification (e.g., Dog, Cat, Bird).
• Algorithms: Decision Trees, Support Vector Machines (SVM), Neural Networks, Logistic Regression. - Regression:• Description: Regression tasks aim to predict a continuous value for a given input. This is useful when the output is a numerical value rather than a class.
• Examples:
• Predicting house prices.
• Estimating the demand for a product over time.
• Algorithms: Linear Regression, Ridge Regression, Support Vector Regression (SVR), Neural Networks. - Clustering:• Description: Clustering is an unsupervised learning task where the goal is to group data points into clusters based on similarities. The clusters are not predefined, and the model finds patterns and groupings from the data.
• Examples:
• Customer segmentation (grouping customers based on purchasing behavior).
• Document clustering (organizing articles based on topic).
• Algorithms: k-Means, Hierarchical Clustering, DBSCAN. - Dimensionality Reduction:• Description: The task of reducing the number of input variables or features in a dataset while retaining important information. Dimensionality reduction helps in simplifying models and visualizing data.
• Examples:
• Reducing the features in a dataset with hundreds of columns.
• Visualizing high-dimensional data in 2D or 3D.
• Algorithms: Principal Component Analysis (PCA), t-SNE, Autoencoders. - Anomaly Detection:• Description: The goal is to identify outliers or unusual data points that do not conform to the general pattern of the data. This is particularly useful in identifying rare events or detecting fraud.
• Examples:
• Fraud detection in financial transactions.
• Identifying defective products in a manufacturing process.
• Algorithms: Isolation Forest, One-Class SVM, Autoencoders. - Recommendation:• Description: A task where the model predicts user preferences or suggests items based on patterns learned from historical data. These systems often use user behavior to recommend products, content, or actions.
• Examples:
• Movie recommendations (e.g., Netflix, YouTube).
• Product suggestions in e-commerce (e.g., Amazon).
• Algorithms: Collaborative Filtering, Matrix Factorization, Content-Based Filtering. - Reinforcement Learning:• Description: In reinforcement learning, an agent learns to take actions in an environment to maximize cumulative rewards over time. The task is sequential, and the agent must balance exploration (trying new actions) with exploitation (leveraging known actions).
• Examples:
• Game playing (e.g., AlphaGo, chess).
• Robotics (e.g., a robot learning to navigate obstacles).
• Algorithms: Q-Learning, Deep Q-Networks (DQN), Policy Gradients. - Sequence Prediction:• Description: Sequence prediction tasks involve predicting the next item in a sequence, such as time series data, language processing, or DNA sequences.
• Examples:
• Stock price prediction (time series forecasting).
• Language modeling (predicting the next word in a sentence).
• Algorithms: Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Transformers. - Structured Output:• Description: This involves predicting more complex outputs that have internal structure, such as trees, sequences, or graphs. Instead of a simple label or value, the output is a structured object.
• Examples:
• Semantic segmentation in images (labeling every pixel).
• Parsing sentences into grammatical structures.
• Algorithms: Conditional Random Fields (CRFs), RNNs, Transformers. - Ranking:• Description: The goal is to order items based on relevance or importance. Ranking models are often used in search engines or recommendation systems where results are presented in a specific order.
• Examples:
• Web search result ranking (Google’s search results).
• Product ranking in an e-commerce platform.
• Algorithms: Learning to Rank (LTR), Gradient Boosted Decision Trees (GBDT).
Each of these task types has specific applications and requires different machine learning techniques depending on the data, problem context, and the goal of the model.
Supervised machine learning
A type of machine learning where the model is trained on a labeled dataset, learning to make predictions based on input-output pairs
unsupervised machine learning
A type of machine learning that deals with input data without labeled responses, aiming to find hidden structures or patterns in the data
Provide unlabeled dataset to an ML program and then it learns to identify patterns and structures in data without any explicit guidance
Supervised ML algorithm examples
Algorithms used for classification problems - Random Forest
XG Boost
Decision trees
Algorithms used for regression problems - Linear Regression
Unsupervised ML algorithm examples
DB Scan
K-mean
Hierarchical Clustering
supervised machine learning types
Can be divided into regression and classification
Few types of regression algo (supervised learning)
Linear regression
Polynomial regression
Few types of classification algo (supervised learning)
Logisitic regression
Decision tree
Random Forest
XG Boost
Few types of unsupervised learning algorithms
K-means
DB scan
Hierarchical clustering
Decision tree (supervised learning algorithm - classification type)
A machine learning algorithm which predicts the outcome based on input features by classifying the data into iterative branches with each node acting like a decision point
A Decision Tree is a supervised machine learning algorithm used for both classification and regression tasks. It works by splitting the dataset into subsets based on feature values, making decisions that lead to a tree-like structure of rules. Each node in the tree represents a decision based on a particular feature, and the branches represent the outcomes of that decision, ultimately leading to a prediction.
Key Concepts of Decision Trees:
1. Root Node: • The first node of the tree that represents the entire dataset. The tree starts with a decision on the most important feature that best splits the data. 2. Internal Nodes: • Intermediate decision points based on feature values. Each node asks a question or applies a condition about one of the input features. 3. Leaf Nodes: • The final nodes that provide the prediction, whether it’s a class label in classification tasks or a continuous value in regression tasks. 4. Splitting: • The process of dividing a node into two or more sub-nodes based on a condition or threshold on a feature. The goal is to split in such a way that the resulting subsets are as “pure” as possible (i.e., the samples within each subset belong to the same class or have similar values). 5. Decision Criteria (Impurity Measures): • Decision trees use different criteria to decide the best feature to split on: • Gini Index (used in classification): Measures the impurity of a dataset. A Gini index of 0 means the node is pure (contains only one class). • Entropy (used in classification): Another measure of impurity, based on information gain. Lower entropy means higher homogeneity. • Variance Reduction (used in regression): Measures the reduction in variance when a node is split, ensuring that similar values are grouped together. 6. Pruning: • Pruning is a technique used to avoid overfitting by removing nodes that provide little to no additional value. It simplifies the tree, making it more generalized for unseen data.
Decision Tree Algorithm Workflow:
1. Select the best feature to split the data based on the chosen decision criterion (Gini, entropy, etc.). 2. Split the dataset into subsets based on the chosen feature and its threshold. 3. Repeat the process recursively for each subset, creating new nodes and branches. 4. Stop when one of the stopping conditions is met, such as when the data cannot be split further, or a maximum tree depth is reached.
Advantages:
• Easy to interpret: Decision trees are simple and intuitive. You can visualize the flow of decisions from the root to the leaves. • Non-parametric: Decision trees do not assume a linear relationship between features and target values, so they can model complex data. • Handles both categorical and continuous data: They can be used for classification and regression tasks.
Disadvantages:
• Overfitting: Decision trees are prone to overfitting, especially when the tree becomes very deep. Pruning helps mitigate this issue. • Instability: Small changes in the data can lead to different splits, drastically changing the structure of the tree. • Bias towards dominant features: If one feature dominates, it might overshadow other important features.
Applications of Decision Trees:
• Credit scoring: To evaluate whether a loan applicant is likely to repay a loan. • Medical diagnosis: To classify patients based on their symptoms. • Customer segmentation: To classify customers into different categories for marketing purposes. • Fraud detection: To identify whether a transaction is fraudulent.
Variations:
• Random Forest: An ensemble method that builds multiple decision trees and combines their predictions to improve accuracy and reduce overfitting. • Gradient Boosted Decision Trees (GBDT): Another ensemble method where trees are built sequentially, with each tree correcting the errors of the previous ones.
In summary, decision trees are powerful tools for classification and regression tasks, easy to interpret, and flexible. However, without proper tuning and pruning, they can be prone to overfitting and may require ensemble methods for better generalization.
Entropy
Entropy measures the impurity or disorder in the dataset. Lower the entropy, lower the disorder
Jupyter notebook
Is an interactive computing environment that enables users to create and share documents containing live code, equations, visualizations, and text
Naive bayes classification task ML algorithms
Used for text data and small datasets
Random forest classification task ML algorithm
Used for complex data sets, overfitting is a concern
Decision tree classification task ML algorithm
Its easy to interpret and explain, decision analysis
XGBoost classification task ML algorithm
Better predictive accuracy, robust
Tools for ML
Scikit learn is main python library used for machine learning
Google Collab is alternative to Jupyter notebook which helps to run machine learning models in google cloud
Amazon sagemaker is another famous option which allows to run ML model in cloud
Azure ML is another option from Microsoft
features
input variables used by algorithm to make predictions or decisions
backward error propagation
Mistake feedback is propogated back in neural network
Deep learning
It is a machine learning technique that uses neural network similar to human brain to recognize patterns and make decisions
Statistical ML usage
The regular ML such as supervised, unsupervised learning
used when the features are simple
used when data is structured
used when data volume is low
used for small datasets, limited compute resources, interpretability requirements
Deep learning usage
Used when the features are complex
Used when data is unstructured
Used when data volume is high
Used for big datasets, complex features, images, videos, audio, and require high compute resources
Neural network architectures
- Feed forward neural network (FNN)
- Recurrent neural network (RNN)
- Convolution neural network (CNN)
- Transformers
Feed forward neural network (FNN)
used for making decisions on a set of inputs
Best for static, structured data
use cases: temperature forecast, credit scoring
Recurrent neural network (RNN)
one of the popular architectures to build neural networks. Network with cycles
Best for text, time series data
use cases: language translation, speech to text
Convolutional neural network
Feed-forward network with convolutional filters
Best for images, video
use cases: image classification, object detection like facial recognition.
It works by breaking down the task into smaller pieces and combines the individual results to derive a final decision
transformer
self-attention mechanism
architecture used in ChatGPT
GPT
Generative pre-trained transformer
Its a gen AI model which abbreviates to generative pre-trained transformer
Best for text, time-series, generative AI
Use cases: text summary, Q&A, translation
Pytorch
library used for building ML projects using deep learning techniques and is built by meta
tensorflow
library used for building ML projects using deep learning techniques and is built by Google
Machine learning classifications
Machine learning can be classified into several types based on how the algorithm learns and the kind of feedback it receives. The primary categories include:
- Supervised Learning:
In supervised learning, the model is trained using labeled data. That means both the input and the correct output are provided to the algorithm, and the model learns to map inputs to outputs. The goal is for the model to predict the output when new, unseen inputs are given.
• Examples: Classification (e.g., spam detection), Regression (e.g., predicting house prices). • Algorithms: Linear Regression, Decision Trees, Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), Neural Networks.
- Unsupervised Learning:
In unsupervised learning, the model is provided with input data without labeled outputs. The goal is to identify patterns, structures, or relationships in the data without explicit guidance.
• Examples: Clustering (e.g., customer segmentation), Association (e.g., market basket analysis). • Algorithms: k-Means Clustering, Hierarchical Clustering, Principal Component Analysis (PCA), Autoencoders.
- Semi-Supervised Learning:
This is a combination of supervised and unsupervised learning. The model is trained on a small amount of labeled data along with a large amount of unlabeled data. It’s useful when labeling data is expensive or time-consuming.
• Examples: Web page classification, Medical image analysis. • Algorithms: Self-training, Co-training.
- Reinforcement Learning:
In reinforcement learning, an agent interacts with an environment, and for each action, it receives rewards or penalties. The goal is to learn a policy that maximizes cumulative rewards over time. This type of learning is commonly used in robotics, gaming, and control systems.
• Examples: Game playing (e.g., AlphaGo), Autonomous driving. • Algorithms: Q-Learning, Deep Q-Networks (DQN), Policy Gradient Methods.
- Self-Supervised Learning:
In self-supervised learning, the system generates its own labels from the input data. The task is set up in such a way that the model learns useful representations without explicit labels (e.g., predicting the next frame in a video or filling in missing words in a sentence).
• Examples: Natural language processing, Computer vision. • Algorithms: Transformers, BERT (Bidirectional Encoder Representations from Transformers), SimCLR.
- Transfer Learning:
Transfer learning involves taking a pre-trained model (usually from a different but related task) and fine-tuning it for a new task. This is especially useful when there is limited data for the new task.
• Examples: Image recognition (using models trained on large datasets like ImageNet), NLP tasks. • Algorithms: Pre-trained models (e.g., GPT, ResNet).
Each classification method has its own applications and challenges, and the choice of the method depends on the problem at hand, the type of data available, and the desired outcomes.
Convolution filters
Convolutional filters, also known as kernels or feature detectors, are a core component of Convolutional Neural Networks (CNNs) used in deep learning, particularly for image processing and computer vision tasks. A convolutional filter is a small matrix, typically 3x3, 5x5, or 7x7 in size, that is applied to an input (e.g., an image) to extract features like edges, textures, or patterns.
How They Work
When a convolutional filter is applied to an image, it slides (or “convolves”) across the image, performing an element-wise multiplication between the filter values and the pixel values of the image. This process generates an output known as a feature map, which highlights the specific patterns the filter is designed to detect.
Key Concepts:
1. Filter Size: The size of the filter (e.g., 3x3 or 5x5) defines the area of the input image that is considered in each step of the convolution. Smaller filters capture finer details, while larger ones can capture more abstract patterns. 2. Strides: The number of pixels by which the filter moves over the image. A stride of 1 moves the filter one pixel at a time, while a stride of 2 skips one pixel in each move, reducing the size of the output. 3. Padding: Padding adds a border of pixels (typically zeros) around the input image to control the spatial dimensions of the output. Without padding, the image shrinks as the filter slides over it. 4. Activation Maps: The result of applying a convolutional filter is an activation or feature map, which shows how strongly the image responds to the specific pattern the filter is detecting. 5. Multiple Filters: CNNs use multiple filters to capture different features. Early layers may capture basic features like edges or corners, while deeper layers capture more complex patterns like shapes, objects, or textures.
Example:
For instance, a simple 3x3 convolutional filter might detect vertical edges in an image. As it moves across the image, it responds strongly where vertical edges are present and weakly or not at all in other areas.
In CNNs, these filters are learned through training, meaning they automatically adjust their values to detect the most important features for the task, such as classifying images or detecting objects.
Geometric mean
In simple terms, the geometric mean is a way to find the average of a set of numbers, but instead of adding them up, you multiply them and then find a root (like square root, cube root, etc.). It’s useful when the numbers are related to each other in a multiplicative way, such as growth rates or percentages.
Use Case Example: Investment Growth
Suppose you have an investment that grows by 10% in the first year, 50% in the second year, and 30% in the third year. These are percentage growth rates, and using the arithmetic mean wouldn’t give an accurate picture of the overall growth over time because percentages compound (multiply) on each other.
Instead, you’d use the geometric mean to better reflect the true average growth rate:
1. Convert the percentages to decimal factors: • 10% → 1.10 • 50% → 1.50 • 30% → 1.30 2. Multiply them together:
1.10 * 1.50 * 1.30 = 2.145
3. Find the cube root (since there are 3 numbers):
√(1.10 * 1.50 * 1.30) ~ 1,283
4. Convert back to a percentage:
1.283 - 1 = 0.283 = 28.3\%
This means that the average growth rate over the three years is about 28.3%, which more accurately represents the compounding effect of the growth.
In summary, the geometric mean is especially useful when you’re dealing with percentages or ratios that multiply over time.