AI Fundamentals Flashcards
AI Intro
AI systems are systems that possess the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It can learn, reason, act and adapt.
Types
General/Strong AI - mimics human-like intelligence
Weak / Narrow AI - solutions designed to solve specific problems. Also called ML which is the process of applying computer algorithms to capture the behavioral patterns of system and processes based on input and output data collected from these systems. MLs are algorithms whose performance improve as they are exposed to more data over time. Deep Learning is a subbset of ML where multilayered Neural Networks learn from vast amounts of data.
AI Models and How to Build Them
A model is a simplified representationn of a process or system. to build a model, you need to :
- Define the problem - State clearly pain point/objective, State benefits, define success ot failure.
- Collect data from process inputs and outputs
- Configure and fit the model - specify the technical problem, select model type, choose best algorithm for the dataset. Fitting involves performing optimization to obtain the best outcomes based on availabe data and objectives.
- Use the model
Parameters and Hyperparameters
Model parameters or coefficients - these are learnt by the algorithm itself from the data.
Hyperparameters are deifned prior to fitting. It Setting hyperparameters is called model tuning and it is slow and costly optimization process.
Before using model, check for overftting and underfitting. Evaluate model performance using holdout approach (ie. holdout train/test splitting)
Sample code - Simple digit recognition
# Select the model appropriate for the task model = DecisionTreeClassifier()
# Train the model model.fit(X=X_train, y=y_train)
# Generate predictions prediction_results = model.predict(X=X_test)
# Test the model evaluate_predictions(y_true=y_test, y_pred=prediction_results)
Three flavors of Machine Learning
Supervised Learning - used to predict categories and quantities based on some input measurements. Algorithms used include Linear and Ridge Regression, and ARIMA models for Regression. Classification algorithms include Logistic Regression, Decision Tree Classifier, Random Forest Classifier
Unsupervised learning - Finding relationships and patterns in data. Used in:
Clustering - algorithms include K-Means,
Anomaly detection - algorithms include Isolation Forest
Dimensionality reduction - algorithms include PCA
In Classification, model learns existing groups while in Clustering, model discovers groups on its own.
Reinforcement Learning - similar to learning by doing using reward and punishment.
Supervised Learning Fundamentals
Common Classification Models:
- Decision Trees
- Logistic Regression
- Support Vector Machine
- Random Forest Classifier
Common Regression Models
- Linear Regression
- Lasso Regression
- Ridge Regression
Training and evaluating classification models
Use train (60%)/test split method(40%) aka holdout method. Code example: from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)
Model Training
Model setup
from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier()
Model fitting/training - model.fit(X_train, y_train)
Testing on testing data - model.predict(X=X_test)
Inspecting model outputs
y_predicted = model.predict(X_test_all)
Is y_predicted == y_true ?
from sklearn.metrics import confusion_matrix confusion_matrix(y_true, y_predicted)
Confusion Matrix
TRUE POSITIVE = the model predicts Yes and the reality is Yes.
TRUE NEGATIVE = model predicts No and the reality is no.
FALSE POSITIVE = model predicts Yes but the reality is no (Type I error).
FALSE NEGATIVE = model preditcs No but the reality is Yes (Type II error).
Accuracy, precision, recall
Metrics:
Accuracy: “How often did I make the correct diagnosis?” Precision: “How often was I correct when I said a person has diabetes?” (= 1 - T1 error)
Recall: “What percentage of actual diabetes cases did my model detect?” (= 1 - T2 error)
Training and evaluating regression models
Difference compared to classication:
Target variable: Numerical (quantities)
Model structure: a line or surface fitted closely to the data, not separating it into regions. Errors are numbers.
Key metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE).
Regression metrics: Code examples # Mean absolute error; range: [-Inf..+Inf] from sklearn.metrics import mean_absolute_error #
Median absolute error; range: [-Inf..+Inf]
from sklearn.metrics import median_absolute_error #
R^2 (coefficient of determination); range: [0..1]
from sklearn.metrics import r2_score
Dimensionality reduction (DR)
Dimensionality reduction is the process of reducing the number of variables under consideration by obtaining a set of principal variables. It is used to prepare data for other Supervised or Unsupervised Learning algorithm and so is a preprocessing step.
Pro's Reduce overtting Obtain independent features Lower computational intensity Enable visualization Con's Compression => Loss of information => loss of performance
Always check model perormance before and after DR to decide whether the sacrifice is worth taking.
Types
Feature selection (B ? A)
Selecting a subset of existing features, based on predictive power
Non-trivial problem: Looking for the best “team of features” , not individually best features!
Feature extraction (B ? A) Transforming and combining existing features into new ones. Linear or non-linear projections.
Common algorithms Linear (faster, deterministic) Principal Component Analysis (PCA) from sklearn.decomposition import PCA
Latent Dirichlet Allocation (LDA) - txt mining
from sklearn.decomposition
import LatentDirichletAllocation
Non-linear (slower, non-deterministic)
Isomap
from sklearn.manifold import Isomap
t-distributed Stochastic Neighbor Embedding (t-SNE)
from sklearn.manifold import TSNE
Principal Component Analysis (PCA)
Family: Linear methods.
Intuition:
Principal components are directions of highest variability in data.
Reduction = keeping only top #N principal components. Assumption: Normal distribution of data.
Caveat: Very sensitive to outliers.
Clustering
Cluster = Group of entities or events sharing similar attributes.
Clustering (AI) = The process of applying Machine Learning algorithms for automatic discovery of clusters.
Popular clustering algorithms
KMeans clustering - Numbers of clusters manually stated
from sklearn.cluster import KMeans
Spectral clustering - Numbers of clusters manually stated
from sklearn.cluster import SpectralClustering
DBSCAN - Numbers of clusters NOT manually stated
from sklearn.cluster import DBSCAN
Cluster analysis and tuning
Unsupervised (no “ground truth” , no expectations)
Variance Ratio Criterion: sklearn.metrics.calinski_harabaz_score
“What is the average distance of each point to the center of the cluster AND what is the distance between the clusters?”
Silhouette score: sklearn.metrics.silhouette_score “How close is each point to its own cluster VS how close it is to the others?”
Supervised (“ground truth”/expectations provided)
Mutual information (MI) criterion: sklearn.metrics.mutual_info_score
Homogeneity score: sklearn.metrics.homogeneity_score
Anomaly detection
Detecting unusual entities or events. Hard to define what's odd, but possible to define what's normal. Use cases Credit card fraud detection Network security monitoring Heart-rate monitoring
Approaches:
Thresholding - For quantities that are fairly stable over time.
Rate of change - Fast changing values, include derivative of target value in the model
Shape monitoring - model behavior in terms of the expected succession of values over time.
Algorithms Robust covariance (simple, fast, assumes normal distribution) from sklearn.covariance import EllipticEnvelope
Isolation Forest (powerful, but more computationally demanding, very slow) from sklearn.ensemble import IsolationForest
One-Class SVM (normality not required, sensitive to outliers, many false negatives)
from sklearn.svm import OneClassSVM
Training and testing
Example: Isolation Forest
from sklearn.ensemble import IsolationForest
algorithm = IsolationForest()
Fit the model algorithm.fit(X)
Apply the model and detect the outliers results = algorithm.predict(X)
Evaluation
from sklearn.metrics \ import (confusion_matrix, precision_score, recall_score)
confusion_matrix(y_true, y_predicted)
Precision = How many of the anomalies I have detected are TRUE anomalies?
Recall = How many of the TRUE anomalies I have managed to detect?
Selecting the right model
Model-to-problem fit
Type of Learning
Target variable dened & known? => Supervised. Classication?
Regression
No target variable, exploration? => Unsupervised. Dimensionality Reduction?
Clustering?
Anomaly Detection?
Defining the priorities Interpretable models Linear regression (Linear, Logistic, Lasso, Ridge) Decision Trees
Well performing models
Tree ensembles (Random Forests, Gradient Boosted Trees)
Support Vector Machines
Articial Neural Networks
Simplicity first!
Using multiple metrics
Satisfying metrics
Cut-off criteria that every candidate model needs to meet.
Multiple satisfying metrics possible (e.g. minimum accuracy, maximum execution time, etc)
Optimizing metrics
Illustrates the ultimate business priority (e.g. “minimize false positives” , “maximize recall”)
“There can be only one”
Final model:
Passes the bar on all satisfying metrics and has the best score on the optimization metric.
Interpretation Global "What are the general decision-making rules of this model?" Common approaches: Decision tree visualization Feature importance plot
Local
“Why was this specfiic example classied in this way?” LIME algorithm (Local Interpretable Model-Agnostic Explanations)
Deep Learning & Beyond
Human neuron
Multiple dendrites (inbound signal paths)
Nucleus (the processing unit)
Single axon (outbound signal path)
Articial neuron
Multiple inputs
Transfer and activation functions
Single output
The basic network structure
Input Layer
Hidded Layer
Output Layer
How do we make them? # Import the necessary objects from Tensorflow from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense
# Initialize the sequential model model = Sequential()
# Add the HIDDEN and OUTPUT layer, specify the input size and the activation function model.add(Dense(units=32, input_dim=64, activation= 'relu')) # relu = REctified Linear Unit model.add(Dense(units=3, activation= 'softmax'))
# Prepare the model for training (multi-class classification problem) model.compile(optimizer='adam', loss='categorical_crossentropy' , metrics=['accuracy'])
Deep Neural Networks: what are they? Shallow networks: 2-3 layers Deep Neural Networks 4+ layers
Types of DNNs
: Feedforward : Applications: General purpose.
Weak spot: Images,text,time-series.
Recurrent - Applications: Speech, Text
Convolutional - Image/Video, Text
Layers and layers
1. Dense: tensorflow.keras.layers.Dense
Single-dimensional feature extraction, signaltransformation.
2. Convolutional: tensorflow.keras.layers.Conv1D, Conv2D, …
Multi-dimensional, shift-invariant feature extraction, signaltransformation.
3. Dropout: tensorflow.keras.layers.Dropout
Overtting prevention by randomly turning off nodes.
4. Pooling/sub-sampling: tensorflow.keras.layers.MaxPooling1D, MaxPooling2D, …
Overtting prevention by sub-sampling.
5. Flattening: tensorflow.keras.layers.Flatten
Converting multi-dimensionalto single-dimensional signals
Convolutional Neural Networks
Convolution
Mathematical operation describing how signals are transformed by passing through systems of
different characteristics.
Inputs:
1. Input signal (video, audio…)
2. Transfer function ofthe processing system (lens, phone,tube…)
Result: The processed signal
Example: Simulating the “telephone voice”
Convolution(raw audio,telephone system transfer function)
The beauty of it all
TraditionalComputer Vision:
Deterministic pre-processing and feature extraction, hard-coded by theComputerVision
engineer through hours and hours of experimentation with different approaches.
Computer Vision, the Deep LearningWay:
Get tons oflabelled images and let the algorithm nd the optimal kernels on its own.
Kernels == feature extractors.
Downside: Very data “hungry”!