AI Fundamentals Flashcards

1
Q

AI Intro

A

AI systems are systems that possess the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It can learn, reason, act and adapt.
Types
General/Strong AI - mimics human-like intelligence
Weak / Narrow AI - solutions designed to solve specific problems. Also called ML which is the process of applying computer algorithms to capture the behavioral patterns of system and processes based on input and output data collected from these systems. MLs are algorithms whose performance improve as they are exposed to more data over time. Deep Learning is a subbset of ML where multilayered Neural Networks learn from vast amounts of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

AI Models and How to Build Them

A

A model is a simplified representationn of a process or system. to build a model, you need to :

  1. Define the problem - State clearly pain point/objective, State benefits, define success ot failure.
  2. Collect data from process inputs and outputs
  3. Configure and fit the model - specify the technical problem, select model type, choose best algorithm for the dataset. Fitting involves performing optimization to obtain the best outcomes based on availabe data and objectives.
  4. Use the model

Parameters and Hyperparameters
Model parameters or coefficients - these are learnt by the algorithm itself from the data.

Hyperparameters are deifned prior to fitting. It Setting hyperparameters is called model tuning and it is slow and costly optimization process.

Before using model, check for overftting and underfitting. Evaluate model performance using holdout approach (ie. holdout train/test splitting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sample code - Simple digit recognition

A
# Select the model appropriate for the task
model = DecisionTreeClassifier()
# Train the model
model.fit(X=X_train, y=y_train)
# Generate predictions
prediction_results = model.predict(X=X_test) 
# Test the model
evaluate_predictions(y_true=y_test,
                     y_pred=prediction_results)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Three flavors of Machine Learning

A

Supervised Learning - used to predict categories and quantities based on some input measurements. Algorithms used include Linear and Ridge Regression, and ARIMA models for Regression. Classification algorithms include Logistic Regression, Decision Tree Classifier, Random Forest Classifier
Unsupervised learning - Finding relationships and patterns in data. Used in:
Clustering - algorithms include K-Means,
Anomaly detection - algorithms include Isolation Forest
Dimensionality reduction - algorithms include PCA
In Classification, model learns existing groups while in Clustering, model discovers groups on its own.

Reinforcement Learning - similar to learning by doing using reward and punishment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Supervised Learning Fundamentals

A

Common Classification Models:

  1. Decision Trees
  2. Logistic Regression
  3. Support Vector Machine
  4. Random Forest Classifier

Common Regression Models

  1. Linear Regression
  2. Lasso Regression
  3. Ridge Regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Training and evaluating classification models

A
Use train (60%)/test split method(40%) aka holdout method.
Code example:
from sklearn.model_selection 
import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)

Model Training
Model setup
from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier()

Model fitting/training - model.fit(X_train, y_train)

Testing on testing data - model.predict(X=X_test)

Inspecting model outputs
y_predicted = model.predict(X_test_all)

Is y_predicted == y_true ?
from sklearn.metrics import confusion_matrix confusion_matrix(y_true, y_predicted)

Confusion Matrix
TRUE POSITIVE = the model predicts Yes and the reality is Yes.
TRUE NEGATIVE = model predicts No and the reality is no.
FALSE POSITIVE = model predicts Yes but the reality is no (Type I error).
FALSE NEGATIVE = model preditcs No but the reality is Yes (Type II error).

Accuracy, precision, recall
Metrics:
Accuracy: “How often did I make the correct diagnosis?” Precision: “How often was I correct when I said a person has diabetes?” (= 1 - T1 error)
Recall: “What percentage of actual diabetes cases did my model detect?” (= 1 - T2 error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Training and evaluating regression models

A

Difference compared to classication:
Target variable: Numerical (quantities)
Model structure: a line or surface fitted closely to the data, not separating it into regions. Errors are numbers.
Key metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE).

Regression metrics: Code examples
# Mean absolute error; range: [-Inf..+Inf] 
from sklearn.metrics import mean_absolute_error # 

Median absolute error; range: [-Inf..+Inf]
from sklearn.metrics import median_absolute_error #

R^2 (coefficient of determination); range: [0..1]
from sklearn.metrics import r2_score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dimensionality reduction (DR)

A

Dimensionality reduction is the process of reducing the number of variables under consideration by obtaining a set of principal variables. It is used to prepare data for other Supervised or Unsupervised Learning algorithm and so is a preprocessing step.

Pro's
Reduce overtting 
Obtain independent features 
Lower computational intensity 
Enable visualization
Con's
Compression => Loss of information => loss of performance

Always check model perormance before and after DR to decide whether the sacrifice is worth taking.

Types
Feature selection (B ? A)
Selecting a subset of existing features, based on predictive power
Non-trivial problem: Looking for the best “team of features” , not individually best features!

Feature extraction (B ? A)
Transforming and combining existing features into new ones. 
Linear or non-linear projections.
Common algorithms
Linear (faster, deterministic)
Principal Component Analysis (PCA)
from sklearn.decomposition 
import PCA 

Latent Dirichlet Allocation (LDA) - txt mining
from sklearn.decomposition
import LatentDirichletAllocation

Non-linear (slower, non-deterministic)
Isomap
from sklearn.manifold import Isomap

t-distributed Stochastic Neighbor Embedding (t-SNE)
from sklearn.manifold import TSNE

Principal Component Analysis (PCA)
Family: Linear methods.
Intuition:
Principal components are directions of highest variability in data.
Reduction = keeping only top #N principal components. Assumption: Normal distribution of data.
Caveat: Very sensitive to outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Clustering

A

Cluster = Group of entities or events sharing similar attributes.
Clustering (AI) = The process of applying Machine Learning algorithms for automatic discovery of clusters.

Popular clustering algorithms
KMeans clustering - Numbers of clusters manually stated
from sklearn.cluster import KMeans

Spectral clustering - Numbers of clusters manually stated
from sklearn.cluster import SpectralClustering

DBSCAN - Numbers of clusters NOT manually stated
from sklearn.cluster import DBSCAN

Cluster analysis and tuning
Unsupervised (no “ground truth” , no expectations)
Variance Ratio Criterion: sklearn.metrics.calinski_harabaz_score
“What is the average distance of each point to the center of the cluster AND what is the distance between the clusters?”

Silhouette score: sklearn.metrics.silhouette_score “How close is each point to its own cluster VS how close it is to the others?”

Supervised (“ground truth”/expectations provided)
Mutual information (MI) criterion: sklearn.metrics.mutual_info_score
Homogeneity score: sklearn.metrics.homogeneity_score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Anomaly detection

A
Detecting unusual entities or events. 
Hard to define what's odd, but possible to define what's normal. 
Use cases 
Credit card fraud detection 
Network security monitoring 
Heart-rate monitoring

Approaches:
Thresholding - For quantities that are fairly stable over time.

Rate of change - Fast changing values, include derivative of target value in the model

Shape monitoring - model behavior in terms of the expected succession of values over time.

Algorithms
Robust covariance (simple, fast, assumes normal distribution)
from sklearn.covariance import EllipticEnvelope 
Isolation Forest (powerful, but more computationally demanding, very slow)
from sklearn.ensemble import IsolationForest 

One-Class SVM (normality not required, sensitive to outliers, many false negatives)
from sklearn.svm import OneClassSVM

Training and testing
Example: Isolation Forest
from sklearn.ensemble import IsolationForest

algorithm = IsolationForest()

Fit the model algorithm.fit(X)

Apply the model and detect the outliers results = algorithm.predict(X)

Evaluation
from sklearn.metrics \ import (confusion_matrix, precision_score, recall_score)

confusion_matrix(y_true, y_predicted)
Precision = How many of the anomalies I have detected are TRUE anomalies?
Recall = How many of the TRUE anomalies I have managed to detect?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Selecting the right model

A

Model-to-problem fit
Type of Learning
Target variable dened & known? => Supervised. Classication?
Regression

No target variable, exploration? => Unsupervised. Dimensionality Reduction?
Clustering?
Anomaly Detection?

Defining the priorities
Interpretable models
Linear regression (Linear, Logistic, Lasso, Ridge) Decision Trees

Well performing models
Tree ensembles (Random Forests, Gradient Boosted Trees)
Support Vector Machines
Articial Neural Networks

Simplicity first!

Using multiple metrics
Satisfying metrics
Cut-off criteria that every candidate model needs to meet.
Multiple satisfying metrics possible (e.g. minimum accuracy, maximum execution time, etc)

Optimizing metrics
Illustrates the ultimate business priority (e.g. “minimize false positives” , “maximize recall”)
“There can be only one”

Final model:
Passes the bar on all satisfying metrics and has the best score on the optimization metric.

Interpretation
Global
"What are the general decision-making rules of this model?" 
Common approaches: 
Decision tree visualization 
Feature importance plot

Local
“Why was this specfiic example classied in this way?” LIME algorithm (Local Interpretable Model-Agnostic Explanations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Deep Learning & Beyond

A

Human neuron
Multiple dendrites (inbound signal paths)
Nucleus (the processing unit)
Single axon (outbound signal path)

Articial neuron
Multiple inputs
Transfer and activation functions
Single output

The basic network structure
Input Layer
Hidded Layer
Output Layer

How do we make them?
# Import the necessary objects from Tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Initialize the sequential model
model = Sequential()
# Add the HIDDEN and OUTPUT layer, specify the input size and the activation function
model.add(Dense(units=32, input_dim=64, activation=
'relu')) # relu = REctified Linear Unit
model.add(Dense(units=3, activation=
'softmax'))
# Prepare the model for training (multi-class classification problem)
model.compile(optimizer='adam', 
loss='categorical_crossentropy'
, metrics=['accuracy'])
Deep Neural Networks: what are they?
Shallow networks:
2-3 layers
Deep Neural Networks
4+ layers

Types of DNNs
: Feedforward : Applications: General purpose.
Weak spot: Images,text,time-series.

Recurrent - Applications: Speech, Text

Convolutional - Image/Video, Text

Layers and layers
1. Dense: tensorflow.keras.layers.Dense
Single-dimensional feature extraction, signaltransformation.
2. Convolutional: tensorflow.keras.layers.Conv1D, Conv2D, …
Multi-dimensional, shift-invariant feature extraction, signaltransformation.
3. Dropout: tensorflow.keras.layers.Dropout
Overtting prevention by randomly turning off nodes.
4. Pooling/sub-sampling: tensorflow.keras.layers.MaxPooling1D, MaxPooling2D, …
Overtting prevention by sub-sampling.
5. Flattening: tensorflow.keras.layers.Flatten
Converting multi-dimensionalto single-dimensional signals

Convolutional Neural Networks
Convolution
Mathematical operation describing how signals are transformed by passing through systems of
different characteristics.
Inputs:
1. Input signal (video, audio…)
2. Transfer function ofthe processing system (lens, phone,tube…)
Result: The processed signal
Example: Simulating the “telephone voice”
Convolution(raw audio,telephone system transfer function)

The beauty of it all
TraditionalComputer Vision:
Deterministic pre-processing and feature extraction, hard-coded by theComputerVision
engineer through hours and hours of experimentation with different approaches.
Computer Vision, the Deep LearningWay:
Get tons oflabelled images and let the algorithm nd the optimal kernels on its own.
Kernels == feature extractors.
Downside: Very data “hungry”!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly