part 1 Flashcards
AI define
Machines performing jobs mimicking human behaviour
ML define
Foundation of an AI system, learns and predicts like a human
Machines that get better without explicit programming
DL define
Machines that have an artificial NN inspired by the human brain to solve complex problems
Data scientist define
Person with multi-disciplinary skills in maths, stats, predictive modelling and ML to make future predictions
Describe onion diagram of AI, ML, DL
AI contains ML which contains DL
Anomoly detection
Detects outliers or things out of place like a human
Computer vision
be able to see like a human
NLP
Be able to process human languages and infer context
Conversational AI
be able to hold a conversation with a human
What is a dataset
Logical grouping of units of data that are closely related and/or share the same data structure
MNIST
Images of handwritten digits used to test classification, clustering and image processing algorithms e.g. computer vision ML models
COCO (common objects in context) dataset
Contains common images using a JSON file (coco format) that identify objects or segments within an image
- features object segmentation, recognision in context, superpixel stuff segmentation
Azure has a data labelling service which can export in coco format
Data labeling
Identifying raw data and adding one more more meaningful and informative labels to provide context so ML model can learn
data labelling - supervised
Labels are a prerequisite to produce training data. Each piece generally labelled by a human
data labelling - unsupervised
Labels produced by machine, might not be human readable
ground truth
Properly labelled dataset used as objective standard to train and assess the model. Accuracy of trained model depends on accuracy of ground truth
Supervised learning
Data that has been labelled for training.
Task-driven - make a prediction
When the labels are known and you want a precise outcome. You need a specific value returned e.g. Classification, Regression
Unsupervised learning
Data has not been labelled, ML model needs to do its own labelling
Data-driven - recognise a structure or pattern
When labels not known and outcome doesn’t need to be precise.
Trying to make sense of data.
e.g. Clustering, dimensionality reduction, association
Reinforcement learning
No data, there is an environment and an ML model generates data any many times to reach a goal
Decisions-driven - Game AI, Learning Tasks, Robot Navigation
Neural network
Mimicking the brain. Node/neuron represents an algorithm
Data inputted into neuron and based on output, data passed to one of many other connected neurons.
Connections are weighted.
Network is organised into layers
Input layer, many hidden, and an output
How many layers for a NN to be called deep learning
3+
Feed Forward (FNN)
Neural networks where connections between nodes don’t form a cycle (always moving forward)
Back propagation
Moves backwards through the neural network adjusting weights to improve next iteration’s performance. How the Neural net learns.
Loss function
Function comparing ground truth to prediction to determine error rate. Performs calculation at the end, performs calculation and then back propagates.
Activation functions
Algorithm applied to a hidden layer node that affects connected output (e.g. ReLu, part of backpropagation)
Dense
When the next layer increases the number of nodes
Sparse
When the next layer decreases the amount of nodes
(dimensionality reduction is when nodes decrease from one layer to the next)
GPU
General processing unit specifically designed to render high res images and video concurrently
Can perform parallel operations on multiple sets of data - used for non-graph tasks e.g. ML and scientific computation
CPU cores vs GPU cores
CPU - 4-16 processor cores on average
GPU - thousands of processor cores
CUDA (compute unified device architecture)
Parallel computing platform and API by NVIDIA allowing developers to use CUDA-enabled GPUs for general purpose computing on GPUs (GPGPU)
NVIDIA
Company manufactures GPUs for gaming and professional markets
Major deep learning frameworks are integrated with
NVIDIA deep learning SDK - collection of NVIDIA libraries for deep learning, e.g. cuDNN (CUDA deep neural network library)
ML Pipeline stages
Data labelling -> supervised learning so model can learn by example
Feature engineering -> translate data to format ML models can understand
Training -> multiple iterations, getting smarter
Hyperparameter tuning -> Try different parameters to optimise outcome
Serving -> So model is accessible, host in VM or container
Inference -> requesting to make prediction e.g. real time endpoint (for one request), or batch processing (slower, but could also be real time)
Forecasting
Future prediction with relevant data: analysis of trends, not ‘guessing’
Prediction
Make future prediction without relevant data: uses statistics to predict future outcomes, more ‘guessing’, uses decision theory
Performance/Evaluation metrics
Used to evaluate ML algorithms:
Different types of metrics for different problems
- Classification metrics (accuracy, precision, recall, F1-score, ROC, AUC)
- Regression metrics (MSE, RMSE, MAE)
- Ranking metrics
- Statistical metrics (correlation)
- Computer vision metrics
Jupyter Lab
Will replace Jupyter notebook
Regression
Process of finding a function to correlate a labelled dataset (supervised) into continuous variable/number e.g. what will temperature be
Regression error
Distance of vector from regression line. Used to predict future variables - MSE, RMSE, MAE
Classification
Finding a function to divide a labelled dataset into classes/categories e.g. what weather category will it be. (supervised)
Classification line
Divides dataset with one side being one category, another being another category
Classification algorithms
Logistic regression, decision tree/random forest, neural networks, naive bayes, k nearest neighbours, SVM
Clustering
Process of grouping unlabelled data based on similarities/differences (unsupervised)
e.g. K-means, K-medoids, Density based, Hierarchichal
Confusion Matrix
Visualise model predictions vs ground truth labels (actual). Aka error matrix.
Top labels: predicted no, predicted yes
Side labels: actual no, actual yes
Size of confusion matrix
Number of categories x 2 (ground truth x prediction)
use cases for anomaly detection
data cleaning
intrusion detection + fraud detection
systems health monitoring
sensor networks event detection
ecosystem disturbances
ML is more accurate than by hand and more efficient + accurate
computer vision DL algorithms
CNN - image + video recognition inspired by how eyes process info + send to brain
Recurrent NN (RNN) - handwriting/speech recognition
Types of computer vision
image classification
object detection
semantic segmentation (identify segments + objects by drawing pixel mask) - good for objects in movement
image analysis - analyse image/video to apply descriptive + context labels
optical character recognition
facial detections
Azure Computer Vision. iOS app built
Seeing AI developed for iOS, use device camera to identify people + objects + device audibly describes for visually impaired
Azure computer vision service offering
‘Computer Vision’ - analyse image/videos + extract description, tags, objects, text
‘Custom vision’ - Custom image classification + object detection models using own images
Face - Detect + identify people and emotions in images
Form recogniser - translate scanned docs into key/val or tabular editable data
NLP
ML that understands context of a corpus enabling
- analyse/interpret text in docs/emails
- interpret + contextualise spoken token e.g. sentiment analysis
- synthesise speech
- automatically translate
- interpret spoken or written commands + determine appropriate actions
NLP Azure service offering
Text analytics - sentiment analysis, key phrase extraction, identify language, entity recognition
Translator - real-time translation
Speech - transcribe into searchable text
LUIS (Language understanding) - NLP enabling understanding human language in own application
Conversational AI
Tech that can participate in conversations w humans: chatbots, voice assistants, Interactive Voice Recognition Systems
Conversational AI use cases
Online Customer Supports
Accessibility e.g. visually impaired
HR Processes - employee training
Healthcare
IoT
Software e.g. autocomplete search
Conversational AI Azure services
QnA Maker - create conversational q and a bot from knowledge base
Azure Bot Service - deploys the bot created with QnA maker. Intelligent serverless bot service scaling on demand. For creating/publishing/managing bots.
Responsible AI
ethical, transparent + accountable use of AI
Microsoft AI principles
Fairness
Reliability + Safety
Privacy + Security
Inclusiveness
Transparency
Accountability
Principle: Fairness
AI systems should treat all people fairly
Bias can be introduced during pipeline development, reinforcing societal stereotypes
E.g. systems dealing w opportunities/resources/info in criminal justice/employment/finance
Azure ML can tell you how each feature can influence model’s prediction for bias
Principle: Reliability + Safety
Should perform reliably + safely. Rigorous testing needed to ensure works as expected before end user release + shortcomings reported to user.
Critical safety importance: Autonomous vehicle, AI health diagnosis, autonomous weapons
Principle: Privacy + Security
Nature of ML model may require personally identifiable information
Ensure data protected so no leaking/disclosing.
Some cases, model can be run on user’s device avoiding vulnerability
Principle Privacy + Security: AI security principles to detect malicious actors
data origin and lineage, data use internal vs external, data corruption considerations, anomaly detection
Principle: Inclusiveness
Design AI solutions for minority then can design AI for majority e.g. physical ability, gender, sexual orientation, ethnicity etc
Principle: Transparency
AI systems should be understandable. Interpretability/intelligibility is when end users can understand UI behaviour.
Transparency: mitigates unfairness, helps debug systems, gains user trust
Open about why using AI+ limitations
Open source AI framework can help
Principle: Accountability
Structure put in place enacting AI principles + putting into account.
AI should work within framework of governance + organisational principles, ethical + legal standards clearly defined.
Principles guide Microsoft on how they develop, sell + advocate when working w 3rd parties, pushing towards regulations towards AI principles