03-Computer Vision Flashcards
What is Computer Vision
AI that “see” the world and make sense of it
What is Image Classification
Train ML model to classify images based on their contents.
What is Object Detection
Train ML model to classify individual objects within an image and identify their location with a bounding box
What is Semantic Segmentation
Advanced ML technique in which INDIVIDUAL PIXELS in the image are CLASSIFIED according to the object to which they belong
What is Image analysis
Combine ML models with advanced image analysis techniques to extract information from images, including “tags” that could help catalog the image or even descriptive captions that summarize the scene shown in the image
What is Face detection, analysis, and recognition
Specialized form of object detection that locates human faces in an image. Combined with classification and facial geometry analysis techniques to infer details such as age, and emotional state; and even recognize individuals based on their facial features
Optical character recognition (OCR)
Technique to detect and read text in images.
What 4 things does Cognitive Services include
Cognitive Service includes
- Decision
- Language
- Speech
- Vision
What are 4 Decision services
- Anomaly Detector
- Content Moderator
- Metrics Advisor (Preview)
- Personalizer
What is Anomaly Detector
Identify potential problems early on
What is Content Moderator
Detect potentially offensive or unwanted content
What are Metrics Advisor
Monitor metrics and diagnose issues
What is Personalizer
Create rich, personalized experiences for every user
What are 5 Language services
- Immersive Reader
- Language Understanding
- QnA Maker
- Text Analytics
- Translator
What are 4 Speech services
- Speech to Text
- Text to Speech
- Speech Translation
- Speaker Recognition (Preview)
What is Immersive Reader
Helps readers of all abilities comprehend text using audio and visual cues
What is Language Understanding
Build natural language understanding into apps, bots, and IoT devices
What is QnA Maker
Create a conversational question and answer layer over your data
What is Text Analytics
Detect sentiment, key phrases, and named entities
What is Translator
Detect and translate more than 90 supported languages
What is Speech to Text
Transcribe audible speech into readable, searchable text
What is Text to Speech
Convert text to life-like speech for more natural interfaces
What is Speech Translation
Integrate real-time speech translation into your apps
What is Speaker Recognition
Identify and verify the people speaking based on audio
What are 5 Vision services
- Computer Vision
- Custom Vision
- Face
- Form Recognizer
- Video Indexer
What is Computer Vision
Analyze content in images and video
What is Custom Vision
Customize image recognition to fit your business needs
What is Face
Detect and identify people and emotions in image
What is Form Recognizer
Extract text, key-value pairs, and tables from documents
What is Video Indexer
Analyze the visual and audio channels of a video, and index its content
What two pieces of information do you need to use a cognitive service
- Key to authenticate client applications
2. Endpoint that provides the HTTP address at which your resource can be accessed
2 places to train image resource and label them
- Custom Vision portal
2. Custom Vision service programming language-specific software development kids (SDKs) - for programmers
What do Client Application developers need to use your model
- Project ID: Unique ID of Custom Vision project you created to train the model
- Model Name: Name you assigned to the model during publishing
- Prediction endpoint: HTTP address of the endpoints for the prediction resource to which you published the model (not the training resource)
- Prediction key: authentication key for the prediction resource to which you published the model (not the training resource)
What is Face detection
Face detection involves identifying regions of an image that contains a human face, typically by returning bounding box coordinates that form a rectangle around the face
What is Facial analysis
Uses algorithms to return information such as facial landmarks (nose, eyes, eyebrows, lips, etc). Used to train an ML model from which you can infer information about a person, such as their age, or perceived emotional state
What is Facial recognition
Identify known individuals from their facial features
Uses for facial detection, analysis, and recognition
- Security
- Social media
- Intelligent monitoring
- Advertising
- Missing persons
- Identity validation
How to improve accuracy of detection in images
- Image format should be JPEG, PNG, GIF, and BMP
- File size is 4 MB or smaller
- Face size range from 36 x 36 to 4096 x 4096. Smaller or larger faces will not be detected
- Other issues such as extreme face angles, occlusion (objects blocking the face such as sunglasses or a hand). Best results are obtained when the faces are full-frontal or as near as possible to full-frontal
How to improve detection using video feeds
- Smoothing - turn it off because the potential blur between frames tends to reduce clarity of the image in individual frames
- Shutter speed - faster speed improves clarity of the images in each frame because the motion is reduced
- Shutter angle - use lower shuttle angle to produce clearer frames, resulting in better clarify for recognition
What happens when you intersect computer vision with natural language process
Computer systems get the ability to process written or printed text. Computer vision to “read” the text and natural language processing to make sense of it
What is optical character recognition (OCR)
Model trained to recognize individual shapes as letters, numerals, punctuation, or other elements of text.
How is OCR beneficial
- Note taking
- Digitizing forms, such as medical records or historical documents
- Scanning printed or handwritten checks for bank deposits
What is Form Recognizer
Form processing capabilities that you can use to automate the processing of data in documents such as forms, invoices, and receipts. Combines optical character recognition (OCR) with predictive models that can interpret form data by
- Matching field names to values
- Processing tables of data
- Identifying specific types of field, such as dates, telephone numbers, addresses, totals, and others
How does Form Recognizer support automated document processing
- Custom models
2. A pre-built receipt model
What are Custom models
Custom models enable you to extract key/value pairs and table data from forms. Custom models are trained using your own data, which helps to tailor this model to your specific forms.
What is pre-built receipt model
Model is provided out-of-the-box.
Trained to recognize and extract data from sales receipts
What is an image to an AI application
Just an array of pixel values. These numerical values can be used as FEATURES to train ML models that make predictions about the image and its contents
Which two specialized domain models does Computer Vision service support
- Celebrities - service includes a model that has been trained to identify thousands of well-known celebrities
- Landmarks - service identifies famous landmarks
What other capabilities does Computer Vision provide
- Detect image types, i.e. clip art images or line drawings
- Detect image color schemes - specifically, identify the dominant foreground, background, and overall colors in an image
- Generate thumbnails - create small versions of images
- Moderate content - detect images that contain adult content or depict violent, gory scenes
What are most modern image classification solutions based on
Deep learning techniques that make use of convolutional neural networks (CNNs) to uncover patterns in the pixels that correspond to particular classes
Do you need to know deep learning techniques to train and publish your model as a software service
Nope because Custom Vision cognitive service encapsulates common techniques used to train image classification models
What are some potential uses for image classification
- Product identification - perform visual searches for specific products in online searches or even, in-store using a mobile device
- Disaster investigation - evaluate key infrastructure for major disaster preparation efforts, i.e. aerial surveillance may show bridges and classify them as such
- Medical diagnosis - evaluating images from X-ray or MRI devices could quickly classify specific issues found as cancerous tumors, or many other medical conditions related to medical imaging diagnosis.
What is Precision
What percentage of class predictions made by the model are correct? If model predicts that 10 images are oranges, of which eight were actually oranges, then the precision is 0.8 (80%)
What is Recall
What percentage of class predictions did the model correctly identify? For example, if there are 10 images of apples, and the model found 7 of them, then the recall is 0.7 (70%)
What is Average Precision (AP)
Overall metric that takes into account both precision and recall
What do client application developers need to use your classification model
- Project ID - unique ID of the Custom Vision project you create to train the model
- Model name: the name you assigned to the model during publishing
- Prediction endpoint: the HTTP address of the endpoints for the prediction resource to which you published the model (not the training resource)
- Prediction key: the authentication key for the prediction resource to which you published the model (not the training resource)
What is Image classification
ML based on computer vision in which a model is trained to categorize images based on the primary subject matter they contain.
What is Object detection
Goes further than image classification to classify individual objects within the image, and to return the coordinates of a bounding box that indicates the object’s location
What are some sample application of object detection
- Evaluate the safety of a building by looking for fire extinguishers or other emergency equipment
- Create software for self-driving cars or vehicles with lane assist capabilities
- Medical imaging such as an MRI or x-rays that can detect known objects for medical diagnosis
What is smart tagging
It suggests classes and bounding boxes for images you add to the training dataset
Mean Average Precision (mAP)
Overall metric that takes into account both precision and recall across all classes in object detection
What are usage of face detection and analysis
- Security - facial recognition can be used in building security applications, and increasingly it is used in smart phones operating systems for unlocking devices
- Social media - automatically tag known friends in photographs
- Advertising - help direct advertisement to an appropriate demographic audience
- Missing persons - identify if a missing person is in the image frame
- Identity validation - ports of entry kiosk where person holds a special entry permit
What functions does Face support
- Face detection
- Face verification
- Find similar faces
- Group faces based on similarities
- Identify people
What attributes can face return
- Age
- Blur
- Emotion
- Exposure
- Facial hair
- Glasses
- Hair
- Head pose
- Makeup
10 Noise - Occlusion
- Smile
What is machine reading comprehension (MRC)
AI system not only reads text characters, but uses a semantic model to interpret with the text is about
What are uses of optical character recognition (OCR) technologies
- note taking
- digitizing forms, such as medical records or historical documents
- scanning printed or handwritten checks for bank deposits
What is OCR API good for
Quick extraction of small amounts of text in images. Operates synchronously to provide immediate results that can recognize text in numerous languages
What doe OCR API return when processing an image
- Regions in the image that contain text
- Lines of text in each region
- Words in each line of text
Also returns bounding box coordinates that define a rectangle to indicate the location in the image where the region, line, or word appears
What is Read API
Superior to OCR that has issues with false positives when image is considered text-dominant.
Uses latest recognition models and is optimized for images that have lot of text or lot of visual noise
What 3-step process must your application do to use Read API
- Submit image to API and retrieve operation ID in response
- Use operation ID to check on the status of the image analysis operation, and wait until it has completed
- Retrieve the results of the operation
How are results from the Read API arranged
Into a hierarchy
- Pages - one for each page of text, including information about the page size and orientation
- Lines - the lines of text on a page
- Words - words in a line of text
Each line and word includes bounding box coordinated indicating its position on the page
What does the Form Recognizer in Azure provide
Intelligent form processing capabilities that you can use to automate the processing of data in documents such as forms, invoices, and receipts
How does Form Recognizer support automated documented processing
- Pre-built receipt model - provided out-of-the-box and is trained to recognize and extract data from sales receipts
- Custom models - extract key/value pairs and table data from forms. Custom models are trained using your own data, which helps to tailor this model to your specific forms. Starting with only 5 samples of your forms, you can train the custom model. After the first training exercise, you can evaluate the results and consider if you need to add more samples and re-train.