03-Computer Vision Flashcards

1
Q

What is Computer Vision

A

AI that “see” the world and make sense of it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Image Classification

A

Train ML model to classify images based on their contents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Object Detection

A

Train ML model to classify individual objects within an image and identify their location with a bounding box

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Semantic Segmentation

A

Advanced ML technique in which INDIVIDUAL PIXELS in the image are CLASSIFIED according to the object to which they belong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Image analysis

A

Combine ML models with advanced image analysis techniques to extract information from images, including “tags” that could help catalog the image or even descriptive captions that summarize the scene shown in the image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Face detection, analysis, and recognition

A

Specialized form of object detection that locates human faces in an image. Combined with classification and facial geometry analysis techniques to infer details such as age, and emotional state; and even recognize individuals based on their facial features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Optical character recognition (OCR)

A

Technique to detect and read text in images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What 4 things does Cognitive Services include

A

Cognitive Service includes

  1. Decision
  2. Language
  3. Speech
  4. Vision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are 4 Decision services

A
  1. Anomaly Detector
  2. Content Moderator
  3. Metrics Advisor (Preview)
  4. Personalizer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Anomaly Detector

A

Identify potential problems early on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Content Moderator

A

Detect potentially offensive or unwanted content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are Metrics Advisor

A

Monitor metrics and diagnose issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Personalizer

A

Create rich, personalized experiences for every user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 5 Language services

A
  1. Immersive Reader
  2. Language Understanding
  3. QnA Maker
  4. Text Analytics
  5. Translator
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are 4 Speech services

A
  1. Speech to Text
  2. Text to Speech
  3. Speech Translation
  4. Speaker Recognition (Preview)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Immersive Reader

A

Helps readers of all abilities comprehend text using audio and visual cues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Language Understanding

A

Build natural language understanding into apps, bots, and IoT devices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is QnA Maker

A

Create a conversational question and answer layer over your data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Text Analytics

A

Detect sentiment, key phrases, and named entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Translator

A

Detect and translate more than 90 supported languages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Speech to Text

A

Transcribe audible speech into readable, searchable text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Text to Speech

A

Convert text to life-like speech for more natural interfaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Speech Translation

A

Integrate real-time speech translation into your apps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Speaker Recognition

A

Identify and verify the people speaking based on audio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are 5 Vision services

A
  1. Computer Vision
  2. Custom Vision
  3. Face
  4. Form Recognizer
  5. Video Indexer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is Computer Vision

A

Analyze content in images and video

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is Custom Vision

A

Customize image recognition to fit your business needs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is Face

A

Detect and identify people and emotions in image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is Form Recognizer

A

Extract text, key-value pairs, and tables from documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is Video Indexer

A

Analyze the visual and audio channels of a video, and index its content

31
Q

What two pieces of information do you need to use a cognitive service

A
  1. Key to authenticate client applications

2. Endpoint that provides the HTTP address at which your resource can be accessed

32
Q

2 places to train image resource and label them

A
  1. Custom Vision portal

2. Custom Vision service programming language-specific software development kids (SDKs) - for programmers

33
Q

What do Client Application developers need to use your model

A
  1. Project ID: Unique ID of Custom Vision project you created to train the model
  2. Model Name: Name you assigned to the model during publishing
  3. Prediction endpoint: HTTP address of the endpoints for the prediction resource to which you published the model (not the training resource)
  4. Prediction key: authentication key for the prediction resource to which you published the model (not the training resource)
34
Q

What is Face detection

A

Face detection involves identifying regions of an image that contains a human face, typically by returning bounding box coordinates that form a rectangle around the face

35
Q

What is Facial analysis

A

Uses algorithms to return information such as facial landmarks (nose, eyes, eyebrows, lips, etc). Used to train an ML model from which you can infer information about a person, such as their age, or perceived emotional state

36
Q

What is Facial recognition

A

Identify known individuals from their facial features

37
Q

Uses for facial detection, analysis, and recognition

A
  1. Security
  2. Social media
  3. Intelligent monitoring
  4. Advertising
  5. Missing persons
  6. Identity validation
38
Q

How to improve accuracy of detection in images

A
  1. Image format should be JPEG, PNG, GIF, and BMP
  2. File size is 4 MB or smaller
  3. Face size range from 36 x 36 to 4096 x 4096. Smaller or larger faces will not be detected
  4. Other issues such as extreme face angles, occlusion (objects blocking the face such as sunglasses or a hand). Best results are obtained when the faces are full-frontal or as near as possible to full-frontal
39
Q

How to improve detection using video feeds

A
  1. Smoothing - turn it off because the potential blur between frames tends to reduce clarity of the image in individual frames
  2. Shutter speed - faster speed improves clarity of the images in each frame because the motion is reduced
  3. Shutter angle - use lower shuttle angle to produce clearer frames, resulting in better clarify for recognition
40
Q

What happens when you intersect computer vision with natural language process

A

Computer systems get the ability to process written or printed text. Computer vision to “read” the text and natural language processing to make sense of it

41
Q

What is optical character recognition (OCR)

A

Model trained to recognize individual shapes as letters, numerals, punctuation, or other elements of text.

42
Q

How is OCR beneficial

A
  1. Note taking
  2. Digitizing forms, such as medical records or historical documents
  3. Scanning printed or handwritten checks for bank deposits
43
Q

What is Form Recognizer

A

Form processing capabilities that you can use to automate the processing of data in documents such as forms, invoices, and receipts. Combines optical character recognition (OCR) with predictive models that can interpret form data by

  1. Matching field names to values
  2. Processing tables of data
  3. Identifying specific types of field, such as dates, telephone numbers, addresses, totals, and others
44
Q

How does Form Recognizer support automated document processing

A
  1. Custom models

2. A pre-built receipt model

45
Q

What are Custom models

A

Custom models enable you to extract key/value pairs and table data from forms. Custom models are trained using your own data, which helps to tailor this model to your specific forms.

46
Q

What is pre-built receipt model

A

Model is provided out-of-the-box.

Trained to recognize and extract data from sales receipts

47
Q

What is an image to an AI application

A

Just an array of pixel values. These numerical values can be used as FEATURES to train ML models that make predictions about the image and its contents

48
Q

Which two specialized domain models does Computer Vision service support

A
  1. Celebrities - service includes a model that has been trained to identify thousands of well-known celebrities
  2. Landmarks - service identifies famous landmarks
49
Q

What other capabilities does Computer Vision provide

A
  1. Detect image types, i.e. clip art images or line drawings
  2. Detect image color schemes - specifically, identify the dominant foreground, background, and overall colors in an image
  3. Generate thumbnails - create small versions of images
  4. Moderate content - detect images that contain adult content or depict violent, gory scenes
50
Q

What are most modern image classification solutions based on

A

Deep learning techniques that make use of convolutional neural networks (CNNs) to uncover patterns in the pixels that correspond to particular classes

51
Q

Do you need to know deep learning techniques to train and publish your model as a software service

A

Nope because Custom Vision cognitive service encapsulates common techniques used to train image classification models

52
Q

What are some potential uses for image classification

A
  1. Product identification - perform visual searches for specific products in online searches or even, in-store using a mobile device
  2. Disaster investigation - evaluate key infrastructure for major disaster preparation efforts, i.e. aerial surveillance may show bridges and classify them as such
  3. Medical diagnosis - evaluating images from X-ray or MRI devices could quickly classify specific issues found as cancerous tumors, or many other medical conditions related to medical imaging diagnosis.
53
Q

What is Precision

A

What percentage of class predictions made by the model are correct? If model predicts that 10 images are oranges, of which eight were actually oranges, then the precision is 0.8 (80%)

54
Q

What is Recall

A

What percentage of class predictions did the model correctly identify? For example, if there are 10 images of apples, and the model found 7 of them, then the recall is 0.7 (70%)

55
Q

What is Average Precision (AP)

A

Overall metric that takes into account both precision and recall

56
Q

What do client application developers need to use your classification model

A
  1. Project ID - unique ID of the Custom Vision project you create to train the model
  2. Model name: the name you assigned to the model during publishing
  3. Prediction endpoint: the HTTP address of the endpoints for the prediction resource to which you published the model (not the training resource)
  4. Prediction key: the authentication key for the prediction resource to which you published the model (not the training resource)
57
Q

What is Image classification

A

ML based on computer vision in which a model is trained to categorize images based on the primary subject matter they contain.

58
Q

What is Object detection

A

Goes further than image classification to classify individual objects within the image, and to return the coordinates of a bounding box that indicates the object’s location

59
Q

What are some sample application of object detection

A
  1. Evaluate the safety of a building by looking for fire extinguishers or other emergency equipment
  2. Create software for self-driving cars or vehicles with lane assist capabilities
  3. Medical imaging such as an MRI or x-rays that can detect known objects for medical diagnosis
60
Q

What is smart tagging

A

It suggests classes and bounding boxes for images you add to the training dataset

61
Q

Mean Average Precision (mAP)

A

Overall metric that takes into account both precision and recall across all classes in object detection

62
Q

What are usage of face detection and analysis

A
  1. Security - facial recognition can be used in building security applications, and increasingly it is used in smart phones operating systems for unlocking devices
  2. Social media - automatically tag known friends in photographs
  3. Advertising - help direct advertisement to an appropriate demographic audience
  4. Missing persons - identify if a missing person is in the image frame
  5. Identity validation - ports of entry kiosk where person holds a special entry permit
63
Q

What functions does Face support

A
  1. Face detection
  2. Face verification
  3. Find similar faces
  4. Group faces based on similarities
  5. Identify people
64
Q

What attributes can face return

A
  1. Age
  2. Blur
  3. Emotion
  4. Exposure
  5. Facial hair
  6. Glasses
  7. Hair
  8. Head pose
  9. Makeup
    10 Noise
  10. Occlusion
  11. Smile
65
Q

What is machine reading comprehension (MRC)

A

AI system not only reads text characters, but uses a semantic model to interpret with the text is about

66
Q

What are uses of optical character recognition (OCR) technologies

A
  1. note taking
  2. digitizing forms, such as medical records or historical documents
  3. scanning printed or handwritten checks for bank deposits
67
Q

What is OCR API good for

A

Quick extraction of small amounts of text in images. Operates synchronously to provide immediate results that can recognize text in numerous languages

68
Q

What doe OCR API return when processing an image

A
  1. Regions in the image that contain text
  2. Lines of text in each region
  3. Words in each line of text

Also returns bounding box coordinates that define a rectangle to indicate the location in the image where the region, line, or word appears

69
Q

What is Read API

A

Superior to OCR that has issues with false positives when image is considered text-dominant.
Uses latest recognition models and is optimized for images that have lot of text or lot of visual noise

70
Q

What 3-step process must your application do to use Read API

A
  1. Submit image to API and retrieve operation ID in response
  2. Use operation ID to check on the status of the image analysis operation, and wait until it has completed
  3. Retrieve the results of the operation
71
Q

How are results from the Read API arranged

A

Into a hierarchy

  1. Pages - one for each page of text, including information about the page size and orientation
  2. Lines - the lines of text on a page
  3. Words - words in a line of text

Each line and word includes bounding box coordinated indicating its position on the page

72
Q

What does the Form Recognizer in Azure provide

A

Intelligent form processing capabilities that you can use to automate the processing of data in documents such as forms, invoices, and receipts

73
Q

How does Form Recognizer support automated documented processing

A
  1. Pre-built receipt model - provided out-of-the-box and is trained to recognize and extract data from sales receipts
  2. Custom models - extract key/value pairs and table data from forms. Custom models are trained using your own data, which helps to tailor this model to your specific forms. Starting with only 5 samples of your forms, you can train the custom model. After the first training exercise, you can evaluate the results and consider if you need to add more samples and re-train.