Computer Vision Flashcards by Maria A

What are six common computer vision tasks?

Image classification
Object detection
Semantic segmentation
Image analysis
Face detection, analysis, and recognition
Optical character recognition (OCR)

How well did you know this?

Not at all

Perfectly

What are Azure’s four computer vision services?

Computer Vision
Custom Vision
Face
Form Recogniser

How well did you know this?

Not at all

Perfectly

What is the Computer Vision service?

A cognitive service in Microsoft Azure that provides pre-built computer vision capabilities.

How well did you know this?

Not at all

Perfectly

What use cases do the Custom Vision and Computer Vision both cover?

Object detection and image classification.

How well did you know this?

Not at all

Perfectly

What is Custom Vision?

An image recognition service that lets you build, deploy, and improve your own image identifiers.

How well did you know this?

Not at all

Perfectly

What would you use Azure Face service for?

Detecting faces in an image

How well did you know this?

Not at all

Perfectly

What would you use Custom Vision for?

Identifying custom-defined objects in a image

How well did you know this?

Not at all

Perfectly

What would you use Computer Vision’s OCR service for?

Reading text in an image

How well did you know this?

Not at all

Perfectly

What would you use Computer Vision’s Image analysis service for?

Interpreting an image and suggesting an appropriate caption

Categorising an image

Suggesting relevant tags that could be used to index an image

Recognising landmarks and celebrities in an image.

How well did you know this?

Not at all

Perfectly

When would you use Custom Vision over Computer Vision?

When you need to specify the labels and train custom models to detect them.

How well did you know this?

Not at all

Perfectly

What is Azure Form Recogniser?

A specialised OCR service that lets you build automated data processing software using ML technology.

How well did you know this?

Not at all

Perfectly

What is Form Recogniser used for?

Automating data entry in applications and enriching documents’ search capabilities.

How well did you know this?

Not at all

Perfectly

What can Form recogniser identify and extract?

Text
Key/Value Pairs
Selection Marks
Tables
Structures

How well did you know this?

Not at all

Perfectly

What 3 things can Form Recogniser output?

The relationships in the original file
Bounding boxes
Confidence scores

How well did you know this?

Not at all

Perfectly

What is Form recogniser composed of?

Custom document processing models, prebuilt models, and the layout model.

How well did you know this?

Not at all

Perfectly

What are custom models in Form Recogniser for?

Extracting document data from custom forms.

What are the the types of custom model training in Form Recogniser?

Training without labels (unsupervised learning), and training with labels (supervised learning).

Why would you train a Form Recogniser custom model using unsupervised learning?

Because it doesn’t require intensive coding and maintenance, or manual data labelling.

Why would you train a Form Recogniser custom model using supervised learning?

Because it produces better performing models, and can produce models that work with complex forms, or forms containing values without keys.

What documents does the Prebuilt models encapsulate?

Invoices, receipts, identity documents and business cards

What is the Prebuilt Receipt model used for?

Reading English sales receipts from Australia, Canada, Great Britain, India, and the United States.

What is the Prebuilt Business Cards model used for?

Extracting information such as a person’s name, job title, address, email, company, and phone numbers in English.

What is the Prebuilt Invoice model used for?

Extracting data from invoices in various formats and returning structured data.

What is the Prebuilt Identity documents model used for?

Extracting key information from world-wide passports and US driver licenses.

What are 4 guidelines for getting the best results when using Form Recogniser?

Images must be JPEG, PNG, BMP, PDF, or TIFF formats File size must be less than 50 MB Image size between 50 x 50 pixels and 10000 x 10000 pixels For PDF documents, no larger than 17 x 17 inches

What are the two APIs the Computer Vision service provides for reading text from images?

Read API and OCR API.

What is the OCR API for?

Quickly extracting small amounts of text in images. It can recognise text in numerous languages.

How does the OCR API work?

By operating synchronously to provide immediate results.

What does the OCR API return?

A hierarchy of information consisting of Regions in the images that contain text Lines of text in each region Words in each line of text For each of these elements, the OCR API also returns bounding box coordinates to indicate the location in the image where the region, line, or word appears.

What is the Read API for?

Scanning documents with a lot of text. It also has the ability to automatically determine the proper recognition model to use, taking into consideration lines of text and supporting images with printed text as well as handwriting.

How does the Read API work?

Asynchronously as not to block your application while it is reading the content and returning the results to your application.

What does the Read API return?

A hierarchy of information consisting of Pages - One for each page of text, including information about the page size and orientation Lines - the lines of text on a page Words - the word in a line of text Each line and word includes bounding box coordinates indicating its position on the page.

Why would you use the Read API over the OCR API?

The OCR API can have issues with false positives when the image is considered text-dominate. The Read API uses the latest recognition models and is optimised for images that have a significant amount of text or has considerable noise.

What is the 3 step process for using the Read API?

1. Submit an image to the API, and retrieve an operation ID in response. 2. Use the operation ID to check on the status of the image analysis operation, and wait until it has completed. 3. Retrieve the results of the operation.