Computer Vision Flashcards

1
Q

What are six common computer vision tasks?

A
Image classification
Object detection
Semantic segmentation
Image analysis
Face detection, analysis, and recognition
Optical character recognition (OCR)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Azure’s four computer vision services?

A

Computer Vision
Custom Vision
Face
Form Recogniser

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Computer Vision service?

A

A cognitive service in Microsoft Azure that provides pre-built computer vision capabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What use cases do the Custom Vision and Computer Vision both cover?

A

Object detection and image classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Custom Vision?

A

An image recognition service that lets you build, deploy, and improve your own image identifiers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What would you use Azure Face service for?

A

Detecting faces in an image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What would you use Custom Vision for?

A

Identifying custom-defined objects in a image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What would you use Computer Vision’s OCR service for?

A

Reading text in an image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What would you use Computer Vision’s Image analysis service for?

A

Interpreting an image and suggesting an appropriate caption

Categorising an image

Suggesting relevant tags that could be used to index an image

Recognising landmarks and celebrities in an image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When would you use Custom Vision over Computer Vision?

A

When you need to specify the labels and train custom models to detect them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Azure Form Recogniser?

A

A specialised OCR service that lets you build automated data processing software using ML technology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Form Recogniser used for?

A

Automating data entry in applications and enriching documents’ search capabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can Form recogniser identify and extract?

A
Text
Key/Value Pairs
Selection Marks
Tables
Structures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What 3 things can Form Recogniser output?

A

The relationships in the original file
Bounding boxes
Confidence scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Form recogniser composed of?

A

Custom document processing models, prebuilt models, and the layout model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are custom models in Form Recogniser for?

A

Extracting document data from custom forms.

17
Q

What are the the types of custom model training in Form Recogniser?

A

Training without labels (unsupervised learning), and training with labels (supervised learning).

18
Q

Why would you train a Form Recogniser custom model using unsupervised learning?

A

Because it doesn’t require intensive coding and maintenance, or manual data labelling.

19
Q

Why would you train a Form Recogniser custom model using supervised learning?

A

Because it produces better performing models, and can produce models that work with complex forms, or forms containing values without keys.

20
Q

What documents does the Prebuilt models encapsulate?

A

Invoices, receipts, identity documents and business cards

21
Q

What is the Prebuilt Receipt model used for?

A

Reading English sales receipts from Australia, Canada, Great Britain, India, and the United States.

22
Q

What is the Prebuilt Business Cards model used for?

A

Extracting information such as a person’s name, job title, address, email, company, and phone numbers in English.

23
Q

What is the Prebuilt Invoice model used for?

A

Extracting data from invoices in various formats and returning structured data.

24
Q

What is the Prebuilt Identity documents model used for?

A

Extracting key information from world-wide passports and US driver licenses.

25
Q

What are 4 guidelines for getting the best results when using Form Recogniser?

A

Images must be JPEG, PNG, BMP, PDF, or TIFF formats

File size must be less than 50 MB

Image size between 50 x 50 pixels and 10000 x 10000 pixels

For PDF documents, no larger than 17 x 17 inches

26
Q

What are the two APIs the Computer Vision service provides for reading text from images?

A

Read API and OCR API.

27
Q

What is the OCR API for?

A

Quickly extracting small amounts of text in images. It can recognise text in numerous languages.

28
Q

How does the OCR API work?

A

By operating synchronously to provide immediate results.

29
Q

What does the OCR API return?

A

A hierarchy of information consisting of

Regions in the images that contain text
Lines of text in each region
Words in each line of text

For each of these elements, the OCR API also returns bounding box coordinates to indicate the location in the image where the region, line, or word appears.

30
Q

What is the Read API for?

A

Scanning documents with a lot of text.

It also has the ability to automatically determine the proper recognition model to use, taking into consideration lines of text and supporting images with printed text as well as handwriting.

31
Q

How does the Read API work?

A

Asynchronously as not to block your application while it is reading the content and returning the results to your application.

32
Q

What does the Read API return?

A

A hierarchy of information consisting of

Pages - One for each page of text, including information about the page size and orientation

Lines - the lines of text on a page

Words - the word in a line of text

Each line and word includes bounding box coordinates indicating its position on the page.

33
Q

Why would you use the Read API over the OCR API?

A

The OCR API can have issues with false positives when the image is considered text-dominate.

The Read API uses the latest recognition models and is optimised for images that have a significant amount of text or has considerable noise.

34
Q

What is the 3 step process for using the Read API?

A
  1. Submit an image to the API, and retrieve an operation ID in response.
  2. Use the operation ID to check on the status of the image analysis operation, and wait until it has completed.
  3. Retrieve the results of the operation.