Implement knowledge mining and document intelligence solutions (10–15%) Flashcards
You are building a solution that uses Azure AI Search.
You need to execute the initial run of the indexer.
Which stages will be included during the initial run?
Document cracking,
field mapping,
skillset execution, and
output field mapping are the stages of indexing.
You are building a knowledge mining solution by using Azure AI Search.
You need to ensure that the solution supports wildcard queries in search requests.
What should you include in the REST API request?
“queryType”: “full”
queryType “full” extends the default Simple query language by adding support for more operators and query types, such as wildcard, fuzzy, regex, and field-scoped queries.
You are building a knowledge mining solution that uses Azure AI Search.
You need to apply AI enrichment to your indexer pipeline to generate links to Wikipedia articles.
Which skill should you use?
Microsoft.Skills.Text.V3.EntityLinkingSkill
Microsoft.Skills.Text.V3.EntityLinkingSkill uses a pretrained model to generate links for recognized entities to articles in Wikipedia.
What features exist for prebuilt Document Intelligence models?
Text extraction
Key-value pairs
Entities.
Selection marks
Tables.
Fields.
What specific forms exist as prebuilt models in Document intelligence?
Invoice
Receipt
W2
ID document model. US drivers’ licenses and international passports
Business card
Health insurance card
What features are available in the Read model in Document Intelligence?
Text extraction
What generic prebuilt models exist in Document intelligence?
Read model.
General document model
Layout model.
What features are available in the General document model in Document Intelligence?
Text extraction
Key-value pairs
Entities
Selection marks
Tables
What features are available in the Layout model in Document Intelligence?
Text extraction
Selection marks
Tables
What features are available in the Invoice model in Document Intelligence?
Text extraction
Key-value pairs
Selection marks
Tables
Fields
What features are available in the Receipt model in Document Intelligence?
Text extraction
Key-value pairs
Fields
What features are available in the W2 model in Document Intelligence?
Text extraction
Key-value pairs
Selection marks
Tables
Fields
What features are available in the ID document model in Document Intelligence?
Text extraction
Key-value pairs
Fields
What features are available in the Business card model in Document Intelligence?
Text extraction
Key-value pairs
Fields
Which file formats can be consumed by prebuilt Document Intelligence models?
JPEG
PNG
BMP
TIFF
PDF
What file size requirements exist for Document Intelligence documents?
The file must be smaller than 500 MB for the standard tier, and 4 MB for the free tier.
What image size requirements exist for Document Intelligence documents?
Images must have dimensions between 50 x 50 pixels and 10,000 x 10,000 pixels.
What limitations exist for PDF files in Document Intelligence?
PDF documents must have dimensions less than 17 x 17 inches or A3 paper size.
PDF documents must not be protected with a password.
What amount of pages are allowed for PDF and TIFF files in Document Intelligence?
PDF and TIFF files can have any number of pages but, in the standard tier, only the first 2000 pages are analyzed. In the free tier, only the first two pages are analyzed.
How do you use the Document Intelligence service?
For custom applications, use the REST API.
To explore the models and how they behavior with your forms visually, you can experiment in the Azure AI Document Intelligence Studio.
What underlying form models exist for custom forms in Document Intelligence?
Custom template models
Custom neural models
In the Read model in Doc intelligence, how can you select a page range for analysis?
Use the pages parameter
What is the purpose of the Read model in Document Intelligence?
The read model is ideal if you want to extract words and lines from documents with no fixed or predictable structure.
Which prebuilt Document Intelligence model supports Entity extraction?
general document model
What entity types can be detected in the General Document model?
Person. The name of a person.
PersonType. A job title or role.
Location. Buildings, geographical features, geopolitical entities.
Organization. Companies, government bodies, sports clubs, musical bands, and other groups.
Event. Social gatherings, historical events, anniversaries.
Product. Objects bought and sold.
Skill. A capability belonging to a person.
Address. Mailing address for a physical location.
Phone number. Dialing codes and numbers for mobile phones and landlines.
Email. Email addresses.
URL. Webpage addresses.
IP Address. Network addresses for computer hardware.
DateTime. Calendar dates and times of day.
Quantity. Numerical measurements with their units.
What is the purpose of the Layout model in Document Intelligence?
Use when you need rich information about the structure of a document.
What fields can be identified by the ID Document model?
First and last names.
Personal information such as sex, date of birth, and nationality.
The country and region where the document was issued.
Unique numbers such as the document number and machine readable zone.
Endorsements, restrictions, and vehicle classifications.
What fields are included in the Business Card model?
First and last names.
Postal addresses.
Email and website addresses.
Various telephone numbers.
What fields are included in the W-2 model?
Information about the employer, such as their name and address.
Information about the employee, such as their name, address, and social security number.
Information about the taxes that the employee has paid.
Describe custom template models
Custom template models accurately extract labeled key-value pairs, selection marks, tables, regions, and signatures from documents.
Training only takes a few minutes, and more than 100 languages are supported.
Describe custom neural models
Custom neural models are deep learned models that combine layout and language features to accurately extract labeled fields from documents.
This model is best for semi-structured or unstructured documents.
What is included in a successful response to a call to the Document Intelligence API?
A successful JSON response contains analyzeResult that contains the content extracted and an array of pages containing information about the document content.
Some fields in pages include pageNumber, angle, width, height, words.
The words field has the word in the content field, a polygon array, and a confidence score
How can you improve confidence scores in Document Intelligence?
You want to make sure that the form you’re analyzing has a similar appearance to forms in the training set if the confidence values are low. If the form appearance varies, consider training more than one model, with each model focused on one form format.
What projects does the Azure Document Intelligence Studio support?
Document analysis models
Read: Extract printed and handwritten text lines, words, locations, and detected languages from documents and images.
Layout: Extract text, tables, selection marks, and structure information from documents (PDF and TIFF) and images (JPG, PNG, and BMP).
General Documents: Extract key-value pairs, selection marks, and entities from documents.
Prebuilt models
Custom models
What do you need to analyze a document in the Document Intelligence Studio?
Azure Document Intelligence or Azure AI service endpoint and key
What steps do you take to create and use a custom model in Document Intelligence Studio?
Create an Azure Document Intelligence or Azure AI Services resource
Collect at least 5-6 sample forms for training and upload them to your storage account container.
Configure cross-domain resource sharing (CORS). CORS enables Azure Document Intelligence Studio to store labeled files in your storage container.
Create a custom model project in Azure Document Intelligence Studio. You’ll need to provide configurations linking your storage container and Azure Document Intelligence or Azure AI Service resource to the project.
Use Azure Document Intelligence Studio to apply labels to text.
Train your model. Once the model is trained, you’ll receive a Model ID and Average Accuracy for tags.
Test your model by analyzing a new form that wasn’t used in training.
A person plans to use an Azure Document Intelligence prebuilt invoice model. To extract document data using the model, what are two calls they need to make to the API?
The Analyze Invoice function starts the form analysis and returns a result ID, which they can pass in a subsequent call to the Get Analyze Invoice Result function to retrieve the results.
A person needs to build an application that submits expense claims and extracts the merchant, date, and total from scanned receipts. What’s the best way to do this?
Use the Azure Document Intelligence’s prebuilt receipts model. It can intelligently extract the required fields even if the scanned receipts have different names in them.
When would you use a Composed model?
A composed model is one that consists of multiple custom models. Typical scenarios where composed models help are
1. when you don’t know the submitted document type and want to classify and then analyze it.
2. They are also useful if you have multiple variations of a form, each with a trained individual model.
How many custom models can you include in a composed model?
Standard pricing tier: 100
Free tier: 5
You want to create an Azure AI Document Intelligence model where the documents are in one of three formats: wills, probate declarations, and affidavits. Each has their own specific layout. What type of model should you use that will understand the format of the three document categories?
A Composed model consists of multiple custom models. Each submitted form is categorized as one of the custom form types and analyzed using the corresponding custom model.