Image Processing and OCR Flashcards

1
Q

What does OCR stand for

A

Optical Character Recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

OCR turns text into?

A

image-based content into machine-readable text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 OCR Engines that come with all Grooper installs

A

Tesseract OCR
Transym 4 OCR
Transym 5 OCR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Matrix matching and feature recognition are part of what phase of an OCR engine’s operation?

A

Character Recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Breaking up pixels into lines, words, and characters is part of what phase of an OCR engine’s operation?

A

Segmenting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Many OCR engines spell check OCR results to improve
their accuracy. This is part of what phase of an OCR
engine’s operation?

A

Post-Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In your own words, describe the Segmenting phase of an

OCR engine.

A

This is when the pixels are broken up into lines, individual words, and
characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

OCR engines that obtain results by comparing a grid of
pixels on an image to a grid of pixels of examples of
characters are performing….

A

Matrix Matching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The Grooper activity that performs OCR is….

A

Recognize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What image processing operation is required for an OCR
engine to obtain results, either through Grooper’s image
processing suite or via the OCR engine itself?

A

thresholding (or binarizing) the image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Image processing in Grooper serves one (1) of three (3)

basic purposes. What are they?

A

Archival Adjustments also OCR Cleanup and Layout Data collection (Archival Adjustments
ONLY pertain to permanent image processing via the Image
Processing activity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The Grooper activity that performs permanent image

processing is….

A

Image Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In your own words, what is the benefit of performing
temporary image processing? How do you perform
temporary image processing in Grooper?

A

Temporary Image Processing is great because it will not make permanent
changes to the document itself. You assign a Temp IP Profile and run the
recognize activity. The only thing I will add is where that temporary IP
Profile gets assigned. It is assigned on the OCR Profile (which then
gets executed by the Recognize activity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
List three (3) common IP Commands used during
permanent image processing.
A

Auto Deskew, Auto Border Crop, Rotate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
List three (3) common IP Commands used during
temporary image processing.
A

Line Removal, Speck Removal, Negative Region Removal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Grooper’s set of properties that pre-process and reprocess

the OCR engine’s results are called…

A

Synthesis

17
Q

There are five (5) operations that comprise this

synthesis functionality. List them

A

Font Pitch Detection, Bound Region Processing, Iterative Processing, Cell
Validation, Segment Reprocessing

18
Q

Where are the synthesis properties enabled and configured?

A

ON an OCR Profile

19
Q

In your own words, what is “fuzzy regular expression”?
How does “fuzzy regular expression” improve Grooper’s
ability to extract data from poorly OCR’d pages?

A

This allows you to match expressions and set a percent match of how
close it looks like what you are trying to find, this helps eliminate errors
when extracting data.

20
Q

How do you alter the normal cost to swap characters

when using fuzzy regular expression?

A

Fuzzy Match Weightings

21
Q

How do you force a portion of a fuzzy regular expression

to match normally (or non-fuzzily)?

A

Required Mode

22
Q

Non-text information obtained via permanent or
temporary image processing such as line locations,
checkbox locations and states, barcode values, and
detection of trained shapes is referred to as…

A

Layout Data

23
Q

Once Non-text information obtained via permanent or
temporary image processing such as line locations,
checkbox locations and states, barcode values, and
detection of trained shapes is collected, where is this information stored in Grooper?

A

The LayoutData.json file. It is stored on each page object a Layout Data IP Command locates non-text data.

24
Q

What “tab marking” property will insert a tab character
(“\t”) between the highlighted values in the table below,
without adjusting the width of a tabbed space?

A

Detect Lines

25
Q

True or False: For the previous question, the lines must
be detected by a Line Removal or Line Detection
command BEFORE the extractor executes in order to insert
the tab character.

A

True