What are the two types of emails in Structured Analytics?
What is an inclusive email?
An email that contains unique content not included in any other email, and thus, must be reviewed. An email with no replies or forwards is by definition inclusive. The last email in a thread is also by definition inclusive.
What is an non-inclusive email?
an email whose text and attachments are fully contained in other (inclusive) emails
Why would you review only inclusive emails?
Because they contain everything that needs to be reviewed.
By reviewing only inclusive emails and skipping duplicates, your review process will be much more efficient.
What are the common Inclusive emails?
- The last email in a thread
- The end of attachments
- Change of text
- Change of sender or time
- Duplication
Are duplicate emails with blank attachments considered Inclusive or non-inclusive?
Blank attachments are always considered unique and therefore inclusive.
What can email threading visualisation be used for?
- Quickly see the story of an email conversation
- Optimise your QC process
What field is required for email thread visualisation?
Email Author Date ID must be present for the emails.
This field is only available for emails run through a full analysis using structured analytics.
If you have multiple Structured Analytics Sets how do you set which one to use for your current email threading visualisation?
Click display options and choose which SA set to display
Textual Near Duplicate Identification
What does it do?
- Takes contents of Extracted Text Field for all docs with 30MB or less text
- Operates on text-only and converts everything to lower case
- Sorts all docs from largest to smallest and then defines groups of documents organised around a principle document for each group.
Textual Near Duplicate Identification
If no groups are matched for a document what happens?
Nothing happens.
Textual Near Duplicate Identification
What happens for documents that are not textually similar to other documents or do not contain text?
Analyzed documents that are not textually similar enough to any other documents will not have fields populated for Textual Near Duplicate Principal or Textual Near Duplicate Group.
Docs that only contain numbers or are blank will have the Textual Near Duplicate Group set to numbers only or empty, respectively.
Textual Near Duplicate Identification
What is the minimum similarity percentage parameter for?
This parameter indicates how similar a document must be to a principal document to be placed into that principal’s group.
The higher the setting the faster the process will run because fewer comparisons have to be made
Textual Near Duplicate Identification
What fields does it create when you run Textual Near Duplicates Identification?
-::Textual Near Duplicate Principal
Identifies the principal document with a “Yes” value.
-::Textual Near Duplicate Similarity
The percent value of similarity between the near duplicate document and its principal document.
-::Textual Near Duplicate Group
Acts as the identifier for a given group of textual near-duplicate documents.
Textual Near Duplicate Identification
After running TNDI what are the next recommended actions?
- Setting up a Textual Near Duplicates view for a structured analytics set
- Assessing similarities and differences with Document Compare
- Viewing the Textual Near Duplicates Summary
- Viewing Near Dup Groups in related items pane