USING ANALYTICS FOR REVIEW Flashcards
What are the two types of emails in Structured Analytics?
Inclusive
Non-Inclusive
What is an inclusive email?
An email that contains unique content not included in any other email, and thus, must be reviewed. An email with no replies or forwards is by definition inclusive. The last email in a thread is also by definition inclusive.
What is an non-inclusive email?
an email whose text and attachments are fully contained in other (inclusive) emails
Why would you review only inclusive emails?
Because they contain everything that needs to be reviewed.
By reviewing only inclusive emails and skipping duplicates, your review process will be much more efficient.
What are the common Inclusive emails?
- The last email in a thread
- The end of attachments
- Change of text
- Change of sender or time
- Duplication
Are duplicate emails with blank attachments considered Inclusive or non-inclusive?
Blank attachments are always considered unique and therefore inclusive.
What can email threading visualisation be used for?
- Quickly see the story of an email conversation
- Optimise your QC process
What field is required for email thread visualisation?
Email Author Date ID must be present for the emails.
This field is only available for emails run through a full analysis using structured analytics.
If you have multiple Structured Analytics Sets how do you set which one to use for your current email threading visualisation?
Click display options and choose which SA set to display
Textual Near Duplicate Identification
What does it do?
- Takes contents of Extracted Text Field for all docs with 30MB or less text
- Operates on text-only and converts everything to lower case
- Sorts all docs from largest to smallest and then defines groups of documents organised around a principle document for each group.
Textual Near Duplicate Identification
If no groups are matched for a document what happens?
Nothing happens.
Textual Near Duplicate Identification
What happens for documents that are not textually similar to other documents or do not contain text?
Analyzed documents that are not textually similar enough to any other documents will not have fields populated for Textual Near Duplicate Principal or Textual Near Duplicate Group.
Docs that only contain numbers or are blank will have the Textual Near Duplicate Group set to numbers only or empty, respectively.
Textual Near Duplicate Identification
What is the minimum similarity percentage parameter for?
This parameter indicates how similar a document must be to a principal document to be placed into that principal’s group.
The higher the setting the faster the process will run because fewer comparisons have to be made
Textual Near Duplicate Identification
What fields does it create when you run Textual Near Duplicates Identification?
-::Textual Near Duplicate Principal
Identifies the principal document with a “Yes” value.
-::Textual Near Duplicate Similarity
The percent value of similarity between the near duplicate document and its principal document.
-::Textual Near Duplicate Group
Acts as the identifier for a given group of textual near-duplicate documents.
Textual Near Duplicate Identification
After running TNDI what are the next recommended actions?
- Setting up a Textual Near Duplicates view for a structured analytics set
- Assessing similarities and differences with Document Compare
- Viewing the Textual Near Duplicates Summary
- Viewing Near Dup Groups in related items pane
Textual Near Duplicate Identification
How do you set up a view for TNDI?
Create new view
In the OTHER tab
GROUP BY the Destination Textual Near Duplicate Group
Apply condition
Textual Near Duplicate Group : is set
Use these output fields for the view:
- Textual Near Duplicate Principal
- Textual Near Duplicate Similarity
- Textual Near Duplicate Group
You now have a view that shows all TNDI documents in their groups
Textual Near Duplicate Identification
Can I view Near Dup Groups in the Related Items Pane?
Yes
There is a button for it
Textual Near Duplicate Identification
Propagation For Responsiveness Field
Is this a good idea?
Nope.
It’s a bad idea.
if you mark an email within a group as not responsive, other potentially responsive emails within the group could be automatically coded as not responsive.
Name Normalisation
What is it?
Analyses email headers to identify aliases and the entities those aliases belong to.
Entities are merged with those created by Legal Hold, Processing or Case Dynamics
Name Normalisation
How does it work?
Creates Aliases the first time is sees an email address or name. Then associates the aliases with appropriate entity.
Name Normalisation
Should you run Name Normalisation in the same Structure Analytics set as email threading or Textual Near Duplicate Identification.
No!
Best to run in it’s only SA set for maximum flexibility. Makes it easier to make modifications if needed in future.
Name Normalisation
What fields are needed to run Name Normalisation?
Must have at least one From field and one other email header field such as To, CC or BCC, Subject or Date Sent.
Name Normalisation
If you don’t add Alias values prior to running name normalisation can you add them after?
Yes!
But you have to use the Merge Mass operation to consolidate duplicate entities
Cluster Visualisation
What is it?
Renders your cluster data as an interactive map allowing you to see a quick overview of your cluster sets and quickly drill into each cluster set to view subclusters and conceptually-related clusters
Clustering
What is it?
Analytics uses clustering to create groups of conceptually similar documents.
Unlike categorization, clustering doesn’t require much user input.
Concept Searching
What is it?
Concept Searching finds information without precisely phrased query by applying a block of text to find docs that have similar conceptual content
Concept Searching
How do you run a concept search or a “Find Similar Documents” Search
In the viewer right-click on some text and Conceptual Search.
Conceptual Search does the highlighted Text
Find Similar Documents uses the entire document