Processing Flashcards
Goals of Processing
Discern what data is found in a certain source
Record all item-level metadata prior to processing
Enable defensible reduction of data
Basic Processing Workflow 1/2
- Create new custodian entries
- New password bank entries
- New processing profile to specify settings
- New processing set that uses profile, and add processing data sources to saved processing set
Basic Processing Workflow 2/2
- Inventory the files located in the data sources
- Apply filters to inventoried files to narrow down the data sources
- Run reports to gauge how much you’ve narrowed down the set
- Discover the inventoried and filtered files, then publish to the workspace
What’s a Processing Profile
An object that stores the numbering, deNIST, extraction and de-duplication settings that the processing engine refers to when publishing documents in each data source
Creating/Editing a Processing Profile
- Go to the Processing Profile tab
- Click “New Processing Profile”
- Complete/modify fields
- Click Save
Fields - High Level
Name Numbering Settings Level Numbering Inventory/Discovery Settings Extraction Settings Deduplication Settings Publish Settings
Processing Profile Fields - Numbering Settings
Default Doc Number Prefix - can be overruled by prefix on Custodian field
Numbering Type:
Auto Numbering - next available number of prefix
Define Start Number - if number already taken, moves to next available
Default Start Number
Number of Digits (Range is 1 to 10)
Parent/Child Numbering
Suffix Always (child appended to parent with delimiter)
Continuous Always (next control number in sequence)
Continuous, Suffix on Retry
Delimiter - hyphen, period, underscore (between parent and child)
Level Numbering (Format PPP.BBBB.FFFF.NNNN) at document level)
Number of Digits (Level 2 (box number), Level 3 (folder number), Level 4 (document number) [Level numbering cannot be used with Quick-Create Sets and cannot be changed upon publish, retry, or republish]
Processing Profile Fields - Inventory/Discover Settings
DeNIST - Y/N
DeNIST Mode - All Files OR Do not break parent/child groups
Default OCR languages
Default time zone
Include/Exclude - Y/n [File List of included or excluded extensions]
Mode - All Files OR Do not break parent/child groups
File Extensions [List, just file extension no period, separated by hard return]
Inclusion/Exclusion
Processing Profile Fields - Extraction Settings
Extract children - Y/N
When extracting children, do not extract: MS Office embedded images/MS Office
embedded object/email inline images
Email Output - MSG or MHT (MHT do not require duplicative storage of
attachments)
Excel Text Extraction Method - Relativity / Native / Native (failover to dtSearch) /
dtSearch (failover to Native) [dtSearch faster but doesn’t support some
metadata information or track changes]
Excel Header/Footer Extraction - Do not extract / Extract and place at end / Extract and place inline
PowerPoint Text Extraction Method
Word Text Extraction Method
OCR - if not essential to processing job, recommended to disable to reduce processing time
OCR Accuracy - High/Medium/Low
OCR Text Separator - separator between extracted text at top of page and text derived from OCR at the bottom
Processing Profile Fields - Deduplication Settings
Deduplication Method - None / Global / Custodial
Propagate Deduplication Data - Yes / No (Yes to have metadata fields populated out of the following: All Custodians, Deduped Custodians, All Paths/Locations, Deduped Paths, and Dedupe Count)
NB - de-duplication only applies to parent files, it doesn’t apply to children
Processing Profile Fields - Publish Settings
Auto-publish set - Y/N
Default destination folder - can create a new folder
Do you want to use source folder structure? Y/N
Parent/Child Numbering Type Examples
MSG w/ 3 Word docs: Email Parent Word Child 1 Word Child 2 Word Child 3 (password protected) Sub Child 1 Sub Child 2
For Suffix Always: Sub Child 2 = REL00001.0003.0002
For Continuous: Sub Child 1 and 2 = last REL0000 numbers in the set (the end)
For Suffix on Retry: Sub Child 2 = REL00004.0002
Prioritizing Publishing Speed Special Considerations
Deduplication Method = None
Create Source Folder Structure = No
Suffix Special Considerations
Secondary levels of documents have delimiter + 4 digits
If a file is unpublished,
continuous always is numbering option, Rel will not add suffix
Suffix always is the numbering option, Rel will add suffix
Continuous, suffix on retry, Rel will add suffix
Possible to have suffix/non-suffixed children in case of error
dtSearch Special Considerations
Faster, but does not populate:
Excel: Track Changes in extracted text
Word: Has Hidden Data in metadata field / Track Changes in metadata field
PowerPoint: Has Hidden Data / Speaker Notes
Processing Sets
An object to which you attach a processing profile and at least one data source and then use as the basis for a processing job.
A single processing set can contain multiple sources
Only one processing profile can be used
Can’t delete a workspace where there is an in-progress inventory/discovery/publish job in queue
Don’t add documents to a workspace and link those documents to an in-progress processing set
Rel works within the bounds of the operating system and programs installed on it (i.e. can’t tell if file was quarantined by anti-virus or deleted by user)
Never stop Relativity services through Windows services
Using a Processing Set
Imagine that you’re a litigation support specialist. The firm you work for has put you in charge of setting the groundwork to bring data owned by two specific custodians into a Relativity workspace because those custodians have just been identified as maintaining material that is potentially relevant to the case.
To do this, you need to create a new processing set using the Default profile. Once you save the set, you need to attach those two specific custodians to the set via two separate processing data sources.
You can now bring only these two custodians’ files through the processing phases of inventory, discovery, and publish.
Discovering Files
Discovery is the phase of processing in which the processing engine retrieves deeper levels of metadata not accessible during Inventory and prepares files for publishing to a workspace
Discovery Workflow
- Create a processing set
- Add data sources
- Inventory the files in the set to extract top-level metadata
- Apply filters to the inventoried data
- Run discovery on the refined data (!!!!)
- Publish the discovered files to the workspace
Monitoring Discovery Status
Monitor progress through the “Processing Set Status” display on the Processing Set layout
Discovery Status - Items
of data sources
Inventory | Files Inventoried - # of files submitted
Inventory | Filtered Inventory - # of files excluded
Discover | Files Discovered - # of files discovered
Discover | Files with Extracted Text - # of files across all data sources that have had text extracted
Errors - [Unresolvable/Available to Retry/In Queue]
Publishing Files Overview
The step that loads processed data into the environment so reviewers can access the files
Publish:
- Applies all settings you specified on the profile to the documents you bring into the workspace
- Determines which is the master document and master custodian and which are duplicates
- Populates the ‘All Custodians’, ‘Other Sources’, and other fields with data
Publishing Files Guidelines
- If using both RDC and Processing to bring data into a workspace, the processing engine won’t de-duplicate against files brought in through RDC (doesn’t recognize RDC-imported data)
- Publish includes 3 distinct sets of deduplication document ID creation, master document publish, and overlaying deduplication metadata (possible for multiple processing sets to be publishing at the same time)
Running File Publish
Click “Publish Files” (if you disabled Auto-publish on profile)
NB - if documents don’t have an actual date, Rel will provide null values for Created, Last Accessed, Last Modified, Primary
Get a confirmation message about job - click Publish to proceed