Processing Flashcards by Terrence Laukkanen

Goals of Processing

Discern what data is found in a certain source
Record all item-level metadata prior to processing
Enable defensible reduction of data

How well did you know this?

Not at all

Perfectly

Basic Processing Workflow 1/2

Create new custodian entries
New password bank entries
New processing profile to specify settings
New processing set that uses profile, and add processing data sources to saved processing set

How well did you know this?

Not at all

Perfectly

Basic Processing Workflow 2/2

Inventory the files located in the data sources
Apply filters to inventoried files to narrow down the data sources
Run reports to gauge how much you’ve narrowed down the set
Discover the inventoried and filtered files, then publish to the workspace

How well did you know this?

Not at all

Perfectly

What’s a Processing Profile

An object that stores the numbering, deNIST, extraction and de-duplication settings that the processing engine refers to when publishing documents in each data source

How well did you know this?

Not at all

Perfectly

Creating/Editing a Processing Profile

Go to the Processing Profile tab
Click “New Processing Profile”
Complete/modify fields
Click Save

How well did you know this?

Not at all

Perfectly

Fields - High Level

Name
Numbering Settings
Level Numbering
Inventory/Discovery Settings
Extraction Settings
Deduplication Settings
Publish Settings

How well did you know this?

Not at all

Perfectly

Processing Profile Fields - Numbering Settings

Default Doc Number Prefix - can be overruled by prefix on Custodian field
Numbering Type:
Auto Numbering - next available number of prefix
Define Start Number - if number already taken, moves to next available
Default Start Number
Number of Digits (Range is 1 to 10)
Parent/Child Numbering
Suffix Always (child appended to parent with delimiter)
Continuous Always (next control number in sequence)
Continuous, Suffix on Retry
Delimiter - hyphen, period, underscore (between parent and child)
Level Numbering (Format PPP.BBBB.FFFF.NNNN) at document level)
Number of Digits (Level 2 (box number), Level 3 (folder number), Level 4 (document number) [Level numbering cannot be used with Quick-Create Sets and cannot be changed upon publish, retry, or republish]

How well did you know this?

Not at all

Perfectly

Processing Profile Fields - Inventory/Discover Settings

DeNIST - Y/N
DeNIST Mode - All Files OR Do not break parent/child groups
Default OCR languages
Default time zone
Include/Exclude - Y/n [File List of included or excluded extensions]
Mode - All Files OR Do not break parent/child groups
File Extensions [List, just file extension no period, separated by hard return]
Inclusion/Exclusion

How well did you know this?

Not at all

Perfectly

Processing Profile Fields - Extraction Settings

Extract children - Y/N
When extracting children, do not extract: MS Office embedded images/MS Office
embedded object/email inline images
Email Output - MSG or MHT (MHT do not require duplicative storage of
attachments)
Excel Text Extraction Method - Relativity / Native / Native (failover to dtSearch) /
dtSearch (failover to Native) [dtSearch faster but doesn’t support some
metadata information or track changes]
Excel Header/Footer Extraction - Do not extract / Extract and place at end / Extract and place inline
PowerPoint Text Extraction Method
Word Text Extraction Method
OCR - if not essential to processing job, recommended to disable to reduce processing time
OCR Accuracy - High/Medium/Low
OCR Text Separator - separator between extracted text at top of page and text derived from OCR at the bottom

How well did you know this?

Not at all

Perfectly

Processing Profile Fields - Deduplication Settings

Deduplication Method - None / Global / Custodial
Propagate Deduplication Data - Yes / No (Yes to have metadata fields populated out of the following: All Custodians, Deduped Custodians, All Paths/Locations, Deduped Paths, and Dedupe Count)

NB - de-duplication only applies to parent files, it doesn’t apply to children

How well did you know this?

Not at all

Perfectly

Processing Profile Fields - Publish Settings

Auto-publish set - Y/N
Default destination folder - can create a new folder
Do you want to use source folder structure? Y/N

How well did you know this?

Not at all

Perfectly

Parent/Child Numbering Type Examples

MSG w/ 3 Word docs:
Email Parent
      Word Child 1
      Word Child 2
      Word Child 3 (password protected)
            Sub Child 1
            Sub Child 2

For Suffix Always: Sub Child 2 = REL00001.0003.0002
For Continuous: Sub Child 1 and 2 = last REL0000 numbers in the set (the end)
For Suffix on Retry: Sub Child 2 = REL00004.0002

How well did you know this?

Not at all

Perfectly

Prioritizing Publishing Speed Special Considerations

Deduplication Method = None

Create Source Folder Structure = No

How well did you know this?

Not at all

Perfectly

Suffix Special Considerations

Secondary levels of documents have delimiter + 4 digits
If a file is unpublished,
continuous always is numbering option, Rel will not add suffix
Suffix always is the numbering option, Rel will add suffix
Continuous, suffix on retry, Rel will add suffix
Possible to have suffix/non-suffixed children in case of error

How well did you know this?

Not at all

Perfectly

dtSearch Special Considerations

Faster, but does not populate:
Excel: Track Changes in extracted text
Word: Has Hidden Data in metadata field / Track Changes in metadata field
PowerPoint: Has Hidden Data / Speaker Notes

How well did you know this?

Not at all

Perfectly

Processing Sets

Study These Flashcards

An object to which you attach a processing profile and at least one data source and then use as the basis for a processing job.

A single processing set can contain multiple sources
Only one processing profile can be used
Can’t delete a workspace where there is an in-progress inventory/discovery/publish job in queue
Don’t add documents to a workspace and link those documents to an in-progress processing set
Rel works within the bounds of the operating system and programs installed on it (i.e. can’t tell if file was quarantined by anti-virus or deleted by user)
Never stop Relativity services through Windows services

Using a Processing Set

Study These Flashcards

Imagine that you’re a litigation support specialist. The firm you work for has put you in charge of setting the groundwork to bring data owned by two specific custodians into a Relativity workspace because those custodians have just been identified as maintaining material that is potentially relevant to the case.

To do this, you need to create a new processing set using the Default profile. Once you save the set, you need to attach those two specific custodians to the set via two separate processing data sources.

You can now bring only these two custodians’ files through the processing phases of inventory, discovery, and publish.

Discovering Files

Study These Flashcards

Discovery is the phase of processing in which the processing engine retrieves deeper levels of metadata not accessible during Inventory and prepares files for publishing to a workspace

Discovery Workflow

Study These Flashcards

Create a processing set
Add data sources
Inventory the files in the set to extract top-level metadata
Apply filters to the inventoried data
Run discovery on the refined data (!!!!)
Publish the discovered files to the workspace

Monitoring Discovery Status

Study These Flashcards

Monitor progress through the “Processing Set Status” display on the Processing Set layout

Discovery Status - Items

Study These Flashcards

of data sources
Inventory | Files Inventoried - # of files submitted
Inventory | Filtered Inventory - # of files excluded
Discover | Files Discovered - # of files discovered
Discover | Files with Extracted Text - # of files across all data sources that have had text extracted
Errors - [Unresolvable/Available to Retry/In Queue]

Publishing Files Overview

Study These Flashcards

The step that loads processed data into the environment so reviewers can access the files

Publish:

Applies all settings you specified on the profile to the documents you bring into the workspace
Determines which is the master document and master custodian and which are duplicates
Populates the ‘All Custodians’, ‘Other Sources’, and other fields with data

Publishing Files Guidelines

Study These Flashcards

If using both RDC and Processing to bring data into a workspace, the processing engine won’t de-duplicate against files brought in through RDC (doesn’t recognize RDC-imported data)
Publish includes 3 distinct sets of deduplication document ID creation, master document publish, and overlaying deduplication metadata (possible for multiple processing sets to be publishing at the same time)

Running File Publish

Study These Flashcards

Click “Publish Files” (if you disabled Auto-publish on profile)

NB - if documents don’t have an actual date, Rel will provide null values for Created, Last Accessed, Last Modified, Primary

Get a confirmation message about job - click Publish to proceed

Publishing Considerations

1. Control numbers are assigned from top of directory down 2. 3 distinct steps: deduplication document ID creation, master document publish, overlaying dedup metadata 3. After published, don't change Control Number value 4. If multiple data sources, Rel starts second source as soon as first set reaches DeDuplication and Document ID generation stage 5. Never disable a worker while its completing a publish job 6. Publish option is available even after publish is complete (can republish data sources) 7. If you've arranged for auto-publish, you are starting publish once discovery is complete, even if errors occur during discovery 8. Once you publish files, you are unable to delete or edit the data sources containing those files or change de-duplication method 9. When you delete a document, Rel automatically recalculates deduplication and publishes a new document to replace the deleted one 10. If you arrange to copy source files to Rel file share, Rel no longer needs access once you've published them (don't need to keep them in the file share) 11. If DeNIST is "YES" on the profile but Invariant database table is empty for DeNIST field, can't publish 12. Publish is a distributed process broken up into separate jobs, which leads to more stability by removing this single point of failure

Publish Process

1. Click Publish Files 2. Console event handler checks to make sure set is valid and ready 3. Event handler inserts all data sources on processing set into the processing set queue 4. Data sources wait in queue to be picked up by an agent 5. Processing set manager agent picks up each data source based on its order, password bank entries are synced, and agent submits each data source as an individual publish job to processing engine 6. Processing engine publishes files to the workspace. Rel updates the reports to include all applicable data. 7. Any errors are logged in the errors tabs 8. Set up review project on documents published to workspace

Monitoring Publish

of data sources Publish | Documents Published Publish | Unpublished Files Errors [Unresolvable/Available to Retry/In Queue]

Canceling publishing

Click cancel! 1. You can't cancel a republish job 2. Once the agent picks up the cancel publish job, no more errors 3. If you click "Cancel" while the status is "Waiting", can re-submit the publish job 4. If you click "Cancel" after the job has been sent to the processing agent, the set is cancelled and it is unusable 5. Errors resulting from a canceled job are given a canceled status and can't be retried 6. Once the agent picks up the cancel publish job, can't delete or edit those data sources NB - if publishing multiple sets with global de-duplication, will need to cancel in reverse order (e.g. cancel in 3 2 1 order).

Republishing Files

Can republish any time after Publish Files is enable after previous publish is complete. Required after retrying errors if you want to see the previously errored docs in the workspace. Will see confirmation message again

Republishing Considerations

1. All ready-to-retry errors resulting from publish job are retried 2. Deduplication is respected 3. When you resolve errors, Relativity performs an overlay (only one file for the republished document) 4. Updates field mappings for files that previously returned errors 5. Processing set may not be republished if numbering type on set profile has been changed 6. Start numbers on processing set may not be changed 7. Changes made to numbering type in a processing profile will not be respected after initial publishing; Data source info can't be changed after initial publishing

Retrying Errors after publish

Error files still published to Rel with their file metadata (but neither document metadata or extracted text is available) For resolvable issues such as password-protected files, can retry errors after you publish files

Processing Flashcards

(31 cards)