Processing Flashcards

1
Q

Goals of Processing

A

Discern what data is found in a certain source
Record all item-level metadata prior to processing
Enable defensible reduction of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Basic Processing Workflow 1/2

A
  1. Create new custodian entries
  2. New password bank entries
  3. New processing profile to specify settings
  4. New processing set that uses profile, and add processing data sources to saved processing set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Basic Processing Workflow 2/2

A
  1. Inventory the files located in the data sources
  2. Apply filters to inventoried files to narrow down the data sources
  3. Run reports to gauge how much you’ve narrowed down the set
  4. Discover the inventoried and filtered files, then publish to the workspace
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s a Processing Profile

A

An object that stores the numbering, deNIST, extraction and de-duplication settings that the processing engine refers to when publishing documents in each data source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Creating/Editing a Processing Profile

A
  1. Go to the Processing Profile tab
  2. Click “New Processing Profile”
  3. Complete/modify fields
  4. Click Save
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fields - High Level

A
Name
Numbering Settings
Level Numbering
Inventory/Discovery Settings
Extraction Settings
Deduplication Settings
Publish Settings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Processing Profile Fields - Numbering Settings

A

Default Doc Number Prefix - can be overruled by prefix on Custodian field
Numbering Type:
Auto Numbering - next available number of prefix
Define Start Number - if number already taken, moves to next available
Default Start Number
Number of Digits (Range is 1 to 10)
Parent/Child Numbering
Suffix Always (child appended to parent with delimiter)
Continuous Always (next control number in sequence)
Continuous, Suffix on Retry
Delimiter - hyphen, period, underscore (between parent and child)
Level Numbering (Format PPP.BBBB.FFFF.NNNN) at document level)
Number of Digits (Level 2 (box number), Level 3 (folder number), Level 4 (document number) [Level numbering cannot be used with Quick-Create Sets and cannot be changed upon publish, retry, or republish]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Processing Profile Fields - Inventory/Discover Settings

A

DeNIST - Y/N
DeNIST Mode - All Files OR Do not break parent/child groups
Default OCR languages
Default time zone
Include/Exclude - Y/n [File List of included or excluded extensions]
Mode - All Files OR Do not break parent/child groups
File Extensions [List, just file extension no period, separated by hard return]
Inclusion/Exclusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Processing Profile Fields - Extraction Settings

A

Extract children - Y/N
When extracting children, do not extract: MS Office embedded images/MS Office
embedded object/email inline images
Email Output - MSG or MHT (MHT do not require duplicative storage of
attachments)
Excel Text Extraction Method - Relativity / Native / Native (failover to dtSearch) /
dtSearch (failover to Native) [dtSearch faster but doesn’t support some
metadata information or track changes]
Excel Header/Footer Extraction - Do not extract / Extract and place at end / Extract and place inline
PowerPoint Text Extraction Method
Word Text Extraction Method
OCR - if not essential to processing job, recommended to disable to reduce processing time
OCR Accuracy - High/Medium/Low
OCR Text Separator - separator between extracted text at top of page and text derived from OCR at the bottom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Processing Profile Fields - Deduplication Settings

A

Deduplication Method - None / Global / Custodial
Propagate Deduplication Data - Yes / No (Yes to have metadata fields populated out of the following: All Custodians, Deduped Custodians, All Paths/Locations, Deduped Paths, and Dedupe Count)

NB - de-duplication only applies to parent files, it doesn’t apply to children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Processing Profile Fields - Publish Settings

A

Auto-publish set - Y/N
Default destination folder - can create a new folder
Do you want to use source folder structure? Y/N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Parent/Child Numbering Type Examples

A
MSG w/ 3 Word docs:
Email Parent
      Word Child 1
      Word Child 2
      Word Child 3 (password protected)
            Sub Child 1
            Sub Child 2

For Suffix Always: Sub Child 2 = REL00001.0003.0002
For Continuous: Sub Child 1 and 2 = last REL0000 numbers in the set (the end)
For Suffix on Retry: Sub Child 2 = REL00004.0002

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Prioritizing Publishing Speed Special Considerations

A

Deduplication Method = None

Create Source Folder Structure = No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Suffix Special Considerations

A

Secondary levels of documents have delimiter + 4 digits
If a file is unpublished,
continuous always is numbering option, Rel will not add suffix
Suffix always is the numbering option, Rel will add suffix
Continuous, suffix on retry, Rel will add suffix
Possible to have suffix/non-suffixed children in case of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

dtSearch Special Considerations

A

Faster, but does not populate:
Excel: Track Changes in extracted text
Word: Has Hidden Data in metadata field / Track Changes in metadata field
PowerPoint: Has Hidden Data / Speaker Notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Processing Sets

A

An object to which you attach a processing profile and at least one data source and then use as the basis for a processing job.

A single processing set can contain multiple sources
Only one processing profile can be used
Can’t delete a workspace where there is an in-progress inventory/discovery/publish job in queue
Don’t add documents to a workspace and link those documents to an in-progress processing set
Rel works within the bounds of the operating system and programs installed on it (i.e. can’t tell if file was quarantined by anti-virus or deleted by user)
Never stop Relativity services through Windows services

17
Q

Using a Processing Set

A

Imagine that you’re a litigation support specialist. The firm you work for has put you in charge of setting the groundwork to bring data owned by two specific custodians into a Relativity workspace because those custodians have just been identified as maintaining material that is potentially relevant to the case.

To do this, you need to create a new processing set using the Default profile. Once you save the set, you need to attach those two specific custodians to the set via two separate processing data sources.

You can now bring only these two custodians’ files through the processing phases of inventory, discovery, and publish.

18
Q

Discovering Files

A

Discovery is the phase of processing in which the processing engine retrieves deeper levels of metadata not accessible during Inventory and prepares files for publishing to a workspace

19
Q

Discovery Workflow

A
  1. Create a processing set
  2. Add data sources
  3. Inventory the files in the set to extract top-level metadata
  4. Apply filters to the inventoried data
  5. Run discovery on the refined data (!!!!)
  6. Publish the discovered files to the workspace
20
Q

Monitoring Discovery Status

A

Monitor progress through the “Processing Set Status” display on the Processing Set layout

21
Q

Discovery Status - Items

A

of data sources
Inventory | Files Inventoried - # of files submitted
Inventory | Filtered Inventory - # of files excluded
Discover | Files Discovered - # of files discovered
Discover | Files with Extracted Text - # of files across all data sources that have had text extracted
Errors - [Unresolvable/Available to Retry/In Queue]

22
Q

Publishing Files Overview

A

The step that loads processed data into the environment so reviewers can access the files

Publish:

  1. Applies all settings you specified on the profile to the documents you bring into the workspace
  2. Determines which is the master document and master custodian and which are duplicates
  3. Populates the ‘All Custodians’, ‘Other Sources’, and other fields with data
23
Q

Publishing Files Guidelines

A
  1. If using both RDC and Processing to bring data into a workspace, the processing engine won’t de-duplicate against files brought in through RDC (doesn’t recognize RDC-imported data)
  2. Publish includes 3 distinct sets of deduplication document ID creation, master document publish, and overlaying deduplication metadata (possible for multiple processing sets to be publishing at the same time)
24
Q

Running File Publish

A

Click “Publish Files” (if you disabled Auto-publish on profile)

NB - if documents don’t have an actual date, Rel will provide null values for Created, Last Accessed, Last Modified, Primary

Get a confirmation message about job - click Publish to proceed

25
Q

Publishing Considerations

A
  1. Control numbers are assigned from top of directory down
  2. 3 distinct steps: deduplication document ID creation, master document publish, overlaying dedup metadata
  3. After published, don’t change Control Number value
  4. If multiple data sources, Rel starts second source as soon as first set reaches DeDuplication and Document ID generation stage
  5. Never disable a worker while its completing a publish job
  6. Publish option is available even after publish is complete (can republish data sources)
  7. If you’ve arranged for auto-publish, you are starting publish once discovery is complete, even if errors occur during discovery
  8. Once you publish files, you are unable to delete or edit the data sources containing those files or change de-duplication method
  9. When you delete a document, Rel automatically recalculates deduplication and publishes a new document to replace the deleted one
  10. If you arrange to copy source files to Rel file share, Rel no longer needs access once you’ve published them (don’t need to keep them in the file share)
  11. If DeNIST is “YES” on the profile but Invariant database table is empty for DeNIST field, can’t publish
  12. Publish is a distributed process broken up into separate jobs, which leads to more stability by removing this single point of failure
26
Q

Publish Process

A
  1. Click Publish Files
  2. Console event handler checks to make sure set is valid and ready
  3. Event handler inserts all data sources on processing set into the processing set queue
  4. Data sources wait in queue to be picked up by an agent
  5. Processing set manager agent picks up each data source based on its order, password bank entries are synced, and agent submits each data source as an individual publish job to processing engine
  6. Processing engine publishes files to the workspace. Rel updates the reports to include all applicable data.
  7. Any errors are logged in the errors tabs
  8. Set up review project on documents published to workspace
27
Q

Monitoring Publish

A

of data sources
Publish | Documents Published
Publish | Unpublished Files
Errors [Unresolvable/Available to Retry/In Queue]

28
Q

Canceling publishing

A

Click cancel!

  1. You can’t cancel a republish job
  2. Once the agent picks up the cancel publish job, no more errors
  3. If you click “Cancel” while the status is “Waiting”, can re-submit the publish job
  4. If you click “Cancel” after the job has been sent to the processing agent, the set is cancelled and it is unusable
  5. Errors resulting from a canceled job are given a canceled status and can’t be retried
  6. Once the agent picks up the cancel publish job, can’t delete or edit those data sources

NB - if publishing multiple sets with global de-duplication, will need to cancel in reverse order (e.g. cancel in 3 2 1 order).

29
Q

Republishing Files

A

Can republish any time after Publish Files is enable after previous publish is complete.
Required after retrying errors if you want to see the previously errored docs in the workspace.
Will see confirmation message again

30
Q

Republishing Considerations

A
  1. All ready-to-retry errors resulting from publish job are retried
  2. Deduplication is respected
  3. When you resolve errors, Relativity performs an overlay (only one file for the republished document)
  4. Updates field mappings for files that previously returned errors
  5. Processing set may not be republished if numbering type on set profile has been changed
  6. Start numbers on processing set may not be changed
  7. Changes made to numbering type in a processing profile will not be respected after initial publishing; Data source info can’t be changed after initial publishing
31
Q

Retrying Errors after publish

A

Error files still published to Rel with their file metadata (but neither document metadata or extracted text is available)
For resolvable issues such as password-protected files, can retry errors after you publish files