Processing Flashcards
Goals of Processing
Discern what data is found in a certain source
Record all item-level metadata prior to processing
Enable defensible reduction of data
Basic Processing Workflow 1/2
- Create new custodian entries
- New password bank entries
- New processing profile to specify settings
- New processing set that uses profile, and add processing data sources to saved processing set
Basic Processing Workflow 2/2
- Inventory the files located in the data sources
- Apply filters to inventoried files to narrow down the data sources
- Run reports to gauge how much you’ve narrowed down the set
- Discover the inventoried and filtered files, then publish to the workspace
What’s a Processing Profile
An object that stores the numbering, deNIST, extraction and de-duplication settings that the processing engine refers to when publishing documents in each data source
Creating/Editing a Processing Profile
- Go to the Processing Profile tab
- Click “New Processing Profile”
- Complete/modify fields
- Click Save
Fields - High Level
Name Numbering Settings Level Numbering Inventory/Discovery Settings Extraction Settings Deduplication Settings Publish Settings
Processing Profile Fields - Numbering Settings
Default Doc Number Prefix - can be overruled by prefix on Custodian field
Numbering Type:
Auto Numbering - next available number of prefix
Define Start Number - if number already taken, moves to next available
Default Start Number
Number of Digits (Range is 1 to 10)
Parent/Child Numbering
Suffix Always (child appended to parent with delimiter)
Continuous Always (next control number in sequence)
Continuous, Suffix on Retry
Delimiter - hyphen, period, underscore (between parent and child)
Level Numbering (Format PPP.BBBB.FFFF.NNNN) at document level)
Number of Digits (Level 2 (box number), Level 3 (folder number), Level 4 (document number) [Level numbering cannot be used with Quick-Create Sets and cannot be changed upon publish, retry, or republish]
Processing Profile Fields - Inventory/Discover Settings
DeNIST - Y/N
DeNIST Mode - All Files OR Do not break parent/child groups
Default OCR languages
Default time zone
Include/Exclude - Y/n [File List of included or excluded extensions]
Mode - All Files OR Do not break parent/child groups
File Extensions [List, just file extension no period, separated by hard return]
Inclusion/Exclusion
Processing Profile Fields - Extraction Settings
Extract children - Y/N
When extracting children, do not extract: MS Office embedded images/MS Office
embedded object/email inline images
Email Output - MSG or MHT (MHT do not require duplicative storage of
attachments)
Excel Text Extraction Method - Relativity / Native / Native (failover to dtSearch) /
dtSearch (failover to Native) [dtSearch faster but doesn’t support some
metadata information or track changes]
Excel Header/Footer Extraction - Do not extract / Extract and place at end / Extract and place inline
PowerPoint Text Extraction Method
Word Text Extraction Method
OCR - if not essential to processing job, recommended to disable to reduce processing time
OCR Accuracy - High/Medium/Low
OCR Text Separator - separator between extracted text at top of page and text derived from OCR at the bottom
Processing Profile Fields - Deduplication Settings
Deduplication Method - None / Global / Custodial
Propagate Deduplication Data - Yes / No (Yes to have metadata fields populated out of the following: All Custodians, Deduped Custodians, All Paths/Locations, Deduped Paths, and Dedupe Count)
NB - de-duplication only applies to parent files, it doesn’t apply to children
Processing Profile Fields - Publish Settings
Auto-publish set - Y/N
Default destination folder - can create a new folder
Do you want to use source folder structure? Y/N
Parent/Child Numbering Type Examples
MSG w/ 3 Word docs: Email Parent Word Child 1 Word Child 2 Word Child 3 (password protected) Sub Child 1 Sub Child 2
For Suffix Always: Sub Child 2 = REL00001.0003.0002
For Continuous: Sub Child 1 and 2 = last REL0000 numbers in the set (the end)
For Suffix on Retry: Sub Child 2 = REL00004.0002
Prioritizing Publishing Speed Special Considerations
Deduplication Method = None
Create Source Folder Structure = No
Suffix Special Considerations
Secondary levels of documents have delimiter + 4 digits
If a file is unpublished,
continuous always is numbering option, Rel will not add suffix
Suffix always is the numbering option, Rel will add suffix
Continuous, suffix on retry, Rel will add suffix
Possible to have suffix/non-suffixed children in case of error
dtSearch Special Considerations
Faster, but does not populate:
Excel: Track Changes in extracted text
Word: Has Hidden Data in metadata field / Track Changes in metadata field
PowerPoint: Has Hidden Data / Speaker Notes