importing and exporting (med) Flashcards
what are some of the primary goals of processing?
- Discern, at an item level, exactly what data is found in a certain source.
- Record all item-level metadata as it existed prior to processing.
- Enable defensible reduction of data by selecting only items that are appropriate to move forward to review.
t or f: processing performs language identification.
false
what enables you to efficiently gather runtime diagnostic information?
the logging framework
what can you use logging for?
troubleshooting application problems when you need a very granular level of detail
why shouldn’t you set your logging to verbose when publishing documents to a workspace?
can cause your worker to run out of resources, which can then cause you publish job to cease entirely
what is a processing profile?
object that stores the numbering, deNIST, extraction and deduplication settings that the processing engine refers to when publishing documents in each data source that you attach to your processing set
what happens if you delete a processing profile that is associated with a processing set that you’ve already started?
the in-progress processing job will continue with the original profile settings you applied when the job was submitted
why won’t relativity re-extract text for a re-discovered file (unless an extraction error occurs)?
because processing always refers to the original master document and the original text stored in the database
is it possible for your workspace to contain a document family that has both suffixed and non-suffixed child documents?
yes
why is the same NIST list used for all workspaces in the environment?
because it is stored on the worker manager server
why don’t you need to set the Extract children field to Yes to have the files within PST and other container files extracted and processed?
because relativity breaks down container files by default without the need to specify to extract children
what happens when you select the dtSearch Text Extraction Method?
the Excel Header/Footer Extraction field below is made unavailable for selection because dtSearch automatically extracts header and footer information and places it at the end of the text
what would happen when you process files with both the OCR and the OCR Text Separator fields enabled?
any section of a document that required OCR will include text that says “OCR from image”, and this can pollute a dtSearch index, since now it has to add text that was not originally in the document
what would happen if you change the deduplication method in the middle of running a processing set?
could result in black DeDuped Custodians or DeDuped paths fields after publish, when those fields would otherwise display deduplication information
what does it mean to globally deduplicate your documents?
arranges for documents from each processing data source to be de-duplicated against all documents in all other data sources in your workspace; there should be no exact email duplicates in the workspace after you publish