How to Perform External Reprocessing

Search Results View for a Data Set > right-click in tree Copy to External Area | Load from External Area

Requires Imports - Add/Edit Permissions

Note: Digital Reef now restricts import and reprocessing of data to Projects using Parsing Library V2. You can no longer import or reprocess data in a Parsing Library V1 Project. Therefore, the Load from External Area option is no longer permitted in a V1 Project.

Users in a role with the appropriate permissions can perform external reprocessing after initial import of a Data Set, if some files require additional processing using an external tool before they can provide value in the system. For example, after examining the scan report for a Data Set, or after performing a search for a parsing status, you may notice the following:

  • You have password-protected files that you need to process externally (instead of using the password-cracking options available from the Project Settings). Remember to remove the passwords from the files before loading them back to eDiscovery.
  • You have encrypted files that you need to process externally.
  • You have a ZIP file that contains password-protected or encrypted files that you need to process externally. (The entire ZIP file is marked as password-protected or encrypted in this case.)
  • You have an MSG file with an attachment that is password-protected or encrypted and needs external processing.
  • You identify files that are in a format that the system cannot process. This could be a type of archive (for example, a type of ZIP archive) that is in an unsupported format and marked with a parsing error.
  • You identify OCR Candidates that you want to process externally.
  • You have GUL files that you want to process externally. A GUL file is a Jungum Global Viewer Korean Language File by Samsung Electronics. GUL files are not supported by Digital Reef for parsing, but you can use a tool to process the files externally and convert the GUL file to a supported file format.

To help you perform this external processing, you can use document-level options from Search Results for a Data Set to copy an entire view to an Export Data Area and then later load the processed files back into the system from an Export Data Area. The following steps summarize the process:

  1. You can perform an import with or without populating Project Data as part of the import process (that is, when setting up an import with Imports > New Data Set, you can either keep the Add to Project Data checkbox cleared or check it). The external processing procedure can be performed when Project Data is populated. However, be aware of the general reprocessing restrictions, as documented in the topic How to Perform Document Reprocessing of Results. These restrictions affect whether or not external reprocessing changes are reflected or ignored.
  2. Examine the reports for a specific Data Set (or perform searches) to see if you have files that need to be processed externally.
  3. From the Organization Settings, make sure you have a Connector that supports Export, and then define the appropriate number of Export Data Areas. For example, you can define one Export Data Area for regular Exports, and one Export Data Area for handling external processing. Under your external processing Export Data Area on the system, you may want to create multiple Folders, one that will hold the documents to be processed (for example, a Folder named tobeprocessed), and one that will provide the processed documents (for example, a Folder named processed).
  4. Use the Data Set scan report or parsing status search to identify files that may need further processing (for example, files that are identified as Protected or Encrypted). An individual file such as a password-protected PDF will appear as Protected. A ZIP file with one or more protected entries will appear as Encrypted. For .GUL files, the Warnings and Errors section of the Data Set scan report will contain Unsupported File type, and you can drill-through that entry to get a list of the .GUL files.
  5. For the entire Search Result (either a drill-down Search from an item in the Data Set scan report, or a Search Result from a Search of a Data Set), select the Copy to External Area option from the Documents tab toolbar. For example, when you want to perform external OCR processing, you may want to drill-through the Total OCR Candidates from the report, and then select all docs and the Copy to External Area option from the results.

Note: The Process > Copy to External Area operation copies an entire Search Results view to the designated Export Data Area. It does not support individual file selection. Therefore, all docs from a results view must be selected.

  1. In the Copy to Data Area dialog, select the appropriate Export Data Area, and then use the Select button to choose the Path (Folder) in which you want to place the Search Results files you want to process externally (for example, the tobeprocessed Folder). Then click Copy to start the copy.
  2. Once the copy is complete, go to the Export Data Area. You will see that the name of each file copied to the Export Area uses the format <dochandle>_<filename>. At this point, you can copy the files to the Folder you created as your processing Folder.
  3. Use your tool of choice to perform the processing of the file (for example, to decrypt the file or crack the password). You must preserve the dochandle in the filename or any directories that contain the dochandle.  

Note: Since there are no filename extension requirements when loading back the files, you must be careful to manage the original versus processed files and ensure that only the appropriate processed files reside in the location used to load the files back to the system. When processing a .ZIP file externally, the .ZIP you copied appears at the Export Data Area with the naming convention <dochandle>_><filename>.zip. You must preserve the dochandle when you successfully extract the contents of the ZIP into a directory. Before you perform the Load operation from the Export Data Area, make sure that all children of the ZIP reside in the created directory, and be sure to delete the original .ZIP file. If you are externally processing .GUL files (for example, <dochandle>.gul), your conversion tool must create a directory using that naming convention. The created directory contains any children of the .GUL file, and must contain the converted file.

  1. Return to the same Search Result from which you performed the Copy to External Area operation, and then select the Load from External Area option from the Documents tab toolbar. (You do not have to select all the docs you copied before clicking the Load from External Area option.)
  2. In the Load from Data Area dialog, select the appropriate Export Data Area, and then use the Select button to choose the Path (Folder) from which you want to load the files that you have processed externally (for example, the processed Folder).
  3. From the Load from Data Area dialog, select the appropriate radio button option (Reprocess documents only, Reprocess documents with children, or Load External OCR Text) (and any associated options). Then click Load to start loading the files.
  4. As part of the Load operation using Reprocess documents only or Reprocess documents with children, the system will locate the files with the naming convention in the specified Folder, then import the processed files, replacing the originally imported files with the processed file versions. The parsingstatus field for the files will be updated to reflect SUCCESS, and the origparsingstatus field will identify the parsing status originally reported after import. All metadata fields used to denote relationships (for example, parent, container, docnum, importpath, and darelativepath) will be preserved. When viewing metadata, two additional metadata fields will identify the location of the externally processed files in document storage and their associated handles, extdarelativepath and extdahandle. If you perform the Load operation with the reprocess option Reprocess documents with children, and any document has been added from a given Data Set to Project Data, externally processed documents without existing children will have their changes reflected, but those with existing children will have their changes ignored.
  5. If you perform external OCR processing and then use the  Load from External Area operation with the Load External OCR Text option, the operation loads only the corresponding content from text (.txt) files back to the system instead of replacing the original files in the system (such as PDFs or Word documents). The text files you supply at the Load from data area will be used to update the text for the associated documents, as long as you use the dochandle for each file.The file type and basic metadata such as origdocext would not change. The parsing status and OCR-related metadata fields would change, you might see language-related metadata fields populated, and the content in the Text tab of the Document Viewer would be updated. In addition, the ocrpath field identifies the external OCR path, which includes the Data Area information and the document handle for a given OCR-processed file. Selecting View Details for the Load from External Area task in the Work Basket enables you to verify the options in effect for the operation. See Load Files from a Data Area for External Reprocessing for an example.Once the OCR-processed documents have been loaded back to the system, their filemd5 duplicates will automatically undergo OCR processing to ensure that they have the OCR text and corresponding metadata.

  6. If the Load from External Area completes but with exceptions, a WarningDetails.csv will be generated for the task, and you can right-click the task to download the file from the Work Basket. The file will identify each error encountered (one per line), with the document handle and reason why the corresponding document was not reprocessed. For more information on the reported errors, see the section on the CSV Warnings File in Load Files from a Data Area for External Reprocessing.
  7. For any files that now have content as a result the external processing, you can view them using the Document Viewer (for example, using the HTML and Text tabs). If an archive that was previously marked as protected or encrypted is now processed successfully, files from the archive will be added to the document list as newly parsed documents.

Note: When you perform an Export, you can use the ExtReprocessed field to see if a file has been processed externally and then loaded back into a Project. Values for this field are Y or N, depending on whether the extdahandle field is present for the file.