Load Files from a Data Area for External Processing

Search Results View for a Data Set > Load from External Area

Requires Imports - Add/Edit Permissions

Note: Digital Reef now restricts import and reprocessing of data to Projects using Parsing Library V2. You can no longer import, reindex, or reprocess data in a Parsing Library V1 Project. Therefore, this operation is no longer permitted in a V1 Project.

If users in a role with permissions have used the Copy to External Area option to copy the contents of an entire Data Set Search Result view to an Export Data Area for external reprocessing, they can then load the files back into the application after processing is complete using the Load from External Area option.

Note: The Copy to and Load from External Area operations require an entire Search Results view. They do not support individual file selection.

If you perform a Load from External Area operation and the task completes but with exceptions, a WarningDetails.csv will be generated for the task. See About the CSV Warnings File for Load from External Area.

For more information about external processing, see How to Perform External Processing. To learn about the general reprocessing restrictions, see the topic How to Perform Document Reprocessing of Results. These restrictions affect whether or not external reprocessing changes are reflected or ignored.

Load from External Area Options

  • Export Data Area (required) — Select a target Export Data Area from the list of available Export Data Areas (a paged list).
  • Folder (required)As soon as you select an Export Data Area, the appropriate Folder information appears to the right. Navigate and select the appropriate Folder, opening a top-level folder to navigate within that Folder. If a Folder name is truncated, you can hover over the name and a tooltip displays the entire name.

Select one of the following three modes for the operation:

  • Reprocess Documents Only   — This mode reprocesses only the selected documents (for example, a parent document or email) in the Search Results, not their children (that is, the children of any reprocessed parent documents or emails are not included in the reprocessing and are not replaced in the Index). The parent document will reflect any updates to its content (for example, changes to the content of its children). Note that any container file such as a disk image or archive selected for reprocessing with this option will always be skipped.
    • Update Native Files   — Updates the original native files when the files are loaded back to the system from the export data area. This option places the files from the export data area in the depot (in a native container) and updates the dahandle and darelativepath information to provide the updated location for the native files. The file MD5 value for the updated native files will not change (for example, to preserve dedupe information). This option, when set, also populates the metadata field processedstatus (for the parent items) with UPDATED_ORIGINAL.
  • Reprocess Documents with Children — This mode reprocesses the selected parent-level documents and all children of the reprocessed parent documents or emails, as long as the children of those parent-level records are not present in Project Data. This option helps discover the children of documents that could not previously be processed (for example, previously damaged or encrypted files that have been fixed in a Data Set). All parent-level records whose children are present in Project Data are not reprocessed, regardless of their source Data Set (that is, they are skipped). Before attempting to reprocess a given parent-level record, remove its children records from Project Data. If you select this option, the following options apply:
    • Sync Document Children to Project Data — For reprocessed documents that reside in Project Data, synchronizes the children of those documents to ensure that Project Data is updated to reflect changes (for example, newly discovered children). You can select one or both of the following options with this option:
      • Apply Parent Tags to Children — Copies the tags associated with a parent to all of its children. In this case, the children inherit the parent's tag history (for example, the Tag Apply events).
      • Apply New Tags to Children - Select Tags — Applies one or more tags that you select to the children. Use the Select Tags button to select the tags you want in a popup. A Selected:<tags> message appears to indicate what you have selected, or a Selected: 0 tags message indicates if you have not selected any tags for this option.
    • Update Native Files   — Updates the original native files when the files are loaded back to the system from the export data area. This option places the files from the export data area in the depot (in a native container) and updates the dahandle and darelativepath information to provide the updated location for the native files. The file MD5 value for the updated native files will not change (for example, to preserve de-dupe information). This option, when set, also populates the metadata field processedstatus (for the parent items) with UPDATED_ORIGINAL.
  • Load External OCR Text — When performing OCR processing of documents externally, use this mode to load only the corresponding content from text (.txt) files back to the system instead of replacing the original files (such as PDFs). This allows you to perform OCR processing in place. The text files you supply at the Load from data area will be used to update the text for the associated documents, as long as you use the dochandle for each file. The file type and basic metadata such as origdocext will not change. The parsing status and OCR-related metadata fields will change, you might see language-related metadata fields populated, and the content in the Text tab of the Document Viewer will be updated. Once the OCR-processed documents have been loaded back to the system, their filemd5 duplicates will automatically undergo OCR processing to ensure that they have the OCR text and associated metadata.

Load or Cancel Options

  • Load — Click this to start the load operation. If the task completes but with exceptions, you can right-click the task in the Work Basket and download the generated WarningDetails.csv file, which will identify each error encountered (one per line), with the document handle and reason for each. The Work Basket task indicates that the documents from the source view are being reprocessed externally (for example, Externally reprocessing documents from Drill through ds1 IMPORT_ERROR for[Damaged].

Selecting View Details for a Load from External Area task enables you to verify the options in effect for the operation. Examine the details for the load task to see entries for the selected options (separate lines for each option).

Example of Details when Load External OCR Text is selected:

Load External OCR Text                        True

Scan Date                                             2017-11-15-19-28-08

  • Cancel — Click this to cancel the operation.

About the CSV Warnings File for Load from External Area

If the Load from External Area operation encounters exceptions, the Work Basket task displays a Warning icon (), and you can right-click and use the Download option to download a CSV (WARNING_DETAILS_REPORT.csv) that contains the following column information:

  • Document Handle
  • Reason

The following information can appear in the Reason column, which provides information about why a given document was not reprocessed:

  • Corresponding file/directory not found at external location — Reported when a document in the view has no corresponding documents at the Load from location. This is also reported if you select the Load External OCR Text option for the operation and there is no corresponding text file at the Load from location.
  • Directory encountered when expecting a fileReported if you select the Reprocess documents only option and a folder is found at the Load from location instead of a file.
  • Document skipped Reported for a document that was skipped because it could not be reprocessed. For Reprocess Documents with Children, all parent-level records whose children are present in Project Data are skipped, regardless of their source Data Set. Before attempting to reprocess a given parent-level record with children, remove its children records from Project Data. For Reprocess documents only, a container file such as a disk image or archive will always be skipped.
  • Too many potential files/directories at external locationReported when a document in the view has more than one corresponding document at the Load from location.