How to Perform a Copy to Document Storage Operation

Data Sets > Selected Data Set > right-click > Copy To Document Storage

Requires Imports - Add/Edit Permissions

Users in a role with the appropriate permissions can copy the source files from a Data Set's import location to the document storage location on which documents extracted from the Data Set are stored, which frees up space on the storage associated with the import location for future imports.

The Copy to Document Storage operation supports Exclusion Options (enabled by default in the Project Index Settings or defined in a template) that enable you to exclude certain document classes and/or NIST eDocs from the copy, as follows:

  • Document Class: Disk Image – Select this checkbox if you want the Copy to Document Storage operation to exclude files with a document class of Disk Image (for example, a Logical Evidence File). Clear this checkbox to include documents with a document class of Disk Image.
  • Document Class: Archive – Select this checkbox if you want the Copy to Document Storage operation to exclude files with a document class of Archive (for example, a RAR, TAR, or ZIP/ZIPX). Clear this checkbox to include documents with a document class of Archive.
  • Document Class: Message Archive – Select this checkbox if you want the Copy to Document Storage operation to exclude files with a document class of Message Archive (for example, mail containers such as a PST or NSF). Clear this checkbox to include documents with a document class of Message Archive.
  • NIST eDoc – Select this checkbox if you want the Copy to Document Storage operation to exclude NIST eDoc files. Clear this checkbox to include NIST eDoc files.

By default, a Copy to Document Storage operation excludes these document classes as well as NIST eDocs (for example, if you use the settings from a new Index template in an existing Organization, or the default Index settings in a new Organization).

You can perform a Copy to Document Storage as follows:

  • As an option for an eligible Data Set after import, where you have the option to use the current Copy to Document Storage Exclusion Options or change the Exclusion Options. You do not have to use any of the Exclusion Options if you want to include the associated document classes and NIST eDocs. To perform a copy of source files to document storage after import, you use the Copy to Document Storage right-click option for an eligible Data Set or the Copy to Document Storage toolbar option from the Imports Summary.
  • As part of the Index Settings applied for any Import, which means you enable the Automatic Copy to Document Storage operation in the Project Settings, an Organization Index Settings template, or a System-level Index template. Each import then uses the current Copy to Document Storage Exclusion Options. Performing this copy as part of import will impact import time.
  • As a Data Set option for a specific import, in which you use the current Copy to Document Storage Exclusion Options in effect for the Project.

From Imports, you can select a Data Set that has been copied to document storage and select View Configuration to view which Copy to Document Storage Exclusion Options were in effect.

Storage for data used by Organizations, including document storage, is selected and managed by Digital Reef and the IT Administrator; for more information, see View and Manage System Storage. Source files copied to document storage are always copied to the storage location and folder in which the Data Set is stored, and exporting or deleting the Project also exports or deletes the source files in document storage.

Important Notes:

  • The Copy to Document Storage operation has the ability to preserve the staging used by certain file types (as long as their associated document classes are not excluded by the operation), such as Forensic Image file types, multi-part RAR files, and Bloomberg Message Dump files.

  • For a standard import of a Data Set from a Data Area or for a shared Data Set added to the V2 Project, you can set this option to Enabled if you want to copy the source files from the import location to the storage location and folder in which the Data Set is stored as part of the import configuration process. This enables you to free storage associated with the import location for future imports. This option is Disabled by default. This option is not available for a Load File import.
  • If this option is unavailable for a selected Data Set, the Data Set has already been copied to document storage.
  • When you issue this option, a copy task (Copy to Document Storage <Data Set Name>) appears in the Work Basket. If necessary, you can cancel this task from the Work Basket.
  • When you are examining how many documents are reported in the Work Basket task, note the following:

    • When you exclude a container as part of a document class and that container that has hooked children (for example, a Disk Image), any children in the container are copied to document storage as long as they are not subject to the exclusions. For example, if you had an import that consisted of a Disk Image with 2 PSTs, 2 NIST eDocs, and 5 non-NIST eDocs, and you performed a Copy to Document Storage after import using all 4 default Exclusion Options, only the 5 non-NIST EDocs would be copied out of the disk image. In general, you should be able to view the Source Files count in the Scan Report for a Data Set, subtract any excluded files, add any directories, and add any eligible files hooked out of the excluded files (such as files from a Disk Image), and that should be the number of documents copied.
    • If you see a Warning icon () in a completed Copy to Document storage task, one or more files were either excluded from the copy or failed to copy. You can then download a CSV file that identifies the reason why these files were not copied. (See the next section for more information on the CSV file.) Since the Data Set is considered partially copied, you can then select the Copy to Document Storage option from the Imports Summary again if you want to retry the copy to potentially copy more of the files previously excluded or failed to copy. In this case, the Work Basket task for the retry will report only the number of documents that needed to be copied, not everything previously copied.
  • Some document metadata fields (dahandle and darelativepath) will be updated to reflect the new location of the documents after the Copy to Document Storage operation.
  • The Copy to Document Storage operation by default uses all available processing Cores when accommodating a large data set that requires a distribution of work over multiple Analytic Engine resources. In this case, during the copy operation, the software will then compress all source data (that is, it will utilize containers for the appropriate file types). If you have the appropriate permissions, you can control the use of all available Cores for the Job using Job Management.
  • When viewing Connector information from Imports, if a Copy to Document Storage operation successfully copied everything in the Data Set, you will no longer see the Connector or Data Area information displayed for the Data Set in the Imports Summary, since the Data Area import location is no longer associated with that Data Set.

About the CSV Warnings File for Copy to Document Storage

If the Copy to Document Storage operation performs a partial copy, whether due to excluded files or errors, the Work Basket task displays a Warning icon () and you can use the Download option to download a CSV (WARNING_DETAILS_REPORT.csv) that numbers each entry and contains the following column information for each file that was not copied:

  • docnum
  • importpath
  • filename
  • filemd5
  • filetype
  • size
  • parsingstatus
  • Reason File Not Copied

The following information can appear in the Reason File Not Copied column:

  • Excluded as <type of Exclusion Option> — Indicates that the file was excluded based on Exclusion options. There are four default Exclusion options, as described earlier. For example, if you exclude Document Class: Archive, you will see an entry for an archive, such as a ZIP file, that contains Excluded as Document Class Archive in the CSV file.
  • The document or directory cannot be read.
  • The document or directory does not exist.
  • The connector cannot be read.
  • The document or directory has illegal characters in name.
  • The file is not a directory or regular file. — Reported for items such as a Special FIFO file.
  • An unexpected error occurred.

Note: For some types of unexpected errors, you may see additional details (for example, an I/O error for a corrupt disk image).