View Imports Reports

Imports > Reports

Requires Imports - View Permissions and Project - Reports - View Permissions to view the Reports

The Imports Reports tab provides reporting information for all imports performed in the Project.

Reports Tab Toolbar

The Reports tab toolbar provides the following:

  • Oldest report: yyyy-MM-dd HH:mm:ss – Identifies the full date and time based on the oldest report generated or updated for this view (for a date other than today or yesterday). The timestamp is shown in the local time zone. If the report was generated or updated today, the report will show Today HH:mm:ss; if it was generated or updated yesterday, it will show Yesterday HH:mm:ss.
  • Exception Details... – Click this button to view the Exception Details popup, which contains detailed information about the individual exceptions reported for all imported data (all Data Sets). You can filter the list by reason, auxparsingstatus, or file type, then download the filtered list in a grid or download a document that is an exception.
  • Update – Click this button to recalculate the reports for this view with the latest information. You should click Update on reports generated prior to a change (for example, changes in the number of documents in the view). An update will also reflect any changes to the enabling/disabling of reports by your eDiscovery Administrator.

About Report Generation

  • By default, if you have Project - Reports - View permission, a given Reports tab shows all reports that apply to that particular view. Your eDa given Reports tab shows you all reports that apply to that particular view. If you want, you can disable the generation of selected reports for the Imports, Data Set, or Project Data views, as follows:
    • Use Project Settings > Reports to selectively disable the generation of reports for the Imports, Data Set, or Project Data views in a given Project.
    • Use Organization Settings > Templates > Reports to selectively disable the generation of reports for the Imports, Data Set, or Project Data views in a new or existing Reports template for the Organization.
    • For those with the appropriate Administrator permissions, you can create a System-level Reports template from the System Settings and apply it to Organizations and associated Projects to selectively disable the generation of reports for the Imports, Data Set, or Project Data views.
  • Reports for a large number of documents (for example, greater than 50,000) take time to generate, especially if all reports for a given view are generated. If report generation is taking a considerable amount of time, an In Progress Report task appears in the Work Basket. If you want, you can cancel the report generation.

How to Use the Individual Charts

You can use the charts to get detailed information about the data in all of Imports.

Drill-through Support

For the Summary sections and charts, you can generally use the drill-through capabilities to get additional information, as follows:

  • Double-click an entry in a chart to perform a drill-through that generates an additional search result view focusing on the information you selected. You can drill-through a particular entry in a Summary (for example, Mail Container Errors) or a pie chart section or entry in the legend (for example, a Document Types entry). When you drill through an entry, an additional search is generated, and a task is generated in Work Basket for this new drill-through search result view. The drill-through search result view launches automatically to list the documents responsive to the search, based on the entry into which you drilled.

Note: The document count displayed for a given report entry may not always match the document count calculated for the drill-through search of that entry. Depending on the report, the drill-through search results may yield a higher count than the original report entry, since the drill-through search is generally more inclusive, looking for all records containing the associated value in some format. Note also that if you try to perform drill-through searches on the reports for a Shared Data Set (assuming you have permissions to do so), you may need to click Update to update the report information appropriately for use in the current Project.

Disabled Reports

Any report that has been explicitly disabled in the Report Settings is identified in the Reports tab as Report Disabled.

Reports without Data to Display

Any report for which there is no data to display is identified on the Reports tab as No Data to Display.

Hover Text Support

You can hover over any bar, pie, or column section to get more information about that section.

View by Count or Size

For charts that can be based on either a Count or Size, you can click to select either Count or Size, depending on which one is already active. For example, if Size is active, you can select Count.

Chart Download

If you have Document Reports permission, you can click , located at the top right of a chart, to download the appropriate file (mostly CSV). Most charts support download.

Some charts may take time to download. During report downloads, a message will appear at the top of the screen to notify you about the number of downloads in progress. You can hover over the in-progress message to see the individual downloads in progress.

Chart Details

For most charts in this report (except Document Classification), you can click at the top right of a chart to view a Details popup with the document count and size information for the appropriate items. The Details pop-up for many charts supports a Document drop-down menu with Add Tags, Remove Tags, Add to, and Remove from actions. For most charts, viewing the report Details popup displays the Name (or Date), Count, and Size of documents for an item in the report, such as an error, a Tag, a Sent or Received Date, an email address, language, or Custodian. Some reports may have special Details that either offer additional actions (Zero Day Details), additional tabs (Document Type Details), or are view only (Billing Report Details). For the Warnings and Errors and Custom Warnings and Errors reports, you can select an entry and click (Document Type Details), which displays Multi-Tab Details.

Warning and Errors Summary

The Warning and Errors summary shows the Name, Count, and Size for various types of warnings and errors reported each time data was added to the Project. This information will vary based on the added data.

Note: For descriptions of the possible Warnings and Errors, see the Error Codes and Reasons table (included below).

To get more information, you can do the following:

  • Click to download an XLSX file with separate tabs that give you detailed information about each type of error. For many errors, you will see count and size information based on the File Type. For the errors Unknown File Type, Read Error, Zero-Length File, and Attachment Save Error, you will see count and size information based on the File Extension instead.
  • Click to get basic information about the count and size information for all errors in the summary.
  • Click Exception Details for the entire report to get more information about the reported errors (exception details). From the scan details, you can download a problem document or filter the exception list by reason, auxparsingstatus, or file type.
  • For a selected entry in the table, click (Document Type Details), which displays Multi-Tab Details. The Document Type Details for the Warnings and Errors report provides two tabs of information (File Type and File Extension) that apply to a selected entry in the Warnings and Errors report (or Custom Warnings and Errors report) for a Data Set. This information is also available to users who have permissions to view report for all Imports.

  • Double-click an entry in the summary to drill through that entry and get a report, and document list, based on a search query associated with that entry. For example, the row Mail Container Errors identifies failures for an identified mail container file such as a PST. Note that the drill-through searches within reports implicitly disable options such as Include Families and Include Metadata; when you run search queries using the primary searches, these options are enabled by default. Therefore, to match the number of results from a drill-through search on a report, you would need to disable the Include Families and Include Metadata options (for example, using Advanced Search).

Warnings and Errors: Codes and Reasons

The following table identifies the list of supported Warnings and Errors. This table applies to Warnings and Errors reported in the parsingstatus metadata field and origparsingstatus metadata field. Each entry shows the Display Name shown in the Warnings and Errors summary, a description, the searchable Code/Reason, and the Type (Warning or Error).

An Auxiliary Warning, which applies to the auxparsingstatus metadata field, is listed in a separate table. OCR errors are listed in their own table as well.

This table does not include Success (code 00000, which identifies documents that were parsed successfully. Note that Success includes all populated or empty directories found on disk. (Directories that cannot be accessed are reported by 01011 DIRECTORY_ACCESS_ERROR.)

Note: If you are searching a given view for error information using the parsingstatus, origparsingstatus, or ocrstatus field (for example, parsingstatus::00005), note that the codes are 5 digits, and the reasons are not case-sensitive. You can search for the code, the text, or both in the field (since the fields make each term searchable).

After reviewing the Warnings and Errors, you may need to take additional steps to address problem files (for example, damaged, encrypted, or password-protected files). For example, you may need to configure password-cracking options and supply password files, repair a PST, decrypt an NSF file, or address a protected ZIP file. If files such as damaged files can be fixed or addressed, you can have them reprocessed from their original import location, or you can copy them to an export area to be externally reprocessed. You drill-through entries in the report, such as Damaged, Encrypted, Protected, or Archive Extraction Error, and, from the drill-through Search results, select files or all files and use the Reprocess option or, for external reprocessing, the Copy to External Area option.

 

Display Name for Warning/Error Description Searchable Code/Reason (parsingstatus field content) Type (Warning or Error)
Processing Failure An unknown failure prevented processing of the file.

You can attempt to reprocess, but the error may recur. Check the filetype, importpath, and size fields for more information that may help you decide next steps, if warranted.
00001 FAILURE Error
No Data The file has no valid data (that is, no terms that are considered acceptable). It may have terms that do not meet the minimum or maximum term length, or that are on the Stop Words List.

Note that for a file with no content, the contentmd5 will be set to the filemd5.

The file may benefit from OCR processing, but reprocessing will not yield additional information.
00005 NODATA Warning
Read Error The read operation on the file failed.

In many situations, you can try reprocessing, and if that fails, attempt to work with the file in question. Note that with Parsing Library V2, you will see this status for Microsoft PowerPoint 3.0 and 4.0 files (which are ID only). For these files, reprocessing will have no effect.
00009 FILE_READ Error
Conversion Error The contents of the file could not be converted to the correct format during processing.

If you see this error, please contact Digital Reef Customer Support.
00012 ICU_CONV_ERROR Error
Empty Email The email has no content, or the email body may be damaged or invalid.

You may see this error if the email has only attached photos or its subject line contains the message. A small number of these errors may be considered expected and not of concern.
00015 NO_EMAIL_BODY Error
Open Error The file could not be opened or processed due to an unknown error. If this is a Lotus Notes NSF file (for Lotus Notes), the file may not be a valid NSF file.

Try reprocessing, and if that fails, attempt to work with the file in question.
00016 FILE_OPEN Error
Zero-Length File The file has zero bytes of content.

A small number of these errors may be considered expected and not of concern.
00017 FILE_ZERO_LENGTH Error
Unknown File Type The software could not determine the document type for this file.

Examine the filetype and docext fields. You can try reprocessing, but the results may not be optimal and you may see the Partial Text Extraction warning.
00019 FILE_TYPE_UNDETERMINISTIC Error
Unsupported File Type For documents that were processed with Parsing Library V1 only, this parsing status indicated that the software was able to identify the file type, but the file type is not supported for full parsing.

Examine the filetype and docext fields. Reprocessing is not useful for files with this status.
00021 FILE_NOT_SUPPORTED Error
System Error The system encountered an error.

Try reprocessing, and if that fails, attempt to work with the file in question.
00024 SYSTEM_ERROR Error
Processing Halted The software had to halt document processing due to an error.

You can attempt to reprocess, but the error may recur. Check the filetype, importpath, and size fields for more information that may help you decide next steps, if warranted.
00025 PROCESS_ERROR Error
Processing Timeout Processing of the file exceeded the timeout value.

Try reprocessing, and if that fails, attempt to work with the file in question.
00026 TIMEOUT Error
Encrypted The file cannot be processed because it is encrypted. In general, this error applies to attachments extracted from Lotus Notes NSF files. It also applies to Zip/ZipX containers and disk partitions (for example, BitLocker-encrypted partitions from an LEF).

You need to obtain a password for the encrypted file.
00027 ENCRYPTED Error
Damaged In many cases, indicates that the file cannot be processed because it is damaged in some way. EnCase files, including Logical Evidence Files, are considered Damaged when there is a mismatch between the stored and calculated cyclic redundancy check (CRC). This status can also be reported for a PST that is damaged or has invalid properties from which the contents can still be extracted.

You can try to work with the file in question, but no other action is useful in eDiscovery for this error.
00028 FILE_DAMAGED Error
Protected The file cannot be processed because it is password-protected. If this is a protected Lotus Notes NSF file, a Lotus Notes key is required to open the file. This error may be reported for file types such as Microsoft Word or Adobe Acrobat PDF, but not for encrypted NSF attachments, mail containers other than NSF, archives, disk images, text files, EMLs, HTMLs, or vcalendar items.

You need to obtain a password for the password-protected file.
00029 PROTECTED Error
Parsing Error Some formatting of content in the file generated a parsing error (for example, incorrectly formatted archives).

You can attempt to reprocess, but the error may recur. Check the filetype (to see if it is a common type and supported), as well as the importpath and size fields for more information that may help you decide next steps, if warranted.
00031 PARSER_LIB_ERROR Error
Permission Error File permissions prevented the file from being processed.

If you see this error, please contact Digital Reef Customer Support for guidance.
00037 PERMISSION_ERROR Error
File Not Found The file could not be found at its import location. It may have been moved or deleted.

During processing of an email archive (for example, a PST), this error may be reported for an email that is either corrupt or missing from the archive.

This status can also be reported during processing of a disk image (for example, an L01 Logical Evidence File) if there was a problem mounting the disk image, or if a file could not be retrieved. Failure to retrieve a file could be caused by a filename with invalid characters, such as a / (forward slash), which typically marks a directory.

For a PST, the ideal solution would be to replace the PST. If not, then try to identify the problematic email and open it in MS Outlook to verify that it produces an error.

For an L01, please contact Digital Reef Customer Support for guidance, which may involve isolating certain folders that can be extracted and processed separately.
00038 FILE_NOT_FOUND Error
Partial Text Extraction The Strings Parser detected some text from certain files:

- After initial import of files identified as text but that include invalid characters. The Text tab of the appropriate Document Viewer displays the text.

- After reprocessing of files that are identified as Unknown files but that have some text in them. Reprocessing the Unknown files may not always yield optimal results. Although the software attempts to extract text, the text may be unusable (potentially garbage) and slow to extract.

If you see this warning with PST files, they might be partially corrupt. You may want to try reprocessing, and working with the PST in MS Outlook.
00040 PARTIAL_TEXT_EXTRACTION Warning
Attachment Open Error The parsing library could not open this email attachment. When parsing MSGs from a PST, the parsing library checks the pr_attach_data_bin MAPI field of entries in an attachments table. If the pr_attach_data_bin field is not present for a given attachment, that attachment is marked with this error.

Reprocessing is not useful for this error.
00041 ATTACH_OPEN Error
Attachment Save Error The parsing library could not save this email attachment. When parsing MSGs from a PST, the parsing library checks the pr_attach_data_bin MAPI field of entries in an attachments table. If the pr_attach_data_bin field is present but has a null value for a given attachment, that attachment is marked with this error.

A number of files reported with this status may indicate one of the following for an MSG attachment: 1) The attachment is actually a reference to a file on the disk of the user, 2) The binary data for the attachment is missing or of zero length, or 3) The attachment is an Archive Stub.

Reprocessing is not useful for this error.
00042 ATTACH_SAVE Error
Unsupported Lotus Note The Lotus Notes file can be identified (for example, based on its Form type) and extracted, but is not a standard Form type. Unsupported Form types include PA, BCMemo, Bookmark, TaskNotice, PAReadOnly, PO, Archive Profile, (Message), and PartEval. This exception is also returned when the Lotus Notes Form type is Unavailable but the Digital Reef software is able to determine that the type is a Script.

This parsing status is preserved upon reprocessing, although reprocessing is generally not useful for this warning. No further action in eDiscovery is needed.

If you see this warning, it may mean that Lotus Notes was used for more than email.
00043 UNSUPPORTED_LOTUS_NOTE Warning
Unknown Lotus Note The Lotus Notes file could not be identified (for example, based on its Form type) but it could be extracted.

This parsing status is preserved upon reprocessing, although reprocessing is generally not useful for this warning. No further action in eDiscovery is needed.

If you see this warning, it may mean that Lotus Notes was used for more than email, including custom applications.
00044 UNKNOWN_LOTUS_NOTE Warning
Mapped Lotus Note Form The Lotus Notes Form type was Unavailable, but the Digital Reef software could derive enough information (for example, MIME content) to enable processing of the file as an EML.

This parsing status is preserved upon reprocessing, although reprocessing is generally not useful for this warning.

No further action in eDiscovery is needed.
00045 MAPPED_LOTUS_NOTE Warning
Missing Attachment For a Bloomberg message, indicates that the message has one or more missing attachments. This occurs if an attachment resides in the XML file but not in the attachment archive.

This error may indicate an issue with the XML Bloomberg export, or potentially a file/folder name issue (for example, tar.gz archives may have the wrong doc extensions for XML).
00046 MISSING_ATTACHMENT Error
Archive Empty Zero files were extracted from the archive (container file). This applies to containers including ZIP, TAR, RAR, 7ZIP, EWF, LEF, PST, NSF, and MBOX. This does not apply to Bloomberg containers. For mail containers, the mailcontainererror metadata field will also be populated with this error.

Examine the filetype, importpath, and size fields. Reprocessing is not generally useful.
00047 ARCHIVE_EMPTY Error
Attachment Manifest Error For a parent file, indicates that there is an issue with an entry in the attachment manifest. This can be caused by the occurrence of special characters in an attachment filename.

Try to work with the parent email.
00048 ATTACH_MANIFEST_ERROR Error
Nested Duplicate Archive If a parent archive (for example, a ZIP file) contains nested duplicate archives, its duplicate archives report this error to indicate that the system protected against the repeated expansion, thereby avoiding potential size limit issues.

This error is intended to protect the system. No further action is necessary. Do not reprocess documents with this parsing status.
00049 NESTED_DUPLICATE_ARCHIVE Error
Unsupported PDF Form The software does not support Adobe Acrobat (PDF) files that are PDF Forms.

Review the files in Adobe Acrobat. No further action in eDiscovery is useful.
00050 UNSUPPORTED_PDF_FORM Error
Invalid Transport Header The email transport header is not valid (it is either not in a standard format or lacks the minimum number of fields). A transport header must have at least three of the standard fields (for example, to, from, bcc, or cc).

When this error is reported for items in Drafts and Sent folders, it may be expected, since these do not tend to have much metadata. If you see this error for a large number of email-related items, it may indicate a broader issue.
00051 INVALID_TRANSPORT_HEADER Error
File Type Map Failure The detected file type failed to map to a supported file type name.

If this occurs, please supply the file type information to Digital Reef Customer Support for potential patch purposes.
00052 FILE_TYPE_MAP_FAILURE Error
Error Extracting Child This parsing status is no longer populated for newly added data (see 00062 CONTAINER_EXTRACTION_WARNING instead). Existing data may still report this parsing status to warn of a potential problem with extraction from the container file.

00053 CHILD_ARCHIVE_EXTRACTION_ERROR Error
Lotus Notes Archive Stub The software was able to process and extract some text from Lotus Notes emails with an nsfform type of ArchiveStub. This warning status alerts you to the presence of stub files.

This parsing status is preserved upon reprocessing, although reprocessing for this warning is not generally useful.
00054 LOTUS_NOTES_ARCHIVE_STUB Warning
NSF Partial Extraction The software was only able to partially extract items from a Lotus Notes NSF.

You can try reprocessing, but this error may mean that the some of the items are corrupted. In general, you may want to try to replace the NSF.
00056 NSF_PARTIAL_EXTRACTION Error
Third-Party Software Not Installed The third-party software required for processing is not installed on the system (for example, the UnRAR tool needed to process certain encrypted RAR files).

If you see this error, please contact Digital Reef Customer Support for guidance.
00057 THIRD_PARTY_SOFTWARE_NOT_INSTALLED Error
Attachment Encrypted For EMLs and ICS Calendar items from Lotus Notes NSF container files, indicates the presence of an encrypted attachment. The attachment, however, will not have this parsing status; it will have a parsing status of 00017 FILE_ZERO_LENGTH.

The parsing status is preserved upon reprocessing.

Forensic assistance may be required to perform decryption of the attachment.
00058 ATTACH_ENCRYPTED Error
Attachment Damaged For EMLs and ICS Calendar items from Lotus Notes NSF container files, indicates the presence of a damaged attachment. The attachment, however, will not have this parsing status; it will have a parsing status of 00017 FILE_ZERO_LENGTH.

This parsing status is preserved upon reprocessing.

Unless this error is widespread, a small number of files with this error may be expected and may not warrant further action.
00059 ATTACH_DAMAGED Error
RTF Flag Missing For MSGs that are extracted from a PST/OST and have only an RTF body but are missing the PidTagRtfInSync (0x0E1F) flag, this warning indicates that the software added the missing flag.

If this warning is widespread, it may indicate that the emails were created or converted by a non-Microsoft tool that left off the flag. Most likely, the content is not problematic, but in some cases, this could indicate a corruption issue. You may want to investigate some of the affected emails to see if there is a broader issue.
00060 RTF_FLAG_MISSING Warning
POI Error For Microsoft Office documents processed prior to 5.2.0.0, this warning indicates that a problem in the POI third-party library prevented the software from extracting attribute information such as docannotations, hiddendata, trackchanges, and markuphistory from documents.

This warning is for informational purposes only. No action is needed, including reprocessing, which will have no effect in this situation.
00061 POI_ERROR Warning
Container Extraction Warning For items from a container processed as of 4.3.11.0, this parsing status indicates a potential problem with extraction from the container file (for example, a PST). This warning may help flag emails that have certain issues when they are extracted and may not be perfect. This warning can also be reported if an MSG from a PST/OST contains an issue with the expected Block ID values.

This parsing status persists after reprocessing of the item.

If you see this warning reported for a large number of items from a container, examine some of the emails for signs of a broader issue.
00062 CONTAINER_EXTRACTION_WARNING Warning
No filemd5 Generated The software was unable to generate a filemd5 for this file.

If you see this error, try reprocessing.
00065 NO_FILEMD5_GENERATED Error
Attribute Extraction Warning For Microsoft Office documents processed as of Release 5.2.0.0, this warning indicates that the software's third-party library was unable to extract attribute information such as docannotations, hiddendata, trackchanges, and markuphistory from documents.

This warning is for informational purposes only. No action is needed, including reprocessing, which will have no effect in this situation.
00066 ATTRIBUTE_EXTRACTION_WARNING Warning
Missing OLE Text Warning This status applies to documents processed with Parsing Library V2 as of 5.2.5.x. For parent documents with one or more embedded OLE documents, this warning indicates that the parent document is missing some text from an OLE document (that is, text was not extracted for that document).

This warning is for informational purposes only. No action is needed, including reprocessing, which will have no effect in this situation.
00067 MISSING_OLE_TEXT_WARNING Warning
File ID only For documents processed with Parsing Library V2 only, this parsing status indicates that the software was able to identify the file type, but the file type support is limited to identification only, not full parsing.

You can examine the filetype and docext fields. Reprocessing is not useful for files with this status.
00068 FILE_ID_ONLY Warning
Virus Detection Error An error with the Virus Detection software prevented this file from being checked for viruses.

In general, if this error occurs, it will apply to all documents imported in a Data Set. If you happen to see only individual documents receive this status, you should reprocess the entire Data Set to get a revised parsing status. Also note that if for some reason two parsing statuses end up applying to a document (this status and a non-Virus Detection parsing status), the non-Virus Detection parsing status will take precedence and the Virus Detection Error status will not appear.
00069 VIRUS_DETECTION_ERROR Error
RTF Body Substituted For emails that are determined to have a corrupt RTF body, this warning indicates that the email body information was substituted with the PR_BODY MAPI field. Reprocessing will not improve this situation.

If this warning is widespread, it may indicate that there is a general issue with the emails in question and could indicate a corruption issue. You may want to investigate some of the affected emails to see if there is a broader issue with the data or collection.
00071 RTF_BODY_SUBSTITUTED Warning
SMF Data Extraction Error For a Short Message Format (SMF) record, this error indicates that the record was extracted from the SMF parent (for example, a Cellebrite XML file), but some aspect of the record could not be processed as expected.

For the SMF parent, this error indicates that either an entire section could not be extracted, or that some aspect of a record prevented the record from being extracted.
00072 SMF_DATA_EXTRACTION_ERROR Error
Excluded Archive Types This warning is reported for files that have been explicitly excluded from extraction based on their file extension. You specify file extensions that you want to exclude in the Excluded File Extensions for Extraction section of the Index Settings.

This warning is for informational purposes only. No action is needed, including reprocessing, which will have no effect in this situation.


01000 SKIPPED_FILE Warning
Special File The file is a Special File (that is, not a regular file or directory). Special Files are not subject to processing. For example, a named pipe (FIFO) is a Special File.

If you see this error, please review the source data and forensic investigation may be required. If necessary, contact Digital Reef Customer Support for guidance.
01001 SPECIAL_FILE Error
Decompress Error A compressed file could not be decompressed (for example, a GZIP file).

Try to work with the file.
01005 FILE_DECOMPRESS Error
Access Error The file could not be retrieved using the Connector and location associated with the import operation. Connectors and locations are defined by an Administrator as part of Organization Provisioning.

Try to investigate a possible Connector configuration issue. If the issue cannot be resolved, please contact Digital Reef Customer Support.
01006 CONNECTOR_RETRIEVE_ERROR Error
Lotus Notes Not Installed The Lotus Notes client is not installed on the system component (Analytics Node) that tried to process the file.

If you see this error, please contact Digital Reef Customer Support.
01008 LOTUS_NOTES_NOT_INSTALLED Error
Lotus Notes Not Licensed Either the system component does not have a valid Lotus Notes client or the license has expired.

If you see this error, please contact Digital Reef Customer Support.
01009 LOTUS_NOTES_NOT_LICENSED Error
Archive Extraction Error Extraction of one or more files from an archive failed.

Try reprocessing, but the error may recur. Try to work with the file in question.
01010 ARCHIVE_EXTRACTION_ERROR Error
Directory Protected Access to a directory is denied because the directory is protected.

If you see this error, please contact Digital Reef Customer Support.
01011 DIRECTORY_ACCESS_ERROR Error
Unsupported File for Connector The Connector does not support files of this type.

Try to investigate a possible source data issue or system configuration issue. If the issue cannot be resolved, please contact Digital Reef Customer Support for guidance.
01012 CONNECTOR_OPERATION_NOT_SUPPORTED Error
Missing EWF Files One or more of the EWF segment files are missing from the set.

This error may indicate an issue with the source data and forensic investigation may be required. If necessary, contact Digital Reef Customer Support for guidance.
01013 EWF_FILE_MISSING Error
Invalid EWF Filename The software detected an invalid EWF segment filename.

This error may indicate an issue with the source data and forensic investigation may be required. If necessary, contact Digital Reef Customer Support for guidance.
01014 EWF_FILE_INVALID_NAME Error
EWF Error An error occurred during processing of an EWF segment file.

You can try to reprocess, but the error may recur and may indicate an issue with the source data and forensic investigation may be required. If necessary, contact Digital Reef Customer Support for guidance.
01015 EWF_FILE_ERROR Error
No Partitions The file system type could not be determined from the raw image. When this error occurs, the diskpartitions metadata field does not exist.

This error may indicate an issue with the source data and forensic investigation may be required. If necessary, contact Digital Reef Customer Support for guidance.
01016 NO_PARTITIONS Error
Skipped Partitions All of the discovered partitions were skipped, either because their type was known and intentionally skipped, or because their type was unknown and could not be processed. The metadata field diskpartitions provides details.

Unless some parts of the LEF image have a different parsing status, the LEF most likely extracted properly and no action is needed. The skipped partitions (e.g., the L02, L03) are from a multi-part LEF image and called skipped because they are associated with the first L01 segment.
01017 PARTITIONS_SKIPPED Error
Partition Errors At least one partition type was supported but failed during processing. This typically occurs when a file system type is known and supported but the mount fails. Other partitions may have been processed successfully. The metadata fields ewfpartitions and diskpartitionstatus provide details.

You can try to reprocess, but the error may recur and may indicate an issue with the source data. Forensic investigation may be required. If necessary, contact Digital Reef Customer Support for guidance.
01018 PARTITION_ERRORS Error
No Fuse Support Fuse is not completely installed (that is, the Fuse kernel module, library, or binaries are not installed).

If you see this error, please contact Digital Reef Customer Support.
01019 FUSE_NOT_INSTALLED Error
Skipped Directory By request, the directory was skipped during processing. 01020 DIRECTORY_SKIPPED Error
Invalid Characters The filename contains invalid characters.

Examine the importpath field.
If this is from an L01, forensic investigation of the import may be required. Contact Digital Reef Customer Support for guidance.
01021 FILE_NAME_INVALID_CHARACTERS Error
Empty Filename in Disk Image The raw disk image (for example, an L01 disk image) was fully processed (with all possible contents extracted), but it contained at least one file/folder with an empty filename, thereby preventing extraction of the files beneath that point in the path.

If you see this error, forensic investigation may be required to identify empty filenames in the disk image. If necessary, contact Digital Reef Customer Support for guidance.
01023 DISK_IMAGE_EMPTY_FILENAME Error
Partitions Encrypted A disk partition is encrypted (for example, as a BitLocker-encrypted partition).

For a BitLocker encrypted partition, make sure that you have the appropriate key file with the 48-digit key.
01024 PARTITIONS_ENCRYPTED Error

 

Auxiliary Parsing Status Warning

The following table identifies the current auxiliary parsing status Warning. This table is intended to apply to Warnings reported in the auxparsingstatus metadata field. The entry identifies the Display Name shown in the Warnings and Errors summary, a description, the searchable Reason, the Type (Warning or Error), and whether it supports any actions upon drill-through of the entry.

Display Name for Warning Description Searchable Reason (auxparsingstatus field content) Type (Warning or Error) Supports Actions on Drill-through
Modern Attachment Retrieve Warning For the email parent of a Modern Attachment, this warning indicates that a Modern Attachment could not be retrieved during processing, either because the Data Set was imported from a non-Office 365 Connector, or the files at the links are not retrievable with the in-use Connector's credentials. See Extract Office 365 Data for information on how you can drill-through the entry in the Warnings and Errors report and use the Extract Office 365 Data option with an available Office 365 Connector (Exchange or SharePoint). modern_attachment_retrieve_warning Warning Yes
Unzip Fallback Extracted For a record within an archive that could not be processed using 7zip, this warning indicates that the record was successfully extracted from the archive using unzip as a fallback. unzip_fallback_extracted Warning Yes
Unzip Fallback Processed For an archive that could not be processed using 7zip, this warning indicates that the archive was successfully processed using unzip as a fallback. unzip_fallback_processed Warning Yes

 

OCR Errors

The following table provides information about OCR errors for files that did not fully or successfully complete OCR processing. Error codes and reasons are reported in the parsingstatus metadata field; you can search the fields using the 5-digit value, text, or both.

If you see these errors, you can evaluate whether external OCR processing is warranted.

 

OCR Error Displayed in Summary Description Searchable Error Code/Reason
OCR Failure The OCR processing of an image file failed unexpectedly. 01200 OCR_FAILURE
OCR Initialization Failure The OCR software engine could not be initialized (for example, because it is not installed correctly). 01201 OCR_INIT_FAILED
OCR Not Licensed Either the system component does not have a valid license for the OCR software or the license has expired. 01202 OCR_NOT_LICENSED
OCR Image Load Error Indicates an OCR processing error occurred when loading an image. 01203 OCR_LOAD_IMAGE_ERROR
OCR Recognition Error Indicates an OCR processing error occurred during recognition of a page in the image file. 01204 OCR_RECOGNITION_ERROR
OCR Preprocessing Error Indicates an OCR processing error occurred during image preprocessing. 01205 OCR_PREPROCESSING_ERROR
OCR Low Confidence The OCR engine reported low confidence in the correctness of recognized text from the image. 01206 OCR_LOW_CONFIDENCE
OCR Timeout OCR processing of a page in the image file exceeded the timeout value (2 minutes). 01207 OCR_TIMEOUT
OCR Unsupported Image Type The OCR software does not consider the file a supported image file format or image file type. 01209 OCR_IMF_NOTSUP_ERR
OCR Missing TIFF Tag A required TIFF basic tag is missing. 01210 OCR_IMF_TAGMISSING_ERR
OCR Image Compression Error An error occurred in image compression. 01211 OCR_IMF_COMP_ERR
OCR Unknown Image Format The OCR software detected an unknown image format. 01212 OCR_IMF_IMGFORM_ERR
OCR File Format Error The OCR software detected a file format error. 01213 OCR_IMF_FILEFORMAT_ERR
OCR Unsupported Color File The OCR software does not support a Color PCX file. 01214 OCR_IMF_COLOR_ERR
OCR Protected Image The OCR software detected a password-protected image file. 01215 OCR_IMF_PASSWORD_WARN
OCR No Text The OCR software generated a no text warning. 01216 OCR_NO_TXT_WARN
OCR Missing Zone The OCR software could not find the OCR Zone. 01217 OCR_ZONE_NOTFOUND_ERR
OCR Insufficient Memory There was not enough memory during image processing. 01218 OCR_IMG_NOTENOUGHMEMORY_ERR
OCR Invalid Dimensions The OCR software detected invalid rectangle dimensions. 01219 OCR_IMG_RECT_ERR
OCR Unsupported Resolution The OCR software detected an unsupported resolution. The resolution of image files must be between 75 and 600 DPI. The optimum resolution for OCR is 300 DPI. 01220 OCR_IMG_DPI_ERR
OCR Missing Image The OCR software could not find the image. 01221 OCR_IMG_NOTFOUND_ERR
OCR Compressed Image Error The OCR software could not process the compressed image. 01222 OCR_IMG_COMPRESSED_ERR
OCR Bits per Pixel Error The OCR software detected an unsupported bits-per-pixel value. 01223 OCR_IMG_BITSPERPIXEL_ERR
OCR Unsupported Image Size The OCR software detected an unsupported image size. This error occurs when either the height or width of the image file is less than 16 pixels, or when either the height or width exceeds 28 inches (71cm) or 8400 pixels. 01224 OCR_IMG_SIZE_ERR

Custom Warnings and Errors

If a user with permissions has added custom warnings and errors queries in the Index Settings to identify documents meeting that query, and the report has been enabled in the Report Settings, this report will appear on the Reports tab for a Data Set view or for all Imports view. It will identify the name of the custom query as well as the count and size of all documents meeting that custom query. It will display 0 if no documents meet the custom query. For a given custom query, drill-through is supported.

To get more information for the Custom Warnings and Errors report, you can do the following:

  • Click to download an XLSX file with separate tabs that give you detailed information about each type of error. For many errors, you will see count and size information based on the File Type. For the errors Unknown File Type, Read Error, Zero-Length File, and Attachment Save Error, you will see count and size information based on the File Extension instead.
  • Click to get basic information about the count and size information for all errors in the summary.
  • Click Exception Details for the entire report to get more information about the reported errors (exception details). From the scan details, you can download a problem document or filter the exception list by reason, auxparsingstatus, or file type.
  • For a selected entry in the table, click (Document Type Details), which displays Multi-Tab Details. The Document Type Details for the Warnings and Errors report provides two tabs of information (File Type and File Extension) that apply to a selected entry in the Warnings and Errors report (or Custom Warnings and Errors report) for a Data Set. This information is also available to users who have permissions to view report for all Imports.

  • Double-click an entry in the summary to drill through that entry and get a report, and document list, based on a search query associated with that entry. Note that the drill-through searches within reports implicitly disable options such as Include Families and Include Metadata; when you run search queries using the primary searches, these options are enabled by default. Therefore, to match the number of results from a drill-through search on a report, you would need to disable the Include Families and Include Metadata options (for example, using Advanced Search).

OCR Candidates

Note: This chart is visible only if you have the appropriate Permissions. It applies to Data Sets at the Content or Analytic Index level.

For all Imports, the number and size of calculated OCR Candidate files that could be submitted for OCR processing in an effort to extract content from the files. Queries defined in an Organization template and/or in the Index Settings determine the calculation of the OCR Candidates. If these OCR Candidate fields are populated, then OCR processing was not enabled during import (the default).

Total OCR Candidates — The total number and size of all OCR Candidates that are eligible for OCR processing.

No Content PDF — The number and size of no-content PDFs that are eligible for OCR processing.

Tiff — The number and size of TIFF files that are eligible for OCR processing.

Low Content PDF < 5 terms/page — The number of PDF files with a low amount of content (less than 5 terms per page and a longest word value of 0-3 characters) that are eligible for OCR processing.

No-Content Microsoft — The number of no-content Microsoft documents that are eligible for OCR processing.

OCR Failure — The number of OCR processing failures.

To perform OCR processing of selected Candidates:

  1. Double-click to drill-through the appropriate entry in the OCR Candidates chart for this Data Set to get a list of documents you can submit for OCR processing.
  2. Select some or all documents and select OCR to start the OCR Processing.
  3. A Work Basket task enables you to track the processing progress.
  4. When OCR processing is complete, the associated OCR Candidate values in the chart become 0. You can then use the Document Viewer to view the documents that were processed and their content.

For this chart, you can click for download to a CSV with the OCR Candidate information.

OCR Confidence

For any documents in the view that have been subject to OCR processing, this table displays the calculated OCR Confidence Level (a Name or numeric range, the associated document Count, and the associated document Size).

The information in this chart is reported as follows:

  • Not present identifies documents that were not subject to OCR Processing.
  • unknown identifies documents for which the OCR Confidence level could not be determined.
  • Each OCR Confidence Level is represented using a numeric range: 0-10 (Lowest Confidence), 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, and 91-100 (Highest Confidence).

Each letter in each page of a document has an OCR Confidence level, and an average Confidence level is computed based on all pages of a document from which text was extracted. (Pages from which no text was extracted do not contribute to the average.)

The average OCR Confidence level for a document is reported in the ocraverageconfidencelevelfield using a value in the range 0-100. A value of 0 is the lowest confidence level and a value of 100 is the highest confidence level. The lowest confidence level calculated for any page in a document is reported in the ocrlowestconfidencelevel metadata field. The ocrlowestconfidencelevel and ocraverageconfidencelevel fields use a padded 5-digit value (for example, 00010 is Confidence level 10) to support range searching. For example, to search for documents whose average OCR Confidence level is in the inclusive range 20-60, you would specify the following search (using the Standard search syntax):

ocraverageconfidencelevel::[00020~~00060]

If you want to see the average number of terms calculated per page, check the averagenumberoftermsperpage metadata field.

Click to save the OCR Confidence information to a CSV file. When you select this option, you can name the CSV file, and you can select where to save the file locally.

OCR Sources

This chart provides OCR Count information for each imported Data Set, as follows:

  • Data Set — The name of the Data Set
  • Not Yet Processed — The number of OCR candidate documents that have not yet been submitted for OCR processing.
  • Successfully Processed — The number of documents that completed OCR processed successfully.
  • Failed Processing — The number of documents that failed OCR processing.

Note: You can double-click a row for a given Data Set in this report to drill-through and see a list of documents that fall into the Not Yet Processed category (that is, the OCR candidate documents that have not yet been submitted for OCR processing). Note that you cannot perform the drill-through if the Not Yet Processed value is 0.

Document Classification

This chart shows how document classification could reduce the amount of content for review.

The values are based on the sum of all files that are part of Imports (for example, all Data Sets).

Document Classification Information

The Document Classification chart provides key count and size information based on document class.

Important Notes:

Note: For an all Imports or Data Set view, the values shown in this chart will reflect file MD5 deduplication, since the Deduplication Settings under Analytic Settings and calculations based on family membership apply only to views of Data (and initially calculated when data is added to Data by a user with Permissions).

  • For document classes subject to deduplication, the deduplicated count and size values are calculated and displayed according to the appropriate Deduplication setting (under Analytic Index Settings), either the default of Global or Custodial). This setting, which applies for Data and views of Data, determines the processing of email, reporting of de-dupe counts and size, and how email is handled for an Export that includes duplicates.
  • The Document Classification chart title and the title displayed in the downloaded CSV for the chart identify which deduplication setting is being used to calculated the counts (for Dedupe Setting: Global) or (for Dedupe Setting: Custodial). If you change the de-dupe mode after initially generating the report, you must click Update to recalculate based on the new setting.
  • The deduplication is calculated based on document class and family membership (MAG or DAG). For example, if the same Word document serves as a Message Attachment to two different email msgs, it is counted twice, once per message family.

Note: Keep in mind that reports will only be accurate if you keep the Families intact when running searches and saving documents to different views. This means that you must keep the Include Families checkbox enabled to ensure that Families remain intact.

  • The entries in the table now support the double-click drill-through capability.
Document Class Count Size Deduplicated Count Deduplicated Size
Container Files

The number of Container files (Archives, Message Archives, and Disk Images) in the view.

The size (by default, in bytes) of Container files in the view.

Container files are not subject to this deduplication analysis; therefore, this column is empty. Container files are not subject to deduplication.
Directories The number of directories in the view. The size (by default, in bytes) of directories in the view. Directories are not subject to this deduplication analysis. Therefore, for directories, this column will be empty. Directories are not subject to this deduplication analysis. Therefore, this column will be empty.
EDOC OLE Attachments The number of EDoc OLE attachments in the view. The size of EDoc OLE attachments in the view. The count of EDoc OLE attachments in the view after deduplication (based on family membership). The size (for example, in GBytes) of EDoc OLE attachments in the view after deduplication (based on family membership).
Message Attachments The number of message attachments in the view. The size (by default, in bytes) of message attachments in the view. The count of message attachments in the view after deduplication (based on family membership). The size (for example, in GBytes) of message attachments in the view after deduplication (based on family membership).
Messages The number of messages in the view. The size (by default, in bytes) of messages in the view. The count of messages in the view after deduplication. The size (for example, in GBytes) of messages in the view after deduplication.
Message OLE Attachments The number of Message OLE attachments in the view. The size of Message OLE attachments in the view. The count of message OLE attachments in the view after deduplication (based on family membership). The size (for example, in GBytes) of message OLE attachments in the view after deduplication (based on family membership).
NIST EDoc Files The number of EDocs that are NISTClosed The National Institute of Standards and Technology (NIST), which provides the National Software Reference Library (NSRL). The NSRL includes a Reference Data Set of digital signatures for known, traceable software applications. The list is used to identify files with no evidentiary value. Digital Reef provides the NSRL database to support detection of files with signatures (hash codes) matching those in the NSRL database upon import. DeNIST refers to the removal of any file that has a digital signature matching one in the NIST NSRL list. files in the view. The size (by default, in bytes) of EDocs that are NIST files in the view. NIST EDOC files are not subject to this deduplication analysis; therefore, this column is empty. NIST EDOC files are not subject to deduplication; therefore, this column is empty.
Non-NIST EDoc Files The number of non-NISTClosed The National Institute of Standards and Technology (NIST), which provides the National Software Reference Library (NSRL). The NSRL includes a Reference Data Set of digital signatures for known, traceable software applications. The list is used to identify files with no evidentiary value. Digital Reef provides the NSRL database to support detection of files with signatures (hash codes) matching those in the NSRL database upon import. DeNIST refers to the removal of any file that has a digital signature matching one in the NIST NSRL list. EDocs (that is, EDocs that are not NIST files) in the view. The size (by default, in bytes) of non- NIST EDocs in the view. The count of non-NIST EDocs (that is, EDocs that are not NIST files) in the view after deduplication. The size (for example, in GBytes) of non-NIST EDocs (that is, EDocs that are not NIST files) in the view after deduplication.
Total Documents The total number of documents in the view. The total size (in bytes) of documents in the view. Calculated regardless of whether the values for Directories, Container Files, and NIST EDocs are 0. Calculated regardless of whether the values for Directories, Container Files, and NIST EDocs are 0.

About the Downloaded Document Class CSV File

You can name and download the Document Classification table as a CSV file. The file contains additional entries, including a top entry for Total Documents. The CSV entries use the document class name, where applicable. Entries for directories, archives, message archives, and disk images will not show counts in views of Data as long as the uses the default set of Exclusion Searches. However, for the Imports view, or for a Data Set Scan Report view, these entries will report counts.

Note: All entries in the CSV file report Count and Size values. For document classes that support deduplicated count and size values, you will also see values populated in columns representing the Deduplicated Count and Deduplicated Size.

  • Total_Documents — The total document count and size for the view.
  • EDoc — The count and size of EDocs in the view.
  • Non-NIST_EDoc — The count and size of non-NIST EDocs in the view. For this entry, the Deduplicated Count and Deduplicated Size columns report the count and size of non-NIST EDocs in the view after deduplication.
  • EDoc_OLE_Attachment — The count and size of EDoc OLE Attachments in the view. For this entry, the Deduplicated Count and Deduplicated Size columns report the count and size of EDoc OLE Attachments in the view after deduplication (based on family membership).
  • Message — The count and size of Message Families (MAGs) in the view. For this entry, the Deduplicated Count and Deduplicated Size columns report the count and size of messages (Message Families, or MAGs) in the view after deduplication (based on family membership).
  • Non-NIST_Message_Attachment — The count and size of non-NISTClosed The National Institute of Standards and Technology (NIST), which provides the National Software Reference Library (NSRL). The NSRL includes a Reference Data Set of digital signatures for known, traceable software applications. The list is used to identify files with no evidentiary value. Digital Reef provides the NSRL database to support detection of files with signatures (hash codes) matching those in the NSRL database upon import. DeNIST refers to the removal of any file that has a digital signature matching one in the NIST NSRL list. Message Attachments in the view.
  • Message_Attachment — The count and size of all Message Attachments in the view. For this entry, the Deduplicated Count and Deduplicated Size columns report the count and size of Message Attachments in the view after deduplication (based on family membership).
  • Directory — The count and size of directories in the view (for example, the Imports view or a Data Set Scan Report view).
  • NIST_EDoc — The count and size of EDocs that are NIST files in the view.
  • Message_OLE_Attachment — The count and size of Message OLE Attachments in the view. For this entry, the Deduplicated Count and Deduplicated Size columns report the count and size of unique Message OLE Attachments in the view after deduplication (based on family membership).
  • Archive — The count and size of files that are File Archives or in compressed format in the view (for example, a GZIP, ZIP, RAR, or TAR file found on disk). This entry applies to a view that includes archive files (for example, the Imports view or a Data Set Scan Report view).
  • Message_Archive — The count and size of Message Archives in the view (for example, the Imports view or a Data Set Scan Report view).
  • NIST_Message_Attachment — The count and size of Message Attachments that are NIST files in the view.
  • Disk_Image — The count and size of Disk Images in a view that includes disk images (for example, the Imports view or a Data Set Scan Report view).

Note that NIST information is reported in the kftdesc metadata field.

Billing Summary

This report provides information about the total included file types as well as the excluded file types in the defined File Type Exclusion Groups.

About the Billing Summary

The Billing Summary provides key count and size information based on the included and excluded File Types. By default, the following File Type Exclusion Groups contain File Types that should be excluded from billing:

  • Compressed Types
  • Disk Image Types
  • Email Archive Types
  • File Archive Types

Note: The Billing Summary does not support drill-through.

You can configure the Exclusion Groups and file types that form the Billing Report information at the Project, Organization, or System level (for System Users in a System-level role with the appropriate permissions):

  • Project Billing Reports
  • Organization Billing Reports Template
  • System-level Billing Reports Template

See Container Files for a list of the File Types included in each Container Files category.

See Supported Files for a list of the Digital Reef Supported File Types.

The Billing Summary appears in the Reports tab for the Imports, Data Set, and Project Data views and is scoped to the appropriate view:

Total Included Types — The total number of files subject to billing based on the included File Types.

  • Count (default sort column) — For the appropriate view (Imports, Data Set, or Project Data), the total number of files subject to billing based on the included File Types for the appropriate view (Imports, Data Set, or Project Data).
  • Size — For the appropriate view (Imports, Data Set, or Project Data), the total size (in GB or MB) of the files subject to billing based on the included File Types for the appropriate view (Imports, Data Set, or Project Data).

Total Excluded Types — The total number of files that will be excluded from billing based on the File Types identified in each Exclusion Group.

  • Count (default sort column) — For the appropriate view (Imports, Data Set, or Project Data), the total number of files excluded from billing, as determined by the excluded File Types in the Exclusion Groups.
  • Size —For the appropriate view (Imports, Data Set, or Project Data), the total size (in GB or MB) of the files excluded from billing, as determined by the excluded File Types in the Exclusion Groups.

  • <Exclusion Group List> — Each Exclusion Group, in alphabetical order, as defined in the Project Billing Reports or Billing Reports template. There are four default Exclusion Groups initially defined for a Project, but you can add your own and select more file types to exclude.

    • Count (default sort column) — The number files associated with a given File Type Exclusion Group in the appropriate view.
    • Size — The size (in GB or MB) of the files associated with a given File Type Exclusion Group in the appropriate view.

For this report, the following buttons are available:

  • Download button — By default, downloads the summary information and the details in an XLSX file, with a tab for the summary information and a tab for the detailed information.
  • — Displays File Type Exclusion Group Details, which includes a list of the file types within each Exclusion Group, along with the document counts and size values.

Note: You can control whether the Billing Summary appears for the Imports, Data Set, and Project Data views from the Project Reports or a Reports template. By default, this report is enabled to appear in the Imports, Data Set, and Project Data views. Note also that the Billing Summary uses the Document Types report.

Document Class

By default, this chart shows you the information based By Size (descending order) in GB, MB, or KB, depending on the size of the data.

What you see for document classes depends on the population of data for your selected view. The document classes are as follows (in display format, not the official search format):

Note: When you search for a document class using the docclass metadata field (which is not case-sensitive, but not tokenized), you must either specify the entire name of the class (for example, Message_Attachment, Message_OLE_Attachment, eDoc_OLE_Attachment, Message_Archive) or use wildcards.

  • EDoc – The total number and size of files imported that are not an email, not from an email, and not any type of archive (for example, not a file or email archive). These are files that do not fall into any of the following other document classes: emails, email archives (containers) such as PST, OST, and NSF, file archives (compressed files such as ZIP), or disk images. A Word document on disk is an EDoc, as is an Excel document found in a ZIP file at the import location, or a Word document that has an embedded email.
  • Message – The total number or size of email messages (but not their attachments). An email file on disk or an email file found in a ZIP file at the import location falls into this category. Email attachments and archive container files such as PST, OST, and NSF are not counted in this category.
  • Message Attachment – The total number or size of all imported email attachments. Examples include an image file or Word document attached to an email, or an archive such as a ZIP file attached to an email.
  • Message OLE Attachment – The total number or size of all files embedded within a Message_Attachment (or another Message_OLE_Attachment). These embedded files are extracted during import. You can drill through this entry to see a list of documents that were embedded within a Message_Attachment (or another Message_OLE_Attachment). An example of this document class is a document embedded within a Word document that is attached to an email.
  • EDoc OLE Attachment – The total number and size of all files that were embedded within an EDoc (or another EDoc_OLE_Attachment). These embedded files are extracted during import. You can drill through this entry to see a list of documents that were embedded within an EDoc (or another EDoc_OLE_Attachment). Examples include a Word document within another Word document, or even an email embedded within a Word document.
  • Message Archive – The total number or size of documents that are Email Archives (email container files such as PST, OST, NSF found on disk). By default, email archives are excluded from Data by an Exclusion Search.
  • Archive – The total number or size of documents that are file archives found on disk or in compressed format (for example, a GZIP, ZIP, RAR, or TAR file found on disk). By default, compressed and file archive types are excluded from Data by an Exclusion Search.
  • Disk Image – The total number or size of documents that are disk images, such as an Expert Witness Compression Format File (for example, for EnCase and SMART). By default, disk images are excluded from Data by an Exclusion Search.
  • Directory – The total number and size of directories present for the imported data (empty, populated, skipped, or with access errors). By default, note that directories are excluded from Data by an Exclusion Search.

— If you have permissions, you can optionally download the document class report information to a CSV file (by default, DigitalReefReport.csv). When you select this option, you can name the CSV file, and you can select where to save the file locally. The CSV also provides a Directory column with the total number and size of any directories present for the imported data (empty, populated, or with access errors).

Document Types

By default, this report shows you the information based By Size (descending order) in GB, MB, or KB, depending on the size of the data.

Note: The Document Types report provides information to the Billing Summary.

The document types are categorized as follows:

  • Disk Images – The total number or size of documents that are disk images, such as a Logical Evidence File (LEF) or Expert Witness Compression Format File (for example, for EnCase, an E01). See Container Files for a complete list of disk image types. By default, disk images are excluded from Data by an Exclusion Search.
  • Email Archives – The total number or size of documents that are email archives (email container files such as PST, OST, NSF found on disk). See Container Files for a complete list of email archive types. By default, email archives are excluded from Data by an Exclusion Search.
  • Email Messages – The total number or size of all email documents, including loose emails (such as msg or eml files), emails from an email archive, or email attachments. Documents that are not identified as emails, such as email archives (email container files), are not counted in this category.
  • File Archives – The total number or size of documents in that have a compressed type or file archive type (for example, GZIP, ZIP, RAR, and TAR). See Container Files for a complete list of compressed types and file archive types. By default, compressed and file archive types are excluded from Data by an Exclusion Search.
  • Images – The total number and size, in MBytes (MB), of files identified as image files (supported image types, such as PNG, JPEG, and TIFF). For a list of supported file types, see Supported File Types for Analysis.
  • Office Files – The total number or size of Microsoft Office documents (including Microsoft Office supported file types and versions, such as Microsoft Word, Excel, PowerPoint, Write, and Works). For a list of supported file types, see Supported File Types for Analysis.
  • PDF – The total number and size of documents in Adobe Acrobat (PDF), Adobe Indesign, or PDF Image format.
  • Other – The total number or size of documents that do not fall into any of the other categories (for example, a Text 7-Bit File, Internet HTML files, and directories (by default, directories are excluded from Data by an Exclusion Search).
  • Unknown – The total number or size of documents that are of a type not recognized by the system (for example, the Unknown format file type).

— If you have permissions, you can optionally download the detailed document type report information (the file types) to a CSV file (FILETYTPE.csv). When you select this option, you are prompted to confirm or name the CSV file, and you can select the directory to which you save the file.

  • The File Type tab provides a list of the official file types, such as Internet HTML or Adobe Acrobat (PDF). Clicking will download a FILETYTPE.csv with the information.
  • The File Extension tab provides a list of the extensions for the files (for example, txt, pdf, and docx). For text/plain and unknown file types, the file extension is the actual extension of the file; for all other file types, the file extension is the standard extension associated with that file type. Not present represents files for which there is no discernible extension (for example, a directory does not have an extension). A blank entry indicates that the file had an empty extension (for example, just a space). Clicking will download a DOCEXT.csv with the information.
  • The Exceptions tab provides a list of extension exceptions to provide information about what would be affected if you decide to change the current file extension to the recommended extension. This chart provides columns for the Current extension, the Recommended extension, as well as the Count and Size. The default sort order is by Count. Clicking for this tab will download a DOCEXT_CONFLICT.csv with the information.

Custodian Directories

If a value for the Custodian Directory Location is included in the Index Settings (such as a value of 1 for the first position), this chart shows the By Size or By Count information for each Custodian Directory found in that position at each Data Set import location. This chart will report Not Present when no Custodian information is discovered (for example, because no Custodian Directory Location was specified in the Index Settings). Note that Imports and Data Set views include this report; Project Data-based views report the Custodians report instead.

Click for download to a CSV with the Custodian Directory information.

For this chart, you can click for Details about the document size and count information for each Custodian Directory.

Note: The Custodian Directory charts are based on Custodian Directory staging and are intended to provide information about the imported Custodian data for billing purposes. These charts have no ongoing relationship with the Custodians added to Project Data (that is, the charts will not reflect any assignments or other changes made to Custodians in Project Data).

Sources

This chart shows the By Size or By Count of each source of data that was imported into the Project. This chart focuses on the top five Data Areas.

Click for download to a CSV with the source data information.

For this chart, you can click for Details about the document size and count information for each source of data.

Sending Domains

This chart shows the top 5 sending Domains associated with the All Imports view, based on the number of email messages sent from each Domain.

Click for download to a CSV with the Sending Domain information.

For this chart, you can click for Details about the document size and count information for each Sending Domain.

Receiving Domains

This chart shows the top 5 receiving Domains associated with the All Imports view, based on the number of email messages received by each Domain.

Click for download to a CSV with the Receiving Domain information.

For this chart, you can click for Details about the document size and count information for each Receiving Domain.

Dominant Languages

This chart shows the document count by dominant language (using the standard ISO 639-1 code for the language, such as en for English). The language-related charts apply if language detection was enabled at the time of import (for each Data Set under Imports).

Document count by dominant language is reported as follows:

  • The chart displays each language (Top 10) using its language code (many are two letters).
  • You can hover over an entry in the chart to see the language name and the total document count and size for a given language code.
  • If a document has multiple languages, the document will be counted for the dominant language only (instead of counted for each language detected).
  • unknown identifies documents for which the language could not be determined.
  • Not present identifies documents that were not subject to Language Detection at import, either because the feature was disabled at import for some of the data, or the documents did not have content, were identified as binary files such as images (when OCR processing is disabled at import), or were not parsed successfully.
  • Click for download to a CSV with the dominant language information. The download CSV provides the full language names.
  • Click for Details about the document count by dominant language.

Languages

This chart shows the document count per language (using the standard ISO 639-1 code for the language, such as en for English). These charts apply if language detection was enabled at the time of import (for each Data Set under Imports).

Document count by language is reported as follows:

  • The chart displays each language (Top 10) using its language code (many are two letters).
  • You can hover over an entry in the chart to see the language name and the total document count and size for a given language code.
  • If a document has multiple languages, the document will be counted for each language detected.
  • unknown identifies documents for which the language could not be determined.
  • Not present identifies documents that were not subject to Language Detection at import, either because the feature was disabled at import for some of the data, or the documents did not have content, were identified as binary files such as images (when OCR processing is disabled at import), or were not parsed successfully.
  • Click for download to a CSV with the language information. The download CSV provides the full language names.
  • Click for Details about the document count by language.

See Supported Languages for Language Detection for a list of languages that can be detected when language detection is enabled, along with their codes.

Email Sent Date and Email Received Date

The Email Sent Date and Email Received Date reports enable you to see the volume of files associated with a range of email sent or received dates. This can help you make decisions about emails that need to be reviewed more carefully based on the date they were sent or received and how much email was involved (for example, you may focus on a large volumes of emails sent 9 months ago).

Note: The document count displayed for a given entry (bar) in the Email Sent Date report or Email Received Date report represents one document per family (the parent message for each family). For these reports, when you double-click to perform a drill-through on a given bar, the drill-through search will include each document in each family matching the date range (both messages and attachments) and will have a higher count than the original entry. The document count displayed for a given entry (bar) in the Project Data-based Date report is calculated differently, and includes each document in each family (messages and attachments). Therefore, a drill-through of an entry in the Date report will have the same count as the original entry.

Email Sent Date and Email Received Date Options

Both the Email Sent Date and Email Received reports provide a number of options that enable you to work with start and end ranges. You can either use the start and end dates in effect when you initially view the report (which is derived from the earliest and latest dates for the view), or you can specify your own start and end dates.

On the left:

  • – Enables you to type a start date in the box or click the Calendar icon to use a calendar to specify zero-day email sent or email received date criteria.
  • – Enables you to type an end date in the box or click the Calendar icon to use a calendar to specify zero-day email sent or email received date criteria.
    1. To use the Calendar, click the icon to the right of the Start or End box.
  1. Click the date you want, either the current date or the date you typed (highlighted for you), or select another day in the current month.
  2. Click the left or right arrows in the top corners on either side of the month name to move back and forward a month.
  3. Click the month and year in the center, and then use the arrows to go back and forward a year. You can also select another month in the year shown.
  4. Once you make a complete date selection, the Calendar closes and you see the date formatted properly in the box.
  • – Click this button to have the report reset to the originally displayed date range (the default start and end date for the report).

On the right:

  • Previous Period – Moves the histogram information to the previous period. The period of time is dictated by your Start and End zero-day dates.
  • Next Period – Moves the histogram information to the next period. The period of time is dictated by your Start and End dates.

Chart Download, Zero Day Details, and Details Options

If you hover over a date block in the Email Sent Date or Email Received Date histogram, you will get a summary of the information for the block. For example, the hover text for the bar of a 2008 entry in the Email Sent histogram might display 2008: 113. A tooltip tells you that you can click to drill down or double-click to drill through. If you click once on a sent or received date block to drill down, you can get more date information for that sent or received date block. If you double-click to drill through an item in the report, the software performs a drill-through search. You can perform the drill-through search at any drill-down level. Also, from either the Email Sent Date or Email Received Date histogram, you can click the following:

  • — Enables you to download all available information to a CSV. The CSV always contains all available data (the data initially shown in the chart based on the earliest and latest dates found in the view). The CSV content does not change based on your selected start date or end date.
  • — Displays a Zero Day Details popup that enables you to select and view zero-day dates for which there was no document processed). You have the option hide weekend days.

Email Addresses Sent and Email Addresses Received

The Email Addresses Sent and Received reports enable you to see the top 10 email addresses associated with sent and received email. This can help you make decisions about emails that need to be reviewed more carefully based on the email addresses.

An Email Address Sent entry is based on the from and sender metadata fields.

An Email Address Received is based on the to, bcc, and cc fields.

From either the Email Addresses Sent or Email Addresses Received report, you can click the following:

  • — Enables you to download all available Email Address Sent or Received information to an XLSX file. The download file always contains all available data. By default, the file is called DigitalReefReport.xlsx, but you can select the appropriate name when you save the file.
  • — Displays with more information about the email addresses.