How to Export to a Database

This topic describes how to perform an Export to an MS SQL database. It is assumed that you have an MS SQL database configured and available within your network configuration.

Here are the main steps you must perform to establish an Export to an MS SQL database:

  1. Configure an Organization Export Database.
  2. Select the Export Database for the Project
  3. Select the Metadata Fields for Export.
  4. Set Up an Export to the Database.

The following sections describe these steps in more detail. Additional sections address the following:

Configure an Organization Export Database

The first setup step involves configuration of an MS SQL Export Database for the Organization. You can set up multiple MS SQL database connections available to support export load file information from Export Streams in the Organization Projects.

Note: You must have permissions to set up an MS SQL Export Database using Organization Settings > Export Databases.

  1. Navigate to Organization Settings > Export Databases. This area is intended to list all available database entries and enables you to add, edit, and delete existing entries.
  2. Click the top-level New Database option.
  3. For Name (required), specify the Digital Reef name for the database, which must be unique in the Organization. This is the name that appears in the list of available database connections for a Project (using the Project Export Settings).
  4. For Description (optional), you can specify a description that helps identify the database connection.
  5. Note the Type field, which is fixed at MS SQL.
  6. For IP (required), specify the IP address or fully qualified (or DNS-resolvable) hostname of the target database.
  7. For Port (optional), you can either have the software use the default well-known port for the database type, or specify another port that is appropriate based on the configuration of the database. For the MS SQL. Type, the well-known port is 1433.
  8. For Database Name(required), specify the actual database name.
  9. For Username (required), specify the username that the system will use to authenticate the database connection.
  10. For Password (required), specify the password needed to authenticate the database connection.
  11. Clicking the Validate button enables you to test and verify the connection to the database after you have specified all the information needed. Success is confirmed or specific failures are reported to you. A failure to validate the connectivity should be investigated before export, but does not prevent the database from being defined.
  12. Click OK when you are ready to create or edit the database. If you navigate away before applying your changes, you will be prompted to either confirm or cancel the operation.

Editing an Export Database Entry

After initial configuration, you can click Edit to modify a selected Export database entry. When editing an Export database entry, you see the current values for the database (with the exception of the password, which is represented by asterisks). Fields that you can edit are active; fields unavailable for edit are grayed out.

Deleting an Export Database Entry

You can delete an Export database entry as long as it is not the active database for any Project in the Organization. For example, you can delete the entry for a newly defined Export database that has not yet been selected as the active database for any Project. In this case, you can click Delete to delete the database entry, and a popup will ask you to confirm the deletion.

If the database has been selected as the active database for any Project in the Organization, clicking Delete will generate an error.

Select the Export Database for the Project

Although you can configure multiple MS SQL Export Databases for the Organization, you can have only one active database for a given Project. You select the active Database in the Project Export Settings. There is no default; you must explicitly select which Database you want to use as the active Database for the Project.

  1. Navigate to Project Settings > Export Settings.
  2. In the Other section, use the Export Database drop-down box to select the MS SQL database connection you want to serve as the active connection in the Project. Initially, the Connection drop-down box is blank, requiring selection of an available database. If you do not see any available databases in the list, you must first configure an Export database for the Organization (as described in the previous section).
  3. Click Save to save your changes or click Discard Changes to cancel them.

Once you have selected the active database for a Project and performed an Export to a database, you can select another available database to use for subsequent Exports, but do NOT set the Export Database Connection to blank (null). Doing so will cause all subsequent Exports of any database-enabled Export Stream to fail.

Select the Metadata Fields for Export

As part of the setup procedures prior to Export, you should decide which metadata fields you want to have populated in the selected Export load files and in the MS SQL database table (in the schema’s doc_xml field).

You can select metadata fields in either an Export Fields template at the Project level or in an Export Fields template at the Organization level . At the Project level, you can use the default System Created template, which initially has all of the fields, or you can add your own Export Fields template with only the fields you want. A new Project Export Fields template will be empty, allowing you to select available fields and use drag-and-drop to add fields to the list of Export Metadata fields. Remember to save your field changes.

Note: Your Export Fields template should include the DocID field and must include the docnum field. The Export will fail if the docnum field cannot be populated. Renamed fields are not recognized in the database, and field reordering is not recognized.

Set Up an Export to the Database

After you have established the database at the Organization and Project level and set up your Export Fields template, you can set up an Export to include export to the Export Database that has been set for the Project.

Exporting to a database does not preclude the export of files and other load files to an export location; it is an additional type of export to populate a database with load file information. In general, an Export requires a configured read/write Connector and Export Data Area to export data from the Project.

  1. Navigate to Exports > > New Export Stream.
  2. In the eDiscovery Export dialog that appears, specify the basic Export information, which includes the following:
  • Export Name
  • the Export field template to use (your Project Export Fields Profile or Export template)
  • the Export Location (a configured Export Data Area)
  • the Documents to Export criteria
  1. Select your desired Family Options and Duplicates Processing options.
  2. Select your Output Format Options. By default, DAT is selected, but CSV is generally useful for viewing load file information. You can specify as many formats as you want.
  3. In the Output Optionssection, select Export to DB if you want the Export load file information to an established MS SQL database. This option is cleared by default and is only selectable for the initial Export of the Export Stream. If you do not have a Database connection configured, selecting this option will display an error message in red (Select Database Connection in Project Settings > Export Settings). To fix this error, you must make sure that you configure an Export DB in the Organization Settings, and then select the appropriate Export DB Connection in the Project Export Settings. (Both of these procedures are described in the previous sections.) Once you have this in place, specify a Table Name, as follows:
  • Table Name <Table Name> — When you select Export to DB, the Table Name option appears. . You can either use the default Table Name for the Export Stream (using the format DR_<casename>_<streamname>), or you can override that name and specify your own. The table name can be a maximum of 128 characters and can include alphanumeric characters (letters and numbers), and underscores. The table name is validated. As long as the name is valid, the Digital Reef software then creates a table with that name (if it is not already present) to contain the document information. For subsequent Volume Exports, the last table name used for the Export Stream appears, but you can specify another table name. Note that two additional tables are also generated, once per database, to provide information about Export Settings and status for each produced Volume.
  1. In the Output Options section, select Include Full Text if you want to include the text from produced text files. This option is available for any Export of an Export Stream and it applies to DAT and EDRM load files, as well as an export to an MS SQL database. For an MS SQL database, the text populated in the table’s extracted_text field can be up to 2 GB per document. The 2 GB limit applies to R4.3.8.0; for 4.3.5 releases, the limit is 1 GB. If a file exceeds the limit, then the extracted_text field will be empty and the table’s text_link field will provide a reference to the extracted text file on disk. (The text_link field is not populated unless the limit is exceeded.)
  2. In the Production Settings section, select the production formats and other production settings that you want to include in the Export. Be sure to include Extracted Text if you have selected the Include Full Text option. (If you do not select Extracted Text, the extracted_text field will not be populated at all.)
  3. Click Run Export when you are ready to start the Export process. You can monitor the progress of the Export task in the Work Basket. At the selected Export Location for a successful Export, you can see the exported items and generated files, such as load files, settings files, production reports, and exceptions. For a successful Export, the configured database table will contain the populated records.

What to Do if the Export Fails

If a database connection fails during Export, or if there is an error populating the records in the database, the entire Export will fail. Any partially inserted records added during the Export process will be removed.

In the Work Basket, you will see a failure for the Export task. You can click the Failure icon in the Actions column of the Work Basket to get information about the failure, and you can right-click the Export task and select View Details to get key information, such as the Request Identifier for the task.

In general, check database connectivity and make sure you have all the required items configured to support a database Export. If necessary, consult your System Administrator.

After the failure, the Export will remain in a Staged state. When the error has been resolved, you can select the Volume and click Export to perform the Export. This will export the Staged volume.

MS SQL Schema

This section provides the MS SQL schema used by the Digital Reef software. When the user selects the Export to DB option in the Export dialog and a Table Name, the Digital Reef software populates a table with document information, as described in the table below. Two additional tables are also generated, once per database, to provide information about Export Settings and status for each produced Volume. These additional tables also appear below.

At least one Export database (an MS SQL database on an MS SQL server) must be configured for the Organization and then one active connection must be selected for the Project.

Note: The Null/Not Null column indicates whether or not a particular field can be Null. A value of Not null indicates that a field cannot be Null.

MS SQL Database Schema: Document Information Table

Column Name Data Type Maximum Length Allows Null Entries? Description
id bigint 19 No A value representing an entry in the table.
doc_id nvarchar 128 No The unique Doc ID assigned to a document for export when document-level numbering is used for the export. Sample Doc ID: DOC0000000011
doc_num nvarchar 128 No The unique Document Number assigned to the document at import. This is a three-part number representing the Document Number in the format C.V.N, where C =A Data Collection (Data Set) number, unique per Organization, V =A Data Collection Checkpoint Value, unique per Data Collection, and N = A document number, unique within the Data Collection Checkpoint.
insert_time datetime 23 No The date and time that the database record was populated.
vol_id nvarchar 128 No The Volume name (for example, VOL0001).
export_stream_handle nvarchar 128 No The handle value representing the Export Stream.
export_stream_name nvarchar 128 No The name assigned to the Export Stream.
native_link nvarchar 4000 Yes For export, the path to the renamed Native file (with a DocID). This field by default uses a prefix of DR\. The path is Windows-compatible (using \ to separate folders). The format of this field is configurable in the Export dialog, so it may contain the full path of the Volume directory, including a base path, Volume label and Volume #, document ID prefix, and starting ID (using the appropriate Pad size), and then the file extension. Example: DR\00001\DOC0000000026.doc
text_link nvarchar 4000 Yes If the extracted_text limit has been exceeded for a file, the text_link field provides the path to a produced text file instead. This field by default uses a prefix of DR\. The path is Windows-compatible (using \ to separate folders), and the text file always ends in .txt (e.g., DR\00001\ DOC00000001.txt). The format of this field is configurable in the Export dialog, so it may contain the full path of the Volume directory, including a base path, Volume label and Volume #, document ID prefix, and starting ID (using the appropriate Pad size), and then the file extension.
extracted_text nvarchar

2,147,483,647

maximum of 2 GB; if exceeded, text_link is used instead

Yes If the Export has the Include Text Load File option and Extracted Text Production Settings option set, this field can contain up to 2 GB of text in 4.3.8 (1 GB of text in 4.3.5) from a document. If a file exceeds the limit, then the extracted_text field will be empty and the text_link field will provide a reference to the extracted text file on disk. (The text_link field is not populated unless the limit is exceeded.)
doc_xml xml 2,147,483,647 Yes The XML representation of the document’s metadata. Note that this representation only includes the fields defined in the Export Fields Profile or template selected for the Export.
relationship_xml xml 2,147,483,647 Yes The XML representation of the relationship information for a document (for example, attachments and near duplicates).

 

MS SQL Database Schema: Generated DR Export Settings Table

Column Name Data Type Maximum Length Allows Null Entries? Description
vol_id nvarchar 128 No The Volume name (for example, VOL0001).
export_stream_handle nvarchar 128 No The handle value representing the Export Stream.
dr_version nvarchar 50 Yes The DR software version, which includes the base release number (for example, R4.3.8.0) followed by a hyphen and a GUID.
general_information xml 2,147,483,647 Yes The general information for an Export (which includes the Organization Name, Project Name, Project GUID, Metadata Template information, and the Export Data Area handle).
documents_to_export xml 2,147,483,647 Yes The Documents to Export information (which includes the option to export All Documents, or to export documents based on selected Tags and/or a typed query using the current Case search syntax setting).
source_view_handle nvarchar 50 Yes This field is not relevant to eDiscovery Export in this release. It is reserved for future use.
source_view_name nvarchar 255 Yes This field is not relevant to eDiscovery Export in this release. It is reserved for future use.
output_format_options xml 2,147,483,647 Yes The Output Format information for an Export (which includes the user-selected formats such as DAT and CSV, the Time Zone, Date/Time format, and Custom Postprocessing information for Export Volume encryption, if included).
load_file_options xml 2,147,483,647 Yes The Load File Options selected for an Export (which includes the Export to DB option, as well as the options Include Text, Separate Duplicates, Duplicate Overlays, ThreadGroup includes Attachments, Max Records, BegAttach starts with, DAT Encoding, and Unit of measure).
family_options xml 2,147,483,647 Yes The Family Options selected for an Export (which include the selection of Associated Family Files versus Tagged Content Only and Associated Threads, Separate Email Attachments with the selected email format, Separate OLE Attachments, Include Container Reference, and Remove Attached Archives).
duplicates_processing xml 2,147,483,647 Yes The Duplicates Processing options selected for an Export (which include the selection of Remove Duplicates from Export versus No Duplicate Removal and Remove Duplicates from Export and Load File, as well as Near-Duplicate Processing options).
production_settings xml 2,147,483,647 Yes The Production Settings selected for an Export (which include the selected path information, numbering, and selected output formats, such as native files, text files, and PDFs..
include_metadata bit 1 Yes When Custodian Reports are generated based on supplied queries, reflects whether the Include Metadata option applies to the queries. By default this option is cleared.
custodian_report_queries xml 2,147,483,647 Yes The queries supplied to generate Custodian Reports.
load_status int 10 Yes

A numeric value representing the status.

  • A value of 1 corresponds to Export started.
  • A value of 2 corresponds to Export finished.
  • A value of 3 corresponds to Export failed.
expected bigint 19   The number of documents expected to be exported.
exported bigint 19   The number of documents actually exported. If this count does not match the expected count, the Export will fail with a status of Export failed (value 3).

MS SQL Database Schema: Status Table

Column Name Data Type Maximum Length Allows Null Entries? Description
status int 10 No

A numeric value representing the status.

  • A value of 1 corresponds to Export started.
  • A value of 2 corresponds to Export finished.
  • A value of 3 corresponds to Export failed.
description nvarchar 255 Yes

A description of the reported status:

  • Export started
  • Export finished
  • Export failed
is_complete bit 1 Yes

Indicates whether or not the Export is finished.

  • A value of 1 indicates that the Export is complete.
  • A value of 0 indicates that the Export is not complete.