Generate a New Load File for an eDiscovery Export Stream or Volume

Exports > eDiscovery Exports > Selected Export Stream or Volume > Generate Load File...

Requires Exports - View Permissions to view Exports, and Add/Edit Permissions to make selections and save or discard changes.

Users in a role with permissions can use the Generate Load File option from the Documents tab for an eDiscovery Export Stream or Volumeto generate a new load file or new version of an existing load file based on the following:

  • A change to the Export Location.
  • A change to the load file formats (note that SQL DB does not apply to this operation).
  • A change to the time zone, or date/time formats.
  • A change in mapping used for an Export Volume or entire Stream, to generate a new version of an existing load file.
  • A change to load file options, such as Include Full Text, Separate Duplicates, Duplicate Overlays, ThreadGroup includes Attachments, Max Records per File, and BegAttach starts with options and DAT File Encoding.
  • Last update dates, so you can generate a filtered load file that contains only the last updated entries associated with update dates you select.

A common use case is changing the load file format. For example, you may have produced a series of Volumes for a given export and want to change the load file format from EDRM XML to DAT because there was a change in client counsel and the new counsel want the load files to be produced in DAT format. You can select the Volume, select the toolbar option to generate a new load file, select the new format, and submit the request. A new DAT appears in the same location as the previous XML file.

Note: If you generate a new version of an existing load file, the new version will reside under the appropriate Volume directory at the Export Area and will include a timestamp in the filename that enables you to track versions (for example, VOL0001-20130809171945.csv). The timestamp is the file system time for the operation. The original version is preserved (for example, VOL0001.csv). If you use Update Export to replay Volumes, you will also see update load files, which include the word update in the filename (for example, VOL0001-update-20130809201314). Generating a new load file also generates a settings file with a timestamp. This settings file does not reflect any changes you make to the generate load file options; it continues to reflect the settings (on the Settings tab) for the associated volume export (that is, what is persisted in the database).

Generate Load File Options

You can specify the following options:

  • Export Data Area (required)Select an available Export Location from the drop-down list (alphabetical order). These export mounts (export data areas) are managed as Organization Settings. Be sure to negotiate access to this location to retrieve files after export. You can always revisit the selection of an export location for each export operation.
  • Export Fields <list> You can use the Export Fields template that is set as the default for the Project, or you can use the drop-down list to select another available Project-level Export Fields template, an Organization-level template, a System-level template, if available based on your permissions. Note that listing a System-level Export Fields template requires System-level View permission for the System Export Fields. (This means you must be a System User in a role with at least View permission to see a list of System templates for Export Fields.) You can also create an Export Fields template based on your permissions.

Analytic Processing

  • ThreadGroup includes Attachments(cleared by default) — This option is available for any Export of an Export Stream. When selected, it enables the ThreadGroup fields (ThreadGroupID, ThreadGroupIndent, and ThreadGroupSort) to be populated for attachments that are part of the Thread Group (and exported separately using the Separate Email Attachments and Separate OLE Attachments options). In the hierarchy reported in the ThreadGroupSort field, the attachments will be with their associated parent message in the appropriate position. By default, the ThreadGroup fields are not populated for any attachments that are part of the Thread Group.

Output Options

Most Output Format options are available at any Export of an Export Stream:

  • DAT, LST, DII, CSV, EDRM XML– For any Export of an Export Stream, you can select any combination of the Output Formats (except SQL DB, which is currently disabled for this operation because it does not apply to generating a load file, and is selectable only at initial export). For other formats, you have the option to clear the formats and export only files, which may be suitable in some cases.
    • DAT (default) — Exports tagged and associate files, or all files, from a view with support for the LexisNexis Concordance® format for eDiscovery. This export type provides a .DAT file.  A Concordance DAT file provides all Digital Reef metadata fields subject to export. The Metadata List has more information.
    • LST — Exports tagged files or all files from a view to a kCua Relativity LST file. The LST file includes a small number of pertinent metadata fields such as DocID and TextLink.
    • DII — Exports tagged files or all files from a view to a CT Summation Document Image Information (DII) file. This file contains the Digital Reef fields mapped to standard DII tokens (for example, bcc becomes @BCC). User-selected fields and fields that do not have standard mappings are identified by a custom token, as @C.
    • CSV — Exports tagged files or all files from a view with a comma separated value (CSV) file serving as a manifest of files. The CSV file includes all Digital Reef metadata fields subject to export.
    • EDRM XML — Exports tagged files, or all files, from a view with support for the Electronic Discovery Reference Model (EDRMClosed The Electronic Discovery Reference Model defines a standard for eDiscovery products and services so that data can be easily exchanged between organizations and eDiscovery products. The supported version is currently 1.1.). With this export type, an XML file contains EDRM metadata as well as all Digital Reef metadata subject to export. The availability of the EDRM XML file enables a EDRM-compliant, third-party application to import the exported files for further analysis.
    • SQL DB (disabled for this operation) — This format is currently not available for Generate Load File. It is selectable only for the initial Export of an Export Stream to export load file information to an established MS SQL database.
    •  If you do not have a Database connection configured for the initial Export, you will see an Export dialog error message in red telling you to No database connection has been selected in the "Export Database" dropdown in Project Settings > Export Settings. To fix this error, you must make sure that you configure an Export DB in the Organization Settings, and then select the appropriate database connection in the Project. You can configure one or more MS SQL Export Databases for the Organization, but you can configure only one active database for the Project in the Project Export Settings. If you do not have any database available, consult your System Administrator. You can select which Metadata fields are populated using a Project Export Fields template, but renamed fields are not recognized in the database, and field reordering is not recognized. Exporting to a database does not preclude the export of files and other load files to an export location; it is an additional type of export to populate a database with load file information. See How to Export to a Database for more information about the configuration steps required to export to an MS SQL database. If a database connection fails during Export, the entire Export will fail. The Export will remain in a Staged state, and when the error has been resolved, you can click the Export button to perform the Export.
      • DB Table Name<Table Name> (selectable at any Export) — This field is only shown enabled when you initially select the SQL DB format and your Export Stream is configured for export to an MS SQL database. You can either use the default DB Table Name for the Export Stream (using the format DR_<projectname>_<streamname>), or you can override that name and specify your own. If either the Project Name or the Stream Name contains characters from foreign languages, you will be prompted to change the DB Table Name. The DB Table Name is validated. The DB Table Name can be a maximum of 128 characters. It cannot contain spaces (leading, trailing, or embedded), a leading digit, or characters other than underscore, a-z, A-Z, and 0-9. Unsupported characters (for example, ! " # $ % & * + . / : ; < = > ? @ [ \ ] ^ { | } ~ “ ”) will be converted to underscores automatically. Note that the DB Table Name must not contain any MS SQL reserved keywords. As long as the DB Table Name is valid, the Digital Reef software then creates a table with that name (if it is not already present) to contain the document information. For subsequent Volume Exports, the last DB Table Name used for the Export Stream appears, but you can specify another DB Table Name. Note that two additional tables are also generated, once per database, to provide information about Export Settings and status for each produced Volume. See How to Export to a Database for more information about the tables that are part of the schema.
  • Include Full Text (cleared by default) — This option is available for any Export of an Export Stream and requires the Extracted Text option. Remember to set this option if you want to the load file to include the text from the text files subject to Export. When selected on its own without its nested option, this option will ensure that full email header information is included in the load file.This option applies to DAT and EDRM load files, and to an export to an MS SQL database. For an MS SQL database, the text in the extracted_text field can be up to 2 GB per document. If a file exceeds 2 GB, then the extracted_text field will be empty and the table’s text_link field will provide a reference to the extracted text file on disk. (The text_link field is not populated unless the limit is exceeded.) For DAT or EDRM load files, the text can be up to 12 MB of data per document (up to approximately 12 million ASCII characters for a given document, and the number of characters will be less if you have non-ASCII characters or special characters such as a ®). If a document's text is greater than the limit, the text will not be included, but the load file will include a reference to the extracted text file. In general, if you want to use the Include Full Text option, be sure to enable the Extracted Text option (as a Production Settings option). The EDRM XML load file populates the EDRM XML element InlineContent when the Include Full Text option is selected; the DAT load file includes a field called inlinetext1 when the Include Text option is selected.
    • Exclude Email Headers (cleared by default, available when Include Full Text is selected) — This option affects load file content and enables you to exclude the email header information from the included text of the text files subject to Export and include just the email body in the load file. For example, this would allow you to just include the email body in MS Teams data.
  • Separate Duplicates (cleared by default) — This option is available for any Export (or Load File Generation) and places the entries for non-duplicates and duplicates into separate load files (for example, VOL0001.csv and VOL0001-duplicates.csv). If you use the default setting of Remove Duplicates from Export, having a separate duplicates load file segregates records with no TextLink and NativeLink information into a separate load file (for example, for manual loading to Relativity). This option does not apply if you remove duplicates from both the export and the load file.
  • Duplicate Overlays (cleared by default) — This option is available for any Export (or Load File Generation) and triggers the generation of overlay manifests containing any updated records for previous volumes due to processing the current volume (for example, due to new or changed DuplicateCustodian metadata). DAT is the default output format, but you might want to select additional formats, such as CSV. For example, if VOL1 contains an original document, and duplicates of that file appear in VOL3 and VOL5, you would see entries in the CSV files VOL0001.csv, VOL0003-overlay.csv, and VOL0005-overlay.csv (as long as you selected CSV as an output format). Note that if you also use the Separate Duplicates option for an export or load file generation, you can see an overlay file for the duplicates CSV (for example, VOL0003-duplicates-overlay.csv). Overlay manifests are also generated automatically as part of the Generate Search Reports option.
    • Include All Master Duplicates (enabled when you select Duplicate Overlays for Export or Load File Generation and cleared by default): When you opt to generate overlay manifest files, the default behavior is to limit the master duplicate records from prior volumes in the stream and include only those with updates to metadata values based on corresponding duplicates added to the most recent volume. Existing master duplicate records without corresponding duplicates added to the most recent volume are therefore excluded from the overlay manifest files by default. If you want the overlay files to include all master duplicate records instead, select the Include All Master Duplicates checkbox.
    • Export Fields for Overlays (enabled when you select Duplicate Overlays for Export or Load File Generation): When you opt to generate overlay manifest files, you can use the associated drop-down menu to select an available Export Fields template to use for the overlay manifest files. This enables you to configure and then use a custom Export Fields template specifically for the overlay file, perhaps one with a smaller subset of fields. If you do not specify a custom Export Fields template for the overlay file, then your designated Export Fields System Created Template for the Project is used.
  • Max Records Per File (0, or disabled, by default) — For any Export of an Export Stream, you can set this option to a non-zero value to generate load file batches (chunks) based on a maximum number of records per batch. The Max Records value must be 0 or higher; the default value of 0 creates the standard Export, without load file batches. Keep in mind that the non-zero value you supply sets the upper boundary for each load file batch. The number of records for a given load file batch may be less than the Max Records Per File value to prevent a family from being split across two load file batches. Any family that is larger than the Max Records Per File value will be split across batches. This option eases the loading process in a downstream review tool, and helps reviewers get started with batches while others are loading. Note that this option applies to DAT, CSV, and EDRM XML load file types, but not DII or LST.
  • BegAttach starts with – For any Export of an Export Stream, you can specify this option with one of the following for the starting attachment (or embedded document) value:
    • Parent Email (set by default) — By default, this option uses the parent email or document ID to represent the beginning attachment range (BegAttach value) for an entire family, which may include email attachments or embedded documents (members of a MAG or DAG). For example, if doc1.doc with an ID of 00001 has three embedded documents (embed1.doc with ID 00002, embed2.doc with ID 00003, and embed3.doc with ID 00004), the BegAttach value contains parent ID 00001 for all members of the family.
    • First Attachment — Selecting this option uses the first email attachment ID or first embedded document ID to represent the beginning attachment range (BegAttach value) for an entire family (members of a MAG or DAG). For example, if doc1.doc with an ID of 00001 has three embedded documents (embed1.doc with ID 00002, embed2.doc with ID 00003, and embed3.doc with ID 00004), the BegAttach value contains first embedded document 00002 for all members of the family.
  • Generate: – For a selected Export Stream, you can select one of the following using the radio button options:
    • A Load File for Each Volume (default) – Generates separate load files for each volume.
    • One Consolidated Load File – For a selected Stream only, you can select this checkbox option to generate a single, consolidated load file instead of separate load files for each volume. This checkbox is grayed out and unavailable for a selected volume. If you select this checkbox for a stream, the Include Volume Label and # in Path Fields option will be set automatically and unavailable to change. Upon generation, the Export Location will contain the consolidated load file in a Consolidated Load Files folder under the stream. The load file(s) generated will be named <stream>]-<timestamp>.<extension>, with the timestamp in YYYYMMDDHHMMSS format (for example, export1>20170120134146.csv). Note that this option will respect your Filter by Last Update selections. If any volumes have been explicitly disabled (that is, the volume checkbox in the tree has been cleared), those disabled volumes will not be included in the consolidated load file.

Formatting Settings

These options include the following:

  • Time Zone – For any Export of an Export Stream, you can select a time zone and adjust the exported date and time metadata accordingly. Export supports a wide range of time zones. You can select from the displayed subset of the most common time zones, such as the default of Coordinated Universal Time (UTC), and you can select Other from the drop-down to display an expanded pop-up list of other available time zones. On the expanded time zone list, you can use the Filter box to search for a time zone containing the characters you type. Once you find the time zone you want, click OK). The time zone affects document conversion to PDF, HTML, or TXT, and the export load file contents. In addition to selecting from the drop-down list, you can enter your own time zone using the standard time zone name (for example, America/New_York). Consult sites such as http://en.wikipedia.org/wiki/List_of_tz_database_time_zones and http://efele.net/maps/tz/us/ for a list of time zone names.
  • Date Format – For any Export of an Export Stream, you can select a date format. The default complete date and time format is MM/dd/yyyy HH:mm:ss. The Date Format drop-down box enables you to select a format for the date, or type in your own date format using the guidelines for custom date formats:
    • MM/dd/yyyy (the default)
    • MM/dd/yy
    • yy/MM/dd
    • yyyy-MM-dd
    • dd-MMM-yy
  • Delimiter – If you type in your own format (for any Export of an Export Stream), you must select a separator (by default, a space) to separate the date information and the time information. You are not limited to a single character; you can supply a text string.
  • Time Format – For any Export of an Export Stream, you can use a drop-down box to select a format for the time, or type in your own time format using the guidelines for custom time formats:
    • HH:mm:ss (the default)
    • HH:m:s

    Guidelines for Specifying Custom Date/Time Formats:

    When supplying your own Date/Time format, see Formatting Characters for Custom Date/Time Formats, or consult sites such as http://download.oracle.com/javase/1.3/docs/api/java/text/SimpleDateFormat.html to learn about the accepted formatting characters used to create date and time patterns. If you type any text other than formatting characters in the Date or Time format boxes and you want that text to be preserved, you must place the text in single quotes. For example, type the Time format hh 'o' 'clock' a in the Time Format box to produce a Time that preserves o'clock. This format (where a is the formatting character for AM or PM), may yield a Time such as the following: 12 o'clock PM. An example of a Date and Time format with the word at as the Separator is yyyy.MM.dd G at h:mm a. In this example, you would type yyyy.MM.dd G in the Date Format box (where G designates an era), the word at (no single quotes are needed for text typed in the Separator box), and h:mm a in the Time Format box. This format may yield a date/time format such as 1996.07.10 AD at 12:08 PM.

    Note: The specified format affects the load file format of the date-only export metadata fields (for example, DateCreated), time-only fields (for example, TimeCreated), and the fields that represent combined dates and times for load files other than EDRM XML. EDRM XML has its own format and does not observe your date/time format.

  • Unit of measure — You can select the desired unit of measure for any Export of an Export Stream. You can choose Bytes (the default), KB, MB, or GB.
  • DAT File Encoding — For DAT load files only, you can set one of the following options for any Export of an Export Stream:
    • ASCII/UTF-8 (the default) — Produces an ASCII-delimited file with UTF-8 encoded values. UTF-8 and ASCII are identical for ASCII values only; for any non-ASCII value (for example, in file names, metadata values, or content), multiple bytes encoded according to the UTF-8 encoding rules will be used to represent the character. In this case, the DAT file would contain multibyte characters. Note that if you use this encoding type and want to import the DAT file back into the system, your Load File Import Settings must use the encoding type MIXEDMODE, which accommodates the ASCII/UTF-8 mix.
    • Unicode — Produces the DAT file using UTF-16 LE encoded values. Note that if you use this encoding type and want to import the DAT file back into the system, your Load File Import Settings must use the encoding type UTF16LE

Production Settings

For a given Export Stream or Volume, you can use the following options:

  • Base path (DR by default) — Allows you to change the base path that you want reported in the load file fields NativeLink, TextLink, and/or PDFLink, which are populated when you include the production of native, text, and/or PDF versions. You can use the default (DR), specify your own base path, or omit the base path completely (for example, if you plan on importing a DAT file back into the system and do not want to have to trim the base path in the Load File Import Settings). The base path can include any of the standard characters permitted in paths on Microsoft Windows and Linux (for example, a comma). However, you cannot specify the " (quotation marks), < (less than), > (greater than), or | (pipe) characters. Note that your specified base path determines what appears in the NativeLink, TextLink, and/or PDFLink export fields in the load file (if you include native, text, and/or PDF versions), but does not affect what appears at the physical export location after export.
  • Include Volume Label and # in Path Fields — Includes or excludes the Volume label and Volume # in the appropriate export metadata fields (NativeLink, TextLink, and/or PDFLink) of the load file. Select the checkbox to include the Volume label and Volume #. This checkbox will be set automatically and cannot be changed if you select the One Consolidated Load File option.

Filter by Last Update

For the Export Stream or Volume for which you want to generate a new load file, you may want to filter entries in the load file by one or more last update dates (listed in timestamp format with a checkbox for each timestamp). Selecting one or more checkboxes enables you to generate a load file that provides an "overlay" of the associated last update dates, thereby limiting the load file content to entries last updated on the dates reported in the timestamps.

Note: By default, your generated load file will include information from all last update updates (that is, no entries will be filtered).

The Filter by Last Update section provides a list of last update dates for the Export Stream or Volume in timestamp format:

  • For an Export Stream, the list of timestamps for Filter by Last Update for an Export Stream will be a superset of last update dates.
  • For a given Volume, the list will be a subset of last update dates, and the list of timestamps will represent the appropriate last update dates for the load file entries in the Volume.

Sample timestamp list (Last Update Dates)

The following sample shows available timestamps listed for an Export Stream (export1) and its three Volumes.

Note: Making selections is useful when the list includes multiple entries.

List for Export Stream export1:

2013-08-15-20-10-59 (represents the initial Export)

2013-08-15-21-13-33 (represents the last Update of entries for VOL0001)

2013-08-15-21-19-11 (represents the Export of VOL0002, which changed Duplicate Custodian information for an entry in VOL0001)

2013-08-15-17-57-06 (represents the last Update of entries for VOL0003)

List for export1-VOL0001:

2013-08-15-20-10-59

2013-08-15-21-13-33

2013-08-15-21-19-11

List for export1-VOL0002:

2013-08-15-21-19-11

List for export1-VOL0003:

2013-08-15-17-57-06

 

About Timestamp Generation

The timestamps use the format yyyy-MM-dd-HH-mm-ss in UTC (Coordinated Universal Time). The following will cause a timestamp to be generated:

  • Complete Volume Export (without Prepare first)
  • Staging (Prepare) of a Volume
  • Export of a Prepared Volume
  • Change in Duplicate Custodian information
  • Update Export to retry Production Errors (right-click on a Volume and select Update Export)
  • Update Export to produce additional formats

Note: The Create Near-Duplicate Metadata operation does not generate a timestamp.

To illustrate how timestamps are listed for updates, consider the following scenario:

  • You have 5 exported files in a Volume.
  • You run Update Export for the Volume twice to add more formats (once to add text file versions, and a second time to add PDF versions).
  • In this case, the timestamp list will include a timestamp for the second update only, since that last update affected all 5 entries in the Volume.
  • Main Options to Submit or Cancel the Load File Generation

    At the bottom of the dialog, click OK when you are ready to submit the information, or click Cancel if you want to cancel the operation.