View eDiscovery Export Stream Settings

Exports > eDiscovery Exports > Selected Export Stream > Settings tab

Requires Exports - View Permissions

Users in a role with the appropriate permissions can select an eDiscovery Export Stream and the Settings tab to view information about the status of the Stream and the export criteria in effect for the Stream.

The eDiscovery Export Stream Settings always reflect the current settings for the Export Stream.

If you have permissions to set up a subsequent Export of an Export Stream, some Stream Settings are editable, and would adjust the Stream Settings displayed (such as the Output Format section'sload file types).

Note: On-demand creation of Near-Duplicate Metadata (using the Create Near Duplicate Metadata right-click option for an Export Stream) does not change the Near-Duplicate setting (Group Near-Duplicates) or Threshold for the Export Stream. The next Export of the Stream observes whatever Near-Duplicate setting and Threshold is selected in the Export dialog. (The Group Near-Duplicates checkbox setting and Threshold value can be managed at each Export of an Export Stream.)

eDiscovery Export Fields and Options

When you are setting up an eDiscovery Export Stream for the first time, you have all options available. Some options are associated, so your choice of some options may further dictate selections. If you are setting up a subsequent Volume export of an Export Stream, most settings are available. Here is a quick summary of what you can change at each Export and what you can only specify for the initial Export:

Selectable only at initial eDiscovery Export and grayed out for subsequent Volume Export of the stream:

  • Documents to Export
  • All selections in the Family Options section
  • Duplicates Processing selection (No Duplicate Removal, Remove Duplicates from Export, Remove Duplicates from Export and Load File)
  • Page-Level Numbering
  • Pad Size under Production Settings

Selectable at each Volume Export:

  • Export template
  • Export Name (you can select an existing Export Stream name; typing in a new name creates a new Export Stream)
  • Export Location
  • Selection of Export to DB Database Table
  • Group Near-Duplicates and associated options (Threshold, Minimum Terms, and Process Attachments)
  • Output Options for Export formats
  • Include Text (which requires the Extracted Text option)
  • Separate Duplicates
  • Duplicate Overlays
  • Max Records
  • BegAttach starts with
  • Configurable Production Settings (Base Path, Volume label, Include Volume, ID Prefix, Starting ID)
  • Output File Options, Native Extracted Text (with the option to Exclude Header Information) , and PDF conversion, with its Highlight Search Terms (Parsing Library V2 only) and Generate Remaining Images options.

If you are viewing eDiscovery Export Settings from a Stream or Volume Settings tab, note that you see a read-only version of the current settings, or the settings you have selected for a particular Volume.

Export Overview provides key information about Export Locations, Directory Structure, and Generated Files and how to verify your exported items and generated files (load files, settings files, production reports, and exception files). It also provides a table of export exceptions and reviews how family associations are maintained across volumes.

Export Name, Template, and Export Location

The following fields dictate the Export fields template, Export Stream name, and Export Location for any export operation. For a given Export Stream, you can change the template and/or Export Location at any export (that is, any Volume export of an Export Stream), or you can create a new Stream with this information:

  • Export Name (required) You must either assign a unique name to a new eDiscovery Export Stream or, for a new Export Volume, you must select from a drop-down list of available Export Stream names within the current Project. A newly provided Export Stream name must be unique within the current Project. If it is not unique, you will see an error message when you navigate away from the field. (For long Export names, ellipses appear when the entire name is too wide for the menu, and you can select the configured name and scroll left or right to see the entire name.) The Export Stream name is subject to validation upon creation . The name can include alphanumeric characters, spaces between characters in the name (leading and trailing spaces are ignored), and some supported characters (such as a hyphen, underscore, and apostrophe). During validation, the software will also allow characters from foreign languages (for example, Korean characters). However, the following characters are not supported for Export stream names and will generate an error message indicating that your entry contains invalid characters:

! " # $ % & * + . / : ; < = > ? @ [ \ ] ^ { | } ~ “ ”

Note: These character restrictions apply to most tree items, such as Imports, Exports, Tags, Folders, Saved Searches, Workflows, Comparisons, Samples, and Synthetic Documents. To support auto-discovery of Custodians based on staging, a Custodian name has fewer restrictions regarding invalid characters. Note that you cannot create multiple Exports that are just uppercase and lowercase versions of the Export name (for example, you cannot add an Export named export2 if there is already an Export named Export2).

  • (required) Select an available Export Location from the drop-down list (alphabetical order). These export mounts (export data areas) are managed as Organization Settings. Be sure to negotiate access to this location to retrieve files after export. You can always revisit the selection of an export location for each export operation.

Documents to Export

For the initial Export of an Export Stream only, you must identify which documents to Export by either using the checkbox to have export consider all documents in the view, or focus the export based on a query in which you typically include and/or exclude certain Tags. If you decide to select Tags for a Tag query or type a query, you can do the following:

Note: In general, you should not change the name of a Tag if you have existing Exports that rely on that Tag.

  • Click Tags to use the Tag menu to select individual Tags. Select the appropriate Tag checkboxes and click OK. This enables you to initially populate the search query box with the selected Tags in the Tag view syntax format (for example, tag_view::“Potentially Hot”), where multiple Tag views are separated by a Boolean OR and quotes appear around the Tag name. You can then edit the Tag selection in the Tags popup or the formed Tag query from the query box.
  • Type the query (typically with Tag criteria) directly into the search query box, using the appropriate Boolean operators between the clauses, such as Tag view clauses. You can then edit the query in the query box or the query from the box. (If you have a long query that exceeds the width of the box, you can select the query and scroll left or right to view the query contents.)

Note: The search query box is generally used to supply a Tag Search format with the supported Boolean operators between Tag names (see Tagging Overview and Search Syntax topics). However, you can also use it to specify content or metadata. When you specify Tags directly, you can omit the quotes if the Tag name is a single word. Place the Tag name in quotes if the name contains multiple words. If the Tag name includes a valid special character such as a hyphen or underscore, place the Tag name in quotes.

Options

All family (email or document family) options apply to the initial Export of an Export Stream only. Select one of the following options to control the scope of files flagged for export:

  • (cleared by default) — Exports only those files, attachments, or emails that were explicitly tagged (for example, as Potentially Responsive). An explicitly tagged email attachment or document attachment is always included in the Export when this option is enabled, so you will see the Separate Email Attachments option and Separate PDF OLE Attachments options enabled and grayed out (so you cannot change the settings). does not export the associated email of a tagged attachment unless that email has been explicitly tagged as well. Using means that family relationships will not be maintained in the appropriate load files upon Export (that is, metadata fields such as AttachmentID, AttachmentRange, and BegAttach will be blank).
  • Associated Family (set by default) — For each item being exported, exports the other contents of its associated document family. Remember to set Separate Email Attachments and Separate PDF OLE Attachments (with or without its nested option) if you want to Export email attachments and PDF OLE attachments (with or without other OLE attachments) as separate files.
  • Associated Threads (cleared by default) — For each email item being exported, exports the other contents of its associated email thread (for example, all associated messages and contained attachments).

Additional family options are as follows and apply to the initial Export of an Export Stream only:

  • Separate Email Attachments — Either includes or excludes email attachments as separate files. By default (with the default Associated Family Files mode), the export process email attachments of a parent email as separate files within their parent email (for example, an EML). Note that this option must enabled if you want to use the Search Terms section to specify queries for either the PDF option Highlight Search Terms or the option. You cannot, for example, enable Highlight Search Terms unless Separate Email Attachments is enabled.
  • Separate PDF OLE Attachments (cleared by default) — When selected on its own without its nested option, restricts the export of separate OLE Attachments to just PDF OLE attachments (files embedded within parent PDF files, called PDF Portfolios). You can control this and its nested option when Associated Family or Associated Threads is enabled. These modes expand the initial document selection to include any missing parent or child documents prior to applying this option. When is enabled, this option is fixed as enabled. Make sure that the Separate Email Attachments option is selected to ensure expected results regarding the export of separate OLE attachments attached to emails.
    • Separate OLE Attachments (cleared by default) — Extends the export of separate OLE attachments to include all other files embedded within files through Object Linking and Embedding. For example, if you export a Word document that contained an Excel spreadsheet with Separate OLE Attachments enabled, you would also export a discrete copy of the spreadsheet. Existing streams created prior to 4.3.11.0 that have this option set will be upgraded to have the Separate PDF OLE Attachments option set as well.
  • Separate OLE Attachments (cleared by default) —Exports copies of files embedded within other files through Object Linking and Embedding. For example, if you export a Word document that contained an Excel spreadsheet with Separate OLE Attachments enabled, you would also export a discrete copy of the spreadsheet. You can control this option when Associated Family Files or Associated Threads is enabled. These modes expand the initial document selection to include any missing parent or child documents prior to applying this option. When is enabled, this option is fixed as enabled.
  • Include Container Reference (cleared by default) — This option ensures that the load file contains a record for each exported document's container (for example, a PST), if containers have not been removed from the Project (for example, with exclusion searches). When this option is set, each container for an exported document will be assigned a Doc ID so that it can be referenced in the ParentContainer metadata field. Note that container references added by this option appear in the load file only (they are not produced). Even without this option, you may see the ParentContainer field populated (for example, if an exported document's container file is part of the export and already has a Doc ID).
  • Remove Attached Archives (cleared by default) — Removes any successfully parsed archives that are family members (that is, part of the MAG or DAG). This option applies to File Archive types, Disk Image types, Message Archives, and Compressed types that have been successfully parsed and have a docclass of Message_Attachment, Message_OLE_Attachment, or EDoc_OLE_Attachment. (Only archives that have been successfully parsed can be removed.) This option does not apply if the Family scope is

For any Export of an Export Stream, you can process the following information:

  • Group Near-Duplicates (cleared by default) — You can select this option to enable Near-Duplicate processing, where the scope of the processing is restricted to the documents meeting the Documents to Export criteria. Near-Duplicate processing includes the calculation of pivot documents and the identification of the compliant Near-Duplicate documents. If a subsequent Export of an Export Stream enables Near-Duplicate processing, any newly added or newly Tagged documents that meet the criteria are evaluated. If you select the Group Near-Duplicates option, you must supply values in the appropriate range for the threshold and the minimum terms:
    • Threshold (required, 80 by default) — Specifies the similarity threshold used for the Near-Duplicate processing. By default, this operation uses a similarity Threshold of 80. You can specify another threshold value in the range where 0 detects a nonzero amount of similarity or commonality. To require a higher degree of similarity or commonality, select a higher value, such as 80 or 90; to require a moderate degree of similarity or commonality, select a value such as 40 or 50. In general, the lower the threshold, the more results you will see, since you are requiring less similarity or commonality. Specifying a higher threshold value yields a smaller number of results.
    • Minimum Terms <value> (required, 25 by default) — Specifies the minimum number of terms for Near-Duplicate Processing. By default, the minimum number terms for Near-Duplicates processing is 25. You can use this default or specify a value that you want to use (negative numbers and decimals are not permitted). The permitted range is 0 to 9999.
    • Process Attachments (optional, cleared by default) — Specifies whether email or OLE attachments are processed as part of Near-Duplicate processing. By default, attachments are not processed independently for Near-Duplicate Processing. This option applies when you have Separate Email Attachments and/or Separate OLE Attachments set for the Export

About Near-duplicate processing for Export

Export near-duplicate processing differs from the near-duplicate processing performed as part of searches for near-duplicate documents. Export near-duplicate processing is characterized by the following:

  • Handling of Numerics based on Export Settings — By default, the Project Export Settings will include Numeric values for Export near-duplicate processing. If you want, you can have Numeric values ignored for Export near-duplicate processing by changing the Export Settings, but you must make this change before you perform any Export near-duplicate processing or the change will have no effect.
  • Removal of Tokens — For documents processed prior to Release 4.3.11.0, the software always removes Tokens for Export near-duplicate processing. For regular document processing during Import prior to Release 4.3.11.0, the software uses Tokens to identify the type of content in a document, errors, and supported Patterns (regular expressions).
  • Inclusion of Stop Words — The software always includes Stop Words for Export near-duplicate processing and therefore does not use the Stop Words list. Stop Words are always included for Indexed operations such as Term Searches and ignored by default for Clustering and similarity comparisons (including a search for near duplicates of a document).
  • Export near-duplicate processing observes a minimum term length setting of 1 character and a maximum term length setting of 64 characters.
  • Use of Shingling for 2 Adjacent Terms — The Export near-duplicate processing employs shingling for each set of 2 adjacent terms (that is, it will evaluate and partially overlap adjacent terms, 2 terms at a time). This imposes an order to the terms (and means that the 2 terms are not considered in the reverse order). For example, with the 2-term shingling, an occurrence of The quick brown fox will have The quick in one set, quick brown in a second set, and brown fox in a third set.
  • Near Duplicates are calculated at the end of the Export Prepare stage (after duplicates have been handled for the Export, and the Export contents have been determined).
  • A near-duplicate Work Basket task shows the potentially long-running Near Dupe task, which you can cancel if necessary. If all of your Export data is not backed by an Analytic Index, this Work Basket task will display a failure.
  • If you export many documents, be aware that the Near-Duplicate processing can be a time-consuming process.

Output Options

All of the Output options are available at any Export of an Export Stream:

  • DAT, LST, DII, CSV, – For any Export of an Export Stream, you can select any combination of these export formats, or you can clear all formats and export only files, which may be suitable in some cases.
    • DAT (default) — Exports tagged and associate files, or all files, from a view with support for the LexisNexis Concordance® format for eDiscovery. This export type provides a .DAT file.  A Concordance DAT file provides all Digital Reef metadata fields subject to export. The Metadata List has more information.
    • LST — Exports tagged files or all files from a view to a Relativity LST file. The LST file includes a small number of pertinent metadata fields such as DocID and TextLink.
    • DII — Exports tagged files or all files from a view to a CT Summation Document Image Information (DII) file. This file contains the Digital Reef fields mapped to standard DII tokens (for example, bcc becomes @BCC). User-selected fields and fields that do not have standard mappings are identified by a custom token, as @C.
    • CSV — Exports tagged files or all files from a view with a comma separated value (CSV) file serving as a manifest of files. The CSV file includes all Digital Reef metadata fields subject to export. If you are going to generate search reports, remember to select this option.
    • EDRM XML — Exports tagged files, or all files, from a view with support for the Electronic Discovery Reference Model (EDRMClosed The Electronic Discovery Reference Model defines a standard for eDiscovery products and services so that data can be easily exchanged between organizations and eDiscovery products. The supported version is currently 1.1.). With this export type, an XML file contains EDRM metadata as well as all Digital Reef metadata subject to export. The availability of the EDRM XML file enables a EDRM-compliant, third-party application to import the exported files for further analysis.

Note: The fields included in an exported load file, as well as their order and names, are dictated by the settings managed by the Export Fields. See Manage Export Fields for more information. Regardless of the load file type, an additional .CSV file called <volume>-TagReasonCodes.csv is generated at export to identify how the Tags included in the export were originally applied. This .csv file lists Search IDs for each exported Tag. Each Search ID in the list references a task to which an exported Tag was applied. In the appropriate load file, a given document may report one or more Tags in the TagID field and one or more Search IDs in the SearchID field (one for each Tag applied).

  • Include Text (cleared by default) — This option is available for any Export of an Export Stream and generally requires selection of the Extracted Text option. Remember to set this option if you want to include the text from the text files subject to Export in the load file. When selected on its own without its nested option, this option will ensure that full email header information is included in the load file.This option applies to DAT and EDRM load files, and to an export to an MS SQL database. For an MS SQL database, the text in the extracted_text field can be up to 2 GB per document. If a file exceeds 2 GB, then the extracted_text field will be empty and the table’s text_link field will provide a reference to the extracted text file on disk. (The text_link field is not populated unless the limit is exceeded.) For DAT or EDRM load files, the text can be up to 12 MB of data per document (up to approximately 12 million ASCII characters for a given document, and the number of characters will be less if you have non-ASCII characters or special characters such as a ®). If a document's text is greater than the limit, the text will not be included, but the load file will include a reference to the extracted text file. In general, if you want to use the Include Text option, enable the Extracted Text option (as a Production Settings option). The EDRM XML load file populates the EDRM XML element InlineContent when the Include Text option is selected; the DAT load file includes a field called inlinetext1 when the Include Full Text option is selected.
    • Exclude Email Headers (cleared by default, available when Include Full Text is selected) — This option affects load file content and enables you to exclude the email header information from the included text of the text files subject to Export and include just the email body in the load file. For example, this would allow you to just include the email body in MS Teams data.
  • Separate Duplicates (cleared by default) — This option is available for any Export (or Load File Generation) and places the entries for non-duplicates and duplicates into separate load files (for example, VOL0001.csv and VOL0001-duplicates.csv). If you use the default setting of Remove Duplicates from Export, having a separate duplicates load file segregates records with no TextLink and NativeLink information into a separate load file (for example, for manual loading to Relativity). This option does not apply if you remove duplicates from both the export and the load file.
  • Duplicate Overlays (cleared by default) — This option is available for any Export (or Load File Generation) and triggers the generation of overlay manifest files containing any updated records for previous volumes due to processing of the current volume (for example, due to new or changed DuplicateCustodian metadata). DAT is the default output format, but you might want to select additional formats, such as CSV. For example, if VOL1 contains an original document, and duplicates of that file appear in VOL3 and VOL5, you would see entries in the CSV files VOL0001.csv, VOL0003-overlay.csv, and VOL0005-overlay.csv (as long as you selected CSV as an output format). Note that if you also use the Separate Duplicates option for an export or load file generation, you can see an overlay file for the duplicates CSV (for example, VOL0003-duplicates-overlay.csv). Overlay manifests are also generated automatically as part of the Generate Reports option.
    • Include All Master Duplicates (enabled when you select Duplicate Overlays for Export or Load File Generation and cleared by default): When you opt to generate overlay manifest files, the default behavior is to limit the master duplicate records from prior volumes in the stream and include only those with updates to metadata values based on corresponding duplicates added to the most recent volume. Existing master duplicate records without corresponding duplicates added to the most recent volume are therefore excluded from the overlay manifest files by default. If you want the overlay files to include all master duplicate records instead, select the Include All Master Duplicates checkbox.
    • Export Fields for Overlays (enabled when you select Duplicate Overlays for Export or Load File Generation): When you opt to generate overlay manifest files, you can use the associated drop-down menu to select an available Export Fields template to use for the overlay manifest files. This enables you to configure and then use a custom Export Fields template specifically for the overlay file, perhaps one with a smaller subset of fields. If you do not specify a custom Export Fields template for the overlay file, then your designated Export Fields System Created Template for the Project is used.
  • BegAttach starts with – For any Export of an Export Stream, you can specify this option with one of the following for the starting attachment (or embedded document) value:
    • Parent Email (default) — By default, this option uses the parent email or document ID to represent the beginning attachment range (BegAttach value) for an entire family, which may include email attachments or embedded documents (members of a MAG or DAG). For example, if doc1.doc with an ID of 00001 has three embedded documents (embed1.doc with ID 00002, embed2.doc with ID 00003, and embed3.doc with ID 00004), the BegAttach value contains parent ID 00001 for all members of the family.
    • First Attachment — Selecting this option uses the first email attachment ID or first embedded document ID to represent the beginning attachment range (BegAttach value) for an entire family (members of a MAG or DAG). For example, if doc1.doc with an ID of 00001 has three embedded documents (embed1.doc with ID 00002, embed2.doc with ID 00003, and embed3.doc with ID 00004), the BegAttach value contains first embedded document 00002 for all members of the family.
  • Max Records Per File ( disabled by default) — For any Export of an Export Stream, you can set this option and supply a non-zero value if you want to generate load file batches (chunks) based on a maximum number of records per batch. If you select the Max Records Per File option, you must supply a value, 0 or higher, or up to 12 digits (the field cannot be empty). Specifying 0 just creates the standard Export, without load file batches. If you want to generate load file batches, specify a non-zero value, and keep in mind that the non-zero value you supply sets the upper boundary for each load file batch. The number of records for a given load file batch may be less than the Max Records Per File value to prevent a family from being split across two load file batches. Any family that is larger than the Max Records Per File value will be split across batches. This option eases the loading process in a downstream review tool, and helps reviewers get started with batches while others are loading. Note that this option applies to DAT, CSV, and EDRM XML load file types, but not DII or LST.
  • DAT Encoding — For DAT load files only, you can set one of the following options for any Export of an Export Stream:
    • ASCII/UTF-8 (default) — Produces an ASCII-delimited file with UTF-8 encoded values. UTF-8 and ASCII are identical for ASCII values only; for any non-ASCII value (for example, in file names, metadata values, or content), multiple bytes encoded according to the UTF-8 encoding rules will be used to represent the character. In this case, the DAT file would contain multibyte characters. Note that if you use this encoding type and want to import the DAT file back into the system, your Load File Import Settings must use the encoding type MIXEDMODE, which accommodates the ASCII/UTF-8 mix.
    • Unicode — Produces the DAT file using UTF-16 LE encoded values. Note that if you use this encoding type and want to import the DAT file back into the system, your Load File Import Settings must use the encoding type UTF16LE.

Production Settings

You can manage most of the configurable Production Settings for any Export of an Export Stream. This section starts with the different elements that make up the full path, as reported in the appropriate export metadata fields (NativeLink, TextLink, and/or PDFLink) in a load file. You can configure many portions of the reported path:

<Base path> \<Volume Label><4-digit Volume #>\<Output directory (optional)>\<5-digit Folder #>\<ID Prefix><Starting ID> <separator> <Page ID>.<extension>

Note: <Page ID> does not apply if you select the Page-Level Numbering option.

Note: The Volume Label and Document ID Prefix for the Document ID are initially derived from the Export Settings template (or the current Project Export Settings, if the Export Settings have been changed for the Project). Changing these settings in the Export dialog for a particular Export Stream effectively overrides the values in the Project Export Settings (or template), if the values differ.

  • Base Path (DR by default) — Enables you to specify the base path that you want reported in the load file fields NativeLink, TextLink, and/or PDFLink, which are populated when you include the production of native, text, and/or PDF versions. You can use the default base path (DR), specify your own base path, or omit the base path completely (for example, if you plan on importing a DAT file back into the system and do not want to have to trim the base path in the Load File Import Settings). Note that what you put in the base path determines what appears in the NativeLink, TextLink, and/or PDFLink export fields in the load file (if you include native, text, and/or PDF versions). This base path does not affect what appears at the physical export location after export, just the reporting of the path in the appropriate load file fields. The Base Path can include alphanumeric characters, and some supported characters permitted in paths on Microsoft Windows and Linux (such as a hyphen, period, comma, or underscore). You can specify a Base Path with a maximum of 50 characters. If you provide a Volume Label that exceeds the limit, an information popup informs you of the limit and that one or more characters were trimmed at the end. During validation, the software will also allow characters from foreign languages (for example, Korean characters). However, spaces as well as the following characters are not supported for the base path and will generate an error message indicating that your entry contains invalid characters:

! " ' # $ % & * + / : ; < = > ? @ [ \ ] ^ { | } ~ “ ”

  • Volume Label (required Volume prefix, VOL by default unless changed in Export Settings) — Enables you to specify a starting production Volume Label. The Volume Label (prefix) can include alphanumeric characters, and some supported characters (such as a hyphen, period, or underscore). You can specify a Volume Label with a maximum of 50 characters. If you provide a Volume Label that exceeds the limit, an information popup informs you of the limit and that one or more characters were trimmed at the end. During validation, the software will also allow characters from foreign languages (for example, Korean characters). However, spaces as well as the following characters are not supported for the Volume Label and will generate an error message indicating that your entry contains invalid characters:

    ! " ' # $ % & * + / : ; < = > ? @ [ \ ] ^ { | } ~ “ ”

  • Volume # — Displays the current volume number that will be appended to the Volume Label (for example, 0001). You cannot configure this value.
  • — Enables you to include or exclude the Volume label and Volume # in the appropriate Export metadata fields (NativeLink, TextLink, and/or PDFLink) of the load file. By default, this checkbox is cleared. Select the checkbox to include the Volume label and Volume #.

Note: An optional output directory, shown as <Output directory (optional)>, may appear after the Volume # based on selected file options. See the section on Production Options for Including Native, Text and/or PDF Versions.

  • Folder # — Displays the current 5-digit folder number that will be part of the path. You cannot configure this value.
  • ID Prefix (required Document prefix, DOC by default) — Enables you to specify the prefix you want to use for a Document ID, or select an available prefix for the Export Stream (as shown in the drop-down box). The default ID Prefix for the first Volume Export in an Export Stream is DOC (or the prefix configured in the Project Export Settings). For subsequent Volume Exports in a given Export Stream, the prefix shown in the dialog is the last ID Prefix used (as reflected in the Stream Export Settings). Your prefix selection then determines the Starting ID, which will be set to the appropriate value based on the existing prefix selected, or reset to 1 (for example, to 0000000001) for a new prefix for the Stream. Note that the Volume Settings will reflect your ID Prefix and Starting ID selections, with the Starting ID reflecting the next-available ID. In addition, the Export Stream Documents tab will reflect the ID Prefix and Starting ID used by each Volume in the Stream. ID prefixes other than the default prefix are specific to a given Export Stream. The ID (Document) Prefix can include alphanumeric characters, and some supported characters (such as a hyphen, period, or underscore). You can specify an ID Prefix with a maximum of 50 characters. If you provide an ID Prefix that exceeds the limit, an information popup informs you of the limit and that one or more characters were trimmed at the end. During validation, the software will also allow characters from foreign languages (for example, Korean characters). However, spaces as well as the following characters are not supported for the ID Prefix and will generate an error message indicating that your entry contains invalid characters:

! " ' # $ % & * + / : ; < = > ? @ [ \ ] ^ { | } ~ “ ”

  • Starting ID — Enables you to specify a starting production Document ID for a given export. The default starting ID for the default prefix in a new Export Stream, or a new prefix in general, is a 10-digit starting ID, 0000000001. A separator follows the Doc ID, followed by the Page ID (if document-level numbering is used) and then the document extension. You can specify a value greater than (or equal to) the shown starting ID, but not a smaller value than that shown. As you add Export Volumes within a Stream for a given prefix, the starting ID value (and value shown on the Settings tab) will reflect the next-available ID.
  • <separator> between Starting ID and Page ID (applies only for document-level numbering, not page-level numbering) — Enables you to use the default separator of an underscore (_), or select a period or a hyphen as the separator between the Starting Doc ID and the Page ID.
  • Page ID (applies only for document-level numbering, not page-level numbering) — Enables you to specify a starting production Page ID for a given export that includes PDFs. The default is a 4-digit Page ID, 0001. The document extension is displayed at the end of the ID (for example, .pdf).
  • Page-Level Numbering (disabled by default)You cannot change this option after the initial export of an Export Stream; therefore, the enabled or disabled state of this option will apply to all Volumes in a Stream. When enabled, this option uses incremental numbering to assign each page of a document (PDF included in the export) its own Doc ID instead of using document-level numbering, in which the same Doc ID supports suffixes for the different Page IDs. If you select this option, the Page ID part of the path no longer appears or applies. Other than Doc ID, page-level numbering will affect a number of Export metadata fields, as follows:
    • Fields that report starting values — AttachmentID, BegAttach, BegDoc, NativeLink, NearDupePivotDocID, OLEChildID, OLEParentID, ParentContainer, ParentID, PDFLink, TextLink, ThreadGroupID, ThreadGroupSort, ThreadID, ThreadIDOrphanRef, and ThreadIDParentRef.
    • Fields that report ending values — EndAttach and EndDoc.
    • Fields that report document ranges — AttachmentRange and DocumentRange.
    • PageCount field — If included in the list of Export Fields for your selected Export Fields template, this field reports the number of pages produced for each document subject to export using page-level numbering.

    Note: An export that uses page-level numbering will fail if the export encounters missing images (for example, because of a conversion failure). In this case, you will see the failure in the Work Basket task, which will show the error message "Documents that could not be numbered at a page-level were encountered in the export. Please download the errors file for this task for more information." You can then click Download for the Work Basket task to download the errors file, WARNING_DETAILS_REPORT.csv. This file provides a list of document handles and the associated error for each (for example, CONNECTOR_FAILURE, CONNECTOR_READ_ERROR, CONVERSION_FAILURE, NATIVEFILE_NOT_FOUND, or UNKNOWN). An export using page-level numbering will also fail if the numbering of a staged volume no longer reflects the page count of the volume once it is actually being exported. In this case, the failed Work Basket task will show the error message "The image page count of this volume has been altered since staging. Please recreate the volume in order to update the page-level numbering."

  • Doc ID Pad Size (10 by default) — Enables you to specify a pad size for the Document ID. The default is 10, and you can specify a value 1-9. You cannot change the value of this option after the initial export of an Export Stream.
  • Page ID Pad Size (4 by default) This option is available only for document-level numbering, not page-level numbering. This option enables you to specify a pad size for the Page. The default is 4, and you can specify a value 1-4. You cannot change the value of this option after the initial export of an Export Stream.

Native, Extracted Text, and PDF Production Options

For any Export of an Export Stream, you can specify these additional Production options that enable you to export different versions of the files marked for export (Native, Text, PDF versions):

  • Native (set by default) – Whether to include Native versions of the files.
    • – Enables you to specify a target directory for the native files that are exported. If you do not specify a directory, the files will be exported to the volume directory. You can specify a maximum of 50 characters for an Output Directory name. Any characters exceeding the limit will be trimmed at the end.

    Note: The Extension Conversion setting in the Project at the time of Export determines whether extension conversion occurs for that Export. By default, the Extension Conversion setting is On, which means that an Export will produce native files with the origdocext file extension, which is based on the intended file type instead of the file extension seen on disk. If you change the Extension Conversion setting to Off in your Project Export settings, the next Export produces the native files based on the docext field, which is the extension seen on disk. Your Extension Conversion setting determines how the NativeLink field is populated in the manifest at Export. It is important to note that the software will not use a document extension during native file production if that extension contains any of the following characters: \ / : * ? " < > | or ASCII characters 0 through 31. In this case, the produced native file will not have an extension.

     

  • Extracted Text (cleared by default) – Determine whether you want to include extracted text versions. By default, text files of the equivalent Native files are not exported, and text files produced by OCR processing are not exported. Select this option if you want to export text files of the equivalent Native files, as well as the text files produced by OCR processing. Note that if you use the Include Text option to include text in a DAT or EDRM load file, or in an MS SQL Database, you must also enable the Extracted Text option.
    • – Enables you to specify a target directory for the extracted text files that are exported. If you do not specify a directory, the files will be exported to the volume directory. You can specify a maximum of 50 characters for an Output Directory name. Any characters exceeding the limit will be trimmed at the end.
    • Exclude Email Headers (cleared by default) – By default, email header information (metadata) is included in the produced text versions. This includes metadata from emails, calendar items, tasks, and journal entries. Set this option if you want the produced text versions to exclude metadata from the email header and include only the email body.
  • PDF (cleared by default) – Enables you to include PDF versions of native files in the Export. If you decide to perform this PDF conversion, Export will see if there are available images for the non-PDF native files (that is, images that were either imported through a Load File Import or part of an External Image Import). If not, Export will convert the non-PDF native files to PDF format. Copies of existing native PDF files are exported if you specify a separate directory for the PDFs. The native PDFs are used if you export native and PDF versions to the same directory or if the export is set up for PDF versions only. If both native and PDF formats are selected and go to the same directory, then a PDF with the naming convention <DocID>.orig.pdf will also appear. By default, selecting PDF Conversion alone will not convert image files.
    • – Enables you to specify a target directory for the PDF files that are exported. If you do not specify a directory, the files will be exported to the volume directory that contains the other exported files. You can specify a maximum of 50 characters for an Output Directory name. Any characters exceeding the limit will be trimmed at the end.
    • Highlight Search Terms (V2 Parsing Library Projects only) – On a per-volume basis, when PDF is selected to export PDF versions, this option enables you to highlight search terms that match queries you supply in the Search Terms section. The Search Terms section appears when you select Highlight Search Terms and/or Generate Search Reports. To use Highlight Search Terms, you must have the Separate Email Attachments option enabled for the Export. (If you clear the Separate Email Attachments option, you will not be able to use Highlight Search Terms.) In the generated PDFs when Highlight Search Terms is enabled, all matching terms are shown in a single color (yellow). Metadata is not highlighted. You can use this option with or without the Generate Search Reports option, which generates a number of reports based on the supplied queries. This option is cleared by default. Note that your supplied search terms are subject to highlighting for each production of a volume (each time a PDF is generated for a document eligible for Export). This gives you the option to modify your search terms and have them highlighted in the PDFs generated for a volume. Note that if highlighting fails for certain documents for some reason, the export will still provide the PDF versions, just without highlighting, and the Export Exceptions CSV will report a PDF_HIGHLIGHT_WARNING.
    • Generate Remaining Images – On a per-volume basis, enables you to export image files in PDF format. By default, the export process (conversion) will not convert image files. The conversion process supports the following image formats:
      • Portable Network Graphics Format (png)
      • Tagged Image File Format (tiff)
      • Windows Bitmap (bmp)
      • Compuserve GIF (gif)
      • Progressive JPEG (jpg, jpeg)
      • JPEG 2000
      • JPEG 2000 jpf Extension
      • JPEG 2000 mj2 Extension
      • JPEG File Interchange
      • Paintbrush

    Note: The following characters are not supported for output directory names and will generate an error message indicating that the name contains invalid characters:
    ! " # $ % & * + . / : ; < = > ? @ [ \ ] ^ { | } ~ “ ”

(per-volume option, cleared by default) – Select this option when you want to generate reports (as CSVs, as part of the Export) based on queries that you provide (similar to Bulk Search). Note the following about generating these reports:

  • Make sure that the Separate Email Attachments option is enabled for the Export.
  • Enter queries in the Search Terms section to set up a streamlined Bulk Search that will be used to generate the reports. Selecting one or both of the Generate Search Reports and Highlight Search Terms options will display the Search Terms section. Make sure that the Separate Email Attachments option is enabled if you want to use one or both of the Generate Search Reports and Highlight Search Terms options. If Separate Email Attachments is subsequently disabled, then the Search Terms section will be closed, Generate Search Reports will be cleared, and Highlight Search Terms will be disabled. In this case, although you can still reselect the Generate Search Reports option and view your queries in the Search Terms section again, remember to reselect Separate Email Attachments to ensure proper behavior of the option and to make Highlight Search Terms selectable again. In general, if you clear the checkbox for Generate Search Reports and/or Highlight Search Terms after you have entered queries in the Search Terms section, your queries will be retained when you select either of the options again. Although you can use the Highlight Search Terms and Generate Search Reports options independently, enabling both options enables you to generate the search reports and see matching terms highlighted in a color in the exported PDF versions.
  • Review Include Metadata — This checkbox option applies when you select Generate Search Reports and is enabled by default to expand the search of each keyword in a query to include a set of metadata fields as well as content. You can select the Search Fields you want to have searched automatically. See Using the Include Metadata Option for a list of the default fields searched. It does not apply to Highlight Search Terms, since metadata is not highlighted in the exported PDF versions.

Search Terms

When you select the Highlight Search Terms (V2 Parsing Library Projects only) and/or Generate Search Reports options, this section appears so that you can specify search terms and or more complex queries. You can specify search terms on a per-volume basis. You can include not only search terms and phrases, but queries using all supported syntax, such as proximity searches and metadata field searches.

Note: For any use of the Search Terms section, keep the Separate Email Attachments option enabled for the Export (if it is disabled, use of the Search Terms section will cause the Export to fail).

  • For this streamlined version of Bulk Search for Export, supply search term queries in one of two ways:
    • Enter queries in the queries box — Enter the queries (one per line), or paste a series of search queries copied from a file. These may include clauses consisting of simple terms, phrases, field searches, or other forms of supported syntax. If you have Custodians in your Project, you can use these searches to identify one or more Custodians to report on for Generate Search Reports.

Note: The search term queries you run as part of Export are evaluated on a per-volume basis (that is, the queries are not maintained volume to volume). When you are viewing the Export Settings for a Volume, the first five queries are shown in the Queries box by default. You can use the scroll bar to navigate a longer list of queries.

  • After Export, your Export Data Area contains the generated reports (in CSV files). See the topic for details about these reports. For example, the <volume>_summary_count_size.csv file provides a summary of the document count/size and family count/size information for the files subject to the Export criteria. This file now reports the calculated deduplication counts by family and by the appropriate deduplication mode (Global, also known as Horizontal, or Custodial, also known as Vertical by Custodian).
  • The searchterms metadata field, a special Export-only field, applies to the Generate Reports option. As long as the selected Export Fields template contains the Export field searchterms, you can check the searchterms column in the generated load file to view a semicolon-separated list of the submitted search terms/queries matching a given document. For example, if a document matched submitted terms demo, newsletter, and the phrase of the, the searchterms field for that document would contain demo;newsletter;(of the).
  • The Export Data Area will also contain an overlay file in the appropriate format when there is updated data for a given volume (for example, if DuplicateCustodian field information changes for multiple Exports of an Export Stream).

Actions

Choose one of the following actions when they are available (based on the state of the Export and whether the Export passes field validation):

Note: The buttons for these actions will initially display an error status (for example, ) to indicate that you have to supply an appropriate entry for any required fields before you can proceed. Hovering over the button will tell you that errors have been found and that you can click the button to highlight all of the errors. When you address all errors correctly, you will see the status change to a success status (for example, ). Note that for a New Export Volume, the Save Empty Export button will appear disabled and will not display a status.

Save Empty Export (available for a new Export Stream only) – Click this button for a new Export Stream to create an empty Export with the settings you have chosen for configuration. You can use the empty Export if you want to deduplicate a given search against the empty Export (for example, to get the HTML size information based on the selected email format), without actually having the first Volume of the Export in place. Once the empty Export is in place, you can view the settings you selected on the Export tab, but the Documents tab will be empty. This button will be grayed out and unavailable when you are setting up a new Volume in an existing Export Stream.

– Click this button to stage the Export, which performs preliminary calculations necessary to perform the export. Error popups appear in red to indicate the errors. A Work Basket task is generated for the Staging in the format Staging <streamName> at <exportdatarea>:<projectName>-<streamName> (for example, Staging export1 at exportda1:test424-export1)

Export – Click this button tostart the Export process. See Export Overview for more information about verifying the exported items and generated files (load files, settings files, production reports, and exceptions). It also reviews how family associations are maintained across volumes. Note the following:

  • For a new Export Stream, clicking Export creates the named Export Stream, creates the first Volume under Exports, persisting the Volume to disk.
  • For an existing Export Stream, clicking Export creates another Volume under the Export Stream, then stages and exports any Volume for this Export Stream that has not already exported.
  • A Work Basket task is generated for an Export (not previously Staged) in the format Exporting <streamName> at <exportdatarea>:<projectName>-<streamName> (for example, Exporting export1 at exportda1:test424-export1). If the Export has been previously Staged, then the Work Basket task does not include the Export Data Area.

Cancel Click to cancel the Export operation.

Note: The Export process will not process files for which there is no discernible content, such as image files, no-content files, or stop words-only files (if stop words are ignored in an Analytic Index, for similarity and Cluster operations).

Virus Detection of Exported Native Files

Upon export of native files, virus detection software installed on the system will check the exported native files for viruses. This virus detection performed at export occurs regardless of whether you have the Detect Viruses Index Setting enabled, as long as the virus detection software is installed. Any native files found to have viruses will be quarantined automatically. Once the export completes, a count of the files that were quarantined will be reported in the export volume report (and in the production report at the Export Area). Additional virus detection files will also be available at the Export Area for the volume. See the section about Virus Detection Files in Export Overview for more information.

As you monitor the progress of an export task in the Work Basket, note that the export task will not complete until the virus detection process is complete. Therefore, any export that includes native files will take longer to complete as of Release 5.4.0.0 because the export includes automatic virus detection.

Note: This feature requires installation of the virus detection software on the system.