Discover Fields for a Load File

Home > selected Project > Settings > Load File Import > Discover Fields
Project > Settings > Load File Import > Discover Fields

Requires Project -Export Fields - View, Add/Edit Permissions

The Discover Fields page lets you discover the fields present in a Load File, which is a necessary step in creating a Load File template containing mappings between these fields and Digital Reef metadata fields. You can then select the template when using the Imports > New Data Set from Load File option to import other Load Files of the same type (DAT, CSV, or EDRM XML). To discover the fields in a load file, do the following:

  1. Select a Connector from the Connectors table.

  2. In the Folder panel, select a Connector folder and subfolders as needed until one or more Load Files are listed in the File Name panel.

  3. Select a Load File. The full path is shown in the Path box.

  4. If the file you selected is a CSV or DAT file, the Settings panel displays, specifying the default encoding and delimiters for the type of file selected (see below). The Preview panel displays the fields discovered in the Load File using the current settings. Under Columns, the fields are listed in the order in which they appear in the Load File; Value shows the first row value (if any) for each field. At any time, you can change the encoding and/or delimiters in the Settings panel and click Update Preview to update the Preview display.

    If you selected an XML Load File, the Settings panel does not display; the file is assumed to be an EDRM XML file, and the Preview panel displays the fields discovered using the appropriate settings.

  5. Click OK to return to the Load File Import page, with the discovered fields loaded into the Load File Fields panel, or Cancel to cancel the operation.

Settings

The settings panel lets for specify the encoding type of and delimiters in the selected DAT or CSV load file.

Encoding Type

Select the appropriate encoding type based on the encoding used to produce the Load File:

  • ISO8859 — Use for a file that has been encoded with only ISO8859-1 values.
  • UTF8 (the default) — Use for a file that has been encoded with only UTF-8 encoded values (pure UTF-8). UTF-8 and ASCII are identical for ASCII values only; for any non-ASCII value (for example, in file names, metadata values, or content), multiple bytes encoded according to the UTF-8 encoding rules will be used to represent the character. In this case, the DAT file would contain multibyte characters.
  • UTF16LE — Use for a file that has been encoded with only UTF-16 LE encoded values (pure UTF-16LE).
  • UTF16BE — Use for a file that has been encoded with only UTF-16 BE encoded values (pure UTF-16BE).
  • MIXED MODE — Use for a file that has delimiters encoded as ASCII characters but the content is UTF-8 encoded. This would apply to DAT files generated from Digital Reef exports using the ASCII/UTF-8 option.

Delimiters

Accept the defaults for DAT or CSV (if you selected one of these file types) or select the appropriate ASCII character (based on ASCII value or represented character) from each drop-down, based on the contents of your Load File. If you selected an XML file, it is automatically treated as an EDRM XML file and previewed accordingly; the Settings panel does not display.

  • Column — Specifies the character used as a field separator to separate the columns in the load file. If you do not supply the appropriate field delimiter for the Load File, an import of the Load File will fail because the delimiter information is required to parse data from the Load File. DAT default: ASCII-020, an nonprintable character. CSV default: , (comma).
  • Quote — Specifies the appropriate quote character used to mark the beginning and end of each field in the load file. DAT default: þ, which is ASCII-254. CSV default: " (quote).
  • Multi Value — Specifies a character used to separate multiple values within a field (for example, multiple email addresses in a to or from field). DAT default: ; (semicolon), which is ASCII-059. CSV default: ; (semicolon).
  • Nested Value — Specifies a character used to handle nested values (such as tagging levels). Not all sources have a default for this separator. DAT default: \ (backslash), which is ASCII-092. CSV default: \ (backslash).
  • New line — Specifies the new line character to be used to mark the end of a line within a field. This helps to format long text in a field. DAT default: ®, which is ASCII-174. CSV default: ®

Note: If the encoding and delimiter selections used to discover fields for a template do not match the selected Load File when using the template, import of the Load File fails.

Google Drive Metadata Fields

Because metadata handling within Google Vault is not fully reliable (a known issue), files from Google Vault are accompanied by CSV files (GMail) and XML files (Google Drive) containing the correct metadata information. The following metadata fields are contained in the Google Drive XML files and are thus discoverable in a Google Vault Load File and can be mapped to metadata fields or (at the Project level) Custom Fields using the Discover Fields page:

  • #Author – Email address of file's owner. For a shared drive file, this is the shared drive name.

  • Collaborators – Accounts and groups that have direct permission to edit the file or add comments; also includes users with indirect access to the file if this option is chosen during export.

  • Viewers – Accounts and groups that have direct permission to view the file; also includes users with indirect access to the file if this option is chosen during export.

  • Others – Accounts from query that have indirect access to the file if access level information was excluded during export. May also include users for whom Google Vault couldn't determine permission levels at the time of export.

  • #DateCreated – Date the file was created in Drive; for non-Google files, date the file was uploaded to Drive.

  • #DateModified – Date the file was last modified.

  • #Title – Filename as assigned by the user. Because some operating systems can't expand zip files with extremely long filenames, Vault truncates the filename at 128 characters during export, but the value shown by #Title isn't truncated.

  • DocumentType – File type, with possible values of DOCUMENT, SPREADSHEET, PRESENTATION, FORM, and DRAWING.

  • SharedDriveID – Identifier of the shared drive that contains the file (if applicable).

  • SourceHash – Unique hash value for each version of a file; can be used to deduplicate file exports and to verify that the exported file is an exact copy of the source file. Supported by Google Docs, Sheets, and Slides files only.

Note: Custom Fields and Custom Field templates can be created and managed at both the Project and Organization levels, but are available for use in templates at the Project level only. To share a template containing Project-level Custom Fields to the Organization and Systems levels, as well as with other projects, you can download the template as an XML file using the Download as XML option, then use the Load from XML option to upload the template to the appropriate page at the Organization or System level or in another project. When you do so, however, the Custom Fields added to it in the originating Project are not populated as Custom Fields at the Organization level or in the receiving project.