Create a Data Set from a Search Result

Data Set Search Results or Imports Search Results (System or File Metadata level only) > right-click in tree > Create Data Set from Search Results

Requires Imports - Add/Edit Permissions

Note: Digital Reef now restricts import and reprocessing of data to Projects using Parsing Library V2. You can no longer import or reprocess data in a Parsing Library V1 Project. Therefore, this operation is no longer available in a V1 Project.

Users in a role that has permissions (for example, as an Organization Administrator) can create a Data Set from an entire set of search results, as follows:

From search results of an existing Data Set that is at the System Metadata or File Metadata representation level.
From an all Imports Search result, where all Data Sets in the results are at the System Metadata or File Metadata representation level.

Note: You can only use the Create Data Set from Results option if the backing Data Set, or all Data Sets under Imports, are at the File Metadata or System Metadata representation level. If the Data Set associated with the results is at the Content or Analytic level, this option will not be available. From Imports results, if any Data Set under Imports is at the Content or Analytic level, the option not be available. Forensic images such as LEFs and other Expert Witness Format (EWF) files including EnCase files require File Metadata as the representation level for import; for other types of source files, a representation level of System Metadata may be acceptable.

As a sample use case, you may want to create a Data Set after importing Logical Evidence (LEF) Files, imported using the representation level File Metadata and without populating Project Data. The following summarizes one supported procedure (in this example, for an LEF imported at the File Metadata level):

You import a Logical Evidence (LEF) file (image of a computer) using the representation level File Metadata. For this initial import, do not select the option Add to Project Data.
You search the imported Data Set to identify files that you want (for example, files in a specified directory).
From the search results, right-click the results in the tree and select Create Data Set from Search Results, specifying the Analytic Index representation level and Add to Project Data.
Select Copy To Document Storage for the new Data Set (Imports Summary or a right-click on a Data Set). This option enables you to copy the source files from the Data Set's import location to the Organization’s designated Document Storage. This helps free the storage associated with the import location.
Delete the original LEF Data Set with the File Metadata Index level.
Remove the original LEF file from the initial staging area.
With Project Data populated with the two Data Sets you created from the LEF Data Set, you can now perform the routine Project operations to search, tag, and export data from the Project.

When you select a search result to create a Data Set from another Data Set, note the following:

You must be careful not to create overlapping subsets that are both added to Project Data.
The source Data Set must have a representation level of System Metadata or File Metadata, depending on the requirements of the source type you are handling. The new Data Set can be at any of the representation levels.
You can monitor your import task in the Work Basket. Right-click the task and select View Details when the task is in progress. This will show you the state of the task, the various system components, and the configuration settings you used.

Create Data Set from Result Options and Summary

Search Task — Identifies the search task, which includes the query and the target, that will serve as the source for this operation. Color coding helps you identify different elements in the query (for example, the Boolean operators each have a dedicated color, metadata fields are purple, and relationship operators such as family_of are green).
Documents — Identifies the current document count for the source search result.
Name The unique name of an item. For many items, the name can have up to 100 characters. Some items, such as a Connector name, can have up to 255 characters. An Excluded Content Block name is limited to 32 characters. — Assign a unique name that will be used to represent the new Data Set. The name must be unique within the Organization and is subject to validation upon creation. The name can include alphanumeric characters, spaces between characters in the name (leading and trailing spaces are ignored), and some supported characters (such as a hyphen, underscore, and apostrophe). During validation, the software will also allow characters from foreign languages (for example, Korean characters). However, the following characters are not supported for Data Set names and will generate an error message indicating that your entry contains invalid characters:

! " # $ % & * + . / : ; < = > ? @ [ \ ] ^ { | } ~ “ ”

Description Provides a helpful description of an item. A description can have up to 255 characters. — Assign a helpful description of this Data Set (up to 1024 characters).
Index Level: Select the appropriate Index level. Use the default of Analytic Index if you want to have a Data Set take advantage of all analytic capabilities. The different levels of Indexing are as follows:

System Metadata – Restricts users to a system (structural metadata) view and a restricted subset of related operations. The Metadata List identifies the system (structural) metadata fields.

File Metadata – Restricts users to a metadata-only view of file (embedded) metadata as well as system metadata. This type is also associated with a restricted subset of related operations. When you select File Metadata, RAR, TAR, and ZIP archives are expanded by default to reveal the file metadata for the archive content, but you have the option to disable the expansion of RAR, TAR, and ZIP archives. File Metadata mode always supports the identification and import of Forensic Images (for example, EWF Files that collectively form a disk image).

Content Index – Gives users a view of document content and document metadata, thereby providing operations that enable analysis of both content and metadata. This is the only Index level you can later upgrade to an Analytic Index.

Analytic Index (default) – Enables users to take advantage of the additional analytics operations such as Document Similarity and Clustering. With this Indexing type, you can use a Project setting to ignore or include Stop Words for Document Similarity operations and Clustering, if applied.

Index Settings — If you have the appropriate permissions, you can review and edit the current Index Settings in effect for the Project by clicking Edit. This launches the Project Settings screen, from which you can control the Index Settings for the Project and the Data Set you are creating from a Result Set.
Pattern Detection — If you have the appropriate permissions, you can review and edit the current Pattern Detection Settings for the Project. By default, Pattern Detection is enabled, which enables you to click Editand view the current Patterns screen for the Project. You can then use the Patterns screen to control the Patterns for the Project and the Data Set you are creating.
Batch Name and Other Legal Discovery Options —The optional Batch name or number is one of the Legal Discovery options you can set for a Data Set. To define the other available eDiscovery options for a Data Set, select Other Legal Discovery Options. In the Other Legal Discovery Options popup that appears, you can view or set the complete set of eDiscovery options. If you do not set a Batch name or number for the Data Set at import, the Data Set name is used as the Batch value.
Add to Project Data – If the new Data Set is at the Content or Analytic Index level, enables you to add the new data set to Project Data automatically, thereby enabling users to work with views of Project Data right away, or to add the data without assigning data to Project Data. By default, this option is cleared. Note that you cannot add data at the System or File Metadata Index level to Project Data. You must enable this option to automatically create Custodians.

Note: If you want to take advantage of automatically creating Custodians, you must review the Project Settings > Index Settings, and supply values for the Custodian Directory Location and Media Directory Location fields, with values such as 1 and 2). This enables the software to recognize the Custodians. When adding Project Data, select the Add to Project Data option to automatically create the Custodians.

When you are ready, click Create Data Set to complete the process, or click Cancel to cancel the operation. While the Data Set is being created, the Imports Summary table creates an entry for the new Data Set. Unless you have a small Data Set, you will see that the State appears as In Progress while indexing is in progress. This changes to the appropriate level when the indexing completes. When the operation completes, you will be navigated to the new Data Set under Imports.

Notes:

Depending on the size of each Data Set, the processing could take some time and can be monitored for progress in the Work Basket.
A representation level of Analytic Index additionally permits use of similarity-based Searches such as More Like These and Search by Synthetic Document.
If the creation of vectors for an Analytic Index fails so some reason, the representation level will appear as Do Not Index. You can correct the issue by reindexing at the Analytic Index level.
If, when selecting the data in a Data Set, you type in a Data Area path that does not exist, the Work Basket will generate a failure to inform your that no Data Areas exist. If you have a combination of available and unavailable Data Areas in the set, the entire set will fail to import as a unit. Later, if the Data Areas specified become available, you can update the Data Set. If, however, the Data Areas in the set are incorrect, you should just delete that Data Set and define another one.
You can select any available index level for each Data Set, but be aware that having mixed levels of processing within Imports (for example, a mix of Data Sets at a File or System Metadata Index level and Data Sets at a Content or Analytic Index level) behaves as follows when you search all of Imports: metadata searches (for example, field searches, or a search run with the Include Metadata checkbox enabled) will return results that meet the metadata search query, but a search that includes a content (keyword) query (run without the Include Metadata option) will return an error message to indicate that a Content Index configuration file is not present.
If you need to subsequently accommodate changes other than document additions to a source Data Set (and perform the Indexing process again), you can add the revised Data Set to the Project under another name. (You may want to remove the old data area from the Project. )