Add a Data Set for Short Message Format to the Project

Imports > New Data Set for Short Message Format

Requires Imports - Add/Edit Permissions

Users in a role with the appropriate permissions can use the New Data Set for Short Message Format import option to create a new Data Set that accommodates Short Message Format (SMF) data. Currently, this type of import for Short Message Format applies to Cellebrite data only. Cellebrite Phone File Dumps are used to collect smartphone data, which can include a variety of items, including Instant Messages and Chat Messages.

About Processing Cellebrite Data as an SMF Format

An import of Cellebrite data relies on the information in a Cellebrite XML file. This XML file is assigned a Digital Reef filetype called Cellebrite iPhone Backup.

Processing of the Cellebrite XML file will yield the following as individual items in the Data Set:

Instant Messages, each captured as an email (.eml) with an SMF datatype of Cellebrite_InstantMessage
Chat Messages, each captured as an email (.eml) with an SMF datatype of Cellebrite_ChatMessage
Calendar entries, each captured as an email (.eml) with an SMF datatype of Cellebrite_CalendarEntry
Files (for example, loose files, such as a JPG or an Apple PLIST Binary File) with an SMF datatype of Cellebrite_File
Emails (.emls) with an SMF datatype of Cellebrite_Email

The appropriate datatype for an item appears in an SMF-specific metadata field called smf_datatype. The datatypes related to Cellebrite will have the prefix Cellebrite_ in the smf_datatype field.

Other items from sections of the Cellebrite XML are captured in CSV files, one file per identified Cellebrite datatype. Such items include the following Cellebrite datatypes:

ActivitySensorData
Call
Contact
Cookie
DeviceConnectivity
InstalledApplication
Location
LogEntry
Note
Notification
Password
SearchedItem
SocialMedia
UserAccount
Voicemail
VisitedPage
WebBookmark
WirelessNetwork

After import of your Cellebrite data, you can review the various files, such as the individual Cellebrite .emls, and check their metadata field information in the Document Viewer. The .emls have many of the standard metadata fields and fields that are specific to SMF. The following is a summary of some key fields for a Cellebrite .eml:

a filetype of email
an auxfiletype of eml
a msgsource value of Cellebrite
a msgclass of email
a docclass of Message
SMF metadata fields that start with smf (for example, smf_chatid, smf_datatype, smf_message_status, smf_message_type, and smf_source_application)

See the Metadata List for a full list of smf_ metadata fields, which appear in a separate table, along with the supported SMF Export-only fields (for example, SMF_DeletionDate, SMF_DeletionTime, SMF_ThreadLastActivityDate, SMF_ThreadLastActivityTime, SMF_ThreadStartDate and SMF_ThreadStartTime). The main metadata table also contains import fields that apply to Cellebrite (for example, isattach, and email-related fields supplying a unique identifier and name, such as cc_identifier and cc_name).

General Import Notes

In general, the process of adding data to a Project assumes that a service provider or enterprise System Administrator has made data available to your Organization using one or more Connectors, such as CIFS, NFS, Microsoft Exchange, or Microsoft SharePoint.

When you select data to add to a Project, note the following:

You can select one or more areas of data. You assign a name and description that represents to the set of selected data areas (as a Data Set).
When you finish adding your Short Message Format data, each named set eventually appears in the Imports Summary.
You can monitor your import task in the Work Basket. Right-click the task and select View Details when the task is in progress. This will show you the state of the task, the various system components, and the configuration settings you used. For an import task that completes with exceptions, you will see the (warning) icon, which changes to a download icon upon hovering, enabling you to download the WARNING_DETAILS_REPORT.csv file. This file identifies the exceptions (for example, if an attachment was unable to be retrieved, if there was an unknown XML tag from a source such as a voicemail, or if there was an unknown XML section that was skipped). In this case, the file includes columns that identify the Document (the dochandle) and the appropriate Reason.
You can check the Warnings and Errors Report on the Reports tab for the Data Set or the document metadata for parsingstatus 00072 SMF_Data_Extraction_Error. For a Short Message Format (SMF) record, this error indicates that the record was extracted from the SMF parent (for example, a Cellebrite XML file), but some aspect of the record could not be processed as expected. For the SMF parent, this error indicates that either an entire section could not be extracted, or that some aspect of a record prevented the record from being extracted.

How to Create a Data Set for Short Message Format

The New Data Set for Short Message Format screen is divided into several areas that enable you to set up an import of the new Data Set. You do not have to follow any particular order.

The different areas of the page are described in the following sections.

Select a Connector

From a list of available Connectors, select a Connector by clicking the entry for the Connector in the table. Each Connector is shown with the following information:

Connector Name — The name assigned to the Connector.
Description — The description for that Connector.
Type — The type of Connector (for example, CIFS, NFS, Exchange, or SharePoint). For information about the information used to create a Connector, see Create a Connector.
Mode — The Connector mode, either Read or Read/Write. For import, either is valid. The Connector mode is determined when a Connector is created.
Server — The IP address or URL associated with the Connector.
Path — The mount point for the Connector, which determines what you see in the Data Area section. (You should see what is available from the mount point on down.)

Make Your Path Selections

As soon as you select a Connector, the appropriate information for the Path appears in the Data Area section below the Connector list. In this area, you can make one or more Data Area selections for the Data Set using a directory hierarchy, which reflects the organization of the data based on the Connector. Collectively, the selected Data Areas form a Data Set. The following explains how to use the hierarchy:

If you really want to include all data in the directory structure in the import, select the top checkbox representing the top-level directory. In this case, the Data Areas icon appears next to the right of the selected checkbox to identify this as a selected Data Area. Note that selecting all Data in the structure may not be recommended, depending on the size and intent of the import.
If you want to navigate the directory structure to selectively include or exclude directories as part of the import, click the arrow to open a given directory. The arrow changes to indicate that there are directories below the selected directory. You can then continue to navigate the directory structure using the arrows.
When you locate data that you want to import, select the checkbox or multiple checkboxes at the appropriate level in the hierarchy. A Data Area icon, identified by a icon, appears beside the highest-level parent directory that you explicitly check.
When you make a selection, the folder you have selected is used to auto-populate the Data Set name, if you have not yet specified the name. After initial auto-population of the name, changing your folder selection will update the Data Set name to your most recent selection. If you clear your folder selections, the Data Set name will be cleared (unless you have edited the name after making your selection).
If you decide you want to skip certain items within a selected directory, clear any checkboxes that represent the items that you want to skip. For example, you may want to exclude snapshot directories from the import.
Once you explicitly select a directory and leave other directories cleared, the checkbox next to the top-level directory changes to to indicate that one or more items have been selected, but not all items (in other words, the top-level directory is tri-state). A lower-level directory that contains a mix of selected and cleared items will also indicate the tri-state of that directory.

View and Manage the Custodians and MediaIDs

As soon as you select a Connector and make one or more Data Area selections, the right side displays a section for Custodian information and a section for MediaID information. Reviewing this information helps you verify that your data will be imported as you expected for each Data Area of the Data Set. Each section provides the following information:

The Source Custodian or Source MediaID directory names on disk for each selected Data Area of the Data Set. These Source names reflect the staging of the data based on the Custodian Directory and Media Directory values in the current Project Index Settings, which specify a number of levels down from the Data Area. For example, a value of 1 for the Custodian Directory indicates that an auto-discovered Custodian will be in the first directory position, and a value of 2 for the Media Directory indicates that a MediaID will be in the second directory position. The Source names are view-only.

The equivalent Project Custodian or Project MediaID names, which are labeled to indicate whether they match an existing name in the Project or would be new to the Project. These Project names can be changed if you want to use other names for the import. For a given entry, you can click the down arrow in the combo box to review the list of existing names in the Project. You can then select a name from the list of existing names, or you can enter your own name in the box. Reviewing the list of Project Custodian names and Project MediaID names helps you spot discrepancies between your Source names and names that already exist in the Project For example, you might want to change a Custodian name that is spelled incorrectly or formatted slightly differently, and so you decide to select an existing Project Custodian value from the drop-down list to ensure use of the correct value for the import.

While viewing the Source Custodian and Source MediaID names, you can filter the list of names, as follows:

For the Source Custodian and Source MediaID columns, you can use the Filter text box to filter by the Custodian name on disk or the MediaID name on disk. The filtering is not case-sensitive. (The icon indicates that filtering is available.) Using the Filter box enables you to pinpoint the items you want to work with based on a Filter term search containing one or more characters you enter. You can explicitly apply a filter by typing one or more characters in the text box and clicking Enter (the return key). If you type one or more characters in the text box, the software will automatically apply the filter for you, and the text box changes to a yellow background color. For any applied filter, you can then clear the filter by removing the text in the box and clicking Enter, by removing the text from the box, or by clicking the that appears at the far right of the Filter box. Clearing a filter restores the list to its original state.
For the Project Custodian and Project MediaID columns, you can use the Custom Combination Filter box with a drop-down option to filter and then select from a list of Project Custodians or Project MediaIDs. The icon indicates that this type of filtering is available. This Custom box enables you to either filter for information that contains the characters you type in the Filter box and then optionally make a selection, or you can click in the box or click the filter icon to use the drop-down to scan and then select from a list of possible values. For example, if you type smith in the box, you will filter by all Project Custodian or Project MediaID names that contain smith. If you want to then select a specific instance of a name, you can then select just that name from the list. For any applied filter or selection, you can then clear the filter or selection by removing the text from the box or by clicking the that appears at the far right of the Filter box. Clearing a filter or selection restores the list to its original state.

While working with the Project Custodian and Project MediaID entries, note the following about existing versus new entries:

Note: For a Custodian name or MediaID name to be considered EXISTING, it must have a corresponding view in Project Data (that is a custodian_view or mediaid_view). A Custodian name or MediaID name with only a custodian or mediaid value for a Data Set is not considered EXISTING.

Each entry that is populated with a name will have one of the following labels:
- EXISTING — Indicates that the Custodian or MediaID name matches an existing Custodian or MediaID view name in Project Data.
- NEW — Indicates that the Custodian or MediaID name is new and does not match an existing Custodian or MediaID view name in Project Data.
- EXISTING — Indicates that you have made a change to select another existing Custodian or MediaID view name from the list of available names.
- NEW — Indicates that you have made a change to specify a new Custodian or MediaID name does not match an existing Custodian or MediaID view name in Project Data.

A Project Custodian or Project MediaID entry cannot be blank. If any entry is blank, it will not have one of the labels described above, you will see an error message, and you will not be able to create the Data Set.
Any Project Custodian or Project MediaID name you specify is subject to character validation. The Custodian or MediaID name can include alphanumeric characters, spaces between characters in the name (leading and trailing spaces are ignored), as well as a number of supported characters (such as a hyphen, underscore, period, pound sign, dollar sign, percent sign, ampersand, and apostrophe). During validation, the software will also allow characters from foreign languages (for example, Korean characters). However, the following characters are not supported for Custodian or MediaID names and will generate an error message indicating that your entry contains invalid characters:
! " * + / : ; < = > ? @ [ \ ] ^ { | } ~ “ ”
Any Project Custodian or Project MediaID name you specify does not have to be unique (that is, you might want multiple entries to have the same selected name).
Any Project Custodian or MediaID name you specify can be up to 100 characters long before being trimmed.
Your Project Custodian and MediaID name selections will be retained regardless of any filtering you do for the Source Custodians or Source MediaIDs.
Your Project Custodian and MediaID name selections will not be retained if you select one of the Refresh & Reset options, as described in the next section.

Refresh & Reset Options

You can use the Refresh & Reset drop-down menu to select one of the following options, which determine the scope of the refresh and reset:

Select Custodians and MediaIDs only if you need to refresh and reset only the Custodian and MediaID values (for example, to pick up changes to the Custodian and Media values in the Index Settings). This option will reset any of your changes to the Project Custodian and Project MediaID names.
Select Folders, Custodians, and MediaIDs if you need to pick up changes made to the Folder structure on disk as well as pick up changes to the Custodian and MediaID values in the Index Settings. This option will clear your checkbox selections for Folders and reset any of your changes to the Project Custodian and Project MediaID names. (No Custodian and MediaID names will appear until you make another Folder selection.)

Assign a Name and Optional Description

In this area, you supply a name and optional description for the Data Set in Short Message Format.

Name The unique name of an item. For many items, the name can have up to 100 characters. Some items, such as a Connector name, can have up to 255 characters. An Excluded Content Block name is limited to 32 characters. (required) — A unique name that will be used to represent the Data Set. Unless you explicitly provide a Data Set name before you make a folder selection, a folder you select is used to auto-populate the Data Set name. After initial auto-population of the name, changing your folder selection will update the Data Set name to your most recent selection. In general, if you clear your folder selections, the Data Set name will be cleared (unless you have edited the name after making your selection). The Data Set name must be unique within the Organization and is subject to validation upon creation or edit. The name can include alphanumeric characters, spaces between characters in the name (leading and trailing spaces are ignored), and some supported characters (such as a hyphen, underscore, and apostrophe). During validation, the software will also allow characters from foreign languages (for example, Korean characters). However, the following characters are not supported for Data Set names and will generate an error message indicating that your entry contains invalid characters:

! " # $ % & * + . / : ; < = > ? @ [ \ ] ^ { | } ~ “ ”

Note: These character restrictions apply to most tree items, such as Imports, Exports, Tags, Folders, Saved Searches, Workflows, Comparisons, Samples, and Synthetic Documents. To support auto-discovery of Custodians based on staging, a Custodian name has fewer restrictions regarding invalid characters.

Description Provides a helpful description of an item. A description can have up to 255 characters. — An optional, helpful description of this Data Set in Short Message Format. A Data Set represents one or more Data Areas (locations) that you select in a directory hierarchy.

Select an Index Level

Select the appropriate Index representation level. For Index Level:, select the appropriate Index level. Use the default of Analytic Index if you want to have a Data Set take advantage of all analytic capabilities. The different levels of Indexing are as follows:

System Metadata – Restricts users to a system (structural metadata) view and a restricted subset of related operations. The Metadata List identifies the system (structural) metadata fields.
File Metadata – Restricts users to a metadata-only view of file (embedded) metadata as well as system metadata. This type is also associated with a restricted subset of related operations. When you select a File Metadata index level, RAR, TAR, and ZIP archives are expanded by default to reveal the file metadata for the archive content. File Metadata mode always supports the identification and import of Forensic Images (for example, EWF Files that collectively form a disk image). Mail containers are not processed for a File Metadata index level.
Content Index – Gives users a view of document content and document metadata, thereby providing operations that enable processing and analysis of both content and metadata. This is the only Index level you can later upgrade to an Analytic Index. (OCR requires either a Content Index or Analytic Index level.)
Analytic Index (default) – Enables users to take advantage of the additional analytics operations such as Document Similarity and Clustering. With this Indexing type, you can use an Advanced Analytic setting under the Project Index Settings to ignore or include Stop Words for Document Similarity operations and Clustering, if applied.

Note: As you add Data Sets to the Project, be aware that having mixed levels of processing within Imports (for example, a mix of Data Sets at a File or System Metadata Index level and Data Sets at a Content or Analytic Index level) behaves as follows when you search all of Imports: metadata searches (for example, field searches, or a search run with the Include Metadata checkbox enabled) will return results that meet the metadata search query, but a search that includes a content (keyword) query (run without the Include Metadata option) will return an error message to indicate that a Content Index configuration file is not present.

Select Index Settings

If you have the appropriate permissions, you can review and manage the current Project Index Settings by clicking Edit. This launches the Project Settings screen, from which you can control the Index Settings for the Project and this new Data Set import.

If you want to take advantage of automatically creating Custodians when you Add to Project Data, enter Index Setting values for the eDiscovery Settings (the Custodian Directory Location and Media Directory Location fields, with values such as 1 and 2). This enables the software to recognize the Custodians. When adding Project Data, select the Add to Project Data option to automatically create the necessary Custodians.

Review Pattern Detection Settings

If you have the appropriate permissions, you can review and manage the current Project Pattern Detection Settings. By default, Pattern Detection is enabled, which enables you to click Edit and manage the current Patterns screen for the Project. You can then use the Patterns screen to control the Patterns for the Project and this new Data Set import.

Note: If you decide to modify the current Project Patterns while setting up an import, keep in mind that those Pattern settings remain in effect until the next time you modify the Project Patterns. This means that if you tailor the Project Patterns for a particular import, those Pattern settings will also apply to subsequent imports. It is therefore recommended that you verify the current Project Pattern settings while setting up a given import. If you change the Project Patterns after import and need to perform a Pattern update for a Data Set, you can either use the Data Set Update Patterns option, or you can reprocess.

Assign an Optional Batch Name or Number

If you want, you can specify a Batch name or number for the new Data Set. If you do not set a Batch name or number as part of import, the Data Set name is used. To verify the Batch name or number after import, you can view the batch field in the document metadata after import, the Imports Summary (which reflects the value stored in the index), or the Data Set Report > Scan History, which reflects the name/value exactly as you type it here for Legal Discovery purposes. It also appears in the appropriate file manifest upon export.

View Other Legal Discovery Options

The optional Batch name or number is one of the Legal Discovery options you can set for a Data Set. To define the other available eDiscovery options for a Data Set, select Other Legal Discovery Options. In the Other Legal Discovery Options popup that appears, you can view or set the complete set of eDiscovery options.

Note: If you do not set a Batch name or number for the Data Set at import, the Data Set name is used as the Batch value.

Optionally Add to Project Data

If you have Project Data - Add/Edit permissions, and are adding data at the Analytic Index or Content Index level, you can select this checkbox to add the selected data to Project Data automatically, thereby enabling users to work with views of Project Data right away. By default, this option is cleared, which adds the data without assigning data to Project Data. Note that you cannot add data at the System or File Metadata Index level to Project Data.

Enable this option if you want to automatically create views for Custodians, MediaIDs, and Batches as part of the import. Note that when you perform an import at the Analytic Index level with this option (or select Add to Project Data as a right-click option for the Data Set after import), the software performs Custodian, MediaID, and Batch view generation for all documents in Project Data, not just the documents to be added with this Data Set batch.

Submit or Cancel the Operation

When you have finished the setup, click Create Data Set for Short Message Format to complete the process and return to the Imports Summary. If you do not want to perform the operation, click Cancel instead.

When you click Create Data Set, you will see a message if you have not yet supplied the information for a required field, such as Name. Once all of the required information is supplied, clicking Create Data Set will kick off the import process, and you will not be able to click Create Data Set again while the dialog is still displayed. When the import is complete, you will see your new Data Set appear in the Data Sets table. Unless you have a small Data Set, you will see that the State appears as In Progress while indexing is in progress. This changes to the appropriate level when the indexing completes.

You can monitor your import task in the Work Basket. Right-click the task and select View Details when the task is in progress. This will show you the state of the task, the various system components, and the configuration settings you used.

Note: If an Import fails, the operation will report an error (for example, in the Work Basket). If you also had the Add to Project Data option enabled for the Import, the Adding documents to Project Data operation will report an error as well, indicating that the required representation does not exist (e.g., an Analytic Index).