Using the Standard Search Syntax for Basic Queries

This topic describes how to form queries using the default Standard search syntax. This topic addresses common searches for content and metadata. For information about more advanced searches, see Use the Standard Syntax for More Advanced Searches.

Searching for content requires an index level of Analytic (the default indexing level) or Content. You cannot search for content in a System or File Metadata Index, which is limited to either the small subset of System metadata or all File metadata.

Basic Term Treatment

The absence of a Boolean operator between terms means that the terms are treated as a Phrase.

Note: A regular term search is not case-sensitive, so you can type terms in either lowercase or uppercase. For term Search operations (such as a Phrase search), Stop Words are valid searchable terms.

Examples:

on the edge

quick brown fox

camille saint-saens

Boolean Operators

These operators specify occurrence requirements for terms and are not case-sensitive (so you can use the uppercase or lowercase version of the operator):

  • AND links two terms (or phrases) that must be present. If you want to search for "and" as an actual word, you must quote it, or quote the phrase, if it is in a phrase.
  • OR links two terms (or phrases), where at least one of the terms or phrases must be present. If you want to search for the word or, you must quote it, or quote the phrase, if it is in a phrase.
  • NOT precedes a term that must not be present. Valid uses of NOT are at the beginning of a query or grouping, or after AND or OR. (Therefore, speed AND NOT limit is a valid query, but speed NOT limit is not a valid query.) If you want to search for "not" as an actual word, you must quote it, or quote the phrase, if it is in a phrase.

Note: With the Standard Search syntax, occurrence operators such as + - preceding a term are not valid with the Standard syntax. They are treated as part of the term.

Examples:

cape cod AND lighthouse

quick OR fast

NOT scary

speed AND NOT limit

yellow and blue

In general, the examples in this topic identify the supported Booleans AND, OR, NOT in uppercase. Lowercase is used when the intended meaning is to treat these as words (for example, in a phrase) instead of as Booleans.

Operator Order

Digital Reef evaluates operators in the following order:

  1. w/N (Proximity)
  2. OR
  3. AND

For example, consider the following search:

dog OR cat OR bird w/2 (angry OR happy)

Digital Reef interprets this as the following:

(dog OR cat OR (bird w/2 (angry OR happy))

In this example, the Proximity portion is evaluated first, followed by the terms separated by OR. In this case, the search finds documents with dog or cat, or angry/happy birds. See Proximity Search for more information about performing a Proximity search.

Treatment of Punctuation

Most Digital Reef Projects support V2 tokenization (introduced in Data Center Sprint 127/Enterprise Release 4.3.5), but some much older Projects may continue to support V1 tokenization. The tokenization version dictates the handling of punctuation that is embedded in a word and how this punctuation is treated when it is part of a numeric term.

For the Projects using V2 tokenization, all common punctuation embedded in a word is not captured as part of the word. The punctuation embedded in a word is replaced by a space. As part of a numeric term, certain types of punctuation are preserved, while others are not.

For any older Projects using the legacy V1 tokenization, this treatment of punctuation applied to common punctuation except the apostrophe, hyphen (and dash), and underscore.

For example, V2 tokenization enables you to use the following search to find documents with content such as Smith’s one-on-one in a new Project:

smith AND one on one

Note: When supplying quotation marks in searches, use straight quotes only (for example, "world peace" or patternvalue::'jsmith@someco.com'). Do not use any other style of quotation marks such as curly quotation marks, as their use will not necessarily produce expected results.

Common Punctuation

The following table summarizes the treatment of common punctuation; that is, whether it is replaced by a space when it is embedded in a word based on the tokenization version. The table also indicates the treatment of the punctuation when it is part of a numeric term.

Note: In general, if particular punctuation marks or notations are required in your Project, you can capture them in a Pattern (regular expression). For example, you can create a Pattern for the copyright sign © where the Pattern contains \xA9. For more information about regular expression syntax, see Pattern (Regex) Syntax. For more information about managing Patterns, see Manage Project Patterns (Regular Expressions) and Add, Edit, or Copy a Pattern.

 

Punctuation Mark Known as Replaced by a Space in a Word in... Preserved as Part of a Term...
< and >, also « » Angle brackets (guillemets) V1, V2 No
' Apostrophe
(straight or angled ’)
V2 only Yes, as part of a numeric term, e.g., 111'000
{ and } Braces V1, V2 No
[ and ] Brackets V1, V2 No
: Colon V1, V2 No
, Comma V1, V2 Yes, as part of a numeric term, e.g., 111,000
– — Dash (e.g., En Dash, Em Dash) V2 only Yes, as part of a numeric term, e.g., 111–000 and 111—000
... Ellipses V1, V2 No
! Exclamation Point V1, V2 No
- Hyphen V2 only Yes, as part of a numeric term, e.g., 111-000
( and ) Parentheses V1, V2 No
. Period V1, V2 Yes, as part of a numeric term, e.g., 111.000
? Question Mark V1, V2 No
" Quotation Mark (double) V1, V2 No
; Semicolon V1, V2 Yes, as part of a numeric term, e.g., 111;000
/ Slash V1, V2 No
_ Underscore V2 only Yes, as part of a numeric term, e.g., 111_000

General Typography and Other Notations

The following table summarizes the treatment of typography and other notations.

Sign/Symbol Known as Replaced by a Space in a Word in... Preserved as Part of a Term...
´ acute accent (standalone diacritical mark) V1, V2 No, as a standalone character.

When the acute accent appears over a letter, the software preserves the accent and the letter as a unit.
& ampersand V1, V2 No
* asterisk V1, V2 No
@ at sign V1, V2 No
\ backslash V1, V2 Yes, as part of a numeric term, e.g., \000
bullet V1, V2 No
^ caret V1, V2 No
¸

cedilla V1, V2 No
¢ cent sign V1, V2 Yes, as part of a numeric term, e.g., 111¢
© copyright sign V1, V2 No
¤ currency sign V1, V2 Yes, as part of a numeric term, e.g., ¤000
dagger V1, V2 No
° degree V1, V2 Yes, as part of a numeric term, e.g., 111°
¨ diaeresis (can be umlaut) V1, V2 No
$ dollar sign V1, V2 Yes, as part of a numeric term, e.g., $000
= equals sign V1, V2 No
¼ ½ ¾ fractions (quarter, half, three quarters) V1, V2 No
` grave accent V1, V2 No
· interpunct V2 only No
¡ inverted exclamation point V1, V2 No
¿ inverted question mark V1, V2 No
µ Mu (micro) No, preserved Yes, e.g., alphaµbeta and 111µ000
× multiplication sign V1, V2 No
¬ negation V1, V2 No
# number sign V1, V2 No
÷ obelus (division sign) V1, V2 No
º ª ordinal indicators No, preserved Yes, e.g., alphaºbeta and 111ªbeta
% percent V1, V2 Yes, as part of a numeric term, e.g., 111%
pilcrow (paragraph marker) V1, V2 No
+ − plus sign and minus sign V1, V2 Yes, as part of a numeric term, e.g., +000 −000
± plus-minus V1, V2 Yes, as part of a numeric term, e.g., ±000
£ pound sterling (currency) V1, V2 Yes, as part of a numeric term, e.g., £000
® registered trademark V1, V2 No
§ section sign V1, V2 No
¹ ² ³ superscript V1, V2 No
~ tilde V2, V2 No
trademark V1, V2 No

¯
upperscore (macron) V1,V2 No
| and ‖ and ¦ vertical bar, pipe, broken bar V1, V2 No
¥ yen (currency) V1, V2 Yes, as part of a numeric term, e.g., ¥000

Note: The major difference between the initial V1 tokenization in older Projects and V2 tokenization is that V1 tokenization captured the apostrophe, hyphen (and dash), and underscore as part of a word, and V2 tokenization does not. Also, in V2 tokenization, standard tokenized metadata fields such as filename are tokenized the same as content. (For tokenized fields that search content, all punctuation is automatically ignored.)

Escaping Special Characters

\<special_character>

'<special_characters>'

Special characters include:

+ & | ! ( ) { } [ ] ^ " ~ * ? : \

' (only as a leading character in an email address)

This section attempts to provide usage notes about escaping special characters.

Note: In general, to understand the treatment of punctuation, general typography, and other notation in a term search, see Treatment of Punctuation. For more information on wildcards, see Wildcard Search.

You do not have to escape it's or o'connor.

For many special characters, when you need to escape a single special character and treat the character literally, you can precede the special character with the \ (backslash) character.

Note: When searching an untokenized field such as importpath, certain special characters cannot be escaped with a single backslash character and must instead be placed within double quotes if you want them treated literally. For example, the ! character cannot be escaped when searching a field like importpath because it conflicts with the supported parsing syntax, so it must be placed within double quotes. The * ? and $ characters also require special handling when searching untokenized fields; these characters must appear within double quotes and be prefaced with double escape characters (\\). (For tokenized fields that search content, all punctuation is automatically ignored.)

With RFC-2822 email address support, you do not have to escape some special characters in email addresses, such as a single !, : , &, or ~. Exceptions include the ' (single quote as a leading character), and for symbols reserved for search syntax, such as { } ( ) [ ] and ^, escaping is still required.

For data processed as of 4.3.11.0, you must escape any backslash (\) characters in the patternvalue field for a local or UNC path. See Pattern Search and Pattern Value Search for an example.

For searches on views subject to name validation, such as a Tag view or Custodian view, if the view name contains a character not prohibited, such as (), the view name must be enclosed in double quotes. Example: tag_view::"(dr)"

Examples:

\'user'@someco.com

a\&p

importpath::*\:Jane\ Bryant/Containers/mbox/files\:3025.eml

Field Content Search

<field>::<text_or_value>

<field> contains <text_or_value>

Field searches apply to content or a named metadata field that is indexed (a field name highlighted in purple followed by :: and the text or value for the field). In general, when you specify metadata fields in search queries, they will have purple highlighting. Note that the mixed-case Export-only fields (those without lowercase, indexed field equivalents) will not have the purple highlighting (for example, AllPaths or DocID), since they are not indexed and not intended to be searchable. Export-only fields that do have lowercase, indexed field equivalents (for example, DateEnded and dateended) will appear with the purple highlighting, but the field searched will be the indexed version.

You can also specify the field name, followed by the word contains and text or a value. Because contains has special meaning, if you want to search for the word contains, you must quote it ("contains"), or quote the phrase, if it is in a phrase. When you issue a field contains search, the report will display the clause (and executed query) like a field search (for example, author contains jones will appear as a clause in the search report and the executed query as author::jones). Unquoted occurrences of contains will appear in boldface to indicate the special meaning.

A metadata field that is tokenized is not case-sensitive and you can specify all or part of the field data, with or without wildcards. You can specify a phrase with the field data (but you should quote it).

A metadata field that is not tokenized requires all field content, which means that you must either specify the entire field content or use wildcards that cover the entire field content. You must observe case if the field is case-sensitive (many are not).

Examples (with field highlighting), individual term search (tokenized):

author contains jones

author::jones OR title::"question of honor"

to::franklin* AND author::roosevelt

filename::txt

report OR content::memo

content::(beach w/5 ball)

Example (with field highlighting), terms treated collectively (not tokenized):

docclass::message*

For all searches, the Include Metadata option expands the search of each keyword in a query to include a set of metadata fields as well as content (associated with a special contents field). This special field is always included and appears on the Metadata Search Fields list as grayed out (not selectable). For other fields, you can select the Metadata Search Fields you want to have searched automatically.

By default, the Include Metadata option is enabled for all term-based searches, which means that all individual keywords as well as the keywords in phrases, and in special searches such as Proximity Search, are subject to expansion (to include the set of metadata fields).

The Include Metadata option expands the entire search query to include the following list of Search fields by default:

  • author
  • category
  • checkedby
  • client
  • comment
  • comments
  • company
  • contents If you want to restrict a given clause keyword in a search query to content only, you can type content:: followed by the keyword, or content:: followed by a set of grouped keywords separated by the appropriate Boolean (for example, content::memo or content::(brokerage OR memo). If you provide a query such as war OR content::peace, the software will expand the search of the keyword war to include a common subset of metadata fields such as subject::war OR author::war, but will restrict the search of the keyword peace to content only. Note that for emails, a content:: search applies to both the email subject and the email body.
  • department
  • editor
  • edocsubject
  • filename
  • fullparticipants
  • group
  • keywords
  • modifiedby
  • owner
  • subject
  • title

See The Metadata List for descriptions of these fields.

For information about metadata field expansion, which is enabled by default, see Using the Include Metadata Option.

Field Exists Search

<field>::<exists>

NOT <field>:: <exists>

Confirms the presence or absence of a metadata field. When terms precede the NOT, remember to include an OR or AND.

Note: You will see an error if you attempt to search for an Analytic Metadata field using a target of a Data Set or all Imports. Such fields should only be searched using a Project Data-based target.

Examples (with field highlighting):

parent::<exists>

show AND NOT author::<exists>

NOT custodian_view::<exists>

stored_image::<exists>

Grouping Terms in a Query

(<grouped_terms>)

Use parentheses to group terms or fields within a query. Grouping can help clarify the relationships in a query.

In general, if you use more than one Boolean (AND, OR, NOT), you should use parentheses to group items and therefore clarify your search criteria.

Examples:

apple AND NOT (worm OR bug)

(opera OR operetta) AND phantom

Pattern Search and Pattern Value Search (for data processed as of 4.3.11.0)

pattern::<pattern_name>

patternvalue::'<value>'

For data processed as of 4.3.11.0, a Pattern search enables you to find documents matching a particular Pattern (defined to contain a regular expression). A Pattern Value search enables you to find documents with a matching Pattern value from the document content or email content (which does not include the email header).

  • If a Pattern is enabled, you can search for documents with that Pattern using the pattern metadata field and the Pattern name. You can type the Pattern name in either lowercase or uppercase format; regardless, the software always uses lowercase format. The System Patterns for email addresses (email), UNC paths (unc), and URIs/URLs (uri) are enabled by default, so you can always search for documents with those Patterns.
  • If a Pattern is enabled with values stored, then you can also search for a specific matching value using the patternvalue metadata field. For this search, you must place the value within single (straight) quotes for a literal search. The System Patterns email, unc, and uri have values stored by default, so you can always search for matching email addresses, UNC paths, and URIs/URLs from the document or email content. Note that for an email, the email header is not included in the content analyzed for Patterns.

Note: When searching for local or UNC paths using patternvalue, remember to escape each backslash (\) character in the value and match the case used in the value exactly (for example, if the value has a C:, you cannot find a match with c:). When you are searching for a URI/URL value, you must also match the case used in the value exactly.

See System Patterns and Numeric Settings for a list of all of the supplied System Patterns.

Examples:

pattern::email

patternvalue::'jsmith@someco.com'

patternvalue::'http://ecommerce.internet.com'

patternvalue::'http://www.state.gov/s/ct/'

patternvalue::'C:\\WINNT\\system32\\ole32.dll'

Phrase Search

word1 word2 word3 ...

"<multi-word phrase>"

Terms that appear in the query without a Boolean (connector) between them are automatically treated as a phrase. This applies to regular content or metadata field content.

Note: Stop Words are valid searchable terms in a Phrase Search.

For clarity, place straight quotation marks around words that you want treated as a phrase (for example, for Freeform Search). For Advanced Search, an exact phrase field is formatted so that you do not supply the quotation marks.

Note: When supplying quotation marks in searches, use straight quotes only (e.g., "world peace"). Do not use any other style of quotation marks such as curly quotation marks, as their use will not necessarily produce expected results.

Use straight quotes when the phrase includes a Boolean operator that you want to search as a word. You can use wildcards in a multi-term phrase with normal content or metadata content (for tokenized fields, such as bcc, cc, to, and from). Do not use wildcards in phrases for fields that use path tokenization rules, such as mailfolder and osfolder.

Note: When you view the results of a phrase search, each instance of the phrase is highlighted in a color (unless you clear the Highlight Search Terms checkbox in the Document Viewer). Individual terms within the phrase are not highlighted on their own within the document.

Examples:

black bear OR big bad wolf

over the top

subject::"advisory boar*"

"take it or leave it"

"terms and conditions"

"not true"

"double indemnity"

Proximity Search

Unordered:<word1_phrase1> w/N <word2_phrase2>

Ordered: <word1_phrase1> pre/N <word2_phrase2>

w/N (Unordered): Finds the first word/phrase specified within Nwords of the second word/phrase, in any order (and possibly overlapping).

pre/N (Ordered): Finds the first word/phrase specified within N words of the second word/phrase, in that order.

Basics about Proximity Search

When building an Unordered (w/N) or Ordered (pre/N) Proximity Search, note that the Proximity operator will apply to the word or phrase on either side of the operator (that is, you can think of the word1/phrase1, the operator, and the word2/phrase2 as a single unit).

When specifying a value for the Proximity operator, note that a value of 1 means that the terms must be next to each other. Advanced Search supplies a default value of 10 for an Unordered Proximity Search (ANY of these words within 10 of ANY of these words), but you can also type the value that you want to use.

If you want to construct a Proximity Search that uses the same word for word1 and word2, you should use an Ordered Proximity Search to ensure that you get the correct results. For example:

legal pre/5 legal

You can issue a Proximity search that includes a group clause with an OR between terms:

dog w/5 (cat OR bird)

This search is the equivalent of (dog w/5 cat) OR (dog w/5 bird).

You can also issue a Proximity search that includes a NOT operator:

dog NOT w/5 (cat OR bird)

Do not issue a Proximity Search that includes a group clause with an AND between terms, as this syntax is not valid:

dog w/5 (cat AND bird)

Note: Since you cannot create a Proximity Search that includes a group clause with an AND between terms, you should rewrite a search such as (jack AND jill) w/10 hill to avoid use of the AND between the terms in the group. For example, you could rewrite this search as (jack w/100 jill) w/10 hill.

Examples:

beach w/5 ball

content::(game w/5 time)

presidential NOT w/2 election

burden pre/5 proof

(jack OR jill) w/10 (pail OR water)

split second w/3 reaction w/3 time

(jack w/100 jill) w/10 hill

You can also group clauses (or a nested group of clauses) and apply a Proximity Search to multiple clauses or to the entire group.

Example:

The following example searches for "stock option" or grant, or "stock appreciation" or award, within 10 ordered words of either company or companies:

(("stock option" OR grant) OR ("stock appreciation" OR award)) pre/10 (company OR companies)

More About Using Chained Proximity (Radius) Operators

When constructing a search query with multiple Proximity (radius) operators, note the following:

  • The order of Proximity operators makes a difference.
  • When parentheses are not used, terms with chained Proximity operators should be read from left to right.
  • Terms with chained Proximity operators must use the same exact occurrence of a word.
  • Each step in the chain of Proximity operators forms a phrase, and the next word in the chain needs to have any portion of that phrase in its radius.

Consider this document text example, which has 2 occurrences of the word cat:

cat 1 2 3 4 tabby 6 copycat 8 9 10 11 happycat 13 14 15 16 17 18 19 persian 21 22 23 24 siamese cat

Using this example, the following searches would have a hit:

cat w/5 tabby w/20 persian

persian w/20 (cat w/5 tabby)

(cat w/5 tabby) AND (cat w/12 persian)

(tabby w/15 persian) w/10 happycat

(tabby w/15 persian) w/10 copycat

(tabby w/15 persian) w/1 happycat

(cat w/5 tabby w/15 persian w/5 siamese) where the search phrase itself has a maximum potential distance of 25

(tabby w/5 cat) w/100 (persian w/30 siamese) where each portion is a phrase, and each phrase is searched within the radius

Using this example and the notes above, the following searches would not have a hit, the first due to not meeting the radius upon evaluation from left to right, and the second because it requires use of the exact same word (in this case, the same cat occurrence):

cat w/20 tabby w/5 persian

cat w/5 tabby w/12 persian

Range Search

Inclusive date, text, or value range:

<field>::[<start_range>~~<end_range>]

Exclusive date, text, or value range:

<field>::{<start_range>~~<end_range>}

Specify a metadata field name followed by :: (two colons) and start and end range separated by ~~.

An inclusive range is enclosed in brackets [] and finds everything between the start range and end range, including the start and end of the range. An exclusive range is enclosed in braces {} and finds everything between the start range and end range, not including the start/end of the range.

Restriction: If you do not use the calendar icon to specify dates, you must supply the complete format, with hours, minutes, and seconds separated by hyphens (for example, when you use Freeform Search).

Examples:

datemodified::[2000-07-11-00-00-00~~2008-08-11-23-59-59]

size::{100~~1000}

docnum::[3.0.900~~3.0.1000]

Note that the docnum example shows an inclusive range search for the docnum field content, which uses a three-part number to identify a Data Set number (unique per Organization), a Data Set volume ID (unique per Data Set), and a document number (unique per Data Set volume). You must specify the entire three-part number when searching for a docnum value, since the field does not support wildcards.

If you specify a range search for a padded field such as the size field, the Query Executed shows the supplied range padded with leading zeros.

Special Email & Domain Searches

email-custodian::<emailaddr>

fullparticipants::<emailaddr>

participants::<emailaddr>

altparticipants::<emailaddr>

sentdomains::<domain>

rcvddomains::<domain>

participantdomains::<domain>

email-custodian::<emailaddr> searches for an email participant based on participant criteria (from the participants or altparticipants fields). Specifying an email address finds all messages with that email address in the fields altbcc and bcc, altcc and cc, altfrom and from, altsender and sender, and altto and to.

fullparticipants::<emailaddr> searches for all participant information from the fields altbcc and bcc, altcc and cc, altfrom and from, altsender and sender, and altto and to. This consolidated field also includes full name values (for example, Bill Smith <bsmith@someco.com>).

participants::<emailaddr> searches for an email participant from the fields bcc, cc, from, sender, and to, without full name values.

altparticipants::<emailaddr> searches for an email participant from the fields altbcc, altcc, altfrom, altsender, and altto, without full name values.

Domain Searches (sentdomains::<domain>, rcvddomains::<domain>, or participantdomains::<domain>) search for the sending, receiving, or participating domain part of an email.

Examples:

email-custodian::james@myco.com

fullparticipants::"all someco*"

fullparticipants::"Bill Smith <bsmith@someco.com>"

participants::bsmith@someco.com

altparticipants::jill.larkin@bigco.com

sentdomains::internet.com

rcvddomains::digitalreefinc.com

participantdomains::*pwcglobal*

Token (Pattern Occurrence) Search (for data processed prior to 4.3.11.0)

'token-<token_name>'

For data processed prior to Release 4.3.11.0, a token searchenables you to find documents matching a particular type of content controlled by Patterns (regular expressions). When a Pattern is enabled, you can search for documents with that type of content using the Pattern’s token name, in lowercase format. For example, System Patterns for email addresses, UNC paths, and URIs/URLs are enabled by default, so you can search for documents with that content by specifying 'token-email', 'token-unc', or 'token-uri', respectively.

Note: For Projects with data processed using V2 tokenization, you must enclose a token search within single quotes. Projects that use V1 tokenization do not require the token searches to be placed within single quotes, although doing so will still yield expected results. Also note that the token- format works the same, and has the same requirement for being placed within single quotes.

For data processed prior to Release 4.3.11.0, the software also uses tokens to identify errors (for example, token-error_unknown_type, token-error_no_content, and token-error_parsing). You might see these types of tokens in a Clustered view of pre-4.3.11.0 data in the Top Terms list for a Cluster. See Tokens and Token Search for more information about the supported tokens. See System Patterns and Numeric Settings for a list of all of the System Patterns and their associated tokens.

Examples:

'token-email'

'token-uri'

'token-ssn'

'token-image' (for pre-4.3.11.0 data)

'token-asvg' (for pre-4.3.11.0 data)

Wildcard Search

* (asterisk) matches one or more characters

? (question mark) matches one character.

Use wildcards anywhere within a term, field, or phrase with document content or metadata field content (for tokenized fields such as bcc, cc, to, and from). However, use wildcards carefully. Using a wildcard character as the first character of a term makes the search query more likely to expand to a term limit that the system may not be able to accommodate (based on available resources), and that causes the query to fail. Should a search fail for this reason (in which case, you see an error), refine your search. The system makes a best effort to handle term expansion based on system resources.

In general, when building queries, it is best to avoid an overly broad use of wildcards or a standalone wildcard. If your query is the equivalent of having a standalone * wildcard, the search will generate an error with a message. In this case, adjust and rerun the query. A standalone ? may not generate an error, but it will not give useful results, as it will match any document with a standalone character.

Wildcards can help search multi-term metadata fields that are untokenized (for example, importpath).

Examples:

"glob* aware"

"pea* and quiet"

conte?t

scien*

sp*n

title::court*

Usage Notes

If your Search query expands to more terms than the system can reasonably accommodate based on resources, you should refine your Search. This is likely to happen when your query includes a broad use of wildcards and/or Patterns.