System Patterns and Numeric Settings

This topic describes the default System Patterns and the available Numeric Settings.

About System Patterns

A Pattern is a sequence of characters typically used to perform a pattern match. This method of pattern matching is widely used in computer programming for its simplicity and power. You can use Patterns to identify patterned data during the parsing process.

The Digital Reef system includes a number of predefined Patterns, called System Patterns. The email, UNC, and URI System Patterns are enabled by default. In new cases, these Patterns store values by default.

For newly imported, updated, or reprocessed data in 4.3.11.0 or later, each Pattern has a Pattern name, and you can search for an enabled Pattern using the Pattern name in uppercase or lowercase format and the software will normalize the name to lowercase. This means that you can search using the Pattern name without regard to case, using the pattern metadata field. For example, the email Pattern is enabled by default, so you can search for documents matching the Pattern:

pattern::email

If a Pattern is enabled with values stored, then you can also search for a specific value using the patternvalue metadata field. For this search, you must place the value within single quotes (for a literal search). For example, the email Pattern is enabled with values stored, so you can search for documents matching a Pattern value:

patternvalue::'jsmith@someco.com'

For data processed prior to 4.3.11.0, each Pattern has a Token name, and you can specify a Token name using uppercase or lowercase format and the software will normalize the Token name to lowercase. This means that you can search using this Token name without regard to case, using the following format:

'token-<token_name>'

In general, a Pattern name must be unique within the Project and is subject to validation upon creation or edit. The name can include alphanumeric characters, spaces between characters in the name (leading and trailing spaces are ignored), and some supported characters (such as a hyphen, underscore, and apostrophe). During validation, the software will also allow characters from foreign languages (for example, Korean characters). However, the following characters are not supported for Pattern names and will generate an error message indicating that your entry contains invalid characters:

! " # $ % & * + . / : ; < = > ? @ [ \ ] ^ { | } ~ “ ”

Note: See About Tokens and Patterns for more information about Patterns.

Each System Pattern is described in the sections that follow.

Social Security Numbers

Pattern Name: ssn

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::ssn

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-ssn'

Default State: Disabled

RegEx: \b([0-6]\d{2}|7[0-6]\d|77[0-2])(([ ]\d{2}[ ])|([\-]\d{2}[\-])|\d{2})(\d{4})\b

Description: The Social Security Number regex matches Social Security Numbers that conform to the SSN numbering rules and that are rendered in any combination of these formats:

  • 078051120
  • 078-05-1120
  • 078 05 1120

Note: The example SSN is not a valid number. You can find out more information about the history of this number from the Social Security Administration's web site.

Credit Card Numbers

Pattern Name: creditcard

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::creditcard

Searchable Token Name (for data processed prior to 4.3.11.0) : 'token-creditcard' 

Default State:Disabled

RegEx: \b(\d{4}[ -]?\d{4}[ -]?\d{5}|\d{4}[ -]?\d{6}[ -]?\d{4}|\d{4}[ -]?\d{7}[ -]?\d{4}|\d{4}[ -]?\d{6}[ -]?\d{5}|\d{4}([ -]?\d{4}){3}|(\d{17,19}))\b

Description: This Pattern matches all known credit card formats. For this credit card Pattern, the Digital Reef software performs special processing to ensure that the Pattern passes the luhn checksum used by all credit cards.

Examples:

  • 4539531370575900
  • 4539 5313 7057 5900
  • 4539-5313-7057-5900

Phone Numbers (North American Numbering Plan)

Pattern Name: phone

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::phone

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-phone'

Default State: Disabled

RegEx: \(?\b[0-9]{3}\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}\b

Description: Matches a 10-digit phone number. The area code can be in parenthesis and the digit groupings can be separated by periods, spaces, or hyphens as in these examples:

  • (123) 123-4567
  • 123 123 4567
  • (123) 123-4567
  • 123.123.4567

Email Addresses

Pattern Name: email

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::email

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-email'

Default State: Enabled with values stored

RegEx: (?i)(?<=^|[^a-z0-9!#$%\\&'*+/=?\^_`{|}~-])[a-z0-9!#$%\&'*+/=?\^_`{|}~-]{1,256}(?:\.[a-z0-9!#$%\&'*+/=?\^_`{|}~-]{1,256}){0,256}@(?:[a-z0-9](?:[a-z0-9-]{0,256}[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]{0,256}[a-z0-9])?(?=$|[^a-z0-9])

Description: Matches any email as defined by RFC-2822. As long as the Store Value check box is enabled (the default), matching email addresses are normalized to lowercase. They are not preserved in their original case, which could be uppercase, lowercase, or mixed case. You can specify the @ sign when searching for the stored email addresses.

Dates (North American format)

Pattern Name: date_us 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::date_us

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-date_us'

Default State: Disabled

RegEx:\b(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}\b

Description: Supports the following date formats and ranges:

  • 1/1/00 to 12/31/99
  • 01/01/1900 to 12/31/2099

Dates (European format)

Pattern Name: date_euro

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::date_euro

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-date_euro'

Default State: Disabled

RegEx:\b(19|20)?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])\b

Description: Supports the following date formats and ranges:

  • 00-1-1 to 99-12-31
  • 1900-01 01 to 2099-12-31

IPV4 Addresses

Pattern Name: ipv4 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::ipv4

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-ipv4'

Default State: Disabled

RegEx:\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

Description: Supports any octet value from 0 to 255 with dots. For example:

  • 192.168.123.234
  • 192.16.1.1

UNC Paths

Pattern Name: unc

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::unc

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-unc'

Default State: Enabled with values stored

RegEx:(?i)(\b[^/:*\?"<>|\r\n\t\x00-\x1F]:|\\\\[a-z0-9]+)(\\[^/:*\?"<>|\r\n\t\x00-\x1F]{0,256}){1,256}

Description: Matches valid LOCAL and UNC paths. An incomplete path, such as c: or \\servername, would not be returned as a match. By default, matching values are preserved in their original case, which could be uppercase, lowercase, or mixed case. This enables you to search for the value using the case in which it was stored.

Example of Local path values:

  • c:\
  • c:\Data\Subfolder
  • c:\Data\Subfolder\
  • c:\Data\Subfolder\MyFile.txt
  • c:\My Documents\My Letters
  • c:\My Documents\My Letters\
  • c:\My Documents\My Letters\Letter to Mum.txt

In general, in your search for a local path that contains one or more backslash (\) characters, you must escape each \ in the query. You must also match the case of the value exactly (for example, if the value has a C:, you cannot find a match with c:).

For data processed in 4.3.11.0 or later, you use a patternvalue field search to find a local path, escaping each backslash character and enclosing the entire value within single quotes. For example, to search for the path value c:\My Documents\My Letters, you format the search as follows:

patternvalue::'c:\\My Documents\\My Letters'

For data processed prior to 4.3.11.0, you enclose the entire path value within single quotes, as follows:

'c:\\My Documents\\My Letters'

Examples of UNC path values:

  • \\server\
  • \\server\c$
  • \\server\c$\
  • \\server\c$\autoexec.bat
  • \\server\data\Subfolder
  • \\server\data\Subfolder\
  • \\server\data\Subfolder\MyFile.txt
  • \\server\docs\My Letters
  • \\server\docs\My Letters\
  • \\server\docs\My Letters\Letter to Mum.txt

In general, in your search for a UNC path that contains one or more backslash (\) characters, you must escape each \ in the query. You must also match the case of the value exactly.

For data processed as of 4.3.11.0, you perform a patternvalue search for a UNC path that escapes each backslash character and encloses the entire value within single quotes. For example, to search for \\server\, you format the search as follows:

patternvalue::'\\\\server\\'

For data processed prior to 4.3.11.0, you enclose the entire value within single quotes, as follows:

'\\\\server\\'

Common Uniform Resource Identifiers (URIs/URLs)

Pattern Name: uri 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::uri

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-uri'

Default state:Enabled with values stored

\b(?i)(file|gopher|news|nntp|telnet|ftps?|https?|sftp|ldaps?)://([-A-Z0-9.]+)(:[0-9]+)?(/[-A-Z0-9+\&@#$/%=~_|!:,.;()'*]*)?(\?[-A-Z0-9+\\&@#$/%=~_|!:,.;()'*]*)?

Description: Most common URI values (file, gopher, news, nntp, telnet, ftp, ftps, http, https, sftp, ldap, ldaps) defined in RFC 3986. By default, matching values are preserved in their original case, which could be uppercase, lowercase, or mixed case.This enables you to search for the value using the case in which it was stored.

Examples of URI values:

https://courseweb.pitt.edu

http://www.state.gov/s/ct/

http://www.InternetNews.com

You can search for the stored URI/URL value, depending on how your data was processed.

For data processed in 4.3.11.0 or later, you use a patternvalue field search in which you enclose the entire URI/URL value within single quotes and use the correct case that reflects how the value was stored. For example:

patternvalue::'http://www.InternetNews.com'

For data processed prior to 4.3.11.0, you enclose the entire URI/URL value within single quotes and observe the correct case for the stored value. For example:

'http://www.InternetNews.com'

Patterns for Personally Identifiable Information (PII)

The remaining Patterns, described in alphabetical order, support Personally Identifiable Information (PII) from various countries and regions.

For data processed in 4.3.11.0 or later, you can search for any of these PII Patterns using the pattern field and the pattern name, as long as the Patterns are enabled. For example:

pattern::asvg

For data processed prior to 4.3.11.0, you can search for any of these PII Patterns using the format 'token-<token name>' for an enabled Pattern, without regard to case (that is, you can specify the Pattern's Token name as either lowercase or uppercase, since the name is normalized to lowercase). For example:

''token-asvg' or 'token-ASVG'

AK

Pattern Name: AK 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::ak

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-ak'

Default state: Disabled

RegEx: \b[3-6]\d{2}(0[1-9]|1[012])(0[1-9]|[12]\d|3[01])\d{4}\b

Lithuania: Personal code (Asmens kodas).

Example:

45911231023

ASVG

Pattern Name: ASVG 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::avsg

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-avsg'

Default state: Disabled

RegEx:\b[0-9]{10}\b

Description: Austria: Social insurance number.

Example:

1234030869

AVS

Pattern Name: AVS 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::avs

Search using the Token Name (for data processed prior to 4.3.11.0) :'token-avs'

Default state: Disabled

RegEx:\b[0-9]{3}\.?[0-9]{2}\.?[0-9]{3}\.?[0-9]{3}\b

Description: Switzerland: Old AVS format with personal data encoded.

Example:

324.65.242.000

AVS-2008

Pattern Name: AVS-2008 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::avs-2008

Searchable Token Name: (for data processed prior to 4.3.11.0): 'token-avs-2008'

Default state: Disabled

RegEx:\b756\.?[0-9]{4}\.?[0-9]{4}\.?[0-9]{2}\b

Description: Switzerland: New AVS format (16 digits with constant prefix 756, which is ISO 3166-1 country code).

Example:

756.5152.7017.84

BE-ID

Pattern Name: BE-ID 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::be-id

Search using the Token Name (for data processed prior to 4.3.11.0) :'token-be-id'

Default state: Disabled

RegEx: \b\d{2}[.]?(0[1-9]|1[012])[.]?(0[1-9]|[12]\d|3[01])-?\d{3}[.]?\d{2}\b

Description: Belgium: Identification number of the National Register. Also used on SIS (social security) card.

Example:

70.01.16-287.31

BSN

Pattern Name: BSN 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::bsn

Search using the Token Name (for data processed prior to 4.3.11.0) :'token-bsn'

Default state: Disabled

RegEx:\b[0-9]{9}\b

Description: Netherlands: Burgerservicenummer, sofinummer (Citizen's Service Number).

Example:

987654321

CF

Pattern Name: CF

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::cf

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-cf'

Default state: Disabled

RegEx:[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}

Description: Italy: Codice fiscale.

Example:

PLDTLL47S04L424T

CNF

Pattern Name: CNF 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::cnf

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-cnf'

Default state: Disabled

RegEx: \b[1-9]\d{2}(0[1-9]|1[012])(0[1-9]|[12]\d|3[01])\d{6}\b

Description: Romania: Nr personal.

Example:

2121212121218

ČOP

Pattern Name: ČOP  

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::čop

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-čop'

Default state: Disabled

RegEx:\b[A-Z]{2}[0-9]{6}\b

Description: Czech, Slovakia: Citizen's Identification Card Number (Číslo občianskeho preukazu).

Example:

AB379999

CPR

Pattern Name: CPR 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::cpr

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-cpr'

Default state: Disabled

RegEx: \b(0[1-9]|[12]\d|3[01])(0[1-9]|1[012])\d{2}-?\d{4}\b

Description: Denmark: CPR-nummer (personnummer).

Example:

020955-2017

DNI

Pattern Name: DNI 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::dni

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-cni'

Default state: Disabled

RegEx:\b[0-9,X,M,L,K,Y][0-9]{7}-?[A-Z]\b

Description: Spain: Documento Nacional de Identidad.

Example:

99999999-R

EGN

Pattern Name: EGN 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::egn

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-egn'

Default state: Disabled

RegEx: \b\d{2}([024][1-9]|[135][012])(0[1-9]|[12]\d|3[01])\d{4}\b

Description: Bulgaria: Uniform Civil Number (Bulgarian: Единен граждански номер).

Example:

7608010133

FN

Pattern Name: FN 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::fn

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-fn'

Default state: Disabled

RegEx:

\b(0[1-9]|[12]\d|3[01])[.]?([04][1-9]|[15][012])[.]?\d{2}[ ]?\d{5}\b

Description: Norway:Fødselsnummer.

Example:

010168 46647

HETU

Pattern Name: HETU 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::hetu

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-hetu'

Default state: Disabled

RegEx: \b(0[1-9]|[12]\d|3[01])[.]?(0[1-9]|1[012])[.]?\d{2}[+\-A]\d{3}[0-9A-Z]\b

Description: Finland: Personal identity code (henkilötunnus).

Example:

101052-719E

IBAN

Pattern Name: IBAN 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::iban

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-iban'

Default state: Disabled

RegEx: \b[A-Z]{2}?[ ]?\d{2}[ ]?([0-9A-Z]{4}[ ]?){1,5}[0-9A-Z]{1,4}\b

Description: Europe: ISO 13616 with ISO 3166 country code prefix in compact format code prefix in compact format.

Example:

PL 10 1140 2017 0000 4202 0971 531

IK

Pattern Name: IK 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::ik

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-ik'

Default state: Disabled

RegEx: \b[1-6]\d{2}(0[1-9]|1[012])(0[1-9]|[12]\d|3[01])\d{4}\b

Description: Estonia: Isikukood (personal code)

Example:

47111119876

NHS

Pattern Name: NHS

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::nhs

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-nhs'

Default state: Disabled

RegEx:\b[0-9]{3}[ -]?[0-9]{3}[ -]?[0-9]{4}\b

Description: UK: UK NHS Number

Example:

401 023 2137

NI

Note: As of 4.3.10.0, the NI pattern is no longer included in a Patterns template for a new Organization or a new System Patterns Template. It remains in existing Projects.

Pattern Name: NI

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::ni

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-ni'

Default state: Disabled

RegEx:\b[A-CEGHJ-PR-TW-Z][A-CEGHJ-NPR-TW-Z]{1}[0-9]{6}[A-DFM]?\b

Description: UK: National identification number

Example:

JG103759A

NINO

Pattern Name: NINO 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::nino

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-nino'

Default state: Disabled

RegEx: (?i)\b(?!BG)(?!GB)(?!NK)(?!KN)(?!TN)(?!NT)(?!ZZ)(?:[A-CEGHJ-PR-TW-Z][A-CEGHJ-NPR-TW-Z])[ ]?[0-9]{2}[ ]?[0-9]{2}[ ]?[0-9]{2}[ ]?([A-DFMP]\b|[ ])

Description: UK: National insurance number

Examples:

JG 12 13 16 A

AB123456C

NIR

Pattern Name: NIR 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::nir

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-nir'

Default state: Disabled

RegEx: \b[123478][ ]?\d{2}(0[1-9]|1[012])[ ]?(\d{5}|2[AB]\d{3})[ ]?\d{3}[ ]?\d{2}\b

Description: France: Social security number (INSEE)

Example:

1 5102 46102 043 25

Personnr

Pattern Name: Personnr 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::personnr

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-personnr'

Default state: Disabled

RegEx: \b\d{2}(0[1-9]|1[012])(0[1-9]|[12]\d|3[01])[-+]\d{4}\b

Description: Sweden: Personal id number

Example:

610321-3499

PESEL

Pattern Name: PESEL 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::pesel

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-pesel'

Default state: Disabled

RegEx: \b\d{2}(0[1-9]|1[012])(0[1-9]|[12]\d|3[01])\d{5}\b

Description: Poland: National identification number

Example:

44051401458

PK-Germany

Pattern Name: PK-Germany 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::pk-germany

Search using the Token Name (for data processed prior to 4.3.11.0) :'token-pk-germany'

Default state: Disabled

RegEx: \b(0[1-9]|[12]\d|3[01])(0[1-9]|1[012])\d{2}-?[A-Z]-?\d{5}\b

Description: Germany: Personenkennziffer (Bundeswehr)

Example:

261083-C-20917

PK-Latvia

Pattern Name: PK-Latvia 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::pk-latvia

Search using the Token Name (for data processed prior to 4.3.11.0) :: 'token-pk-latvia'

Default state: Disabled

RegEx: \b(0[1-9]|[12]\d|3[01])(0[1-9]|1[012])\d{2}-?[0-2]\d{4}\b

Description: Latvia: Personal no (Personas kodas)

Example:

161171-22345

PPS

Pattern Name: PPS 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::pps

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-pps'

Default state: Disabled

RegEx:\b[0-9]{7}[A-Z]W?\b

Description: Ireland: Personal Public Service Number

Examples:

1234567A

1234567FW

Pattern Name: 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::rč

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-rč'

Default state: Disabled

RegEx: \b\d{2}([05][1-9]|[16][012])(0[1-9]|[12]\d|3[01])/?\d{4}\b

Description: Czech, Slovakia: Birth Number (Rodné číslo)

Example:

685229/4449

ssPIN

Pattern Name: ssPIN 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::sspin

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-sspin'

Default state: Disabled

RegEx: (?<=^|[^A-Za-z0-9+/=])[A-Za-z0-9+/]{22}([A-Za-z0-9+/]{4})?[A-Za-z0-9+/=]{2}(?=$|[^A-Za-z0-9+/=])

Description: Austria: New national identification number

Example:

MDEyMzQ1Njc4OWFiY2RlZg==

Steuer-ID

Pattern Name: Steuer-ID 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::steuer-id

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-steuer-id'

Default state: Disabled

RegEx: \b[0-9]{2}[ ] ? [0-9]{3}[ ] ? [0-9]{3}[ ] ? [0-9]{3}\b

Description: Germany: Steuer-Identifikationsnummerr

Example:

71 214 053 962

Szam

Pattern Name: Szam 

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::szam

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-szam'

Default state: Disabled

RegEx: \b[1-8][ ]?\d{2}([024][1-9]|[135][012])(0[1-9]|[12]\d|3[01])[ ]?\d{4}\b

Description: Hungary: Personal identification number (Személyi szám)

Example:

1 651105 6666

TAJ

Pattern Name: TAJ

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::taj

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-taj'

Default state: Disabled

RegEx: \b[0-9]{3}[ ] ? [0-9]{3}[ ] ? ? [0-9]{3}\b

Description: Hungary: Social insurance number (TAJ szám)

Example:

123 456 789

Tautotita

Pattern Name: Tautotita

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::tautotita

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-tautotita'

Default state: Disabled

RegEx: \b([A-Z]|[ABEZHIKMNOPTYX]{2})-?\d{6}\b

Description: Greece: Tautotita

Examples:

XZ-460380

S-737194

VSNR-RVNR

Pattern Name: VSNR-RVNR

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::vsnr-rvnr

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-vsnr-rvnr'

Default state: Disabled

RegEx: \b\d{2}(0[1-9]|[12]\d|3[01])(0[1-9]|1[012])\d{2}[A-Z]\d{3}\b

Description: Germany: Versicherungsnummer, Rentenversicherungsnummer

Example:

65170839J003

ZMR-Zahl

Pattern Name: ZMR-Zahl

Search using the Pattern Field (for data processed as of 4.3.11.0): pattern::zahl

Search using the Token Name (for data processed prior to 4.3.11.0) : 'token-zahl'

Default state: Disabled

RegEx:\b[0-9]{12}\b

Description: Austria: National identification number - Zentrales Melderegister (Central Register of Residents - CRR)

Example:

597109862729

About Numeric Settings

Numeric settings affect the parsing of document content. These settings are managed as part of the Index Settings for the Project.

The Project Index Settings include Parsing Settings that enable you to control the parsing of terms representing either currency, numeric quantities, or numeric terms. By default, these numeric settings are enabled, which allows the numeric values to be stored, and users can search for the appropriate numeric content. Disabling these settings means that numerics are not part of the Index. If you import with these settings disabled, you must reprocess or perform another import to have the numerics in the Index.

Consider the following when working with numeric settings:

  • For data processed prior to 4.3.11.0, although the numeric settings are enabled to permit search of individual numeric terms, the tokens reserved to identify the different types of numeric content are currently disabled (token-quantity, token-currency, and token-numeric_term), and you cannot use them to search for numeric content. Therefore, performing a search using the numeric tokens will not produce results.
  • Numeric Index Settings do not apply to Metadata-only Index types (System or File), since they are for content, not metadata.
  • Enabling all three numeric settings avoids concern over the formats supported by a particular numeric setting.
  • Numeric content will not appear in Clusters.
  • Custom Patterns can also be created to employ numeric content, if desired.

Numeric Quantities

Token Name: token-quantity (applies to data processed before 4.3.11.0):

Default State: Disabled; values not captured

Description: The following examples show how the parsing process identifies numeric quantities:

  • One or more numbers, which can be separated by a comma (,) or a period. Examples:
    • 100
    • 123,456
    • 1.1
  • A different base or radix, such as 0xA26.
  • Scientific notation, for example, 6.022×1023.

The following formats are not identified as a single numeric quantity term by the parsing process:

  • Non-standard formats such as 5 476,85. The space separator is not recognized by the parser. A search for this (without quotes) will identify this as two terms.
  • Shorthand that includes caps (for example, 10M), because the parser treats the cap as lowercase.

Numeric Currencies

Token Name:token-currency (applies to data processed prior to 4.3.11.0) :

Default state: Disabled; values not captured

Description: A currency value within a document must be unambiguous. To be recognized as a currency, a string must be one of these currency symbols, $, €, £, ¥, immediately followed by a numerical quantity. (Note that the results are not currently highlighted.)

The following representations of currency are not recognized as currency values:

Numerical Terms

Token Name: token-numeric_term (applies to data processed prior to 4.3.11.0):

Default state: Disabled; values not captured

Description: A numerical term contains numbers and other characters but does not match the definition of a numerical quantity or numerical currency, such as part numbers, serial numbers, phone numbers, chemical compounds, and percentages. Note that the hyphen character, ‘-‘, is allowed in a numerical term. Unrecognized numerical quantities may be recognized as numerical terms instead. Examples:

  • 1-1
  • 75bn
  • 100% (results not currently highlighted)

In addition, one or more numbers followed immediately by a single occurrence of m, b, or k is accepted as a numeric term.

A single occurrence of m, b, or k is accepted. Examples:

  • 135m
  • 1.212k
  • 1,345b