Supported Languages for Language Detection

The Digital Reef software can detect a variety of languages in the documents of supplied data sets when the Language Detection feature is enabled. A user with the appropriate permissions can enable Language Detection as follows:

Note: Language Detection is enabled by default for data import operations to identify supported languages. If Language Detection is disabled, you will not see the languages and dominant languages calculated in reports.

Metadata fields (language and dominantlanguage) provide information about the detected languages and dominant language for a given document (using the appropriate letter codes). You can search either of these fields using the code for the language (for example, dominantlanguage:ar). Most of the codes consist of two letters (except the Chinese languages).

You can also check the report for a view (for example, all of Imports, a Data Set in Imports, or Project Data) to see information about the document count per language and dominant languages discovered upon import. For the Document Count per Language chart, if a document has multiple languages, the document will be counted for each language detected. For the Dominant Languages chart, if a document has multiple languages, the document will be counted for the dominant language only (instead of counted for each language detected). For either chart, if a language cannot be determined, it is identified as unknown, and Not present identifies documents that were not subject to Language Detection at import, either because the feature was disabled at import for a particular Data Set, or the documents did not have content, were identified as binary files such as images (when OCR processing is disabled at import), or were not parsed successfully.

The following table lists the languages (listed alphabetically according to the letter code).

Supported Language Letter Code
Arabic ar
Bulgarian bg
Czech cz
Danish da
German de
Greek el
English en
Spanish es
Estonian et
Finnish fi
French fr
Hebrew he
Hindi hi
Hungarian hu
Indonesian id
Icelandic is
Italian it
Japanese ja
Korean ko
Dutch nl
Norwegian no
Polish pl
Portuguese pt
Romanian ro
Russian ru
Slovakian sk
Swedish sv
Swahili sw
Thai th
Ukrainian ua
Vietnamese vi
Chinese traditional zh-tr
Chinese simplified zh-si