Supported Languages for Language Detection
The Digital Reef software can detect a variety of languages in the documents of supplied data sets when the Language Detection feature is enabled. A user with the appropriate permissions can enable Language Detection as follows:
- As a Project Index Setting (for initial import or reprocessing)
- As a setting in an Organization Index Settings template, which can then apply to all Projects.
Note: Language Detection is enabled by default for data import operations to identify supported languages. If Language Detection is disabled, you will not see the languages and dominant languages calculated in reports.
Metadata fields (language
and dominantlanguage
) provide information about the detected languages and dominant language for a given document (using the appropriate letter codes). You can search either of these fields using the code for the language (for example, dominantlanguage:ar
). Most of the codes consist of two letters (except the Chinese languages).
You can also check the report for a view (for example, all of Imports, a Data Set in Imports, or Project Data) to see information about the document count per language and dominant languages discovered upon import. For the Document Count per Language chart, if a document has multiple languages, the document will be counted for each language detected. For the Dominant Languages chart, if a document has multiple languages, the document will be counted for the dominant language only (instead of counted for each language detected). For either chart, if a language cannot be determined, it is identified as unknown, and Not present identifies documents that were not subject to Language Detection at import, either because the feature was disabled at import for a particular Data Set, or the documents did not have content, were identified as binary files such as images (when OCR processing is disabled at import), or were not parsed successfully.
The following table lists the languages (listed alphabetically according to the letter code).
Supported Language | Letter Code |
---|---|
Arabic | ar |
Bulgarian | bg |
Czech | cz |
Danish | da |
German | de |
Greek | el |
English | en |
Spanish | es |
Estonian | et |
Finnish | fi |
French | fr |
Hebrew | he |
Hindi | hi |
Hungarian | hu |
Indonesian | id |
Icelandic | is |
Italian | it |
Japanese | ja |
Korean | ko |
Dutch | nl |
Norwegian | no |
Polish | pl |
Portuguese | pt |
Romanian | ro |
Russian | ru |
Slovakian | sk |
Swedish | sv |
Swahili | sw |
Thai | th |
Ukrainian | ua |
Vietnamese | vi |
Chinese traditional | zh-tr |
Chinese simplified | zh-si |