Work with Search Results

After performing a type of Search operation, you can see your most recent results under Searches > Search History in the Navigation Tree, or go to the Work Basket to track and view results.

The All Docs tab for a search result lists all documents that met your search criteria. For query based search results, documents are sorted by Score in descending order to ensure that you see the highest scored documents first.

In the Reports tab of a Search Results set, where the search was performed using Freeform Search, Advanced Search, or Current Results, you can review the Search criteria based on your type of Search.

Search Report Summary

The Results Summary describes the query based Search that was performed using Freeform Search, Advanced Search, or Current Results. The summary information depends on the nature of the Search performed. Bulk Searches and Workflows have their own reports (Bulk Search Report and Workflow Report); individual searches from a Bulk Search or a Workflow Step do provide the search report summary on the Reports tab.

Note: For a Search based on a query, the Results Summary title and the title displayed in the downloaded XLSX (or CSV) for the chart identify which deduplication setting is being used to calculated the counts (for Dedupe Setting: Global) or (for Dedupe Setting: Custodial).

For a Search based on a query, the summary displays the query entered and executed, as well as the target and search settings used, as follows:

  • Search Type — The type of Search (for example, Freeform Search, or Advanced Search. Freeform is reported for an individual search run as part of a Bulk Search, which has its own Bulk Search Report. In this case, the individual search appears in the tree as a child of the Bulk Search parent. Note that if you run a Bulk Search with typed or local file uploaded queries as a combined search, Freeform is the search type reported, but this search will not have a parent Bulk Search entry in the tree. (If you run a Bulk Search with a Connector file of queries as a combined search, the Connector file queries are subject to chunking, so you will see the parent Bulk Search Report entry in the tree and one or more child entries for the chunked Connector file queries, 1000 per chunk.)
  • Target — The target view selected to define the scope of the Search (for example, all of Project Data, or a Folder).
  • Filter by Data Sets — If applicable, a semicolon-separated list of the source Data Sets in Project Data that were selected for the Search (as part of selecting a Search Target). This option will be blank if no source Data Sets were selected, or if the option does not apply (for example, for a target selected for a Workflow or for a Metadata Bulk Search (where you select the By Metadata Field option for a Bulk Search). For more information about filtering a search by Data Sets, see a Select Data Set(s) when selecting a Search Target.
  • Include Families — Indicates whether the search includes all members of a family (MAG or DAG). A indicates that the setting was enabled for the search; an indicates that the setting was disabled for the search. Family members are included as long as the Include Families option is set for the Search. By default, this option is Enabled to ensure that all available family members are included in the results (a parent email, parent document, associated attachments, and embedded messages or documents). When enabled, it is possible that your results will include more documents than are reported in the Doc Count. This would indicate the inclusion of family members that are already in the source view but do not get scored in the results against the query. You can identify these documents in the Documents list because they all have a Score of 0. If you disable the Include Families option for an Advanced Search or Bulk Search, the results will report Disabled for the Family Expansion, and the results will include just the document that meets the Search criteria, not the family members. Note that this option is implicitly enabled for a Freeform Search or Current Results search.
  • Include Metadata — Indicates whether the search of each keyword in a query was expanded to include a set of metadata fields as well as content. A indicates that the setting was enabled for the search; an indicates that the setting was disabled for the search. You can select the Search Fields you want to have searched automatically. When the Include Metadata option is enabled, all individual keywords as well as the keywords in phrases, and in special searches such as Proximity search, are subject to expansion. By default, this checkbox option is Enabled for Freeform,Advanced Search, as well as Bulk Search. It is always enabled for a Current Results Search. Your results report per-term Doc Count for the entire set of fields (not each field). Note that when this option is enabled, you can control the expansion on a per-term basis and limit the search of a given keyword to just content by specifying content::<keyword> for a given keyword or content::(<keyword1> <keyword2> <keyword3>)for a group of keywords. In this case, your per-keyword document counts will reflect how you issued the Search. For more information about the default Standard syntax, see the topic Use the Standard Search Syntax.
  • Expand Synonyms — Indicates whether the Advanced or Bulk Search was performed with Expand Synonyms enabled or disabled. A indicates that the setting was enabled for the search; an indicates that the setting was disabled for the search. When Expand Synonyms is enabled, the Search is performed with a list of synonyms for each specified term, and the terms appear when you open the Query Executed area. The Doc Count for all the synonyms for a term are reflected in the Clause table entry for that term (that is, cumulatively). Note that Expand Synonyms works for information in the contents, title, subject, comment, or comments fields. It will not work for terms accompanied by ~ or other syntax used for special Searches. By default, Expand Synonyms is Disabledfor all Search types.
  • Results — The number of documents returned in the results and the total number of documents included in the scope of the Search (the target, for example, a single view of Project Data or the entire contents of Project Data). For example, you may get 270 results out of 4343 documents overall in the searched view met the query.

Two tabs on the right enable you to see either the query you entered, or the query that was actually executed:

  • Query Entered (default tab) — The exact query entered by the user who performed the Search. If the Search was performed using Advanced Search, this query reflects what was built from your selections. Color is used to help identify items such as Boolean operators and special syntax.
  • Query Executed — The query as interpreted by the software, reflecting the actual criteria used in the Search. Studying the query executed helps you see how your Search query was interpreted and expanded. For example, if you keep the default setting for Include Metadata, you will see the set of common metadata fields that were searched for each keyword. Color is used to help identify items such as Boolean operators and metadata fields. Viewing the query executed can also help you identify syntax issues that prevented you from getting the results you expected. For more information about Standard Search syntax, see Use the Standard Search Syntax.

Results Summary

  • Results Summary (for Dedupe Setting:Global | Custodial)

    The columns shown in this table and in the download report are determined by the Project Search Settings.

    Note that you cannot necessarily add up all the columns in the Results Summary table and get the total number of hits:

    • ID — An identifier for the search.
    • Clause — An entry for each clause or component of the Search query (see About Search Clauses for a summary of what constitutes a clause):
      • Doc Count — The number of documents or "hits" in the results that were responsive to the associated keyword (term) or clause.
      • Unique — The number of documents in the results that were uniquely responsive to the specific clause.
      • Clause Ratio — The percentage of the total document count providing "hits" (based on meeting the search criteria for the given clause).
      • Duplicates — Per clause, the number of documents that represent duplicates based on family membership (MAG or DAG). A document is a duplicate of another document only if both documents have the same dupe_fingerprint and their parents have the same dupe_fingerprint and have different parents.
      • Family Expansions — Per clause, the number of documents that represent any additional family members (of a MAG or DAG) that were included in the results but did not match this clause. This field will report NA for each clause if the Include Families option is not set for the Search.
      • Family Doc Count — Per clause, the number of families, where each family hit represents one of the following:
        • Each document that is not an email or from an email.
        • Each MAG, regardless of the number of email items in the MAG.

        Note: The dedupe counts and filesize are calculated and displayed according to the Project deduplication setting (under Analytic Index Settings), either the default of Global or Custodial. This setting, which applies for Project Data and views of Project Data, determines the processing of email, reporting of dedupe counts and size, and how email is handled for an Export that includes duplicates. For search results for an Imports view or a Data Set view, these counts have no meaning, as the dupe_fingerprint calculation comes into effect when documents are added to Project Data.

      • Total DeDuped Docs — Per clause, the number of documents that were responsive to the query, without including duplicate documents.
      • Total DeDuped File Size — Per clause, the file size (in GB) of the documents that were responsive to the query, without including duplicate documents.
      • Total DeDuped Docs w/Family — Per clause, the number of documents that were responsive to the query, including all of their family members, but not including duplicate documents.
      • Total DeDuped File Size w/Family — Per clause, the file size (in GB) of the documents that were responsive to the query, including all of their family members, but not including duplicate documents.
      • Unique DeDuped Docs — Per clause, the number of documents that were uniquely responsive to the specific clause, without including duplicate documents.
      • Unique DeDuped File Size — Per clause, the file size (in GB) of the documents that were uniquely responsive to the specific clause, without including duplicate documents.
      • Unique DeDuped Families (unique deduped documents with family) — Per clause, the number of entire document families that were uniquely responsive to the specific clause, not including duplicate documents.
      • Unique DeDuped Families File Size (file size of unique deduped documents with family) — Per clause, the file size (in GB) of the entire document families that were uniquely responsive to the specific clause, not including duplicate documents.

      About Search Clauses

      When examining search clauses in the report, it is important to remember that Digital Reef evaluates operators in the following order:

      1. w/N (Proximity)
      2. OR
      3. AND

      In general, if your query consists of terms or phrases separated by OR operators only, the individual terms or phrases are reported as separate clauses. For example, the following query will be reported as three clauses:

      one OR two OR three

      one

      two

      three

      The same is true even if your query consists of terms or phrases separated by either OR operators only, and also includes a term preceded by a NOT. For example, the following query would report four clauses:

      one OR two OR three OR NOT six

      one

      two

      three

      NOT six

      If your query consists of terms or phrases separated by AND operators only, one clause will be reported. For example, the following query will be reported as one clause:

      one AND two AND three

      one AND two AND three

      As soon as your query includes a mix of operators, the clause breakdown depends on how the query is interpreted (for example, based on the order in which the operators are evaluated). For example, the following query, issued without explicit grouping, is reported as one clause:

      one AND this OR that

      This query is interpreted as the following:

      one AND (this OR that)

      The following query is also reported as one clause:

      one OR two AND reports

      This query is interpreted as the following:

      (one OR two) AND reports

      If your query includes one or more sets of explicitly grouped terms, the different components of that query are evaluated. For example, the following query would report two clauses:

      (brokerage w/25 agreement) OR memo

      (brokerage w/25 agreement)

      memo

      However, the following query reports one clause:

      (brokerage w/25 agreement) AND memo

      (brokerage w/25 agreement) AND memo

      The following query will report two clauses:

      (one AND two) OR (three AND four)

      one AND two

      three AND four

      However, the following query reports one clause:

      (one OR two) AND (three OR four)

      (one OR two) AND (three OR four)

      The following query will report two clauses:

      dog w/5 (cat OR bird) OR cat w/10 mouse

      (dog w/5 (cat OR bird))

      (cat w/10 mouse)

      However, the following query will report one clause:

      dog w/5 (cat OR bird) AND cat w/10 mouse

      (dog w/5 (cat OR bird)) AND (cat w/10 mouse)

      The following query would report one clause:

      ted AND (bob AND bill)

      ted AND (bob AND bill)

      The following query would also report one clause:

      apple AND NOT (worm OR bug)

      apple AND (NOT (worm OR bug))

      A proximity search or a date range search will appear as a Search Clause as well.

      The software currently supports top-level clauses only.

      If you want to search by domain lists, providing only one clause in the query for sentdomains::<domainlist> or participantdomains::<domainlist> ensures that you will see a doc count for each domain in the list. However, if your query includes more than the single clause, you will see one clause and doc count for the domain list as a whole. Even a clause with zero for a Doc Count is listed in the table.

      About the Download Report with Per-Clause Information

      The downloaded XLSX file for a search report provides key information over multiple tabs, as follows:

      • Glossary — This tab applies to all search result views and contains both a Glossary and a Legend. The Glossary helps you follow the Search Term Hit Count information in columns of the Total tab as well as the appropriate Batch and/or Custodian tabs that apply to the Search Result view. The Legend section identifies the prefixes used for each Batch and Custodian tab that applies to the search results view. For example, B1 may be the prefix used to represent a Batch called data1 (where the tab name is B1-data1), and C1 may be used to represent a Custodian called mikeg (where the tab name is C1-mikeg).
      • Summary — This tab applies to all search result views and summarizes the key counts for the searches, such as the Total Records in the search target view, the Total Dupes in that search target view (where Global refers to Global deduplication and Custodial refers to Custodial deduplication), the Total Search Hits (Deduped), and the Total Search Hits with Family (Deduped). This tab also reserves a section for a logo and other job and client information, and identifies the Search Settings for the search (for example, the Search Target, Query Entered, Include Families, and Include Metadata). A note is included if the entered query is truncated to meet the XLSX cell length restrictions.
      • Total — This tab provides the Search Term Hit Count information (per Clause, as it appears on the Reports tab, with the appropriate columns based on your selected Search Term Report Settings). An additional TOTAL line provides the statistics for the overall search (that is, all clauses in the search, combined).
      • Per-Custodian sheets — When included, these provide per-Custodian Hit Count details for the search results view. Custodian sheets apply only to Project Data and views of Project Data. Only Custodians that are responsive to the search will have tabs. An additional TOTAL line provides the per-Custodian statistics for the overall search (that is, all clauses in the search, combined). If you do not want to include the per-Custodian sheets in your downloaded search report, you can disable the Include Per-Custodian Sheets option in the Project Search Settings.
      • Per-Batch sheets —When included, these provide per-Batch (imported Data Set) Hit Count details for the search result view. Only Batches that are responsive to the search will have sheets. An additional TOTAL line provides the per-Batch statistics for the overall search (that is, all clauses in the search, combined). If you do not want to include the per-Batch sheets in your downloaded search report, you can disable the Include Per-Batch Sheets option in the Project Search Settings.

Combined Search Summary

For a Combined Search based on two or more queries, the summary displays the following information:

  • Search Type — The type of Search, Combined.
  • Target — The target view selected to define the scope of the Search (for example, all of Project Data, or a Folder).
  • Filter by Data Sets — If applicable, a semicolon-separated list of the source data sets in Project Data that were selected for the Search.
  • Include Families — Indicates whether the search includes all members of a family (MAG or DAG). A indicates that the setting was enabled for the search; an indicates that the setting was disabled for the search. Family members are included as long as the Include Families option is set for the Search. By default, this option is enabled to ensure that all available family members are included in the results (a parent email, parent document, associated attachments, and embedded messages or documents). When enabled, it is possible that your results will include more documents than are reported in the Doc Count. This would indicate the inclusion of family members that are already in the source view but do not get scored in the results against the query. You can identify these documents in the Documents list because they all have a Score of 0. If you disable the Include Families option for an Advanced Search or Bulk Search, the results will report Disabled for the Family Expansion, and the results will include just the document that meets the Search criteria, not the family members. Note that this option is implicitly enabled for a Freeform Search or Current Results search.
  • Include Metadata — Indicates whether the search of each keyword in a query was expanded to include a set of metadata fields as well as content. A indicates that the setting was enabled for the search; an indicates that the setting was disabled for the search. You can select the Search Fields you want to have searched automatically. When the Include Metadata option is enabled, all individual keywords as well as the keywords in phrases, and in special searches such as Proximity search, are subject to expansion. By default, this checkbox option is Enabled for Freeform,Advanced Search, as well as Bulk Search. It is always enabled for a Current Results Search. Your results report per-term Doc Count for the entire set of fields (not each field). Note that when this option is enabled, you can control the expansion on a per-term basis and limit the search of a given keyword to just content by specifying content::<keyword> for a given keyword or content::(<keyword1> <keyword2> <keyword3>)for a group of keywords. In this case, your per-keyword document counts will reflect how you issued the Search. For more information about the default Standard syntax, see the topic Use the Standard Search Syntax.
  • Expand Synonyms — Indicates whether the Search was performed withExpand Synonyms enabled or disabled. A indicates that the setting was enabled for the search; an indicates that the setting was disabled for the search. When Expand Synonyms is enabled, the Search is performed with a list of synonyms for each specified term, and the terms appear when you open the Query Executed area. The Doc Count for all the synonyms for a term are reflected in the Clause table entry for that term (that is, cumulatively). Note that Expand Synonyms works for information in the contents, title, subject, comment, or comments fields. It will not work for terms accompanied by ~ or other syntax used for special Searches.
  • Results — The number of documents returned in the results and the total number of documents included in the scope of the Search (the target, for example, a single view of Project Data or the entire contents of Project Data). For example, you may get 270 results out of 4343 documents overall in the searched view met the query.

Two tabs on the right enable you to see either the query you entered, or the query that was actually executed:

  • Query Entered (default tab) — The exact query entered by the user who performed the Search. If the Search was performed using Advanced Search, this query reflects what was built from your selections. Color is used to help identify items such as Boolean operators and special syntax.
  • Query Executed — The query as interpreted by the software, reflecting the actual criteria used in the Search. Studying the query executed helps you see how your Search query was interpreted and expanded. For example, if you keep the default setting for Include Metadata, you will see the set of common metadata fields that were searched for each keyword. Color is used to help identify items such as Boolean operators and metadata fields. Viewing the query executed can also help you identify syntax issues that prevented you from getting the results you expected. For more information about Standard Search syntax, see Use the Standard Search Syntax.

Drill-through Summary

A drill-through search performed by double-clicking an entry in a report on the Reports tab provides the following information (note that it does not support Include Families or Include Metadata; it operates on just the entry):

  • Search Type — The type of Search for drill-through searches with a query, Drill-through.
  • Target — The target view selected to define the scope of the Drill-through Search (for example, Imports, all of Project Data, or a Folder).
  • Results — The number of documents returned in the results and the total number of documents included in the scope of the Search (the target, for example, a single view of Project Data or the entire contents of Project Data). For example, you may get 270 results out of 4343 documents overall in the searched view met the query.

Note: Depending on the report, drill-through results may yield a higher count than the original report entry, since the drill-through search is generally more inclusive, looking for all records containing the associated value in some format.

Two tabs on the right enable you to see the query that was performed for the drill-through search. Usually, these two tabs match (with perhaps some uppercase/lowercase differences) for a drill-through search:

  • Query Entered (default tab) — The drill-through query performed for you when you double-clicked an entry in a report table.
  • Query Executed — The drill-through query as interpreted by the software.

Note: For users with Imports -View and Project - Reports - View permissions, note that a drill-through of the Total OCR Candidates entry in the OCR Candidates report does not provide all of the standard drill-through search information, since it is not associated with a query.

Metadata Bulk Search Summary

A Bulk Search that is performed using the By Metadata Field option has its own report summary. (A regular Bulk Search without this option generates the standard Bulk Search Report.)

  • Search Type — The type of Search, Metadata Bulk Search.
  • Target — The target view that was selected to define the scope of the Search (for example, all of Project Data, or a Folder).
  • Filter by Data Sets — A semicolon-separated list of the source data sets in Project Data that were selected for the Search.
  • Input File (if applicable) — The name of the uploaded text file that contains the values specified by the Bulk Search by Metadata operation.
  • Metadata Field — The name of the metadata field searched (for example, docnum, filemd5, messageid, entryid, or unid).
  • Include Families — Indicates whether the search includes all members of a family (MAG or DAG). A indicates that the setting was enabled for the search; an indicates that the setting was disabled for the search. Family members are included as long as the Include Families option is set for the Search. By default, this option is Enabled to ensure that all available family members are included in the results (a parent email, parent document, associated attachments, and embedded messages or documents).
  • In the table shown to the right,Result Categoryand Count columns display the following:
    • Provided — Identifies the total number of values supplied for this type of Metadata Bulk Search.
    • Matched — Identifies the number of values that had just one document match.
    • Not Matched — Identifies the number of values that did not match any documents.
    • Multiple Matches — Identifies the number of values that had multiple document matches.

    — You can optionally save the Metadata Bulk Search information to an XLSX file). This XSLX file contains a Metadata Bulk Search Summary tab with basic count information per category and tabs that contain the values for each individual category (Matched, Not Matched, and Multiple Matches).

Find Exact or Content Duplicates of a Document Summary

This summary will display the following:

  • Search Type — The type of Search, either Exact Duplicate Document or Content Duplicate Document (which is a search for Exact or Content Duplicates of a selected document).
  • Target — The target view selected to define the scope of the Search (for example, all of Project Data or a Folder.
  • Source Document — The name of the Source (pivot) document. For an email, this is the name of the file (for example, 000017221.eml). (The task always displays the subject of the email instead.)
  • Results — The number of documents returned in the results and the total number of documents included in the scope of the Search (the target, for example, a single Folder or the entire contents of Project Data). For example, 4 of 100 means that 4 documents of 100 documents in the searched view were exact or content duplicates of the selected document.

Find Near Duplicates of a Document Summary

This summary will display the following:

  • Search Type — The type of Search, Near Duplicate Document.
  • Target — The target view selected to define the scope of the Search (for example, all of Project Data, or a Folder.
  • Threshold— The similarity threshold that was used to perform the operation. The default is 80.
  • Source Document — The name of the file used as the Search pivot document to search for documents that are very close to the document you selected. For an email, this is the name of the file (for example, 000016550.eml). (The task always displays the subject of the email instead.)
  • Results — The number of documents returned in the results and the total number of documents included in the scope of the Search (the target, for example, a single Folder or the entire contents of Project Data). For example, 60 of 1000 means that 60 documents of 1000 documents in the searched view met the Near Duplicate Document similarity threshold. Note that if the only responsive document is the pivot document (the document you used for the search), then that document is the only hit listed (for example, 1 of 1000).

Exact Duplicates Group/ Content Duplicates Group Summary

This summary will display the following:

  • Search Type — The type of Search, either Exact Duplicate Group or Content Duplicate Group (which is a search for Exact or Content Duplicates in a view).
  • Target — The target view selected to define the scope of the Search (for example, a Data Set shown to a user with the appropriate permissions).
  • Results — The number of documents returned in the results.

Find More Like These Summary

This summary will display the following:

  • Search Type — The type of Search, More Like These.
  • Target — The target view that was selected to define the scope of the Search.
  • Threshold— The similarity threshold that was used to perform the operation. The default is 20.
  • Results — The number of documents returned in the results and the total number of documents included in the scope of the Search (the selected documents and the target (for example, a single Folder or the entire contents of the Project Data). For example, 25 of 500 means that the 25 of the 500 documents in the target view met the specified similarity threshold.

If the set of Results have been tagged (using the Tag option on the Work Basket toolbar), the appropriate tag icons appear on all of the documents view in the Documents tab of the results. You can also use the Tag menu in the Documents tab of the results to Tag one or more documents.

Search by Synthetic Document Summary

This summary will display the following information:

  • Search Type — The type of Search, By Synthetic Document.
  • Target — The target view that was selected to define the scope of the Search (for example, a Custodian or a Folder).
  • Threshold— The similarity threshold that was used to perform the operation. The default is 20.
  • Source Synthetic Document — The name of the Synthetic Document compared to a target view.
  • Results — The number of documents returned in the results and the total number of documents included in the scope of the Search (for example, a Custodian or Folder). For example, 25 of 500 means that the 25 of the 500 documents in the target view met the Synthetic Document's specified similarity threshold.

Sample Summary

A Sample is a type of Search Result. Depending on how you create the Sample, you will see a Summary with either Sample by Size or Sample by Confidence Level.

Sample by Size

This type of Sample shows the following information:

  • Search Type — The summary displays Sample by Size.
  • Target — The target view selected to define the scope of the Sample (for example, all of Project Data, or a Folder).
  • Include Families — Indicates whether the Sample includes all members of a family (MAG or DAG). Family members are included as long as the Include Families option is set for the Sample.
  • Sample All Custodians — Indicates whether the Sample covered all Custodians in the target view.
  • Results / Sample Size — The number of documents returned in the results and the total number of documents included in the Sample.

Sample by Confidence Level

This type of Sample shows the following information:

  • Search Type — The summary displays Sample by Confidence Level.
  • Target — The target view selected to define the scope of the Sample (for example, all of Project Data, or a Folder).
  • Confidence Level — Identifies the selected Confidence level for the Sample, which can be a value in the range 90-99%.
  • Margin of Error — Identifies the selected Margin of Error value (also known as the Confidence Interval, or CI) for the Sample. This value can be in the range 1-5.
  • Include Families — Indicates whether the Sample includes all members of a family (MAG or DAG). Family members are included as long as the Include Families option is set for the Sample.
  • Sample All Custodians — Indicates whether the Sample covered all Custodians in the target view.
  • Results / Sample Size — The number of documents returned in the results and the total number of documents included in the Sample.

If the set of Results have been tagged (which takes effect in Project Data), the appropriate tag icons appear on all of the documents in the All Docs tab of the results. You can also tag from the All Docs tab of the results.

You can work with results as follows:

  • From the Work Basket, Tag an entire set of search results.
  • From a selected results view in the tree, Tag an entire set of search results.
  • From the All Docs tab of a search results view, Tag one or more or all results.
  • From the All Docs tab of a search results view, browse document results and open the Document Viewer to view the content, metadata, or history of a selected document.
  • Perform additional operations (for example, add documents to a Folder or perform searches).
  • View reports on the Reports tab and view or download details for a specific report. From the popup details, you can add or remove Tags, or add or remove documents from select views.

For more information about the All Docs tab information, see Document Information for Search Results.

About Search Result Calculations

Each time you perform a search of one or more Indexed sets of data, the Digital Reef software ensures return of all relevant results for your query, in relevance order. Note that you may see variations in the scoring of a result when the same query is issued against different sets of data, because data sets often vary in both size and content. However, each individual search returns the most accurate and relevant results based on the search criteria and scope of that search.

For those interested in more detail about how search results are calculated, Digital Reef Index-based searches calculate score using an Apache Lucene relevance algorithm. This algorithm takes into account the following:

  • the number of times a term appears in a given document (documents with more occurrences are scored higher)
  • the inverse document frequency, in which the rarity of a term causes a higher contribution to the total score
  • how many of the query terms are found in the specified document
  • a normalizing factor used to make scores between queries comparable
  • the boost of a term in the query, if requested in the query
  • boost and length factors calculated during the Indexing process

See http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html for more technical information.