Create Near-Duplicate Metadata for an Export Stream

Exports > selected Export Stream > right-click > Create Near-Duplicate Metadata

Requires Exports - Add/Edit Permissions

For a selected Stream, if you have permissions, you can request Near-Duplicate processing on-demand to create Near-Duplicate Metadata for the Export Stream. This enables you to perform Near-Duplicate analysis at any time for an Export Stream, unless the Export Stream's Near-Duplicate information is up-to-date.

The Create Near Duplicate Metadata right-click option will be unavailable (grayed out) if the Near-Duplicate processing for the Export Stream is up-to-date.

Note: This operation does not change the Near-Duplicate setting (Group Near-Duplicates) or Threshold or other Near-Duplicates options for the Export Stream. The next Export of the Stream observes whatever Near-Duplicate setting and Near-Duplicate options are selected in the Export dialog. (The Near-Duplicate setting and related options can be managed at each Export of an Export Stream.)

The scope of the Near-Duplicate processing is restricted to the documents meeting the Documents to Export criteria of the Export Stream. Near-Duplicate processing includes the calculation of pivot documents from the documents included in the Export, and the identification of the compliant Near-Duplicate documents. If a subsequent Export of an Export Stream enables Near-Duplicate processing, any newly added or newly Tagged documents that meet the criteria are evaluated.

Near-Duplicate Options

  • Threshold (0-99) (80 by default) — Specifies the similarity threshold used for the Near-Duplicate processing. By default, this operation uses a similarity Threshold of 80. You can specify another threshold value in the range 0 to 99, where 0 detects a nonzero amount of similarity or commonality. To require a higher degree of similarity or commonality, select a higher value, such as 80 or 90; to require a moderate degree of similarity or commonality, select a value such as 40 or 50. In general, the lower the threshold, the more results you will see, since you are requiring less similarity or commonality. Specifying a higher threshold value yields a smaller number of results.
  • Minimum Terms <value> (25 by default) — Specifies the minimum number of terms for Near-Duplicate Processing. By default, the minimum number terms for Near-Duplicates processing is 25. You can use this default or a value you specify (negative numbers and decimals are not permitted). No limit is enforced.
  • Process Attachments (cleared by default) — Specifies whether email or OLE attachments are processed as part of Near-Duplicate processing. By default, attachments are not processed independently for Near-Duplicate Processing. This option applies when you have Separate Email Attachments and/or Separate OLE Attachments set for the Export (it has no effect otherwise).
  • OK — Submits the Near-Duplicate threshold and begins processing, which you can monitor in a Work Basket task. Near-Duplicate processing can be time-consuming, If necessary, you can cancel the Near-Duplicates Work Basket task. If all of your Export data is not backed by an Analytic Index, this Work Basket task will display a failure.
  • Cancel — Cancels the operation.