What are Group Duplicates options?
Duplicate options lets you choose whether or not you want to use similarity analysis to process your search results. Similarity analysis analyses a results list, identifies documents that have similar content, and groups the similar documents together.
Choose the type of similarity analysis you want from the Group Duplicates drop-down list:
- No similarity analysis. If similarity analysis has already been performed, the results list is regenerated without similarity analysis.
- High similarity
- Documents must be nearly identical in order for the service to include them in the same group of similar documents.
- Moderate similarity
- Documents with relatively less similarity can be included in the same group of similar documents.
Similarity analysis may find that a results list contains one or more groups of similar documents (no document will be included in more than one group). Likewise, if no documents in the list are similar enough that they can be grouped together, the results list may contain no groups.
After similarity analysis has identified and grouped similar documents, it chooses one document in each group as the "lead document". These selections are based on user preferences you can set up. In the results list, a lead document icon () next to its title designates it as a lead document. Aside from this, lead documents are no different from any other documents in the list.
The remaining documents in the group — those that are not the lead document — are called "shadow documents". Shadow documents do not appear among the listed documents in the results list. You can access them, however, by clicking a link immediately below the lead document entry or by clicking the lead document icon. For information on how to view shadow documents, see How do I view shadow documents?
Documents that are not included in any group of similar documents are called "distinct documents".
With similarity analysis on, each time you re-sort or change the view of your results list, the service re-analyses the documents in the list, re-groups them, and re-designates lead documents. Therefore, existing groups of similar documents or their lead documents may change.