We observed that the average clustering coefficient versus similarity threshold function can be char‑
acterized by the presence of a peak that covers a range of similarity threshold values. This peak is preceded by a steep decline in the number of edges of the similarity network. The maximum of this peak is well aligned with the best clustering outcome. Thus, if no reference set is available, choosing the similarity threshold associated with this peak would be a near-ideal setting for the subsequent network cluster analysis. The proposed method can be used as a general approach to determine the appropriate similarity threshold to generate the similarity network of large-scale molecular datasets.

Gergely Zahoránszky‑Kőhalmi, Cristian G. Bologa and Tudor I. Oprea*

The Git repositories can be cloned as:
git clone https://cheminfonet@bitbucket.org/cheminfonet/accvsthreshold_openmp.git
git clone https://cheminfonet@bitbucket.org/cheminfonet/networkclusteringutilities.git