Advanced search×

Large-scale similarity search profiling of ChEMBL compound data sets.

J Chem Inf Model 51(8):1831-9 (2011) PMID 21728295

A large-scale similarity search investigation has been carried out on 266 well-defined compound activity classes extracted from the ChEMBL database. The analysis was performed using two widely applied two-dimensional (2D) fingerprints that mark opposite ends of the current performance spectrum of these types of fingerprints, i.e., MACCS structural keys and the extended connectivity fingerprint with bond diameter four (ECFP4). For each fingerprint, three nearest neighbor search strategies were applied. On the basis of these search calculations, a similarity search profile of the ChEMBL database was generated. Overall, the fingerprint search campaign was surprisingly successful. In 203 of 266 test cases (∼76%), a compound recovery rate of at least 50% was observed with at least the better performing fingerprint and one search strategy. The similarity search profile also revealed several general trends. For example, fingerprint searching was often characterized by an early enrichment of active compounds in database selection sets. In addition, compound activity classes have been categorized according to different similarity search performance levels, which helps to put the results of benchmark calculations into perspective. Therefore, a compendium of activity classes falling into different search performance categories is provided. On the basis of our large-scale investigation, the performance range of state-of-the-art 2D fingerprinting has been delineated for compound data sets directed against a wide spectrum of pharmaceutical targets.

DOI: 10.1021/ci200199u
Version: za2963e q8za9 q8zbb q8zcc q8zd4 q8ze7 q8zf1 q8zgd

Similar articles you may find interesting…

  1. Relic Vector Field and CMB Large Scale Anomalies

    arXiv:1305.4794 [astro-ph.CO] 21 May 2013

    We study the most general effects of relic vector fields on the inflationary Background and density perturbations. Such effects are observable if the number Of inflationary e-folds is close to the minimum required to solve the horizon Problem. We show that this can potentially explain three CMB larg...
  2. Collapse of a cylindrically symmetric, self-similar scalar field with non-minimal coupling

    arXiv:1305.4866 [gr-qc] 21 May 2013

    We investigate self-similar scalar field solutions to the Einstein equations In whole cylinder symmetry. Imposing self-similarity on the spacetime gives Rise to a set of single variable functions describing the metric. Furthermore, It is shown that the scalar field is dependent on a single unknown f...
  3. Devil's crevasse and macroscopic entanglement in two-component Bose-Einstein condensates

    arXiv:1305.5095 [quant-ph] 22 May 2013

    We give a detailed study of entanglement Generated between two spin-1/2 BECs due to an Sz1 Sz2 interaction. The states That are generated show a remarkably rich structure showing fractal Characteristics. In the limit of large particle number N, the entanglement Shows a strong dependence upon whether...