Combinatorial chemistry and high-throughput screening creates a demand for storing large compound collections for a long-term. Poor chemical stability of a compound library can result in unacceptable false positives and false negatives in biological assays.

ChemStable is a free web server for in silico prediction of compound chemical unstability based on Bayesian modeling and atomic centered fragment generation method. The prediction model was constructed based upon a data set of 9,746 structurally diverse compounds with experimental chemical stability data in DMSO/H2O solutions stored at 50 °C for 105 days.

To use ChemStable, simply draw a chemical structure in the Chemdoodle window or upload a library in SDF.

mTOR Predictor

Mammalian target of rapamycin (mTOR) is a highly conserved serine/threonine protein kinase (PK) and a vital component of the PI3K/Akt/mTOR signal pathway. This pathway is deregulated in 50% of all human cancers. mTOR is a central controller of this signal pathway in cell growth, proliferation, metabolism and angiogenesis, and thus, there is a great deal of interest in developing clinical drugs based on mTOR.

mTORPredictor is a free web server for predicting whether a compound is an mTOR inhibitor or non-inhibitor. It was developed based on the ACFs and Bayesian method. The model was constructed based on a data set of 1,264 structurally diverse compounds with experimental mTOR inhibition assay data. Besides, the scaffold hopping ability of the mTORPredictor was successfully evaluated. Detailed results can be found in our paper.

WEGA     Email Request

Shape comparing technologies based on Gaussian functions have been widely used in virtual screening of drug discovery. For efficiency, most of them adopt the First Order Gaussian Approximation (FOGA), in which the shape density of a molecule is represented as a simple sum of all individual atomic shape densities. The new approach, which is called the Weighted Gaussian Algorithm (WEGA), is proposed to improve the accuracy of the first order approximation. The new approach significantly improves the accuracy of molecular volumes and reduces the error of shape similarity calculations by 37% using hard-sphere model as the reference. The new algorithm also keeps the simplicity and efficiency of the FOGA. A program based on the new method has been implemented for molecular overlay and shape-based virtual screening. With improved accuracy for shape similarity scores, the new algorithm also improves virtual screening results, particularly when shape-feature combo scoring function is used.


The abundance of compound bioactivity data and the publicly accessible chemistry-oriented database bring a good opportunity for ligand-based drug discovery. Machine learning plays an essential role in ligand-based virtual screening for facilitating drug lead discovery. Bayesian learning, one of most important machine learing tools, shows predominances in tolerance of noisy data, fast and efficient. This makes bayesian learning well suited for handling HTS data and yields respectable results in many cases.

In order to exploit the full potential of big data in chemgenomics and bayesian learning, LBVS, an online naive bayesian classifier virtual screening platform were developed. The popular public database BindingDB and ChEMBL were used as the data source, and naive bayesian classifier based target-directed ligand-based virtual screening were implemented in LBVS.


Target dependent molecular similarity (TDMS) is a molecular similarity calculation method combining chemical data and bioactivity data. The similarity of two molecules are various in different target situation. The fingerprints of the molecules are weighted using the bioactivity data. The increasingly chemical-biological big data bring opportunities to calculate the similarity more specifically. TDMS is an attempt to mining the big data to improve the traditional similarity calculation methods with better performance. Virtual screening performance and scaffold hopping ability test show that TDMS is outperformed with appropriate weighting schemes.


Annotated Scaffold Database(ASDB) is an open scaffold-orientated database through systematic annotations. This database contains 333,601 unique scaffold entries and each scaffold are annotated with chemical identification, properties, drugs, natural products, and targets information. The DrugBank, ChEMBL, UniProt, TCMSP Database are linked to ASDB.

Users may query ASDB through text query or chemical query. Text query support target name or target id (UniProt ID or CHEMBL ID). Chemical queries support full structure, substructure and similarity search.

Scaffold based polypharmacology network construction is provided for network visualization and analysis. One target includes multiple scaffolds and one scaffold probably targeted multiple targets which results in the scaffold based polypharmacology network.


Identifying protein targets of a bioactive agent is critical for drug development. Experimentally identifying and validating a target for a biological agent is time-consuming and costly, therefore, we have developed PTS to predict potential targets conveniently. PTS is developed based upon the principle that similar 3-dimensional molecular shapes have the same set of targets. It takes a query structure (a biological active agent) and superimposes against all ligands in the database. Lastly, it will recommend potential targets in the order of similarity scores.

PNDD for T2D


Covalent binding molecules exert their excellent characteristics and biological functions through irreversibly or reversibly covalent attachment to their targets. We present the Covalent Binder Database (cBinderDB), the first online database that provides a central resource for covalent binding molecules, related targets and mechanism information. The covalent binding molecules and related targets were derived from scientific literatures and further annotated through public databases such as ChEMBL, DrugBank, PubChem, RCSB PDB, UniProt, HGNC. Currently, cBinderDB contains 527 covalent binding molecules, related with 200 targets. The molecules are annotated with physicochemical properties, mechanism of covalent binding, corresponding targets, indication cross-link and so on. Targets are annotated with biological function, nucleophile residues, related diseases modulators, etc. Text and structure query tools are provided to retrieve this database. cBinderDB could be an important resource for discovery of covalent binding molecules.


Chem Filter helps drug developers obtain more safety compounds.

S2MA     Download

Molecule similarity methods have been widely used in drug discovery. Most of them are global similarity methods that concentrate on overall structure comparison. However, they have limitation on those molecules with large size difference or discontinuous SAR (such as activity cliffs). On account of the above problems, local similarity methods have gained more attention. They focus on the substructures contributed most to activity. Those molecules with such the same or similar substructures are more probably drug candidates.

We proposed a new algorithm called Steric Substructure Match Algorithm(S2MA). Substructures contributed most to activity are defined by reference to the binding mode of receptor with ligand, key fragments or chemical knowledge. The substructures are emphasized in molecular overlay and the remaining structure also has been overlapped as possible. S2MA has been validated for accuracy, and the results showed that it had a better performance in virtual drug screening compared to the Weighted Gaussian Algorithm (WEGA). The corresponding program has been implemented as well.


TCMAnalyzer is a free web tool for analyze TCM data and aims to understand the underlying molecular mechanism of TCM using network pharmacology methods. The chemoinformatics and bioinformatics tools and systematic annotations of compounds, molecular targets, pathways, and the diseases provide a good opportunity to annotate the TCM. TCMAnalyzer can construct various interaction network, including TCM-ingredient-target network, TCM-ingredient-disease network, and TCM-ingredient-pathway network. Those various systemic level networks help in better understanding the MOA of TCM. TCMAnalyzer also provides structure mining for bioactive TCM screening.