Downloads

Curated datasets

Pathogenesis (csv: 12 records, 6.91 KB)

Therapeutic Strategies (csv: 15 records, 9.12 KB)

Clinical Trials (csv: 3390 records, 1.83 MB)

Research Articles (csv: 98453 records, 135.74 MB)

Investigational Drugs (csv: 200 records, 120.98 KB)

Bioactive Compounds (csv: 200606 records, 146.68 MB

Drug Targets (csv: 1404 records, 149.06 KB)

Experimental Models (csv: 77 records, 25.97 KB)

Associated Diseases (csv: 2 records, 2.71 KB)

Therapeutic Targets (csv: 103 records, 15.66 KB)

AI-ready datasets

(The code used to generate these datasets is available at GitHub)

🧪

RDKit Fingerprints

Description:

These datasets are suitable for users prioritizing interpretability, computational efficiency, and classical cheminformatics workflows.

Recommended use cases:

  • Structure-based similarity search & filtering
  • Quantitative structure-activity relationship (QSAR) analysis
  • Machine learning tasks for classification or regression
  • Large-scale and high-throughput molecular screening
🤖

MolFormer Embeddings

Description:

These datasets support advanced AI-driven analyses with flexible integration into modern machine learning models and scalable knowledge discovery.

Recommended use cases:

  • Dimensionality reduction and visualization
  • Embedding-based similarity and nearest-neighbor drug discovery
  • AI/Machine learning with transferable molecular features
  • Cross-dataset comparison and drug repurposing