Downloads
Curated datasets
Pathogenesis (csv: 12 records, 6.91 KB)
Therapeutic Strategies (csv: 15 records, 9.12 KB)
Clinical Trials (csv: 3390 records, 1.83 MB)
Research Articles (csv: 98453 records, 135.74 MB)
Investigational Drugs (csv: 200 records, 120.98 KB)
Bioactive Compounds (csv: 200606 records, 146.68 MB
Drug Targets (csv: 1404 records, 149.06 KB)
Experimental Models (csv: 77 records, 25.97 KB)
Associated Diseases (csv: 2 records, 2.71 KB)
Therapeutic Targets (csv: 103 records, 15.66 KB)
AI-ready datasets
(The code used to generate these datasets is available at GitHub)
RDKit Fingerprints
Description:
These datasets are suitable for users prioritizing interpretability, computational efficiency, and classical cheminformatics workflows.
Recommended use cases:
- Structure-based similarity search & filtering
- Quantitative structure-activity relationship (QSAR) analysis
- Machine learning tasks for classification or regression
- Large-scale and high-throughput molecular screening
Datasets download:
MolFormer Embeddings
Description:
These datasets support advanced AI-driven analyses with flexible integration into modern machine learning models and scalable knowledge discovery.
Recommended use cases:
- Dimensionality reduction and visualization
- Embedding-based similarity and nearest-neighbor drug discovery
- AI/Machine learning with transferable molecular features
- Cross-dataset comparison and drug repurposing