ChemBERTaLM / README.md
gokceuludogan's picture
Update README.md
33199b3
---
license: mit
tags:
- molecule-generation
- cheminformatics
- biochemical-language-models
widget:
- text: "c1ccc2c(c1)"
example_title: "Scaffold Hopping"
---
## ChemBERTaLM
A molecule generator model finetuned from [ChemBERTa](https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_450k) checkpoint. It was introduced in the paper, "Exploiting pretrained biochemical language models for
targeted drug design", which has been accepted for publication in *Bioinformatics* Published by Oxford University Press and first released in [this repository](https://github.com/boun-tabi/biochemical-lms-for-drug-design).
ChemBERTaLM is a RoBERTa model initialized with [ChemBERTa](https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_450k) checkpoint, and then, finetuned on the MOSES dataset which comprises a collection of drug-like compounds.
## How to use
```python
from transformers import RobertaForCausalLM, RobertaTokenizer, pipeline
tokenizer = RobertaTokenizer.from_pretrained("gokceuludogan/ChemBERTaLM")
model = RobertaForCausalLM.from_pretrained("gokceuludogan/ChemBERTaLM")
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
generator("", max_length=128, do_sample=True)
# Sample output
[{'generated_text': 'Cc1ccc(C(=O)N2CCN(C(=O)c3ccc(F)cc3)CC2)cc1'}]
```
## Citation
```bibtex
@article{10.1093/bioinformatics/btac482,
author = {Uludoğan, Gökçe and Ozkirimli, Elif and Ulgen, Kutlu O. and Karalı, Nilgün Lütfiye and Özgür, Arzucan},
title = "{Exploiting Pretrained Biochemical Language Models for Targeted Drug Design}",
journal = {Bioinformatics},
year = {2022},
doi = {10.1093/bioinformatics/btac482},
url = {https://doi.org/10.1093/bioinformatics/btac482}
}
```