|
--- |
|
license: mit |
|
tags: |
|
|
|
- molecule-generation |
|
- cheminformatics |
|
- biochemical-language-models |
|
|
|
widget: |
|
- text: "c1ccc2c(c1)" |
|
example_title: "Scaffold Hopping" |
|
--- |
|
|
|
## ChemBERTaLM |
|
|
|
A molecule generator model finetuned from [ChemBERTa](https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_450k) checkpoint. It was introduced in the paper, "Exploiting pretrained biochemical language models for |
|
targeted drug design", which has been accepted for publication in *Bioinformatics* Published by Oxford University Press and first released in [this repository](https://github.com/boun-tabi/biochemical-lms-for-drug-design). |
|
|
|
ChemBERTaLM is a RoBERTa model initialized with [ChemBERTa](https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_450k) checkpoint, and then, finetuned on the MOSES dataset which comprises a collection of drug-like compounds. |
|
|
|
## How to use |
|
|
|
```python |
|
from transformers import RobertaForCausalLM, RobertaTokenizer, pipeline |
|
tokenizer = RobertaTokenizer.from_pretrained("gokceuludogan/ChemBERTaLM") |
|
model = RobertaForCausalLM.from_pretrained("gokceuludogan/ChemBERTaLM") |
|
generator = pipeline("text-generation", model=model, tokenizer=tokenizer) |
|
generator("", max_length=128, do_sample=True) |
|
# Sample output |
|
[{'generated_text': 'Cc1ccc(C(=O)N2CCN(C(=O)c3ccc(F)cc3)CC2)cc1'}] |
|
``` |
|
|
|
## Citation |
|
```bibtex |
|
@article{10.1093/bioinformatics/btac482, |
|
author = {Uludoğan, Gökçe and Ozkirimli, Elif and Ulgen, Kutlu O. and Karalı, Nilgün Lütfiye and Özgür, Arzucan}, |
|
title = "{Exploiting Pretrained Biochemical Language Models for Targeted Drug Design}", |
|
journal = {Bioinformatics}, |
|
year = {2022}, |
|
doi = {10.1093/bioinformatics/btac482}, |
|
url = {https://doi.org/10.1093/bioinformatics/btac482} |
|
} |
|
``` |