--- license: mit tags: - molecule-generation - cheminformatics - biochemical-language-models widget: - text: "c1ccc2c(c1)" example_title: "Scaffold Hopping" --- ## ChemBERTaLM A molecule generator model finetuned from [ChemBERTa](https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_450k) checkpoint. It was introduced in the paper, "Exploiting pretrained biochemical language models for targeted drug design", which has been accepted for publication in *Bioinformatics* Published by Oxford University Press and first released in [this repository](https://github.com/boun-tabi/biochemical-lms-for-drug-design). ChemBERTaLM is a RoBERTa model initialized with [ChemBERTa](https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_450k) checkpoint, and then, finetuned on the MOSES dataset which comprises a collection of drug-like compounds. ## How to use ```python from transformers import RobertaForCausalLM, RobertaTokenizer, pipeline tokenizer = RobertaTokenizer.from_pretrained("gokceuludogan/ChemBERTaLM") model = RobertaForCausalLM.from_pretrained("gokceuludogan/ChemBERTaLM") generator = pipeline("text-generation", model=model, tokenizer=tokenizer) generator("", max_length=128, do_sample=True) # Sample output [{'generated_text': 'Cc1ccc(C(=O)N2CCN(C(=O)c3ccc(F)cc3)CC2)cc1'}] ``` ## Citation ```bibtex @article{10.1093/bioinformatics/btac482, author = {Uludoğan, Gökçe and Ozkirimli, Elif and Ulgen, Kutlu O. and Karalı, Nilgün Lütfiye and Özgür, Arzucan}, title = "{Exploiting Pretrained Biochemical Language Models for Targeted Drug Design}", journal = {Bioinformatics}, year = {2022}, doi = {10.1093/bioinformatics/btac482}, url = {https://doi.org/10.1093/bioinformatics/btac482} } ```