gokceuludogan
/

ChemBERTaLM

Text Generation

molecule-generation

cheminformatics

biochemical-language-models

Inference Endpoints

Model card Files Files and versions Community

ChemBERTaLM / README.md

gokceuludogan's picture

Update README.md

33199b3 about 2 years ago

|

history blame contribute delete

No virus

1.74 kB

	---
	license: mit
	tags:

	- molecule-generation
	- cheminformatics
	- biochemical-language-models

	widget:
	- text: "c1ccc2c(c1)"
	example_title: "Scaffold Hopping"
	---

	## ChemBERTaLM

	A molecule generator model finetuned from [ChemBERTa](https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_450k) checkpoint. It was introduced in the paper, "Exploiting pretrained biochemical language models for
	targeted drug design", which has been accepted for publication in Bioinformatics Published by Oxford University Press and first released in [this repository](https://github.com/boun-tabi/biochemical-lms-for-drug-design).

	ChemBERTaLM is a RoBERTa model initialized with [ChemBERTa](https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_450k) checkpoint, and then, finetuned on the MOSES dataset which comprises a collection of drug-like compounds.

	## How to use

	```python
	from transformers import RobertaForCausalLM, RobertaTokenizer, pipeline
	tokenizer = RobertaTokenizer.from_pretrained("gokceuludogan/ChemBERTaLM")
	model = RobertaForCausalLM.from_pretrained("gokceuludogan/ChemBERTaLM")
	generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
	generator("", max_length=128, do_sample=True)
	# Sample output
	[{'generated_text': 'Cc1ccc(C(=O)N2CCN(C(=O)c3ccc(F)cc3)CC2)cc1'}]
	```

	## Citation
	```bibtex
	@article{10.1093/bioinformatics/btac482,
	author = {Uludoğan, Gökçe and Ozkirimli, Elif and Ulgen, Kutlu O. and Karalı, Nilgün Lütfiye and Özgür, Arzucan},
	title = "{Exploiting Pretrained Biochemical Language Models for Targeted Drug Design}",
	journal = {Bioinformatics},
	year = {2022},
	doi = {10.1093/bioinformatics/btac482},
	url = {https://doi.org/10.1093/bioinformatics/btac482}
	}
	```