yuyan-dialogue / README.md
席亚东
update the readme
7a8fd9d
---
license: apache-2.0
language: zh
inference: false
tags:
- text-generation
- dialogue-generation
- pytorch
- inference acceleration
- gpt2
- gpt3
---
# YuYan-Dialogue
YuYan is a series of Chinese language models with different size, developed by Fuxi AI lab, Netease.Inc. They are trained on a large Chinese novel dataset of high quality.
YuYan is in the same family of decoder-only models like [GPT2 and GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective.
YuYan-Dialogue is a dialogue model by fine-tuning the YuYan-11b on a large multi-turn dialogue dataset of high quality. It has very strong conversation generation capabilities.
## Model Inference Acceleration
As the model size increases, the model inference time increases and more computational resources are required.
Therefore, we developed our own transformer model inference acceleration framework, [EET](https://github.com/NetEase-FuXi/EET.git). More details are in [Easy and Efficient Transformer: Scalable Inference Solution For Large NLP Model](https://aclanthology.org/2022.naacl-industry.8/).
We combine our language model with the EET inference framework to provide industrial-grade inference reasoning performance.
## How to use
Our model is trained based on the [fairseq](https://github.com/facebookresearch/fairseq). As a result, the inference and finetuning depend on it.
For inference, we modify some parts of the original fairseq codes. Mainly
> fairseq-0.12.2/fairseq/sequence_generator.py
We integrate the EET with sequence_generator. We replace the eos token to a token unlikely to be sampled to ensure the generated text length. The repetition penalty trick is also modified. You can change the penalty strength by adjusting the value of `self.ban_weight`.
Then, to keep the eos token in the final generated text, we change the line 75 `include_eos=False` to `include_eos=True` in
> fairseq-0.12.2/fairseq/data/dictionary.py
Finally, to pass in parameters in python scripts, we remove the line 67 ~ line 69 in
>fairseq-0.12.2/fairseq/dataclass/utils.py
Below are the install tutorial.
```
# install pytorch
pip install torch==1.8.1 # install pytorch
# install fairseq
unzip fairseq-0.12.2.zip
cd fairseq-0.12.2
pip install.
# install EET
git clone https://github.com/NetEase-FuXi/EET.git
cd EET
pip install .
# install transformers (EET requirements)
pip install transformers==4.23
# make a folder, move the dictionary file and model file into it.
mkdir transformer_lm_gpt2_xxl_dialogue
mv dict.txt transformer_lm_gpt2_xxl_dialogue/
mv checkpoint_best_part_*.pt transformer_lm_gpt2_xxl_dialogue/
```
`inference.py` is a script to provide a interface to initialize the EET object and sequence_generator. It includes some pre-process and post-process functions for text input and output. You can modify the script according to your needs.
In addition, it provide a simple object to organize the dialogue generation and dialogue history.
After the environment is ready, several lines of codes can realize the inference.
``` python
from inference import Inference, Dialogue
model_path = "transformer_lm_gpt2_xxl_dialogue/checkpoint_best.pt"
data_path = "transformer_lm_gpt2_xxl_dialogue"
eet_batch_size = 10 # max inference batch size, adjust according to cuda memory, 40GB memory is necessary
inference = Inference(model_path, data_path, eet_batch_size)
dialogue_model = Dialogue(inference)
dialogue_model.get_repsonse("你好啊")
```
## Citation
If you find the technical report or resource is useful, please cite the following technical report in your paper.
- https://aclanthology.org/2022.naacl-industry.8/
```
@inproceedings{li-etal-2022-easy,
title = "Easy and Efficient Transformer: Scalable Inference Solution For Large {NLP} Model",
author = "Li, Gongzheng and
Xi, Yadong and
Ding, Jingzhen and
Wang, Duan and
Luo, Ziyang and
Zhang, Rongsheng and
Liu, Bai and
Fan, Changjie and
Mao, Xiaoxi and
Zhao, Zeng",
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track",
month = jul,
year = "2022",
address = "Hybrid: Seattle, Washington + Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.naacl-industry.8",
doi = "10.18653/v1/2022.naacl-industry.8",
pages = "62--68"
}
```
## Contact Us
You can also contact us by email:
xiyadong@corp.netease.com, dingjingzhen@corp.netease