File size: 4,650 Bytes

---
license: apache-2.0

language: zh
inference: false
tags:
- text-generation
- dialogue-generation
- pytorch
- inference acceleration
- gpt2
- gpt3
---
# YuYan-Dialogue

YuYan is a series of Chinese language models with different size, developed by Fuxi AI lab, Netease.Inc. They are trained on a large Chinese novel dataset of high quality. 

YuYan is in the same family of decoder-only models like [GPT2 and GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective.

YuYan-Dialogue is a dialogue model by fine-tuning the YuYan-11b on a large multi-turn dialogue dataset of high quality. It has very strong conversation generation capabilities. 

## Model Inference Acceleration 

As the model size increases, the model inference time increases and more computational resources are required. 

Therefore, we developed our own transformer model inference acceleration framework, [EET](https://github.com/NetEase-FuXi/EET.git). More details are in [Easy and Efficient Transformer: Scalable Inference Solution For Large NLP Model](https://aclanthology.org/2022.naacl-industry.8/).

We combine our language model with the EET inference framework to provide industrial-grade inference reasoning performance.

## How to use

Our model is trained based on the [fairseq](https://github.com/facebookresearch/fairseq). As a result, the inference and finetuning depend on it.

For inference, we modify some parts of the original fairseq codes. Mainly 
> fairseq-0.12.2/fairseq/sequence_generator.py

We integrate the EET with sequence_generator. We replace the eos token to a token unlikely to be sampled to ensure the generated text length. The repetition penalty trick is also modified. You can change the penalty strength by adjusting the value of `self.ban_weight`.  

Then, to keep the eos token in the final generated text, we change the line 75 `include_eos=False` to `include_eos=True` in
> fairseq-0.12.2/fairseq/data/dictionary.py 

Finally, to pass in parameters in python scripts, we remove the line 67 ~ line 69 in
>fairseq-0.12.2/fairseq/dataclass/utils.py

Below are the install tutorial.

```
# install pytorch
pip install torch==1.8.1 # install pytorch

# install fairseq
unzip fairseq-0.12.2.zip
cd fairseq-0.12.2
pip  install.

# install EET
git clone https://github.com/NetEase-FuXi/EET.git
cd EET
pip install .

# install transformers (EET requirements)
pip install transformers==4.23

# make a folder, move the dictionary file and model file into it.
mkdir transformer_lm_gpt2_xxl_dialogue
mv dict.txt transformer_lm_gpt2_xxl_dialogue/
mv checkpoint_best_part_*.pt transformer_lm_gpt2_xxl_dialogue/

```
`inference.py` is a script to provide a interface to initialize the EET object and sequence_generator. It includes some pre-process and post-process functions for text input and output. You can modify the script according to your needs. 

In addition, it provide a simple object to organize the dialogue generation and dialogue history.

After the environment is ready, several lines of codes can realize the inference.

``` python

from inference import Inference, Dialogue
model_path = "transformer_lm_gpt2_xxl_dialogue/checkpoint_best.pt"
data_path = "transformer_lm_gpt2_xxl_dialogue"
eet_batch_size = 10  # max inference batch size, adjust according to cuda memory, 40GB memory is necessary 
inference = Inference(model_path, data_path, eet_batch_size)
dialogue_model = Dialogue(inference)
dialogue_model.get_repsonse("你好啊")
```
## Citation
If you find the technical report or resource is useful, please cite the following technical report in your paper.
- https://aclanthology.org/2022.naacl-industry.8/
```
@inproceedings{li-etal-2022-easy,
    title = "Easy and Efficient Transformer: Scalable Inference Solution For Large {NLP} Model",
    author = "Li, Gongzheng  and
      Xi, Yadong  and
      Ding, Jingzhen  and
      Wang, Duan  and
      Luo, Ziyang  and
      Zhang, Rongsheng  and
      Liu, Bai  and
      Fan, Changjie  and
      Mao, Xiaoxi  and
      Zhao, Zeng",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track",
    month = jul,
    year = "2022",
    address = "Hybrid: Seattle, Washington + Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-industry.8",
    doi = "10.18653/v1/2022.naacl-industry.8",
    pages = "62--68"
}

```
## Contact Us
You can also contact us by email:

xiyadong@corp.netease.com, dingjingzhen@corp.netease