---
language:
- ko
metrics:
- accuracy
- f1
pipeline_tag: text-classification
---
# XLM-Roberta-base --> 8emotions!

## Label Dictionry

- label_dictionary
- emo2int = {
    "기쁨": 0, "당황": 1, "분노": 2,
    "불안": 3, "상처": 4, "슬픔": 5,
    "중립": 6
}
- kore2en = {
    "기쁨": "joy", "당황": "surprise", "분노": "anger",
    "불안": "fear", "상처": "hurt", "슬픔": "sadness",
    "중립": "neutral"
}

## Dataset

### 감성대화말뭉치(AI Hub)
- input format(recommendation) - this model is trained by ChatBOT dataset. 
- ref: https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=86

### 한국어 감정 정보가 포함된 연속적 대화 데이터셋(AIHub)
- And.. this dataset doesn't have neutral class..
- So additional dataset is used. 
- ref: https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=271 

- finally I Concatenate 2 Datasets.

## Input Format(Please Use Special Tokens [USR], [BOT] to use model API!)

- (example)
[USR] 안녕. [BOT] 안녕하세요! 무엇을 도와드릴까요? [USR] 별일 없어.

- 이 두개의 특수 토큰은 반드시 사용해주시길 부탁드립니다.

- And these are a part of real data.
![input_format](./INPUT.png)

## Metrics(F1, Accuracy, and Confusion Matrix!)

- and confusion matrix like this..
![ConfusionMatrix](./CM.png)

- and so on.. F1, Accuracy

![Training_Steps](./Train.PNG)