---
title: Test ParaScore
emoji: 🤗 
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
  ParaScore is a new metric to scoring the performance of paraphrase generation tasks
  
  See the project at https://github.com/shadowkiller33/ParaScore for more information.
---

# Metric Card for ParaScore

## Metric description

ParaScore is a new metric to scoring the performance of paraphrase generation tasks

## How to use 

```python
from evaluate import load
bertscore = load("transZ/test_parascore")
predictions = ["hello there", "general kenobi"]
references = ["hello there", "general kenobi"]
results = bertscore.compute(predictions=predictions, references=references, lang="en")
```

## Output values

ParaScore outputs a dictionary with the following values:

`score`: Range from 0.0 to 1.0

## Limitations and bias

The [original ParaScore paper](https://arxiv.org/abs/2202.08479) showed that ParaScore correlates well with human judgment on sentence-level and system-level evaluation, but this depends on the model and language pair selected.

## Citation

```bibtex
@article{Shen2022,
    archivePrefix = {arXiv},
    arxivId = {2202.08479},
    author = {Shen, Lingfeng and Liu, Lemao and Jiang, Haiyun and Shi, Shuming},
    journal = {EMNLP 2022 - 2022 Conference on Empirical Methods in Natural Language Processing, Proceedings},
    eprint = {2202.08479},
    month = {feb},
    number = {1},
    pages = {3178--3190},
    title = {{On the Evaluation Metrics for Paraphrase Generation}},
    url = {http://arxiv.org/abs/2202.08479},
    year = {2022}
}
```
    
## Further References 
- [Offcial implementation](https://github.com/shadowkiller33/parascore_toolkit)