--- title: Test ParaScore emoji: 🤗 colorFrom: blue colorTo: red sdk: gradio sdk_version: 3.0.2 app_file: app.py pinned: false tags: - evaluate - metric description: >- ParaScore is a new metric to scoring the performance of paraphrase generation tasks See the project at https://github.com/shadowkiller33/ParaScore for more information. --- # Metric Card for ParaScore ## Metric description ParaScore is a new metric to scoring the performance of paraphrase generation tasks ## How to use ```python from evaluate import load bertscore = load("transZ/test_parascore") predictions = ["hello there", "general kenobi"] references = ["hello there", "general kenobi"] results = bertscore.compute(predictions=predictions, references=references, lang="en") ``` ## Output values ParaScore outputs a dictionary with the following values: `score`: Range from 0.0 to 1.0 ## Limitations and bias The [original ParaScore paper](https://arxiv.org/abs/2202.08479) showed that ParaScore correlates well with human judgment on sentence-level and system-level evaluation, but this depends on the model and language pair selected. ## Citation ```bibtex @article{Shen2022, archivePrefix = {arXiv}, arxivId = {2202.08479}, author = {Shen, Lingfeng and Liu, Lemao and Jiang, Haiyun and Shi, Shuming}, journal = {EMNLP 2022 - 2022 Conference on Empirical Methods in Natural Language Processing, Proceedings}, eprint = {2202.08479}, month = {feb}, number = {1}, pages = {3178--3190}, title = {{On the Evaluation Metrics for Paraphrase Generation}}, url = {http://arxiv.org/abs/2202.08479}, year = {2022} } ``` ## Further References - [Offcial implementation](https://github.com/shadowkiller33/parascore_toolkit)