GithubHelp home page GithubHelp logo

chakki-works / sumeval Goto Github PK

View Code? Open in Web Editor NEW
601.0 19.0 61.0 111 KB

Well tested & Multi-language evaluation framework for text summarization.

License: Apache License 2.0

Python 100.00%
machine-learning text-summarization rouge bleu

sumeval's Introduction

Well tested & Multi-language
evaluation framework for Text Summarization.

PyPI version Build Status codecov

  • Well tested
  • Multi-language
    • Not only English, Japanese and Chinese are also supported. The other language is extensible easily.

Of course, implementation is Pure Python!

How to use

from sumeval.metrics.rouge import RougeCalculator


rouge = RougeCalculator(stopwords=True, lang="en")

rouge_1 = rouge.rouge_n(
            summary="I went to the Mars from my living town.",
            references="I went to Mars",
            n=1)

rouge_2 = rouge.rouge_n(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"],
            n=2)

rouge_l = rouge.rouge_l(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"])

# You need spaCy to calculate ROUGE-BE

rouge_be = rouge.rouge_be(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"])

print("ROUGE-1: {}, ROUGE-2: {}, ROUGE-L: {}, ROUGE-BE: {}".format(
    rouge_1, rouge_2, rouge_l, rouge_be
).replace(", ", "\n"))
from sumeval.metrics.bleu import BLEUCalculator


bleu = BLEUCalculator()
score = bleu.bleu("I am waiting on the beach",
                  "He is walking on the beach")

bleu_ja = BLEUCalculator(lang="ja")
score_ja = bleu_ja.bleu("私はビーチで待ってる", "彼がベンチで待ってる")

From the command line

sumeval r-nlb "I'm living New York its my home town so awesome" "My home town is awesome"

output.

{
  "options": {
    "stopwords": true,
    "stemming": false,
    "word_limit": -1,
    "length_limit": -1,
    "alpha": 0.5,
    "input-summary": "I'm living New York its my home town so awesome",
    "input-references": [
      "My home town is awesome"
    ]
  },
  "averages": {
    "ROUGE-1": 0.7499999999999999,
    "ROUGE-2": 0.6666666666666666,
    "ROUGE-L": 0.7499999999999999,
    "ROUGE-BE": 0
  },
  "scores": [
    {
      "ROUGE-1": 0.7499999999999999,
      "ROUGE-2": 0.6666666666666666,
      "ROUGE-L": 0.7499999999999999,
      "ROUGE-BE": 0
    }
  ]
}

Undoubtedly you can use file input. Please see more detail by sumeval -h.

Install

pip install sumeval

Dependencies

  • BLEU is depends on SacréBLEU
  • To calculate ROUGE-BE, spaCy is required.
  • To use lang ja, janome or MeCab is required.
    • Especially to get score of ROUGE-BE, GiNZA is needed additionally.
  • To use lang zh, jieba is required.
    • Especially to get score of ROUGE-BE, pyhanlp is needed additionally.

Test

sumeval uses two packages to test the score.

  • pythonrouge
    • It calls original perl script
    • pip install git+https://github.com/tagucci/pythonrouge.git
  • rougescore
    • It's simple python implementation for rouge score
    • pip install git+git://github.com/bdusell/rougescore.git

Welcome Contribution 🎉

Add supported language

The tokenization and dependency parse process for each language is located on sumeval/metrics/lang.

You can make language class by inheriting BaseLang.

sumeval's People

Contributors

averypai avatar icoxfog417 avatar shirayu avatar tma15 avatar yutanakamura-tky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sumeval's Issues

Support for chinese?

Thank you for making this package, so how can I use Chinese by inheriting BaseLang? Can you give me some detailed steps?

super() takes at least 1 argument(0 given)

in the file:\metrics\lang\lang_en.py, line 8, in init super().init("en")
TypeError: super() takes at least 1 argument(0 given)
it causes demo cannot run, please you to correct this mistake
image

Ginza v2.0への対応

Ginza v2.0で言語モデル名がja_ginza_nopnからja_ginzaへと変更となったため、Ginza v2.0がインストールされた環境だとROUGE-BE(日本語)で評価しようとすると、lang_ja.pyのload_parser()でエラーになります。

Ginza v2.0に対応する予定はありますでしょうか。

Supporting multiple sentences for the calculation of ROUGE-L?

It seems a common practice to calculate ROUGE-L score with multiple sentences instead of stacking all sentences into a single string. Splitting a summary/reference into multiple sentences can impact the way the longest matching sequences are determined, and therefore impact the ROUGE-L score. Does semeval support this?
Note that this is different from having multiple references for each summary. Thanks!

Please add CITATION.cff for making citation easier

https://qiita.com/icoxfog417/items/65faecbbe27d3c53d212 からたどり着きました。

素晴らしいライブラリをありがとうございます。卒業論文を書くのにsumevalを用いており、引用したいのですが、CITATION.cffを追加していただきたいです!

ちなみに、https://arxiv.org/abs/2108.03502 と似たようなテーマに取り組んでおります。

I got to it from https://qiita.com/icoxfog417/items/65faecbbe27d3c53d212.

Thanks for the great library. I'm using sumeval to write my senior thesis, and I'd like to cite it, but I need you to add CITATION.cff!

By the way, I am working on a similar project to https://arxiv.org/abs/2108.03502.

Input parameter types not clearly documented

Is there a documentation that clearly specifies the type of input parameters i.e. summary and references to rouge.rouge_n(), rouge.rouge_l(), rouge.rouge_be() and bleu.bleu()? As far as I can understand, those parameters are expected to be string and list type but I'm not sure completely. Any clarification regarding this would be helpful.

Are the references single sentences or summaries with many sentences?

Hi, the docs aren't quite clear: lets say I have two different gold standard annotations, each of which includes 2 sentences chosen as good summary sentences. Do I pass them both into the ROUGE methods like this?

rouge.rouge_2(summary, [
   'Sentence one from annotation one. Sentence two from annotation one.',
   'Sentence one from annotation two. Sentence two from annotation two.'
])

Or do I call rouge_2 multiple times, once for each gold standard annotation, and average the results?

rouge.rouge_2(summary, ['Sentence one from annotation one.', 'Sentence two from annotation one.'])
rouge.rouge_2(summary, ['Sentence one from annotation two.', 'Sentence two from annotation two.'])

Cannot use stemming in python3.6

Hello! First I want to thank you for doing great job.

When I trying to use stemming in English, I have an error.

What I do

rouge = RougeCalculator(stopwords=True, stemming=True, lang="en")
rouge_1 = rouge.rouge_n(summary=summary, references=references, n=1)

Error message

  File "/home/vinter/.local/lib/python3.6/site-packages/sumeval/metrics/lang/base_lang.py", line 51, in stemming
    self.load_stemming_dict()
  File "/home/vinter/.local/lib/python3.6/site-packages/sumeval/metrics/lang/base_lang.py", line 82, in load_stemming_dict
    self._stemming = dict(lines)
ValueError: dictionary update sequence element #45 has length 3; 2 is required

I notice that lines 77~81 in metrics/lang/base_lang.py causing the error.

I tried to change like below, but it doesn't work for some lines that include 3 elements such as better good well

    def load_stemming_dict(self):
        p = Path(os.path.dirname(__file__))
        p = p.joinpath("data", self.lang, "stemming.txt")
        if p.is_file():
            with p.open(encoding="utf-8") as f:
                lines = f.readlines()
-                lines = [ln.strip() for ln in lines]
+               lines = [ln.strip().split(' ') for ln in lines]
                lines = [ln for ln in lines if ln]
            self._stemming = dict(lines)

ROUGE-L for summary-level

Hi. I have a feature request for ROUGE-L calculation with multiple sentences.
There are two manners for ROUGE-L calculation: (1) sentence-level and (2) summary-level.
See also: https://github.com/google-research/google-research/tree/master/rouge#two-flavors-of-rouge-l

In pythonrouge, if we use a new line for each sentence, we can calculate summary-level ROUGE-L.
If we use the concatenated sentence, the value will be sentence-level ROUGE-L.

In Google's rouge implemention, they support two types of ROUGE-L (rougeL and rougeLsum).

From my experiment, sumeval supports only sentence-level ROUGE-L.
Is it correct? And do you have a plan to implement such an option?

Here is the test code to compare the results:

import numpy as np
import pytest

from pythonrouge.pythonrouge import Pythonrouge
from rouge_score.rouge_scorer import RougeScorer
from sumeval.metrics.rouge import RougeCalculator

SUMMARY = [
    (
        "the unusual format has been captured in a series of photographs by visual "
        "journalist anna erickson ."
    ),
    (
        "meet seattle 's rolling neighborhood of rvs , where each unassuming vehicle "
        "is a capsule home ."
    ),
    (
        "meet bud dodson , 57 , and welcome to his home : an rv in seattle 's sodo "
        "where he watches over the parking lot in exchange for a spot"
    ),
]

ABSTRACT = [
    (
        "around 30 people live a floating life in seattle 's sodo "
        "( south of downtown ) area in their rvs"
    ),
    (
        "there is one parking lot in particular where the owner lets them act as "
        "watchmen in exchange for a spot to live"
    ),
    (
        "visual journalist anna erickson , who photographed the community , said "
        "they are just grateful to have a home"
    ),
]

@pytest.mark.parametrize("stemming", [True, False])
def test_rouge_mutilple_sentences(stemming):
    # In pythonrouge, sentences are represented as the list of sentences.
    # If use list of sentences, each sentence has a new line.
    summ = SUMMARY
    abst = ABSTRACT

    pythonrouge = Pythonrouge(
        summary_file_exist=False,
        summary=[summ],
        reference=[[abst]],
        n_gram=2,
        ROUGE_SU4=False,
        ROUGE_L=True,
        recall_only=False,
        stemming=stemming,
        stopwords=False,
        length_limit=False,
    )
    res = pythonrouge.calc_score()
    pythonrouge_rouge_1 = res["ROUGE-1-F"]
    pythonrouge_rouge_2 = res["ROUGE-2-F"]
    pythonrouge_rouge_l = res["ROUGE-L-F"]
    print(f"pythonrouge_rouge_1={pythonrouge_rouge_1}")
    print(f"pythonrouge_rouge_2={pythonrouge_rouge_2}")
    print(f"pythonrouge_rouge_l={pythonrouge_rouge_l}")

    # In rouge_score, sentences are represented as the concatenated sentence with "\n"
    # In sumeval, it does not care about how to concat the sentences (?)
    abst = "\n".join(ABSTRACT)
    summ = "\n".join(SUMMARY)

    rouge_calculator = RougeCalculator(stopwords=False, stemming=stemming, lang="en")
    rouge_calculator_rouge_1 = rouge_calculator.rouge_n(summ, abst, n=1)
    rouge_calculator_rouge_2 = rouge_calculator.rouge_n(summ, abst, n=2)
    rouge_calculator_rouge_l = rouge_calculator.rouge_l(summ, abst)
    print(f"rouge_calculator_rouge_1={rouge_calculator_rouge_1}")
    print(f"rouge_calculator_rouge_2={rouge_calculator_rouge_2}")
    print(f"rouge_calculator_rouge_l={rouge_calculator_rouge_l}")

    rouge_scorer = RougeScorer(
        ["rouge1", "rouge2", "rougeL", "rougeLsum"], use_stemmer=stemming
    )
    res = rouge_scorer.score(abst, summ)
    rouge_scorer_rouge_1 = res["rouge1"].fmeasure
    rouge_scorer_rouge_2 = res["rouge2"].fmeasure
    rouge_scorer_rouge_l = res["rougeL"].fmeasure
    rouge_scorer_rouge_lsum = res["rougeLsum"].fmeasure
    print(f"rouge_scorer_rouge_1={rouge_scorer_rouge_1}")
    print(f"rouge_scorer_rouge_2={rouge_scorer_rouge_2}")
    print(f"rouge_scorer_rouge_l={rouge_scorer_rouge_l}")
    print(f"rouge_scorer_rouge_lsum={rouge_scorer_rouge_lsum}")

    try:
        np.testing.assert_array_almost_equal(
            np.array(
                [
                    rouge_scorer_rouge_1,
                    rouge_scorer_rouge_2,
                    rouge_scorer_rouge_l,
                ]
            ),
            np.array(
                [
                    rouge_calculator_rouge_1,
                    rouge_calculator_rouge_2,
                    rouge_calculator_rouge_l,
                ]
            ),
            decimal=5,
        )
    except AssertionError as e:
        if stemming:
            # If we use stemming in sumeval, the value will be different
            # https://github.com/chakki-works/sumeval/issues/20
            pass
        else:
            raise e
    np.testing.assert_array_almost_equal(
        np.array(
            [
                pythonrouge_rouge_1,
                pythonrouge_rouge_2,
                pythonrouge_rouge_l,
            ]
        ),
        np.array(
            [
                rouge_scorer_rouge_1,
                rouge_scorer_rouge_2,
                rouge_scorer_rouge_lsum,
            ]
        ),
        decimal=5,
    )

`From commandline` command error

$ sumeval r-nlb "I'm living New York its my home town so awesome" "My home town is awesome"
Traceback (most recent call last):
  File "/usr/local/bin/sumeval", line 9, in <module>
    load_entry_point('sumeval==0.1.4', 'console_scripts', 'sumeval')()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 565, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 2697, in load_entry_point
    return ep.load()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 2370, in load
    return self.resolve()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 2376, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/Library/Python/2.7/site-packages/sumeval/cli/sum_eval.py", line 10
    score_desc: "ex: To calculate ROUGE-N, L, BE => 'r-nlb'",

ROUGE score is not matched with Pythonrouge when stemming=True

Hi @icoxfog417. Thank you for providing a great tool!
I found that the result is different between Pythonrouge and RougeCalculator when using stemming=True.
I attached the test code to reproduce (just change the option stemming):

class TestRouge(unittest.TestCase):
    DATA_DIR = os.path.join(os.path.dirname(__file__), "data/rouge")

    def load_test_data(self):
        test_file = os.path.join(self.DATA_DIR, "ROUGE-test.json")
        with open(test_file, encoding="utf-8") as f:
            data = json.load(f)
        return data

    def test_rouge_with_stemming(self):
        data = self.load_test_data()
        rouge = RougeCalculator(stopwords=False, stemming=True)
        for eval_id in data:
            summaries = data[eval_id]["summaries"]
            references = data[eval_id]["references"]
            for n in [1, 2]:
                for s in summaries:
                    baseline = Pythonrouge(
                                summary_file_exist=False,
                                summary=[[s]],
                                reference=[[[r] for r in references]],
                                n_gram=n, recall_only=False,
                                length_limit=False,
                                stemming=True, stopwords=False)
                    b1_v = baseline.calc_score()
                    b2_v = rouge_n(rouge.tokenize(s),
                                   [rouge.tokenize(r) for r in references],
                                   n, 0.5)
                    v = rouge.rouge_n(s, references, n)
                    self.assertLess(abs(b2_v - v), 1e-5)
                    self.assertLess(abs(b1_v["ROUGE-{}-F".format(n)] - v), 1e-5) # noqa

Is this expected?
If so, is there any solution to match the results?

broken with recent version of sacrebleu

~/projects/{redacted}/venv/lib/python3.6/site-packages/sumeval/metrics/bleu.py in bleu(self, summary, references, score_only)
     51                     force=False, lowercase=self.lowercase,
     52                     tokenize=self.tokenizer,
---> 53                     use_effective_order=self.use_effective_order)
     54         else:
     55             _s = " ".join(summary)

TypeError: corpus_bleu() got an unexpected keyword argument 'smooth'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.