chakki-works / sumeval Goto Github PK

Well tested & Multi-language evaluation framework for text summarization.

License: Apache License 2.0

Python 100.00%

machine-learning text-summarization rouge bleu

sumeval's Issues

ROUGE score is not matched with Pythonrouge when stemming=True

Hi @icoxfog417. Thank you for providing a great tool!
I found that the result is different between Pythonrouge and RougeCalculator when using stemming=True.
I attached the test code to reproduce (just change the option stemming):

class TestRouge(unittest.TestCase):
    DATA_DIR = os.path.join(os.path.dirname(__file__), "data/rouge")

    def load_test_data(self):
        test_file = os.path.join(self.DATA_DIR, "ROUGE-test.json")
        with open(test_file, encoding="utf-8") as f:
            data = json.load(f)
        return data

    def test_rouge_with_stemming(self):
        data = self.load_test_data()
        rouge = RougeCalculator(stopwords=False, stemming=True)
        for eval_id in data:
            summaries = data[eval_id]["summaries"]
            references = data[eval_id]["references"]
            for n in [1, 2]:
                for s in summaries:
                    baseline = Pythonrouge(
                                summary_file_exist=False,
                                summary=[[s]],
                                reference=[[[r] for r in references]],
                                n_gram=n, recall_only=False,
                                length_limit=False,
                                stemming=True, stopwords=False)
                    b1_v = baseline.calc_score()
                    b2_v = rouge_n(rouge.tokenize(s),
                                   [rouge.tokenize(r) for r in references],
                                   n, 0.5)
                    v = rouge.rouge_n(s, references, n)
                    self.assertLess(abs(b2_v - v), 1e-5)
                    self.assertLess(abs(b1_v["ROUGE-{}-F".format(n)] - v), 1e-5) # noqa

Is this expected?
If so, is there any solution to match the results?

Cannot use stemming in python3.6

Hello! First I want to thank you for doing great job.

When I trying to use stemming in English, I have an error.

What I do

rouge = RougeCalculator(stopwords=True, stemming=True, lang="en")
rouge_1 = rouge.rouge_n(summary=summary, references=references, n=1)

Error message

  File "/home/vinter/.local/lib/python3.6/site-packages/sumeval/metrics/lang/base_lang.py", line 51, in stemming
    self.load_stemming_dict()
  File "/home/vinter/.local/lib/python3.6/site-packages/sumeval/metrics/lang/base_lang.py", line 82, in load_stemming_dict
    self._stemming = dict(lines)
ValueError: dictionary update sequence element #45 has length 3; 2 is required

I notice that lines 77~81 in metrics/lang/base_lang.py causing the error.

I tried to change like below, but it doesn't work for some lines that include 3 elements such as better good well

    def load_stemming_dict(self):
        p = Path(os.path.dirname(__file__))
        p = p.joinpath("data", self.lang, "stemming.txt")
        if p.is_file():
            with p.open(encoding="utf-8") as f:
                lines = f.readlines()
-                lines = [ln.strip() for ln in lines]
+               lines = [ln.strip().split(' ') for ln in lines]
                lines = [ln for ln in lines if ln]
            self._stemming = dict(lines)

Please add CITATION.cff for making citation easier

https://qiita.com/icoxfog417/items/65faecbbe27d3c53d212 からたどり着きました。

素晴らしいライブラリをありがとうございます。卒業論文を書くのにsumevalを用いており、引用したいのですが、CITATION.cffを追加していただきたいです！

ちなみに、https://arxiv.org/abs/2108.03502 と似たようなテーマに取り組んでおります。

I got to it from https://qiita.com/icoxfog417/items/65faecbbe27d3c53d212.

Thanks for the great library. I'm using sumeval to write my senior thesis, and I'd like to cite it, but I need you to add CITATION.cff!

By the way, I am working on a similar project to https://arxiv.org/abs/2108.03502.

Ginza v2.0への対応

Ginza v2.0で言語モデル名がja_ginza_nopnからja_ginzaへと変更となったため、Ginza v2.0がインストールされた環境だとROUGE-BE（日本語）で評価しようとすると、lang_ja.pyのload_parser()でエラーになります。

Ginza v2.0に対応する予定はありますでしょうか。

What exactly does rouge_be measure?

Tried looking through the code but I'm still not exactly clear on what rouge_be is measuring. What is rouge_be measuring?

broken with recent version of sacrebleu

~/projects/{redacted}/venv/lib/python3.6/site-packages/sumeval/metrics/bleu.py in bleu(self, summary, references, score_only)
     51                     force=False, lowercase=self.lowercase,
     52                     tokenize=self.tokenizer,
---> 53                     use_effective_order=self.use_effective_order)
     54         else:
     55             _s = " ".join(summary)

TypeError: corpus_bleu() got an unexpected keyword argument 'smooth'

How to calculate ROUGE-SU4

Move Japanese parser from Cabocha to GiNZA

Cabocha is not maintained and not based on Universal Dependency. Additionally, it requires editing code to install. It makes building CI environment difficult.

Because of this, move parser to GiNZA

how to load self jieba

Are the references single sentences or summaries with many sentences?

Hi, the docs aren't quite clear: lets say I have two different gold standard annotations, each of which includes 2 sentences chosen as good summary sentences. Do I pass them both into the ROUGE methods like this?

rouge.rouge_2(summary, [
   'Sentence one from annotation one. Sentence two from annotation one.',
   'Sentence one from annotation two. Sentence two from annotation two.'
])

Or do I call rouge_2 multiple times, once for each gold standard annotation, and average the results?

rouge.rouge_2(summary, ['Sentence one from annotation one.', 'Sentence two from annotation one.'])
rouge.rouge_2(summary, ['Sentence one from annotation two.', 'Sentence two from annotation two.'])

Implements CI

Implements CI environment. This issue will be done after #11.

Support for chinese?

Thank you for making this package, so how can I use Chinese by inheriting BaseLang? Can you give me some detailed steps?

Input parameter types not clearly documented

Is there a documentation that clearly specifies the type of input parameters i.e. summary and references to rouge.rouge_n(), rouge.rouge_l(), rouge.rouge_be() and bleu.bleu()? As far as I can understand, those parameters are expected to be string and list type but I'm not sure completely. Any clarification regarding this would be helpful.

Supporting multiple sentences for the calculation of ROUGE-L?

It seems a common practice to calculate ROUGE-L score with multiple sentences instead of stacking all sentences into a single string. Splitting a summary/reference into multiple sentences can impact the way the longest matching sequences are determined, and therefore impact the ROUGE-L score. Does semeval support this?
Note that this is different from having multiple references for each summary. Thanks!

encounter an OSError

super() takes at least 1 argument(0 given)

in the file:\metrics\lang\lang_en.py, line 8, in init super().init("en")
TypeError: super() takes at least 1 argument(0 given)
it causes demo cannot run, please you to correct this mistake

`From commandline` command error

$ sumeval r-nlb "I'm living New York its my home town so awesome" "My home town is awesome"
Traceback (most recent call last):
  File "/usr/local/bin/sumeval", line 9, in <module>
    load_entry_point('sumeval==0.1.4', 'console_scripts', 'sumeval')()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 565, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 2697, in load_entry_point
    return ep.load()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 2370, in load
    return self.resolve()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 2376, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/Library/Python/2.7/site-packages/sumeval/cli/sum_eval.py", line 10
    score_desc: "ex: To calculate ROUGE-N, L, BE => 'r-nlb'",

ROUGE-L for summary-level

Hi. I have a feature request for ROUGE-L calculation with multiple sentences.
There are two manners for ROUGE-L calculation: (1) sentence-level and (2) summary-level.
See also: https://github.com/google-research/google-research/tree/master/rouge#two-flavors-of-rouge-l

In pythonrouge, if we use a new line for each sentence, we can calculate summary-level ROUGE-L.
If we use the concatenated sentence, the value will be sentence-level ROUGE-L.

In Google's rouge implemention, they support two types of ROUGE-L (rougeL and rougeLsum).

From my experiment, sumeval supports only sentence-level ROUGE-L.
Is it correct? And do you have a plan to implement such an option?

Here is the test code to compare the results:

import numpy as np
import pytest

from pythonrouge.pythonrouge import Pythonrouge
from rouge_score.rouge_scorer import RougeScorer
from sumeval.metrics.rouge import RougeCalculator

SUMMARY = [
    (
        "the unusual format has been captured in a series of photographs by visual "
        "journalist anna erickson ."
    ),
    (
        "meet seattle 's rolling neighborhood of rvs , where each unassuming vehicle "
        "is a capsule home ."
    ),
    (
        "meet bud dodson , 57 , and welcome to his home : an rv in seattle 's sodo "
        "where he watches over the parking lot in exchange for a spot"
    ),
]

ABSTRACT = [
    (
        "around 30 people live a floating life in seattle 's sodo "
        "( south of downtown ) area in their rvs"
    ),
    (
        "there is one parking lot in particular where the owner lets them act as "
        "watchmen in exchange for a spot to live"
    ),
    (
        "visual journalist anna erickson , who photographed the community , said "
        "they are just grateful to have a home"
    ),
]

@pytest.mark.parametrize("stemming", [True, False])
def test_rouge_mutilple_sentences(stemming):
    # In pythonrouge, sentences are represented as the list of sentences.
    # If use list of sentences, each sentence has a new line.
    summ = SUMMARY
    abst = ABSTRACT

    pythonrouge = Pythonrouge(
        summary_file_exist=False,
        summary=[summ],
        reference=[[abst]],
        n_gram=2,
        ROUGE_SU4=False,
        ROUGE_L=True,
        recall_only=False,
        stemming=stemming,
        stopwords=False,
        length_limit=False,
    )
    res = pythonrouge.calc_score()
    pythonrouge_rouge_1 = res["ROUGE-1-F"]
    pythonrouge_rouge_2 = res["ROUGE-2-F"]
    pythonrouge_rouge_l = res["ROUGE-L-F"]
    print(f"pythonrouge_rouge_1={pythonrouge_rouge_1}")
    print(f"pythonrouge_rouge_2={pythonrouge_rouge_2}")
    print(f"pythonrouge_rouge_l={pythonrouge_rouge_l}")

    # In rouge_score, sentences are represented as the concatenated sentence with "\n"
    # In sumeval, it does not care about how to concat the sentences (?)
    abst = "\n".join(ABSTRACT)
    summ = "\n".join(SUMMARY)

    rouge_calculator = RougeCalculator(stopwords=False, stemming=stemming, lang="en")
    rouge_calculator_rouge_1 = rouge_calculator.rouge_n(summ, abst, n=1)
    rouge_calculator_rouge_2 = rouge_calculator.rouge_n(summ, abst, n=2)
    rouge_calculator_rouge_l = rouge_calculator.rouge_l(summ, abst)
    print(f"rouge_calculator_rouge_1={rouge_calculator_rouge_1}")
    print(f"rouge_calculator_rouge_2={rouge_calculator_rouge_2}")
    print(f"rouge_calculator_rouge_l={rouge_calculator_rouge_l}")

    rouge_scorer = RougeScorer(
        ["rouge1", "rouge2", "rougeL", "rougeLsum"], use_stemmer=stemming
    )
    res = rouge_scorer.score(abst, summ)
    rouge_scorer_rouge_1 = res["rouge1"].fmeasure
    rouge_scorer_rouge_2 = res["rouge2"].fmeasure
    rouge_scorer_rouge_l = res["rougeL"].fmeasure
    rouge_scorer_rouge_lsum = res["rougeLsum"].fmeasure
    print(f"rouge_scorer_rouge_1={rouge_scorer_rouge_1}")
    print(f"rouge_scorer_rouge_2={rouge_scorer_rouge_2}")
    print(f"rouge_scorer_rouge_l={rouge_scorer_rouge_l}")
    print(f"rouge_scorer_rouge_lsum={rouge_scorer_rouge_lsum}")

    try:
        np.testing.assert_array_almost_equal(
            np.array(
                [
                    rouge_scorer_rouge_1,
                    rouge_scorer_rouge_2,
                    rouge_scorer_rouge_l,
                ]
            ),
            np.array(
                [
                    rouge_calculator_rouge_1,
                    rouge_calculator_rouge_2,
                    rouge_calculator_rouge_l,
                ]
            ),
            decimal=5,
        )
    except AssertionError as e:
        if stemming:
            # If we use stemming in sumeval, the value will be different
            # https://github.com/chakki-works/sumeval/issues/20
            pass
        else:
            raise e
    np.testing.assert_array_almost_equal(
        np.array(
            [
                pythonrouge_rouge_1,
                pythonrouge_rouge_2,
                pythonrouge_rouge_l,
            ]
        ),
        np.array(
            [
                rouge_scorer_rouge_1,
                rouge_scorer_rouge_2,
                rouge_scorer_rouge_lsum,
            ]
        ),
        decimal=5,
    )

chakki-works / sumeval Goto Github PK

sumeval's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs