chakki-works / sumeval Goto Github PK
View Code? Open in Web Editor NEWWell tested & Multi-language evaluation framework for text summarization.
License: Apache License 2.0
Well tested & Multi-language evaluation framework for text summarization.
License: Apache License 2.0
Hi @icoxfog417. Thank you for providing a great tool!
I found that the result is different between Pythonrouge
and RougeCalculator
when using stemming=True
.
I attached the test code to reproduce (just change the option stemming
):
class TestRouge(unittest.TestCase):
DATA_DIR = os.path.join(os.path.dirname(__file__), "data/rouge")
def load_test_data(self):
test_file = os.path.join(self.DATA_DIR, "ROUGE-test.json")
with open(test_file, encoding="utf-8") as f:
data = json.load(f)
return data
def test_rouge_with_stemming(self):
data = self.load_test_data()
rouge = RougeCalculator(stopwords=False, stemming=True)
for eval_id in data:
summaries = data[eval_id]["summaries"]
references = data[eval_id]["references"]
for n in [1, 2]:
for s in summaries:
baseline = Pythonrouge(
summary_file_exist=False,
summary=[[s]],
reference=[[[r] for r in references]],
n_gram=n, recall_only=False,
length_limit=False,
stemming=True, stopwords=False)
b1_v = baseline.calc_score()
b2_v = rouge_n(rouge.tokenize(s),
[rouge.tokenize(r) for r in references],
n, 0.5)
v = rouge.rouge_n(s, references, n)
self.assertLess(abs(b2_v - v), 1e-5)
self.assertLess(abs(b1_v["ROUGE-{}-F".format(n)] - v), 1e-5) # noqa
Is this expected?
If so, is there any solution to match the results?
Hello! First I want to thank you for doing great job.
When I trying to use stemming in English, I have an error.
What I do
rouge = RougeCalculator(stopwords=True, stemming=True, lang="en")
rouge_1 = rouge.rouge_n(summary=summary, references=references, n=1)
Error message
File "/home/vinter/.local/lib/python3.6/site-packages/sumeval/metrics/lang/base_lang.py", line 51, in stemming
self.load_stemming_dict()
File "/home/vinter/.local/lib/python3.6/site-packages/sumeval/metrics/lang/base_lang.py", line 82, in load_stemming_dict
self._stemming = dict(lines)
ValueError: dictionary update sequence element #45 has length 3; 2 is required
I notice that lines 77~81 in metrics/lang/base_lang.py
causing the error.
I tried to change like below, but it doesn't work for some lines that include 3 elements such as better good well
def load_stemming_dict(self):
p = Path(os.path.dirname(__file__))
p = p.joinpath("data", self.lang, "stemming.txt")
if p.is_file():
with p.open(encoding="utf-8") as f:
lines = f.readlines()
- lines = [ln.strip() for ln in lines]
+ lines = [ln.strip().split(' ') for ln in lines]
lines = [ln for ln in lines if ln]
self._stemming = dict(lines)
https://qiita.com/icoxfog417/items/65faecbbe27d3c53d212 からたどり着きました。
素晴らしいライブラリをありがとうございます。卒業論文を書くのにsumeval
を用いており、引用したいのですが、CITATION.cffを追加していただきたいです!
ちなみに、https://arxiv.org/abs/2108.03502 と似たようなテーマに取り組んでおります。
I got to it from https://qiita.com/icoxfog417/items/65faecbbe27d3c53d212.
Thanks for the great library. I'm using sumeval
to write my senior thesis, and I'd like to cite it, but I need you to add CITATION.cff!
By the way, I am working on a similar project to https://arxiv.org/abs/2108.03502.
Ginza v2.0で言語モデル名がja_ginza_nopn
からja_ginza
へと変更となったため、Ginza v2.0がインストールされた環境だとROUGE-BE(日本語)で評価しようとすると、lang_ja.pyのload_parser()でエラーになります。
Ginza v2.0に対応する予定はありますでしょうか。
Tried looking through the code but I'm still not exactly clear on what rouge_be is measuring. What is rouge_be measuring?
~/projects/{redacted}/venv/lib/python3.6/site-packages/sumeval/metrics/bleu.py in bleu(self, summary, references, score_only)
51 force=False, lowercase=self.lowercase,
52 tokenize=self.tokenizer,
---> 53 use_effective_order=self.use_effective_order)
54 else:
55 _s = " ".join(summary)
TypeError: corpus_bleu() got an unexpected keyword argument 'smooth'
Cabocha is not maintained and not based on Universal Dependency. Additionally, it requires editing code to install. It makes building CI environment difficult.
Because of this, move parser to GiNZA
Hi, the docs aren't quite clear: lets say I have two different gold standard annotations, each of which includes 2 sentences chosen as good summary sentences. Do I pass them both into the ROUGE methods like this?
rouge.rouge_2(summary, [
'Sentence one from annotation one. Sentence two from annotation one.',
'Sentence one from annotation two. Sentence two from annotation two.'
])
Or do I call rouge_2 multiple times, once for each gold standard annotation, and average the results?
rouge.rouge_2(summary, ['Sentence one from annotation one.', 'Sentence two from annotation one.'])
rouge.rouge_2(summary, ['Sentence one from annotation two.', 'Sentence two from annotation two.'])
Implements CI environment. This issue will be done after #11.
Thank you for making this package, so how can I use Chinese by inheriting BaseLang? Can you give me some detailed steps?
Is there a documentation that clearly specifies the type of input parameters i.e. summary
and references
to rouge.rouge_n()
, rouge.rouge_l()
, rouge.rouge_be()
and bleu.bleu()
? As far as I can understand, those parameters are expected to be string and list type but I'm not sure completely. Any clarification regarding this would be helpful.
It seems a common practice to calculate ROUGE-L score with multiple sentences instead of stacking all sentences into a single string. Splitting a summary/reference into multiple sentences can impact the way the longest matching sequences are determined, and therefore impact the ROUGE-L score. Does semeval support this?
Note that this is different from having multiple references for each summary. Thanks!
$ sumeval r-nlb "I'm living New York its my home town so awesome" "My home town is awesome"
Traceback (most recent call last):
File "/usr/local/bin/sumeval", line 9, in <module>
load_entry_point('sumeval==0.1.4', 'console_scripts', 'sumeval')()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 565, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 2697, in load_entry_point
return ep.load()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 2370, in load
return self.resolve()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources/__init__.py", line 2376, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/Library/Python/2.7/site-packages/sumeval/cli/sum_eval.py", line 10
score_desc: "ex: To calculate ROUGE-N, L, BE => 'r-nlb'",
Hi. I have a feature request for ROUGE-L calculation with multiple sentences.
There are two manners for ROUGE-L calculation: (1) sentence-level and (2) summary-level.
See also: https://github.com/google-research/google-research/tree/master/rouge#two-flavors-of-rouge-l
In pythonrouge
, if we use a new line for each sentence, we can calculate summary-level ROUGE-L.
If we use the concatenated sentence, the value will be sentence-level ROUGE-L.
In Google's rouge implemention, they support two types of ROUGE-L (rougeL
and rougeLsum
).
From my experiment, sumeval supports only sentence-level ROUGE-L.
Is it correct? And do you have a plan to implement such an option?
Here is the test code to compare the results:
import numpy as np
import pytest
from pythonrouge.pythonrouge import Pythonrouge
from rouge_score.rouge_scorer import RougeScorer
from sumeval.metrics.rouge import RougeCalculator
SUMMARY = [
(
"the unusual format has been captured in a series of photographs by visual "
"journalist anna erickson ."
),
(
"meet seattle 's rolling neighborhood of rvs , where each unassuming vehicle "
"is a capsule home ."
),
(
"meet bud dodson , 57 , and welcome to his home : an rv in seattle 's sodo "
"where he watches over the parking lot in exchange for a spot"
),
]
ABSTRACT = [
(
"around 30 people live a floating life in seattle 's sodo "
"( south of downtown ) area in their rvs"
),
(
"there is one parking lot in particular where the owner lets them act as "
"watchmen in exchange for a spot to live"
),
(
"visual journalist anna erickson , who photographed the community , said "
"they are just grateful to have a home"
),
]
@pytest.mark.parametrize("stemming", [True, False])
def test_rouge_mutilple_sentences(stemming):
# In pythonrouge, sentences are represented as the list of sentences.
# If use list of sentences, each sentence has a new line.
summ = SUMMARY
abst = ABSTRACT
pythonrouge = Pythonrouge(
summary_file_exist=False,
summary=[summ],
reference=[[abst]],
n_gram=2,
ROUGE_SU4=False,
ROUGE_L=True,
recall_only=False,
stemming=stemming,
stopwords=False,
length_limit=False,
)
res = pythonrouge.calc_score()
pythonrouge_rouge_1 = res["ROUGE-1-F"]
pythonrouge_rouge_2 = res["ROUGE-2-F"]
pythonrouge_rouge_l = res["ROUGE-L-F"]
print(f"pythonrouge_rouge_1={pythonrouge_rouge_1}")
print(f"pythonrouge_rouge_2={pythonrouge_rouge_2}")
print(f"pythonrouge_rouge_l={pythonrouge_rouge_l}")
# In rouge_score, sentences are represented as the concatenated sentence with "\n"
# In sumeval, it does not care about how to concat the sentences (?)
abst = "\n".join(ABSTRACT)
summ = "\n".join(SUMMARY)
rouge_calculator = RougeCalculator(stopwords=False, stemming=stemming, lang="en")
rouge_calculator_rouge_1 = rouge_calculator.rouge_n(summ, abst, n=1)
rouge_calculator_rouge_2 = rouge_calculator.rouge_n(summ, abst, n=2)
rouge_calculator_rouge_l = rouge_calculator.rouge_l(summ, abst)
print(f"rouge_calculator_rouge_1={rouge_calculator_rouge_1}")
print(f"rouge_calculator_rouge_2={rouge_calculator_rouge_2}")
print(f"rouge_calculator_rouge_l={rouge_calculator_rouge_l}")
rouge_scorer = RougeScorer(
["rouge1", "rouge2", "rougeL", "rougeLsum"], use_stemmer=stemming
)
res = rouge_scorer.score(abst, summ)
rouge_scorer_rouge_1 = res["rouge1"].fmeasure
rouge_scorer_rouge_2 = res["rouge2"].fmeasure
rouge_scorer_rouge_l = res["rougeL"].fmeasure
rouge_scorer_rouge_lsum = res["rougeLsum"].fmeasure
print(f"rouge_scorer_rouge_1={rouge_scorer_rouge_1}")
print(f"rouge_scorer_rouge_2={rouge_scorer_rouge_2}")
print(f"rouge_scorer_rouge_l={rouge_scorer_rouge_l}")
print(f"rouge_scorer_rouge_lsum={rouge_scorer_rouge_lsum}")
try:
np.testing.assert_array_almost_equal(
np.array(
[
rouge_scorer_rouge_1,
rouge_scorer_rouge_2,
rouge_scorer_rouge_l,
]
),
np.array(
[
rouge_calculator_rouge_1,
rouge_calculator_rouge_2,
rouge_calculator_rouge_l,
]
),
decimal=5,
)
except AssertionError as e:
if stemming:
# If we use stemming in sumeval, the value will be different
# https://github.com/chakki-works/sumeval/issues/20
pass
else:
raise e
np.testing.assert_array_almost_equal(
np.array(
[
pythonrouge_rouge_1,
pythonrouge_rouge_2,
pythonrouge_rouge_l,
]
),
np.array(
[
rouge_scorer_rouge_1,
rouge_scorer_rouge_2,
rouge_scorer_rouge_lsum,
]
),
decimal=5,
)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.