GithubHelp home page GithubHelp logo

Comments (12)

XingxingZhang avatar XingxingZhang commented on July 28, 2024

It looks like you are using single BLEU evaluation. There are 8 references in wikilarge test sets (available here https://github.com/cocoxu/simplification).

The code I released can be used to produce system output of EncDecA, Dress and Dress-Ls. Please follow the evaluation protocols described in our paper. More suggestions can be found here

from dress.

XingxingZhang avatar XingxingZhang commented on July 28, 2024

< why turn off the ignore case, I think the uppercase or lowercase makes no difference for the words in this data set?

I don't think it matters :)

from dress.

Sanqiang avatar Sanqiang commented on July 28, 2024

Yes, I also tried dataset download from https://github.com/cocoxu/simplification/tree/master/data/turkcorpus
But the 8 references dataset doesn't do the NER replacement (e.g. PEOPLE@1, LOCATION@1 that kind of things), so I cannot directly use your code (I am supposed to be put preprocess with NER tools test files as input).
I wonder if you still do the NER replacement for 8 references dataset?

Meanwhile, I wonder could you show me the command you run for 8 references test set (since there are 8 references for iBLEU, the Xu Wei's paper didn't indicate how to do with 8 references, take the mean or max for 8 reference performances?)?

I downloaded the dataset from https://github.com/cocoxu/simplification/tree/master/data/turkcorpus/truecased (because true cased is good for NER tool), and use Stanford NER tool to do the replacement. (I think I am doing the correct thing because I see exactly same output when I replace from wiki.full.aner.ori.test to wiki.full.aner.test). But since your code seems doesn't support 8 references data set, I tried my TensorFlow encoder-decoder model, which follows similar setting as yours, but still not get 88% BLEU. But my model can generate similar performance based on the Wikilarge/Wikismall test set (non 8 reference ones) through mteval-v13a.pl. So I think perhaps I just use the wrong script.

In addition, are you still work on this task? So far, it is true that Seq2seq without RL prefers copy complex text into simpler one but I think you can solve it in the evaluation stage (through RL) but I think the major reason id the attention (I am running experiments for now to prove it).

from dress.

XingxingZhang avatar XingxingZhang commented on July 28, 2024

"
Yes, I also tried dataset download from https://github.com/cocoxu/simplification/tree/master/data/turkcorpus
But the 8 references dataset doesn't do the NER replacement (e.g. PEOPLE@1, LOCATION@1 that kind of things), so I cannot directly use your code (I am supposed to be put preprocess with NER tools test files as input).
I wonder if you still do the NER replacement for 8 references dataset?
"
The "wiki.full.aner.map.t7" file in "data-simplification/wikilarge" folder contains all you need for NER anonymization/de-anonymization. Note that in test set, I only did NER anonymization for complex sentences and one of the reference sentences. But it doesn't matter since your system output will be de-anonymized anyway.

from dress.

XingxingZhang avatar XingxingZhang commented on July 28, 2024

"
Meanwhile, I wonder could you show me the command you run for 8 references test set (since there are 8 references for iBLEU, the Xu Wei's paper didn't indicate how to do with 8 references, take the mean or max for 8 reference performances?)?
"
BLEU evaluation in default assumes there are multiple references [1][2]. Please refer to the documentations of Joshua or mt-eval-v13 for how to evaluate BLEU with multiple references.

[1] BLEU: a Method for Automatic Evaluation of Machine Translation
[2] https://en.wikipedia.org/wiki/BLEU

from dress.

Sanqiang avatar Sanqiang commented on July 28, 2024

I think I can achieve similar result in the paper,
This is what I do:
(1) I still use scripts/mteval-v13a.pl to eval your output with single ground truth, get a score for BLEU(I, O) roughly 60%.
(2) I use scripts/multi-bleu.perl to eval your output with 8 references, get a score for BLEU(I, R) roughly 90%. But the original reference is all lowercase, so I use true cased references ones so that it is matchable.
(3) iBLEU = 0.9 * BLEU(I, R) + 0.1 * BLEU(I, O) as Xu's paper, the result is similar to your paper.

Am I correct? or any bias from what you did?

from dress.

XingxingZhang avatar XingxingZhang commented on July 28, 2024

< Am I correct? or any bias from what you did?
No. I didn't use iBLEU and didn't mention iBLEU anywhere

from dress.

XingxingZhang avatar XingxingZhang commented on July 28, 2024

Here are the instructions for 8-ref bleu evaluation on wikilarge https://github.com/XingxingZhang/dress/tree/master/experiments/evaluation/BLEU

Good luck!

from dress.

Sanqiang avatar Sanqiang commented on July 28, 2024

I got it.
so you use the 8-ref bleu evaluation from https://github.com/XingxingZhang/dress/tree/master/experiments/evaluation/BLEU

I wonder you only use 8-references or you use 9-references (8 reference plus original single ground truth)? (Based on the script you provide, I think you use 8-references ones but just double check)

from dress.

XingxingZhang avatar XingxingZhang commented on July 28, 2024

did you get the correct bleu score?
=> BLEU = 0.8885

from dress.

Sanqiang avatar Sanqiang commented on July 28, 2024

Yes, I can get correct bleu. Thank you.

from dress.

XingxingZhang avatar XingxingZhang commented on July 28, 2024

awesome!

from dress.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.