GithubHelp home page GithubHelp logo

kenlimmj / rouge Goto Github PK

View Code? Open in Web Editor NEW
41.0 5.0 10.0 435 KB

A Javascript implementation of the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluation metric for summaries.

License: MIT License

JavaScript 100.00%
nlp rouge evaluation-metric summarization jackknifing bootstrapping-statistics

rouge's Introduction

ROUGE.js

A Javascript implementation of the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluation metric for summaries. This package implements the following metrics:

  • n-gram (ROUGE-N)
  • Longest Common Subsequence (ROUGE-L)
  • Skip Bigram (ROUGE-S)

Rationale

ROUGE is somewhat a standard metric for evaluating the performance of auto-summarization algorithms. However, with the exception of MEAD (which is written in Perl. Yes. Perl.), requesting a copy of ROUGE to work with requires one to navigate a barely functional webpage, fill up forms, and sign a legal release somewhere along the way while at it. These definitely exist for good reason, but it gets irritating when all one wishes to do is benchmark an algorithm.

Nevertheless, the paper describing ROUGE is available for public consumption. The appropriate course of action is then to convert the equations in the paper to a more user-friendly format, which takes the form of the present repository. So there. No more forms. See how life could have been made a lot easier for everyone if we were all willing to stop writing legalese or making people click submit buttons?

Quick Start

This package is available on NPM, like so:

npm install --save rouge

To use it, simply require the package:

import 'rouge';                 // ES2015

// OR

var rouge = require('rouge');   // ES5

A small but growing number of tests exist. To run them:

npm test

This should give you many lines of colorful text in your CLI. Naturally, you'll need to have Mocha installed, but you knew that already.

NOTE: Function test coverage is 100%, but branch coverage numbers look horrible because the current testing implementation has no way of accounting for the additional code injected by Babel when transpiling from ES2015 to ES5. A fix is in the pipeline, but if anyone has anything good, feel free to PR!

Usage

Rouge.js provides three functions:

  • ROUGE-N: rouge.n(cand, ref, opts)
  • ROUGE-L: rouge.l(cand, ref, opts)
  • ROUGE-S: rouge.s(cand, ref, opts)

All functions take in a candidate string, a reference string, and an configuration object specifying additional options. Documentation for the options are provided inline in lib\rouge.js. Type signatures are specified and checked using Flowtype.

Here's an example evaluating ROUGE-L using an averaged-F1 score instead of the DUC-F1:

import l as rougeL from 'rouge';

const ref = 'police killed the gunman';
const cand = 'police kill the gunman';

rougeL(cand, ref, { beta: 0.5 });

In addition, the main functions rely on a battery of utility functions specified in lib\utils.js. These perform a bunch of things like quick evaluation of skip bigrams, string tokenization, sentence segmentation, and set intersections.

Here's an example applying jackknife resampling as described in the original paper:

import n as rougeN from 'rouge';
import jackKnife from 'utils';

const ref = 'police killed the gunman';
const cands = [
  'police kill the gunman',
  'the gunman kill police',
  'the gunman police killed',
];

// Standard evaluation taking the arithmetic mean
jackKnife(cands, ref, rougeN);

// A function that returns the max value in an array
const distMax = (arr) => Math.max(...arr);

// Modified evaluation taking the distribution maximum
jackKnife(cands, ref, rougeN, distMax);

Versioning

Development will be maintained under the Semantic Versioning guidelines as much as possible in order to ensure transparency and backwards compatibility.

Releases will be numbered with the following format:

<major>.<minor>.<patch>

And constructed with the following guidelines:

  • Breaking backward compatibility bumps the major (and resets the minor and patch)
  • New additions without breaking backward compatibility bump the minor (and resets the patch)
  • Bug fixes and miscellaneous changes bump the patch

For more information on SemVer, visit http://semver.org/.

Bug Tracking and Feature Requests

Have a bug or a feature request? Please open a new issue.

Before opening any issue, please search for existing issues and read the Issue Guidelines.

Contributing

Please submit all pull requests against *-wip branches. All code should pass JSHint/ESLint validation. Note that files in /lib are written in ES2015 syntax and transpiled to corresponding files in /dist using Babel. Gulp build pipelines exist and should be used.

The amount of data available for writing tests is unfortunately woefully inadequate. I've tried to be as thorough as possible, but that eliminates neither the possibility of nor existence of errors. The gold standard is the DUC data-set, but that too is form-walled and legal-release-walled, which is infuriating. If you have data in the form of a candidate summary, reference(s), and a verified ROUGE score you do not mind sharing, I would love to add that to the test harness.

License

MIT

rouge's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rouge's Issues

Constructor Uint8Array requires 'new'

I tried running the following:

var rouge = require('rouge');
const ref = 'police killed the gunman';
const cand = 'police kill the gunman';
console.log(rouge.l(cand, ref, { beta: 0.5 }));

I get the following error:

node_modules/rouge/src/rouge.js:48
                cTable.push(Uint8Array(trimmedCandidate.length + 1));
                            ^

TypeError: Constructor Uint8Array requires 'new'

I installed through npm install rouge.

Multi-sentence ROUGE-L scores

Hi,

I've been working with ROUGE for a while and I'm still not sure how to implement ROUGE-L correctly.
Both your implementation and the one I'm using (in Python) implements the summary level ROUGE-LCS score as described in the paper. The thing is, the score isn't close to the official scores (i.e. using the perl script).

Example

Ref:

brendan @entity8 is under pressure following @entity11 semi-final defeat . but the @entity10 boss says he will bounce back despite the criticism . @entity10 owners @entity9 maintain @entity8 wo n't be sacked . @entity13 hopes @entity18 commits his future to the @entity23 .

Summary:

brendan @entity8 insists he is the man to guide @entity10 to success . brendan @entity8 has not been rattled by the intensity of the criticism . @entity10 manager is under pressure following the semi-final defeat by @entity12 last sunday .

Experiment

  • Official scores: In fact, I'm using python wrappers (files2rouge, that uses pyrouge). I tested those wrappers by scoring some prediction / reference pairs and finding the exact same numbers.
---------------------------------------------
1 ROUGE-1 Average_R: 0.43902 (95%-conf.int. 0.43902 - 0.43902)
1 ROUGE-1 Average_P: 0.47368 (95%-conf.int. 0.47368 - 0.47368)
1 ROUGE-1 Average_F: 0.45569 (95%-conf.int. 0.45569 - 0.45569)
---------------------------------------------
1 ROUGE-2 Average_R: 0.20000 (95%-conf.int. 0.20000 - 0.20000)
1 ROUGE-2 Average_P: 0.21622 (95%-conf.int. 0.21622 - 0.21622)
1 ROUGE-2 Average_F: 0.20779 (95%-conf.int. 0.20779 - 0.20779)
---------------------------------------------
1 ROUGE-L Average_R: 0.41463 (95%-conf.int. 0.41463 - 0.41463)
1 ROUGE-L Average_P: 0.44737 (95%-conf.int. 0.44737 - 0.44737)
1 ROUGE-L Average_F: 0.43038 (95%-conf.int. 0.43038 - 0.43038)
  • Other implementations:
{
  "rouge-1": {
    "f": 0.43076922582721894,
    "p": 0.4827586206896552,
    "r": 0.3888888888888889
  },
  "rouge-2": {
    "f": 0.19999999501250013,
    "p": 0.21052631578947367,
    "r": 0.19047619047619047
  },
  "rouge-l": {
    "f": 0.048830315339539125,
    "p": 0.0507936507936508,
    "r": 0.047249907715024
  }
}
> rougeL(hyp, ref);
0.07972913936216687

The difference between R1, R2 scores does not really bother me. But it seems like we're not using the right LCS.

Calling with same input

Call with equal candidate and reference will cause a deadlock.

var rouge = require('rouge');
input = "the same input sentence"
console.log(rouge.l(input, input, { beta: 0.5 }));

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.