GithubHelp home page GithubHelp logo

vipulraheja / coedit Goto Github PK

View Code? Open in Web Editor NEW
100.0 5.0 10.0 35 KB

Official implementation of the paper "CoEdIT: Text Editing by Task-Specific Instruction Tuning" (EMNLP 2023)

Home Page: https://aclanthology.org/2023.findings-emnlp.350/

Shell 100.00%
deep-learning grammarly large-language-models llm llms nlp text-editing text-revision writing-assistant instruction-tuning

coedit's People

Contributors

vipulraheja avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

coedit's Issues

Fine tuning on Coedit

There are 2 output/Target, How transformers can be fine tuned on multiple output?
Please acknowledge me

Has "STYLE (Formalize)" data been used in the training?

Hi authors, Thank you very much for the great work.

From your Table 1 in the paper, it seems you have used around 86k data for training.
image

However, from the "train_coedit.jsonl" I downloaded, it only contains 69k data. After checking, I think the second last row (STYLE (Formalize)) is missing from the released "train_coedit.jsonl" file.

May I know if you have used this data portion to train the coedit model?

Best regards,
Michael

dataset release?

Hi! Thank you for your great work. The models are excellent at preserving the salient content while making the requested edits. I was wondering if the dataset has already been released or will be soon? I couldn't find it when I searched just now.

Right way to use discofuse dataset?

Click here for Dataset link
Below is the following way, as per my understanding , Is it correct โ“ โ“

The columns/features from DiscoFuse dataset that will be the input to the encoder and decoder are:

Click here for Dataset link

  1. coherent_first_sentence

  2. coherent_second_sentence

  3. incoherent_first_sentence

  4. incoherent_second_sentence

Click here for Dataset link

The encoder will take these four columns as input and encode them into a sequence of hidden states. The decoder will then take these hidden states as input and decode them into a new sentence that fuses the two original sentences together.

The discourse type, connective_string, has_coref_type_pronoun, and has_coref_type_nominal columns will not be used as input to the encoder or decoder. These columns are used to provide additional information about the dataset, but they are not necessary for the task of sentence fusion.

Please correct me if I am wrong; otherwise, if this understanding is right, how shall I implement this task practically?

Request for Evaluation Code and Metrics

Hello,

I recently read your interesting paper. The results look very promising and I'm excited to try out the COEDIT models.

In Section 4 "Experimental Setup" of the paper, several evaluation datasets and metrics are described. However, I didn't see the code for computing these metrics or the full evaluation pipeline provided in the linked GitHub repo.

Would it be possible for you to please open-source the code you used to evaluate the models and compute the reported metrics? Having access to the exact evaluation scripts and metric implementations would help ensure reproducibility and make it easier for others to benchmark against COEDIT and validate the results.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.