vipulraheja / coedit Goto Github PK

Official implementation of the paper "CoEdIT: Text Editing by Task-Specific Instruction Tuning" (EMNLP 2023)

Home Page: https://aclanthology.org/2023.findings-emnlp.350/

Shell 100.00%

deep-learning grammarly large-language-models llm llms nlp text-editing text-revision writing-assistant instruction-tuning

coedit's People

Contributors

Stargazers

Watchers

Forkers

dumpmemory dykang ckqqqq copperdong mattkallo habibzadeh gijungcho vdubya roysh omowunmiaj

coedit's Issues

Fine tuning on Coedit

There are 2 output/Target, How transformers can be fine tuned on multiple output?
Please acknowledge me

Has "STYLE (Formalize)" data been used in the training?

Hi authors, Thank you very much for the great work.

From your Table 1 in the paper, it seems you have used around 86k data for training.

However, from the "train_coedit.jsonl" I downloaded, it only contains 69k data. After checking, I think the second last row (STYLE (Formalize)) is missing from the released "train_coedit.jsonl" file.

May I know if you have used this data portion to train the coedit model?

Best regards,
Michael

Hi! Thank you for your great work. The models are excellent at preserving the salient content while making the requested edits. I was wondering if the dataset has already been released or will be soon? I couldn't find it when I searched just now.

Right way to use discofuse dataset?

Click here for Dataset link
Below is the following way, as per my understanding , Is it correct ❓ ❓

The columns/features from DiscoFuse dataset that will be the input to the encoder and decoder are:

Click here for Dataset link

coherent_first_sentence
coherent_second_sentence
incoherent_first_sentence
incoherent_second_sentence

Click here for Dataset link

The encoder will take these four columns as input and encode them into a sequence of hidden states. The decoder will then take these hidden states as input and decode them into a new sentence that fuses the two original sentences together.

The discourse type, connective_string, has_coref_type_pronoun, and has_coref_type_nominal columns will not be used as input to the encoder or decoder. These columns are used to provide additional information about the dataset, but they are not necessary for the task of sentence fusion.

Please correct me if I am wrong; otherwise, if this understanding is right, how shall I implement this task practically?

Request for Evaluation Code and Metrics

Hello,

I recently read your interesting paper. The results look very promising and I'm excited to try out the COEDIT models.

In Section 4 "Experimental Setup" of the paper, several evaluation datasets and metrics are described. However, I didn't see the code for computing these metrics or the full evaluation pipeline provided in the linked GitHub repo.

Would it be possible for you to please open-source the code you used to evaluate the models and compute the reported metrics? Having access to the exact evaluation scripts and metric implementations would help ensure reproducibility and make it easier for others to benchmark against COEDIT and validate the results.

Thanks!

vipulraheja / coedit Goto Github PK

coedit's People

Contributors

Stargazers

Watchers

Forkers

coedit's Issues

Fine tuning on Coedit

Has "STYLE (Formalize)" data been used in the training?

dataset release?

Right way to use discofuse dataset?

Request for Evaluation Code and Metrics

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs