GithubHelp home page GithubHelp logo

dialogue-absa's Introduction

Dialogue ABSA

  • DiaASQ
  • Exploring the inference results for DiaASQ on several models
    • ChatGPT (k-shot)
    • T5 (fine-tuned)
    • LLaMA 2 (fine-tuned)

dialogue-absa's People

Contributors

nana2929 avatar

Watchers

Dan avatar Chao Mei avatar  avatar

dialogue-absa's Issues

[data prep] DiaASQ

Dataset Preprocessing

  • full-dialogue inference
  • speaker-specific inference
  • [-] reply-thread inference > pending

Data 數量

train: 800, valid: 100 documents (threads)
for both zh and en

An experiment idea

will separating to sub-threads help LLM do the task?
DiaASQ does model the reply relation in replay mask $M^{Rp}$.

Understanding data structure

Image

Understanding data indices

sentences = train_example['sentences']
full_text = ' '.join(sent for sent in sentences)
full_text = full_text.split()
# find the quads 
# 我太鬼了 
quads = train_example['triplets'] 
for quad in quads:
    assert len(quad) == 10 
    target_s, target_t = quad[0], quad[1] 
    asp_s, asp_t = quad[2], quad[3] 
    opn_s, opn_t = quad[4], quad[5]
    pol = quad[6]
    aspect_string = quad[7]
    target_string = quad[8]
    opn_string = quad[9]
    print(f'pol: {pol}')
    print(f'aspect_string: {aspect_string}')
    print(f'target_string: {target_string}')
    print(f'opn_string: {opn_string}')

    print(full_text[target_s:target_t])
    print(full_text[asp_s:asp_t])
    print(full_text[opn_s:opn_t])
    print('-----------------') 

pol: other
aspect_string: 13promax
target_string: 信号
opn_string: 是硬伤吗?
['13', 'promax']
['信', '号']
['是', '硬', '伤', '吗', '?']

Full-Dialogue DiaASQ zh

  1. 每個例子都很長,需要良好的分隔符號或提示:不在這邊加上「範例」或引導詞彙的話,ChatGPT 會以為範例是需要一起 inference 的例子。
    full_dialogue_dataset.py
    image

  2. ChatGPT results

INFO:__main__:Found 22 files to be concatenated ...
INFO:dataset.diaasq.full_dialogue_dataset:Legal pool (k-examples pool) size: 409
INFO:__main__:Sanity check passed!
INFO:__main__:Writing inference file to output/diaasq/gpt-full-dialog-zh/FullDiaAsqDataset_gpt_eval.csv ...
INFO:__main__:Starting evaluation on valid (100 examples)...
{'aspect_f1': 0.4953959483845313,
 'iden_f1': 0.12240553480962135,
 'opinion_f1': 0.21926105385779768,
 'pair_ao_f1': 0.14426229503281143,
 'pair_ta_f1': 0.2990881458468997,
 'pair_to_f1': 0.11902231663541006,
 'quad_micro_f1': 0.084703537569097,
 'target_f1': 0.5363457759829979} 
  1. Paper statistics
image

⏰ [exp] DiaASQ

Discussion / Stop and Think

What's left to do for this dataset?

    • DiaASQ Full Dialogue+en Ver. (Only ChatGPT)
  1. DiaASQ zh
  • #15
  • Speaker-Specific DiaASQ zh
  1. Dynamic/Not-fixed In-context examples?
    Following What Makes Good In-context examples for GPT-3?
  • Different ic-example for a different test example, eg. in speaker-spec version, Speaker A IC - Speaker A test

Evaluate if it is possible to move the whole research direction toward dialogue SA

  • We still have CASA as another dialogue SA choice.
  • However, it is not easy to model CASA since its subtasks are mentions + opinion + polarity identifying. The dataset is formulated differently from triplet extraction tasks. Will need extra effort to convert it.

[exp] DiaASQ T5 compute_metrics in Trainer

compute_metrics

I wrote a compute_metrics but need to know the content of EvalPrediction object.
Solution: pickling it out and study it until I can get strings out of it. The below code is used for testing. It's weird that the tokenizer initialized with the same name seems to give out different pad_token_id.

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    tokenizer=tokenizer,
    data_collator=collate_fn,
    compute_metrics=calc_sentiment_scores(tokenizer),
)
#%%
import pickle
pred_path = './preds.pkl'
label_path = './labels.pkl'
evalpred_path = './evalprediction.pkl'



def load_pickle(path):
    with open(path, 'rb') as f:
        return pickle.load(f)

evalpred = load_pickle(evalpred_path)
from transformers import AutoTokenizer
model_name = 'allenai/tk-instruct-base-def-pos'
tokenizer = AutoTokenizer.from_pretrained(model_name)
#%%
import numpy as np
p = evalpred.predictions[0]
p = np.argmax(p, axis=-1)
print('p:', p.shape)
p = tokenizer.batch_decode(p, skip_special_tokens=True, clean_up_tokenization_spaces=True)
print('p:', p)
#%%
l = evalpred.label_ids
l= np.where(l != -100, l, tokenizer.pad_token_id)
l = tokenizer.batch_decode(l, skip_special_tokens=True, clean_up_tokenization_spaces=True)
print('l:', l)
# print('preds:', preds)
# print('len(preds):', len(preds))          # 2 (why 2??)
# print('shape(preds[0]):', preds[0].shape)
# batch_size * max_output_len * vocab_size.
# shape(preds[1]): (290, 512, 768)

# %%
# labels shape: labels.shape (290, 183)

[experiment] llm + diaASQ [en]

ChatGPT

  • 為了和 t5 版本比較,打一樣的 instruction。
  • 先實驗 en(中文版),再實驗 zh。
  • Follow 易庭的建議,改用 dataclass 來寫 data(zh)。

Configs

# reference: https://tsmatz.wordpress.com/2022/11/25/huggingface-japanese-summarization/
# Note : Do not use FP16 precision in mT5 fine-tuning.
seed: 42
data:
#   data_root: 'data/diaasq/speaker_dataset'
#   train_split_name: 'train'
#   test_split_name: 'valid'
  lang_src : 'en'

# proc_data and dataset follows the diaasq-t5-speaker-spec-en.yaml for experiment comparison
proc_data:
  type: 'speaker'
  data_root: 'data/diaasq/speaker_dataset/proc'
  train_ic_name: 't5_in_context' # use t5/create_kshot_dataset_split.py
  t5_train_split_name: 't5_train'
  test_ic_name: 't5_in_context'  # use the same in-context examples as in training
  t5_test_split_name: 't5_valid'

dataset:
  name: 'diaasq-speaker-spec-en'
  k: 1
  prompt_path: 'prompt/experiment/diaasq-speaker-spec-en-t5'
  in_context_strategy: None

model:
  model_name: 'gpt-3.5-turbo'
  max_tokens: 256 # t5: generation_max_length: 256 # t5: # max_length: 512
  temperature: 0

# private keys
envfile: './envs/.env'
output_dir: './results'

[survey] DiaASQ

DiaASQ

2023 paper

Method summary

  1. base encoder learns base contextual repr.,
  2. multi-view interaction layers use 3 feature masks (Thread mask, Speaker mask, Reply Mask) + max pooling over masked attns
  3. RoPE
  4. Grid-tagging
    image

Results

  • All f1 are strict f1 (spans and elements need to be completely correct)
    image

Ablation

  • removing all feature masks
    image

[exp] t5 + diaASQ [en]

T5-Generation Fine-tuning Task

Data Preparation

  • can only use specially-preprocessed: speaker-spec dataset, because the max input length for t5 is 512.
  • preprocessing
    1. lib/create_speaker_data.py --cfg=configs/diaasq-t5-en.yaml for creating speaker-specific data. Note that do not
      use speaker-spec configs. The resulted data will be saved to /home/nanaeilish/projects/research/sentiment-llm/data/diaasq/speaker_dataset/jsons_en (take en as lang_src for example; modify config if needed).
    2. lib/create_kshot_dataset_split.py --cfg=configs/diaasq-t5-speaker-spec-en.yaml --is_speaker_dataset for creating k-shot in-context examples for speaker-specific data. The script will reuse the data created above and then save resulted data to data/diaasq/speaker_dataset/proc/jsons_en.
    3. Don't forget to set speaker in step 2. so that the in-context examples all contain the same speaker.
  • prompts need extra designed
    • To be precise, "in-context examples design"
    1. Filter trainset, leaving the speaker data example with 3 sentiment tuples (complicated, demonstrative enough)
    2. Sort the data example with the number of triplets; choose the k shots with the fewest triplets. This is because T5 has very strict input length limit (512), which is too short for a data example to be filled in... (hence k = 1 in the below experiment),

Eval metrics (micro)

  • target, aspect, opinion F1
  • $pair_{t-a}, pair_{t-o}, pair_{a-o} F1$
  • quadruple F1

Loss

order does not matter; but (needs inspection) loss seems to matter for seq2seq training loss:
loss = loss_fct(lm_logits.view(-1, lm_logits.size(-1)), labels.view(-1))

[test] DiaASQ dataset

test in the sense of unit test or pytest

  • Since I am refactoring code constantly, writing tests takes too much time and gets outdated too quickly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.