Authorship Style Transfer with Inverse Transfer Data Augmentation

This is the offficial implementation of the paper [Authorship Style Transfer with Inverse Transfer Data Augmentation].

Overview

Authorship style transfer aims to modify the style of neutral text to match the unique speaking or writing style of a particular individual. We propose an inverse transfer data augmentation ITDA method, leveraging GPT-3.5 to create (neutral text, stylized text) pairs. We use this augmented dataset to train a BART-base model adept at style transfer. Our experimental results, conducted across four datasets with distinct authorship styles, establish the effectiveness of ITDA over style transfer using GPT-3.5.

Evaluation Results

We evaluate ITDA on four benchmarks: Lin Daiyu, Shakespeare, Trump, Lyrics. We adopt four metrics: BLEU and BS (BERTScore) measure content preservation, SC measures style transfer strength, and GPT-4 measures overall performance. Since user-provided text often spans a range of topics, we also collect a new test set comprising neutral texts spanning diverse topics to do out-of-distribution evaluation.

Install the requirements

First, you need to create a virtual environment and activate it:

conda deactivate
conda create -n <env_name> python=3.8
conda activate <env_name>

Then, install the cuda version Pytorch：

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1

Finally, install the requirements:

conda install --file requirements.txt

Stylized Datasets

datasets/hlm, datasets/shakespeare, datasets/trump, datasets/lyrics

Training

(a) Cluster-based Demonstration Annotation

python kmeans.py

(b) Stylized Text Augmentation

python stylized_augmentation.py

python dynamic_inverse_transfer.py

(d) Fine-tune a Compact Model

python ft_bart_en.py   #For English Datasets
python ft_bart_ch.py   #For Chinese Datasets

Inference

python bart_transfer.py

Classifier Training

python classifer_train_en.py  #For English Datasets
python classifer_train_ch.py  #For Chinese Datasets

Evaluation

python evaluation/eval_content.py (BLEU, BERTScore)
python evaluation/classifier_metrics_en.py (SC)   #For English Datasets
python evaluation/classifier_metrics_ch.py (SC)   #For Chinese Datasets
python evaluation/GPT4_judge.py (GPT-4 Score)

Fixed Prompting

Fixed Few-shot Prompting for Fowrad Transfer or Inverse Transfer with GPT-3.5

python fixed_transfer.py

anonymousrole / itda Goto Github PK

itda's Introduction

Authorship Style Transfer with Inverse Transfer Data Augmentation

Overview

Evaluation Results

Install the requirements

Stylized Datasets

Training

Inference

Classifier Training

Evaluation

Fixed Prompting

itda's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs