This is the offficial implementation of the paper [Authorship Style Transfer with Inverse Transfer Data Augmentation].
Authorship style transfer aims to modify the style of neutral text to match the unique speaking or writing style of a particular individual. We propose an inverse transfer data augmentation ITDA method, leveraging GPT-3.5 to create (neutral text, stylized text) pairs. We use this augmented dataset to train a BART-base model adept at style transfer. Our experimental results, conducted across four datasets with distinct authorship styles, establish the effectiveness of ITDA over style transfer using GPT-3.5.
We evaluate ITDA on four benchmarks: Lin Daiyu, Shakespeare, Trump, Lyrics. We adopt four metrics: BLEU and BS (BERTScore) measure content preservation, SC measures style transfer strength, and GPT-4 measures overall performance. Since user-provided text often spans a range of topics, we also collect a new test set comprising neutral texts spanning diverse topics to do out-of-distribution evaluation.
First, you need to create a virtual environment and activate it:
conda deactivate
conda create -n <env_name> python=3.8
conda activate <env_name>
Then, install the cuda version Pytorch:
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1
Finally, install the requirements:
conda install --file requirements.txt
datasets/hlm, datasets/shakespeare, datasets/trump, datasets/lyrics
(a) Cluster-based Demonstration Annotation
python kmeans.py
(b) Stylized Text Augmentation
python stylized_augmentation.py
(c) Inverse Transfer Data Augmentation
python dynamic_inverse_transfer.py
(d) Fine-tune a Compact Model
python ft_bart_en.py #For English Datasets
python ft_bart_ch.py #For Chinese Datasets
python bart_transfer.py
python classifer_train_en.py #For English Datasets
python classifer_train_ch.py #For Chinese Datasets
python evaluation/eval_content.py (BLEU, BERTScore)
python evaluation/classifier_metrics_en.py (SC) #For English Datasets
python evaluation/classifier_metrics_ch.py (SC) #For Chinese Datasets
python evaluation/GPT4_judge.py (GPT-4 Score)
Fixed Few-shot Prompting for Fowrad Transfer or Inverse Transfer with GPT-3.5
python fixed_transfer.py