GithubHelp home page GithubHelp logo

ratapongonjun / congen Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kornwtp/congen

0.0 0.0 0.0 1011 KB

Implementation of ConGen: Unsupervised Control and Generalization Distillation For Sentence Representation (Finding of EMNLP 2022).

License: Apache License 2.0

Shell 0.21% Python 60.26% Jupyter Notebook 39.53%

congen's Introduction

ConGen

Implementation of ConGen: Unsupervised Control and Generalization Distillation For Sentence Representation (Finding of EMNLP 2022).

Citation

@inproceedings{limkonchotiwat-etal-2022-congen,
    title = "{ConGen}: Unsupervised Control and Generalization Distillation For Sentence Representation",
    author = "Limkonchotiwat, Peerat  and
      Ponwitayarat, Wuttikorn  and
      Lowphansirikul, Lalita and
      Udomcharoenchaikit, Can  and
      Chuangsuwanich, Ekapol  and
      Nutanong, Sarana",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}

Installation

git clone https://github.com/KornWtp/ConGen.git
cd ConGen
pip install -e .

Our models (Small to Large)

Usage

Training data

We use the training data from BSL's paper: monolingual version and multilingual version.

Development data

We use sts-b development set from sentence transformer.

Parameters

The full model parameters:

Models Teacher Temp Student Temp Queue Size Learning Rate
BERT-Tiny 0.05 0.05 16384 5e-4
BERT-Mini 0.05 0.07 16384 3e-4
Tiny-BERT-L4 0.05 0.05 65536 1e-4
MiniLM-L3 0.05 0.07 16384 5e-4
MiniLM-L6 0.05 0.07 65536 3e-4
BERT-Small 0.05 0.07 65536 3e-4
MiniLM-L12 0.05 0.07 16384 5e-5
Tiny-BERT-L6 0.05 0.07 65536 5e-5
BERT-base 0.05 0.07 65536 5e-5
RoBERTa-base 0.1 0.1 1024 5e-5
Multilingual-DistilBERT 0.05 0.07 65536 3e-4
Multilingual-MiniLM-L12 0.05 0.07 65536 3e-4

Train your own model

Please set the model's parameter before training.

>> bash train_congen.sh

For finetuning model parameters:

learning_rate_all=(3e-4 5e-4 1e-4 3e-5 5e-5 1e-5)
queue_sizes=(262144 131072 65536 16384 1024)
teacher_temps=(0.01 0.03 0.05 0.07 0.09 0.1)
student_temps=(0.01 0.03 0.05 0.07 0.09 0.1)

Evaluation

Our evaluation code for sentence embeddings is based on a modified version of SentEval and SimCSE.

Before evaluation, please download the evaluation datasets by running

cd SentEval/data/downstream/
bash download_dataset.sh

Evaluation - Notebook

Please see https://github.com/KornWtp/ConGen/tree/main/notebook

Evaluation - Python

Then come back to the root directory, you can evaluate any sentence transformers models using SimCSE evaluation code. For example,

python evaluation.py \
    --model_name_or_path "your-model-path" \
    --task_set sts \
    --mode test

Main results - STS

In our paper, we average score over three models and shown as follows:

Methods Semantic Textual Similarity (STS) average scores
BERT
Tiny
BERT
Mini
Tiny
BERT-L4
MiniLM
L3
MiniLM
L6
BERT
Small
MiniLM
L12
Tiny
BERT-L6
BERT
Base
RoBERTa
Base
#Param (M) 4 11 14 17 22 29 33 67 109 125
Finetuning-based
Teacher SimCSE-Unsup-RoBERTa-large: 78.90
Sup-SimCSE 72.35 76.52 78.19 76.49 78.86 78.59 80.48 81.23 81.57 82.52
Unsup-SimCSE 64.47 65.94 67.91 55.10 59.15 69.13 67.90 73.67 76.25 77.10
Distillation-based
L2 73.32 76.07 77.03 76.66 77.51 77.30 78.79 78.95 78.97 79.00
Making 70.76 74.42 76.39 75.34 74.74 76.92 76.91 78.67 78.07 79.06
SKD 68.83 72.02 73.05 72.66 73.59 75.06 74.58 77.62 78.05 77.44
CKD 76.19 76.59 77.48 77.14 77.90 76.97 77.92 78.29 78.54 78.34
Our propose method
ConGen 76.85 78.09 78.54 78.22 79.10 78.91 79.68 79.73 80.06 79.78

Full results

Models STS-12 STS-13 STS-14 STS-15 STS-16 STS-B SICK-R Avg.
BERT-Tiny 72.18 81.12 75.45 83.22 77.89 79.03 69.05 76.85
BERT-Mini 74.17 82.69 76.58 84.30 78.23 80.84 69.82 78.09
Tiny-BERT-L4 74.3 83.07 77.37 84.70 79.06 80.99 70.26 78.54
MiniLM-L3 74.00 82.93 76.58 84.35 78.57 81.00 70.09 78.22
MiniLM-L6 75.06 83.86 77.29 85.01 79.67 81.92 70.89 79.10
BERT-Small 74.50 83.58 77.29 84.83 79.72 81.93 70.55 78.91
MiniLM-L12 75.25 84.61 78.27 85.51 80.52 82.32 71.32 79.68
Tiny-BERT-L6 75.53 84.76 78.33 85.72 80.42 82.25 71.12 79.73
BERT-base 75.58 85.13 78.54 85.75 81.12 82.81 71.47 80.06
RoBERTa-base 75.32 84.56 77.26 85.33 81.34 82.67 72.00 79.78

We have Thai sentence embedding models from ConGen!!

Hyper-Parameters

Parameters Models Teacher Temp Student Temp Queue Size Learning Rate
<30M ConGen-WangchanBERT-Tiny 0.01 0.01 65536 3e-4
ConGen-WangchanBERT-Small 0.05 0.09 65536 5e-4
>100M ConGen-simcse-model-roberta-base-thai 0.05 0.03 65536 3e-4
ConGen-paraphrase-multilingual-mpnet-base-v2 0.05 0.05 262144 1e-4

Thai semantic textual similarity benchmark

Parameters Models Spearman's Correlation (*100)
<30M ConGen-WangchanBERT-Tiny 66.43
ConGen-WangchanBERT-Small 70.65
>100M ConGen-simcse-model-roberta-base-thai 66.21
ConGen-paraphrase-multilingual-mpnet-base-v2 76.56

Thai transfer benchmark

Wisesight

Parameters Models Acc (*100) F1 (*100, weighted)
<30M ConGen-WangchanBERT-Tiny 61.55 62.19
ConGen-WangchanBERT-Small 64.77 65.30
>100M ConGen-simcse-model-roberta-base-thai 65.07 65.28
ConGen-paraphrase-multilingual-mpnet-base-v2 67.84 68.31

Wongnai

Parameters Models Acc (*100) F1 (*100, weighted)
<30M ConGen-WangchanBERT-Tiny 42.67 44.78
ConGen-WangchanBERT-Small 43.38 45.99
>100M ConGen-simcse-model-roberta-base-thai 41.32 41.57
ConGen-paraphrase-multilingual-mpnet-base-v2 47.22 48.63

Generated Review

Parameters Models Acc (*100) F1 (*100, weighted)
<30M ConGen-WangchanBERT-Tiny 54.26 52.69
ConGen-WangchanBERT-Small 58.22 57.03
>100M ConGen-simcse-model-roberta-base-thai 49.81 47.94
ConGen-paraphrase-multilingual-mpnet-base-v2 58.00 56.80

congen's People

Contributors

kornwtp avatar mrpeerat avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.