GithubHelp home page GithubHelp logo

calebgeniesse / generalization_metrics_for_nlp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nsfzyzz/generalization_metrics_for_nlp

0.0 1.0 0.0 22.38 MB

Shell 0.66% Python 9.33% Jupyter Notebook 90.02%

generalization_metrics_for_nlp's Introduction

NLP metrics

This repository contains the code to reproduce the results from the paper ๐Ÿ”— Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data. Our main results are that metrics from the ๐Ÿ”— HT-SR theory can predict the generalization of NLP models. Also, unlike existing generalization metrics that focus on the "generalization gap", the HT-SR theory can predict the quality of NLP models, e.g., measured by the test-time BLEU scores when the NLP task is neural machine translation.

We mainly study Transformers in this paper. For Transformer training, we follow ๐Ÿ”— Vaswani et al.. We develop our implementation based on an ๐Ÿ”— online repository. This code reproduces the results from Vaswani et al. with more easily configurable Transformer architectures. In addition to the HT-SR theory, we also evaluate generalization metrics from ๐Ÿ”— Dziugaite et al. 2020. and ๐Ÿ”— Jiang et al. 2019.

Setup the environment

Step 1. Create a conda environment.

conda env create

Activate the environment.

conda activate NLP_metrics

Step 2. Download data and pretrained results.

./download_data.sh

Generate the experiment files. Change the checkpoint repository if necessary.

python create_experiment.py --CKPT_DIR <your_checkpoint_directory>

For example, on my machine, the checkpoint directory is /data/yyaoqing/Generalization_metrics_for_NLP/checkpoint/.

Reproduce the figures shown in paper

Result 1. Examples of PL fittings.

You can check the examples of PL and E-TPL fittings. Take a look at visualization/Visualize_example_WW_layers.ipynb.

drawing

Result 2. Scatter plots.

Then, you can reproduce the scatter plots that compare the generalization metrics with the BLEU scores. Check visualization/reproduce_scatterplot.ipynb.

Block

Result 3. Box plots.

You can also reproduce the box plots that rank the generalization metrics considered in the paper.

Block

First, use the following commands to generate the time-wise correlations. The argument --bleu_type can be used to choose the correlation with the test BLEU scores or the generalization gap.

python time_wise_correlation.py --bleu_type test
python time_wise_correlation.py --bleu_type gap

Second, Generate the correlation results when a single hyperparameter is varied.

python aggregate_hyperparameter_correlation.py

Now, you should have all the results. Check visualization/calculate_rank_correlation_with_colored_groups.ipynb to see the box plots.

Reproduce all the training results.

Fully reproducing our results requires ๐Ÿ”— slurm and about 6T storage.

Step 1. Generate slurm configuration files. Check the scripts/generate_script.ipynb to generate the training and evaluation slurm configrations.

Step 2. Submit the slurm files. Remember to change the directories in the slurm file and make a slurm log folder.

mkdir slurm_logs

For training, do the following.

sbatch ./scripts/slurm_train_models.sh

For evaluation, use the following bash files.

sbatch ./scripts/slurm_eval_bleu.sh
sbatch ./scripts/slurm_compute_ww.sh
sbatch ./scripts/slurm_robust_measures.sh

Notice that we evaluate PL, E-TPL and EXP fittings. To select the distribution, change L23-33 in the file slurm_compute_ww.sh.

Step 3. After generating all the evaluation files, you will get all the json and pickle files similar to the checkpoint.zip. Then, you can draw the scatter plots and calculate the rank correlations using the following commands.

./scripts/run_plot_scatterplot.sh
./scripts/run_hyperparameter_correlation.sh

After that, you will get all the plots and rank correlation results similar to the plots.zip and results.zip.

Citation

We appreciate it if you would please cite the following paper if you found the repository useful for your work:

@TECHREPORT{yang2022evaluating,
  author =       {Yang, Yaoqing and Theisen, Ryan and Hodgkinson, Liam and Gonzalez, Joseph E and Ramchandran, Kannan and Martin, Charles H and Mahoney, Michael W},
  title =        {Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data},
  number =       {Preprint: arXiv:2202.02842},
  year =         {2022},
}

License

MIT

generalization_metrics_for_nlp's People

Contributors

nsfzyzz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.