GithubHelp home page GithubHelp logo

davidorz / unicoder Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/unicoder

0.0 0.0 0.0 3.38 MB

Unicoder model for understanding and generation.

License: MIT License

Shell 1.09% Python 96.63% Makefile 0.03% Batchfile 0.01% C++ 0.26% Cuda 0.60% Lua 0.07% Dockerfile 0.06% CSS 0.10% JavaScript 0.28% Jupyter Notebook 0.87%

unicoder's Introduction

Unicoder

This repo provides the code for reproducing the experiments in XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation (leaderboard).

We provide three models, Unicoder for understanding tasks, Unicoder for generation tasks (pre-trained with xDAE) and Unicoder for generation tasks (pre-trained with xFNP).

Unicoder for understanding tasks

We share a 12-layers model which is pre-trained with 100 languages.

This code can reproduce the experiments on 9 understanding XGLUE tasks: NER, POS Tagging (POS), News Classification (NC), MLQA, XNLI, PAWS-X, Query-Ad Matching (QADSM), Web Page Ranking (WPR), QA Matching (QAM).

For more details, you can go to understanding README.

Unicoder for generation tasks (pre-trained with xDAE)

We share a 12-layer encoder and 12-layer decoder model which is pre-trained with 100 languages.

The code can reproduce the experiments on 2 generation XGLUE tasks: News Title Generation(NTG) and Question Generation (QG).

For more details, you can go to generation README.

Unicoder for generation tasks (pre-trained with xFNP)

We share a 12-layer encoder and 12-layer decoder model which is pre-trained with 100 languages.

The code can reproduce the experiments on 2 generation XGLUE tasks: News Title Generation(NTG) and Question Generation (QG).

For more details, you can go to ProphetNet.

How to cite

If you extend or use this work, please cite our paper.

@inproceedings{huang2019unicoder,
  title={Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks},
  author={Huang, Haoyang and Liang, Yaobo and Duan, Nan and Gong, Ming and Shou, Linjun and Jiang, Daxin and Zhou, Ming},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
  pages={2485--2494},
  year={2019}
}
@article{Liang2020XGLUEAN,
  title={XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation},
  author={Yaobo Liang and Nan Duan and Yeyun Gong and Ning Wu and Fenfei Guo and Weizhen Qi and Ming Gong and Linjun Shou and Daxin Jiang and Guihong Cao and Xiaodong Fan and Ruofei Zhang and Rahul Agrawal and Edward Cui and Sining Wei and Taroon Bharti and Ying Qiao and Jiun-Hung Chen and Winnie Wu and Shuguang Liu and Fan Yang and Daniel Campos and Rangan Majumder and Ming Zhou},
  journal={arXiv},
  year={2020},
  volume={abs/2004.01401}
}

More Models in the Unicoder Family

Unicoder-VL (image): a monolingual (English) pre-trained model for image-language understanding tasks.
Unicoder-VL (video): a monolingual (English) pre-trained model for video-language understanding and generation tasks.
XGPT (image): a monolingual (English) pre-trained model for image captioning.
M^3P (image): a multilingual (100 languages) pre-trained model for image-language understanding and generation tasks.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

unicoder's People

Contributors

nanduan avatar microsoftopensource avatar yiming1013 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.