GithubHelp home page GithubHelp logo

tiancheng-luo / codetrans Goto Github PK

View Code? Open in Web Editor NEW

This project forked from agemagician/codetrans

0.0 0.0 0.0 789 KB

Pretrained Language Models for Source code

License: MIT License

Jupyter Notebook 100.00%

codetrans's Introduction


CodeTrans



CodeTrans is providing state of the art pre-trained models for source code. CodeTrans was trained on several Nvidia RTX 8000 GPUs and couple of Google TPUs using various State of the Art Transformers Models.

Take a look into our paper CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing for more information about our work.


CodeTrans Attention Visualization


This repository will be updated regulary with new pre-trained models for source code as part of supporting software engineering community in general, and Source Code for Covid-19 research specifically.

Table of Contents

⌛️  Models Availability

All CodeTrans original Tensorflow checkpoints are downloadable from this dropbox folder and the pytorch checkpoints in the Hugging Face model hub.

You can download all the datasets used in this research from dropbox folder.

🚀  Usage

How to use CodeTrans:

  • 🤖  Feature Extraction (FE):
    coming soon.

  • 💥  Fine Tuning (FT):
    coming soon.

  • ⚗️  Code Sequences Generation:
    coming soon.

  • 🧐  Visualization:
    coming soon.

  • 📈  Benchmark:
    coming soon.

📊  Expected Results

  • 💻  Function Documentation Generation (Bleu):
Language / Model Python Java Go Php Ruby JavaScript
CodeTrans-ST-Small 17.31 16.65 16.89 23.05 9.19 13.7
CodeTrans-ST-Base 16.86 17.17 17.16 22.98 8.23 13.17
CodeTrans-TF-Small 19.93 19.48 18.88 25.35 13.15 17.23
CodeTrans-TF-Base 20.26 20.19 19.50 25.84 14.07 18.25
CodeTrans-TF-Large 20.35 20.06 19.54 26.18 14.94 18.98
CodeTrans-MT-Small 19.64 19.00 19.15 24.68 14.91 15.26
CodeTrans-MT-Base 20.39 21.22 19.43 26.23 15.26 16.11
CodeTrans-MT-Large 20.18 21.87 19.38 26.08 15.00 16.23
CodeTrans-MT-TF-Small 19.77 20.04 19.36 25.55 13.70 17.24
CodeTrans-MT-TF-Base 19.77 21.12 18.86 25.79 14.24 18.62
CodeTrans-MT-TF-Large 18.94 21.42 18.77 26.20 14.19 18.83
State of the art 19.06 17.65 18.07 25.16 12.16 14.90

  • 💻  Source Code Summarization (Bleu):
Language / Model Python SQL C#
CodeTrans-ST-Small 8.45 17.55 19.74
CodeTrans-ST-Base 9.12 15.00 18.65
CodeTrans-TF-Small 10.06 17.71 20.40
CodeTrans-TF-Base 10.94 17.66 21.12
CodeTrans-TF-Large 12.41 18.40 21.43
CodeTrans-MT-Small 13.11 19.15 22.39
CodeTrans-MT-Base 13.37 19.24 23.20
CodeTrans-MT-Large 13.24 19.40 23.57
CodeTrans-MT-TF-Small 12.10 18.25 22.03
CodeTrans-MT-TF-Base 10.64 16.91 21.40
CodeTrans-MT-TF-Large 12.14 19.98 21.10
State of the art -- 18.40 20.50

  • 💻  Code Comment Generation (Bleu):
Language / Model Java
CodeTrans-ST-Small 37.98
CodeTrans-ST-Base 38.07
CodeTrans-TF-Small 38.56
CodeTrans-TF-Base 39.06
CodeTrans-TF-Large 39.50
CodeTrans-MT-Small 20.15
CodeTrans-MT-Base 27.44
CodeTrans-MT-Large 34.69
CodeTrans-MT-TF-Small 38.37
CodeTrans-MT-TF-Base 38.90
CodeTrans-MT-TF-Large 39.25
State of the art 38.17

  • 💻  Commit Message Generation (Bleu):
Language / Model Java
CodeTrans-ST-Small 39.61
CodeTrans-ST-Base 38.67
CodeTrans-TF-Small 44.22
CodeTrans-TF-Base 44.17
CodeTrans-TF-Large 44.41
CodeTrans-MT-Small 36.17
CodeTrans-MT-Base 39.25
CodeTrans-MT-Large 41.18
CodeTrans-MT-TF-Small 43.96
CodeTrans-MT-TF-Base 44.19
CodeTrans-MT-TF-Large 44.34
State of the art 32.81

  • 💻  API Sequence Recommendation (Bleu):
Language / Model Java
CodeTrans-ST-Small 68.71
CodeTrans-ST-Base 70.45
CodeTrans-TF-Small 68.90
CodeTrans-TF-Base 72.11
CodeTrans-TF-Large 73.26
CodeTrans-MT-Small 58.43
CodeTrans-MT-Base 67.97
CodeTrans-MT-Large 72.29
CodeTrans-MT-TF-Small 69.29
CodeTrans-MT-TF-Base 72.89
CodeTrans-MT-TF-Large 73.39
State of the art 54.42

  • 💻  Programming Language and Synthesis (Accuracy):
Language / Model LISP
CodeTrans-ST-Small 89.43
CodeTrans-ST-Base 89.65
CodeTrans-TF-Small 90.30
CodeTrans-TF-Base 90.24
CodeTrans-TF-Large 90.21
CodeTrans-MT-Small 82.88
CodeTrans-MT-Base 86.99
CodeTrans-MT-Large 90.27
CodeTrans-MT-TF-Small 90.31
CodeTrans-MT-TF-Base 90.30
CodeTrans-MT-TF-Large 90.17
State of the art 85.80

❤️  Community and Contributions

The CodeTrans project is a open source project supported by various partner companies and research institutions. We are committed to share all our pre-trained models and knowledge. We are more than happy if you could help us on sharing new ptrained models, fixing bugs, proposing new feature, improving our documentation, spreading the word, or support our project.

📫  Have a question?

We are happy to hear your question in our issues page CodeTrans! Obviously if you have a private question or want to cooperate with us, you can always reach out to us directly via our RostLab email

🤝  Found a bug?

Feel free to file a new issue with a respective title and description on the the CodeTrans repository. If you already found a solution to your problem, we would love to review your pull request!.

✅  Requirements

For prediction, Text to Text libraray is needed. For source code feature extraction or fine-tuning our pre-trained models, Pytorch and Transformers library from huggingface is needed. For model visualization, you need to install BertViz library.

🤵  Team

  • Technical University of Munich:
Ahmed Elnaggar Wei Ding Florian Matthes Burkhard Rost
  • Google:
Llion Jones
  • Nvidia:
Tom Gibbs Tamas Feher Christoph Angerer

💰  Sponsors

Google Google Nvidia Software Campus

📘  License

The CodeTrans pretrained models are released under the under terms of the MIT License.

✏️  Citation

If you use this code or our pretrained models for your publication, please cite the original paper:

@misc{elnaggar2021codetrans,
      title={CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing}, 
      author={Ahmed Elnaggar and Wei Ding and Llion Jones and Tom Gibbs and Tamas Feher and Christoph Angerer and Silvia Severini and Florian Matthes and Burkhard Rost},
      year={2021},
      eprint={2104.02443},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}

codetrans's People

Contributors

matchlesswei avatar agemagician avatar clmnt avatar standardai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.