GithubHelp home page GithubHelp logo

ibm-granite / granite-code-models Goto Github PK

View Code? Open in Web Editor NEW
1.0K 1.0K 66.0 17.5 MB

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Home Page: https://huggingface.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330

License: Apache License 2.0

granite-code-models's People

Contributors

ameza13 avatar eltociear avatar leesaferite avatar mayank31398 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

granite-code-models's Issues

GQA?

Have the granite models been trained with grouped query attention?

Question about PSM vs SPM

I noticed in the paper that it says:

We train our models to work with both PSM (Prefix-Suffix-Middle) and SPM (Suffix-Prefix-Middle) modes, with relevant formatting control tokens,

Have you noticed any difference in accuracy from PSM vs SPM completion? I have seen PSM used before, but I like the idea of SPM. Since the model would be completing almost directly from the prefix, I wonder if it would generate higher quality completions than PSM. If there is no difference, that would be cool to know as well.

I'm still waiting on llama.cpp support to fully materialize before I can try out these models, but they look really nice!

Support infilling?

Hello, many thanks for the brilliant work!

Does the granite code model support infilling format for code completion?

Is softmax scaling optional?

Can you elaborate on the significance of the softmax scaling? I can't find it referenced in the paper, and it seems to be applied differently for each of the three attention methods in the HF implementation:

  • Eager attention applies it whenever the dtype isn't FP32 (since scale_attention_softmax_in_fp32, attention_softmax_in_fp32 and scale_attn_weights are all set.
  • SDPA sets a scale of None, though seems prepared to change it to 1 if scale_attn_weights were unset. (?)
  • The flash-attn module has provisions for applying the scale in _flash_attention_forward, but that argument isn't passed so it defaults to None.

Presumably the models are trained with flash-attn so is this just not actually relevant?

Provide .GGUF files?

Would it be possible to provice a full range of GGUF files for these wicked models?

I'm tyring to convert the 3B myself, but running into issues.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.