GithubHelp home page GithubHelp logo

databricks / dbrx Goto Github PK

View Code? Open in Web Editor NEW
2.4K 38.0 229.0 65 KB

Code examples and resources for DBRX, a large language model developed by Databricks

Home Page: https://www.databricks.com/

License: Other

Python 100.00%
databricks gen-ai generative-ai llm llm-inference llm-training mosaic-ai

dbrx's People

Contributors

asnelling avatar bandish-shah avatar eltociear avatar hanlint avatar megha95 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbrx's Issues

HumanEval

When evaluating humaneval, does dbrx use some specical prompts to improve performance?

Missing tokenizer when use vllm

  File "/home/paas/vllm/vllm/engine/llm_engine.py", line 222, in _init_tokenizer
    self.tokenizer: BaseTokenizerGroup = get_tokenizer_group(
  File "/home/paas/vllm/vllm/transformers_utils/tokenizer_group/__init__.py", line 20, in get_tokenizer_group
    return TokenizerGroup(**init_kwargs)
  File "/home/paas/vllm/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 23, in __init__
    self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
  File "/home/paas/vllm/vllm/transformers_utils/tokenizer.py", line 66, in get_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
  File "/home/paas/miniconda3/envs/naie/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 822, in from_pretrained
    return tokenizer_class.from_pretrained(
  File "/home/paas/miniconda3/envs/naie/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2086, in from_pretrained
    return cls._from_pretrained(
  File "/home/paas/miniconda3/envs/naie/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2327, in _from_pretrained
    raise OSError(
OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted.

Is DBRX, the most powerful open-source LLM yet?

In the rapidly evolving landscape of natural language processing (NLP), the emergence of DBRX marks a significant milestone. Developed by Databricks, DBRX represents a quantum leap in the realm of large language models (LLMs), boasting unparalleled performance and accuracy across a multitude of benchmarks. This article delves into the intricate details and remarkable statistics that underscore the prowess of DBRX, positioning it as a frontrunner in the field.

Performance on Composite Benchmarks:

DBRX's superiority becomes evident when evaluated against established open and closed models across composite benchmarks. Notably, on the Hugging Face Open LLM Leaderboard, DBRX achieves an exceptional score of 74.5%, surpassing its closest competitor by a significant margin of 1.8%. Similarly, on the Databricks Model Gauntlet, DBRX outshines its peers with a commanding score of 66.8%, underscoring its unrivaled proficiency in diverse tasks encompassing world knowledge, commonsense reasoning, and language understanding.

Dominance in Specialized Domains:

Where DBRX truly shines is in specialized domains such as programming and mathematics. On benchmarks tailored to assess programming prowess like HumanEval and GSM8k, DBRX demonstrates remarkable superiority. For instance, on HumanEval, DBRX achieves an impressive score of 70.1%, outperforming Grok-1 by 6.9%, Mixtral Instruct by 15.3%, and the best-performing LLaMA2-70B variant by 37.9%. Similarly, on GSM8k, DBRX secures a notable 66.9%, surpassing competitors by margins ranging from 4.0% to 12.8%.

Unprecedented Versatility:

DBRX's exceptional performance across diverse domains underscores its versatility and adaptability. Whether tackling complex programming challenges or unraveling intricate linguistic nuances, DBRX consistently delivers unparalleled results. This versatility positions DBRX as a formidable tool for a wide array of applications, ranging from natural language understanding to specialized tasks in programming and mathematics.

Efficiency in Training and Inference:

Beyond its remarkable performance, DBRX also excels in training and inference efficiency. Leveraging a fine-grained mixture-of-experts (MoE) architecture, DBRX achieves superior FLOP efficiency compared to dense models, enabling faster training and inference without compromising on quality. Additionally, DBRX's inference throughput surpasses that of its counterparts, offering up to 150 tokens per second on Mosaic AI Model Serving.

Conclusion:

In conclusion, DBRX represents a paradigm shift in the landscape of large language models. Its exceptional performance, unmatched versatility, and superior efficiency position it as a frontrunner in the field, setting new standards for accuracy and proficiency. As the pinnacle of Databricks' innovation in NLP, DBRX promises to empower enterprises and researchers alike, heralding a new era of breakthroughs in natural language understanding and AI-driven applications.

Whats you think?

How inference efficiency is measured

The tech report described the methodology of the inference efficiency measurement but not in detail. It compared the Llama2-70B and DBRX. We have great interests in the comparison. So we also carried out some tests where we spawned different number synchronous clients in order to stress the service in different QPS. What performance we get is different from the tech report. DBRX is faster than Llama2-70B when the traffic is lower than 0.35 QPS. The Latency vs QPS curve is flipped after that. By the way we use the same prompt length and output length as that in tech report.

So I wonder if you could give more details about how the performance is test.

Silu or Glu activation?

According to Model card on huggingface:

DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA).

However, when I run

config = AutoConfig.from_pretrained('/models/dbrx-instruct/')
print(config.ffn_config)

It shows:

DbrxFFNConfig {
  "ffn_act_fn": {
    "name": "silu"
  },
  "ffn_hidden_size": 10752,
  "moe_jitter_eps": 0,
  "moe_loss_weight": 0.05,
  "moe_normalize_expert_weights": 1,
  "moe_num_experts": 16,
  "moe_top_k": 4,
  "transformers_version": "4.38.1",
  "uniform_expert_assignment": false
}

It is somehow misleading and confusing.

How to get hands on experience as a newbie

My first option is to run quantized versions.

Quantized

I read this https://github.com/databricks/dbrx#mlx

and then went to https://huggingface.co/mlx-community/dbrx-instruct-4bit

I read this

On my Macbook Pro M2 with 96GB of Unified Memory, DBRX Instruct in 4-bit for the above prompt it eats 70.2GB of RAM.

I am on a macbook pro M1 Max with 64Gb memory.

I guess that's not enough?

Computing

My next version is to figure out what's a cheap way to run the model but the details confuse me.

Can help?

Loading over multiple gpus in 8bit and 4bit with transformers loader

I can load the instruct model using the transformers loader and 8bit bits and bytes, I can get it to load evenly among multiple gpus.

However, I cannot seem to load the model with 4bit precion over multiple gpus, I managed to get the model to load across 1 24GB gpu and then start loading onto a second gpu of equivalent size, but it will not move on to any of the remaining gpus (7 in total). It will oom on the second gpu with the others sitting empty.

I've loaded other transformers based models via 4bit and never experience this heavily unbalanced loading before.

I have encountered a problem:LayerNorm.__init__() got an unexpected keyword argument 'bias'

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "/home/roo/train/dbrx-instruct/generate.py", line 39, in
model = AutoModelForCausalLM.from_pretrained(
File "/home/roo/anaconda3/envs/Meditron/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/home/roo/anaconda3/envs/Meditron/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3404, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/roo/.cache/huggingface/modules/transformers_modules/model/modeling_dbrx.py", line 1261, in init
self.transformer = DbrxModel(config)
File "/home/roo/.cache/huggingface/modules/transformers_modules/model/modeling_dbrx.py", line 1013, in init
self.blocks = nn.ModuleList([
File "/home/roo/.cache/huggingface/modules/transformers_modules/model/modeling_dbrx.py", line 1014, in
DbrxBlock(config, block_idx) for block_idx in range(config.n_layers)
File "/home/roo/.cache/huggingface/modules/transformers_modules/model/modeling_dbrx.py", line 856, in init
self.norm_attn_norm = DbrxNormAttentionNorm(
File "/home/roo/.cache/huggingface/modules/transformers_modules/model/modeling_dbrx.py", line 642, in init
self.norm_1 = nn.LayerNorm(hidden_size, bias=False)
TypeError: LayerNorm.init() got an unexpected keyword argument 'bias'

`convert_ids_to_tokens` not working as expected.

from transformers import AutoTokenizer
t = AutoTokenizer.from_pretrained('/models/dbrx-instruct/')
t.encode('请问你是谁')
# [15225, 57107, 57668, 21043, 39013, 223]
t.decode([15225, 57107, 57668, 21043, 39013, 223])
# '请问你是谁'
print(t.convert_ids_to_tokens(15225))
# '请'

I suppose it should output token text like

How to use API?

Hi, thank you for your wonderful work! However, I have to learn how to use API in our project. So please show us the way to use API when you update README.md next time. Thank you so much~

generate.py : tiktoken.py throws Encoding import error

After setting everything up locally, both the generate.py from the github and the minimal python script on the huggingface page throw the same error.
I followed all the steps in this repo and my brand new venv has been populated with "pip install -r requirements.txt"


Traceback (most recent call last):
  File "/media/models/dbrx/generate.py", line 34, in <module>
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/models/dbrx/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 822, in from_pretrained
    return tokenizer_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/models/dbrx/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2086, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/media/models/dbrx/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2325, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/tiktoken.py", line 105, in __init__
    from tiktoken import Encoding  # type: ignore (thirdParty)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: cannot import name 'Encoding' from 'tiktoken' (/media/models/dbrx/tiktoken.py)

training slow

Hello, I find it is slow to train the moe model, because the DbrxExperts moe training process is serial.
image

Can it use parallel training process in "for loop"?

What's the optimal parallel strategy using TensorRT-LLM?

Thanks for your great efforts first. I read the PR you opened in the TensorRT-LLM repo and noticed that EP +TP, PP + TP, and TP are supported during inference. May I ask which one is optimal? Specifically, as for the MoE layer, does EP or TP yield better performance?

Fine Tuning?

Do you support fine-tuning on this model? Such as using Lora, Deepspeed, etc

Real Performance versus llama-70B?

I have a problem about the inference data posted in this blog:
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

A MoE model with 36B activated parameters and 132B total parameters, it's inference performance will act like a 90B dense model with 2000 prompt and 256 output tokenes. How can it always performs better than llama2-70B dense model? As the batchsize increases, it will perform better than llama2-70B dense model first, and will perform worse than llama2-70B dense model from batchszie 3 or 4, because it will load all the 132B parameters when more and more experts are activated.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.