databricks / dbrx Goto Github PK
View Code? Open in Web Editor NEWCode examples and resources for DBRX, a large language model developed by Databricks
Home Page: https://www.databricks.com/
License: Other
Code examples and resources for DBRX, a large language model developed by Databricks
Home Page: https://www.databricks.com/
License: Other
When evaluating humaneval, does dbrx use some specical prompts to improve performance?
File "/home/paas/vllm/vllm/engine/llm_engine.py", line 222, in _init_tokenizer
self.tokenizer: BaseTokenizerGroup = get_tokenizer_group(
File "/home/paas/vllm/vllm/transformers_utils/tokenizer_group/__init__.py", line 20, in get_tokenizer_group
return TokenizerGroup(**init_kwargs)
File "/home/paas/vllm/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 23, in __init__
self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
File "/home/paas/vllm/vllm/transformers_utils/tokenizer.py", line 66, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/home/paas/miniconda3/envs/naie/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 822, in from_pretrained
return tokenizer_class.from_pretrained(
File "/home/paas/miniconda3/envs/naie/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2086, in from_pretrained
return cls._from_pretrained(
File "/home/paas/miniconda3/envs/naie/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2327, in _from_pretrained
raise OSError(
OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted.
In the rapidly evolving landscape of natural language processing (NLP), the emergence of DBRX marks a significant milestone. Developed by Databricks, DBRX represents a quantum leap in the realm of large language models (LLMs), boasting unparalleled performance and accuracy across a multitude of benchmarks. This article delves into the intricate details and remarkable statistics that underscore the prowess of DBRX, positioning it as a frontrunner in the field.
DBRX's superiority becomes evident when evaluated against established open and closed models across composite benchmarks. Notably, on the Hugging Face Open LLM Leaderboard, DBRX achieves an exceptional score of 74.5%, surpassing its closest competitor by a significant margin of 1.8%. Similarly, on the Databricks Model Gauntlet, DBRX outshines its peers with a commanding score of 66.8%, underscoring its unrivaled proficiency in diverse tasks encompassing world knowledge, commonsense reasoning, and language understanding.
Where DBRX truly shines is in specialized domains such as programming and mathematics. On benchmarks tailored to assess programming prowess like HumanEval and GSM8k, DBRX demonstrates remarkable superiority. For instance, on HumanEval, DBRX achieves an impressive score of 70.1%, outperforming Grok-1 by 6.9%, Mixtral Instruct by 15.3%, and the best-performing LLaMA2-70B variant by 37.9%. Similarly, on GSM8k, DBRX secures a notable 66.9%, surpassing competitors by margins ranging from 4.0% to 12.8%.
DBRX's exceptional performance across diverse domains underscores its versatility and adaptability. Whether tackling complex programming challenges or unraveling intricate linguistic nuances, DBRX consistently delivers unparalleled results. This versatility positions DBRX as a formidable tool for a wide array of applications, ranging from natural language understanding to specialized tasks in programming and mathematics.
Beyond its remarkable performance, DBRX also excels in training and inference efficiency. Leveraging a fine-grained mixture-of-experts (MoE) architecture, DBRX achieves superior FLOP efficiency compared to dense models, enabling faster training and inference without compromising on quality. Additionally, DBRX's inference throughput surpasses that of its counterparts, offering up to 150 tokens per second on Mosaic AI Model Serving.
In conclusion, DBRX represents a paradigm shift in the landscape of large language models. Its exceptional performance, unmatched versatility, and superior efficiency position it as a frontrunner in the field, setting new standards for accuracy and proficiency. As the pinnacle of Databricks' innovation in NLP, DBRX promises to empower enterprises and researchers alike, heralding a new era of breakthroughs in natural language understanding and AI-driven applications.
The tech report described the methodology of the inference efficiency measurement but not in detail. It compared the Llama2-70B and DBRX. We have great interests in the comparison. So we also carried out some tests where we spawned different number synchronous clients in order to stress the service in different QPS. What performance we get is different from the tech report. DBRX is faster than Llama2-70B when the traffic is lower than 0.35 QPS. The Latency vs QPS curve is flipped after that. By the way we use the same prompt length and output length as that in tech report.
So I wonder if you could give more details about how the performance is test.
According to Model card on huggingface:
DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA).
However, when I run
config = AutoConfig.from_pretrained('/models/dbrx-instruct/')
print(config.ffn_config)
It shows:
DbrxFFNConfig {
"ffn_act_fn": {
"name": "silu"
},
"ffn_hidden_size": 10752,
"moe_jitter_eps": 0,
"moe_loss_weight": 0.05,
"moe_normalize_expert_weights": 1,
"moe_num_experts": 16,
"moe_top_k": 4,
"transformers_version": "4.38.1",
"uniform_expert_assignment": false
}
It is somehow misleading and confusing.
It would be great to have a version that can be better worked with locally - ideally executable on CPU
My first option is to run quantized versions.
I read this https://github.com/databricks/dbrx#mlx
and then went to https://huggingface.co/mlx-community/dbrx-instruct-4bit
I read this
On my Macbook Pro M2 with 96GB of Unified Memory, DBRX Instruct in 4-bit for the above prompt it eats 70.2GB of RAM.
I am on a macbook pro M1 Max with 64Gb memory.
I guess that's not enough?
My next version is to figure out what's a cheap way to run the model but the details confuse me.
Can help?
I can load the instruct model using the transformers loader and 8bit bits and bytes, I can get it to load evenly among multiple gpus.
However, I cannot seem to load the model with 4bit precion over multiple gpus, I managed to get the model to load across 1 24GB gpu and then start loading onto a second gpu of equivalent size, but it will not move on to any of the remaining gpus (7 in total). It will oom on the second gpu with the others sitting empty.
I've loaded other transformers based models via 4bit and never experience this heavily unbalanced loading before.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "/home/roo/train/dbrx-instruct/generate.py", line 39, in
model = AutoModelForCausalLM.from_pretrained(
File "/home/roo/anaconda3/envs/Meditron/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/home/roo/anaconda3/envs/Meditron/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3404, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/roo/.cache/huggingface/modules/transformers_modules/model/modeling_dbrx.py", line 1261, in init
self.transformer = DbrxModel(config)
File "/home/roo/.cache/huggingface/modules/transformers_modules/model/modeling_dbrx.py", line 1013, in init
self.blocks = nn.ModuleList([
File "/home/roo/.cache/huggingface/modules/transformers_modules/model/modeling_dbrx.py", line 1014, in
DbrxBlock(config, block_idx) for block_idx in range(config.n_layers)
File "/home/roo/.cache/huggingface/modules/transformers_modules/model/modeling_dbrx.py", line 856, in init
self.norm_attn_norm = DbrxNormAttentionNorm(
File "/home/roo/.cache/huggingface/modules/transformers_modules/model/modeling_dbrx.py", line 642, in init
self.norm_1 = nn.LayerNorm(hidden_size, bias=False)
TypeError: LayerNorm.init() got an unexpected keyword argument 'bias'
from transformers import AutoTokenizer
t = AutoTokenizer.from_pretrained('/models/dbrx-instruct/')
t.encode('请问你是谁')
# [15225, 57107, 57668, 21043, 39013, 223]
t.decode([15225, 57107, 57668, 21043, 39013, 223])
# '请问你是谁'
print(t.convert_ids_to_tokens(15225))
# '请'
I suppose it should output token text like 请
?
Hi, thank you for your wonderful work! However, I have to learn how to use API in our project. So please show us the way to use API when you update README.md next time. Thank you so much~
Does the tokenizer of this model have a network to load successfully?
After setting everything up locally, both the generate.py from the github and the minimal python script on the huggingface page throw the same error.
I followed all the steps in this repo and my brand new venv has been populated with "pip install -r requirements.txt"
Traceback (most recent call last):
File "/media/models/dbrx/generate.py", line 34, in <module>
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/models/dbrx/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 822, in from_pretrained
return tokenizer_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/models/dbrx/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2086, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/media/models/dbrx/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2325, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/tiktoken.py", line 105, in __init__
from tiktoken import Encoding # type: ignore (thirdParty)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: cannot import name 'Encoding' from 'tiktoken' (/media/models/dbrx/tiktoken.py)
The most models' pretrainModel class attribute _supports_sdpa are True, why DBRX set False?
Thanks for your great efforts first. I read the PR you opened in the TensorRT-LLM repo and noticed that EP +TP, PP + TP, and TP are supported during inference. May I ask which one is optimal? Specifically, as for the MoE layer, does EP or TP yield better performance?
Do you support fine-tuning on this model? Such as using Lora, Deepspeed, etc
I have a problem about the inference data posted in this blog:
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
A MoE model with 36B activated parameters and 132B total parameters, it's inference performance will act like a 90B dense model with 2000 prompt and 256 output tokenes. How can it always performs better than llama2-70B dense model? As the batchsize increases, it will perform better than llama2-70B dense model first, and will perform worse than llama2-70B dense model from batchszie 3 or 4, because it will load all the 132B parameters when more and more experts are activated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.