GithubHelp home page GithubHelp logo

nouamanetazi / bloomz.cpp Goto Github PK

View Code? Open in Web Editor NEW
811.0 15.0 65.0 2.14 MB

C++ implementation for BLOOM

License: MIT License

Makefile 0.55% Python 0.53% C 85.49% C++ 12.99% Objective-C 0.06% Swift 0.38%
bloom cpp multilingual

bloomz.cpp's Introduction

Hello World 👋


  • Loves a freshly brewed cup of coffee

🦑

🦑

🦑

bloomz.cpp's People

Contributors

lapo-luchini avatar nouamanetazi avatar pcuenca avatar wauplin avatar yangyaofei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bloomz.cpp's Issues

Convert HF to ggml error.

Thanks for great repo!!

I have HF model(llama) like that

image

But when I using script: python convert_hf_to_ggml.py ../7b/ ../7b/. I get error:

  File "convert_hf_to_ggml.py", line 81, in <module>
    config = AutoConfig.from_pretrained(model_name,   local_files_only=True)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 955, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/transformers/configuration_utils.py", line 617, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/transformers/configuration_utils.py", line 672, in _get_config_dict
    resolved_config_file = cached_file(
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/transformers/utils/hub.py", line 388, in cached_file
    raise EnvironmentError(
OSError: ../7b-vi/ does not appear to have a file named config.json. Checkout 'https://huggingface.co/../7b-vi//None' for available files.

How can I fix it?
Thank you

Bloomz 176B inference doesn't work

Hello,

I have converted bloomz model successfully, but the inference doesn't work.

 ./main -m ./models/ggml-model-bloomz-f16.bin -t 8 -n 128
main: seed = 1679167152
bloom_model_load: loading model from './models/ggml-model-bloomz-f16.bin' - please wait ...
bloom_model_load: n_vocab = 250880
bloom_model_load: n_ctx   = 512
bloom_model_load: n_embd  = 14336
bloom_model_load: n_mult  = 1
bloom_model_load: n_head  = 112
bloom_model_load: n_layer = 70
bloom_model_load: f16     = 1
bloom_model_load: n_ff    = 57344
bloom_model_load: n_parts = 1
bloom_model_load: ggml ctx size = 333257.61 MB
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 349847586752, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 349847931776, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351081229760, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351081459328, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 350670590144, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 349848678784, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351081976768, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351082206336, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351493305664, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351493305664, available 349445931264)
Segmentation fault (core dumped)

I have enough cpu memory "420GB". Any idea what is the issue ?

Quantised model?

Hey Great Job. Are you also sharing or hosting the quantized model on HF Hub?

This should help people who just want to 0-shot inference with the default model

Something wrong with the tokenize function.

The ggml model converted from "YeungNLP/bloomz-396m-zh" or "WangZeJun/bloom-396m-chat" lacks some tokens, such as the string "焙" or "擀", without corresponding tokens, the generated result cannot be displayed. However, in the official python way of the model, there is no such problem.

Sample, Notice the "�" section:

main: prompt: '面包的烘焙制作流程'
main: number of tokens in prompt = 3
 24765 -> '面包'
   373 -> '的'
 28967 -> '烘'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


面包(24765)的(373)烘(28967)�(1165)�(237)技巧(16012):(1038)
(189)1(20).(17) (210)面(1157)条(1996)要(853)煮熟(43916),(355)否则(14458)容易(7305)粘(14494)。(420) 
(2813)2(21).(17) 应(23830)使用(2527)烤(15337)箱(8226)而不是(12285)微波(30656)炉(16613)加热(25228)面团(44449)。
(672)3(22).(17) 用(16647)冷水(33637)淋(15735)湿(10556)面团(44449)以防止(31473)黏(19639)在一起(10919)。
(672)4(23).(17) 在(3612)预(3119)热(4291)至(1546)摄氏(39868)175(13634)度(1423)时(1018)开始(3590)烘(28967)�(1165)�(237),(355)直到(8326)底部(26609)变得
(13044)金(1539)黄色(21313)并(1437)散(4711)发出(13801)香味(32740)即可(10134)享用(42892)</s>(2) [end of text]


main: mem per token =  4944640 bytes
main:     load time =   558.57 ms
main:   sample time =   516.50 ms
main:  predict time =  3674.82 ms / 52.50 ms per token
main:    total time =  4945.50 ms

Quantization doesn't work with Bloomz 176B

Hello,

I have successfully converted the bloomz 176B model to fp16.
However, the quantization doesn't work and throw an error:

./quantize ./models/ggml-model-bloomz-f16.bin ./models/ggml-model-bloomz-f16-q4_0.bin 2
bloom_model_quantize: loading model from './models/ggml-model-bloomz-f16.bin'
bloom_model_quantize: n_vocab = 250880
bloom_model_quantize: n_ctx   = 512
bloom_model_quantize: n_embd  = 14336
bloom_model_quantize: n_mult  = 1
bloom_model_quantize: n_head  = 112
bloom_model_quantize: n_layer = 70
bloom_model_quantize: f16     = 1
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append
Aborted (core dumped)

Any idea how this could be fixed ?

setting -t 8 (n_threads) "locks" the python process

hello

using a 8 core CPU machine, setting -t above 8 freezes the process on 7b1 model. it does not reply back.

as in : 12 is >8 in :

./main -m ./models/ggml-model-bloomz-7b1-f16-q4_0.bin -t 12 -n 256 -p 'translate "Hi, how are you?" in Spanish:'

I wrapped your binary and added a core count protection in https://github.com/laurentperez/ava/blob/main/ava/src/main/kotlin/fr/ava/ia/service/hf/bloom/BloomService.kt#L20 but I'm no python expert and can't investigate much why python freezes

my cpu is

zsh 2506 [1]  (git)-[main]-% lscpu        
Architecture :                              x86_64
  Mode(s) opératoire(s) des processeurs :   32-bit, 64-bit
  Address sizes:                            39 bits physical, 48 bits virtual
  Boutisme :                                Little Endian
Processeur(s) :                             8
  Liste de processeur(s) en ligne :         0-7
Identifiant constructeur :                  GenuineIntel
  Nom de modèle :                           Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
    Famille de processeur :                 6
    Modèle :                                142
    Thread(s) par cœur :                    2
    Cœur(s) par socket :                    4

[Open-to-community] Benchmark bloomz.cpp on different hardware

Hey hey,

We are working hard to help you unlock the truest potential of open-source LLMs. In order for us to build better and cater to the majority of hardware we need your help to run benchmarks with bloomz.cpp 🤗

We are looking for the following information:

  1. Hardware information (CPU/ RAM/ GPU/ Threads)
  2. Inference time (time per token)
  3. Memory use

You can do so by following the quickstart steps in the project's README. 💯

Ping @NouamaneTazi and @Vaibhavs10 if you have any questions! <3

Happy benchmarking! 🚀

Convert bloomz-7b1-mt to ggml error.

I have the following problems when converting bloomz-7b1-mt model:

Some weights of BloomForCausalLM were not initialized from the model checkpoint at D:\projects\code\bloomz-7b1-mt and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
...
Traceback (most recent call last):
  File "D:\projects\code\bloomz.cpp\convert-hf-to-ggml.py", line 151, in <module>
    data.tofile(fout)
OSError: 67108864 requested and 0 written

download and convert the 7B1 model to ggml FP16 format fails!

As describe into readme file, when I try to run the convert-hf-to-ggml.py script I'm getting the following error.

Loading model:  bigscience/bloomz-7b1
pytorch_model.bin:  68%|████████████████████████████████████████████████████████████████████████████████████▎                                       | 9.62G/14.1G [16:33<07:47, 9.68MB/s]
Traceback (most recent call last):
  File "/home/e/Downloads/bloomz.cpp/convert-hf-to-ggml.py", line 84, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_name, config=config, torch_dtype=torch.float16 if ftype == 1 else torch.float32, low_cpu_mem_usage=True)
  File "/home/e/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/home/e/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3268, in from_pretrained
    resolved_archive_file = cached_file(
  File "/home/e/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/e/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/e/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1461, in hf_hub_download
    http_get(
  File "/home/e/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 569, in http_get
    raise EnvironmentError(
OSError: Consistency check failed: file should be of size 14138162687 but has size 9616967235 (pytorch_model.bin).
We are sorry for the inconvenience. Please retry download and pass `force_download=True, resume_download=False` as argument.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
Loading model:  bigscience/bloomz-7b1
pytorch_model.bin:  75%|████████████████████████████████████████████████████████████████████████████████████████████▋                               | 10.6G/14.1G [17:45<05:59, 9.92MB/s]
Traceback (most recent call last):
  File "/home/e/Downloads/bloomz.cpp/convert-hf-to-ggml.py", line 84, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_name, config=config, torch_dtype=torch.float16 if ftype == 1 else torch.float32, low_cpu_mem_usage=False)
  File "/home/e/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/home/e/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3268, in from_pretrained
    resolved_archive_file = cached_file(
  File "/home/e/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/e/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/e/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1461, in hf_hub_download
    http_get(
  File "/home/e/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 569, in http_get
    raise EnvironmentError(
OSError: Consistency check failed: file should be of size 14138162687 but has size 10568333935 (pytorch_model.bin).
We are sorry for the inconvenience. Please retry download and pass `force_download=True, resume_download=False` as argument.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.

I'm running : python3 convert-hf-to-ggml.py bigscience/bloomz-7b1 ./models

I believe this fails for some reason.
AutoModelForCausalLM.from_pretrained(model_name, config=config, torch_dtype=torch.float16 if ftype == 1 else torch.float32, low_cpu_mem_usage=True)

I do have enough disk space so I'm not sure why it downloading fails around 10 Gb. Also, my .cache directory has both unfinished files.

My operating system is Ubuntu 22.04.

Running 'make' warnings in bloom_model_load() function

‘bool bloom_model_load(const string&, bloom_model&, gpt_vocab&, int)

main.cpp:422:89: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
422 | "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld], expected [%d, %d]\n",
| ~~~^
| |
| long long int
| %ld
423 | func, name.data(), tensor->ne[0], tensor->ne[1], ne[0], ne[1]);
| ~~~~~~~~~~~~~
| |
| int64_t {aka long int}
main.cpp:422:95: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 6 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
422 | "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld], expected [%d, %d]\n",
| ~~~^
| |
| long long int
| %ld
423 | func, name.data(), tensor->ne[0], tensor->ne[1], ne[0], ne[1]);
| ~~~~~~~~~~~~~
| |
| int64_t {aka long int}
main.cpp:430:93: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
430 | "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld], expected [%d, %d]\n",
| ~~~^
| |
| long long int
| %ld
431 | func, name.data(), tensor->ne[0] / n_parts, tensor->ne[1], ne[0], ne[1]);
| ~~~~~~~~~~~~~~~~~~~~~~~
| |
| int64_t {aka long int}
main.cpp:430:99: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 6 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
430 | "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld], expected [%d, %d]\n",
| ~~~^
| |
| long long int
| %ld
431 | func, name.data(), tensor->ne[0] / n_parts, tensor->ne[1], ne[0], ne[1]);
| ~~~~~~~~~~~~~
| |
| int64_t {aka long int}
main.cpp:437:93: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
437 | "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld], expected [%d, %d]\n",
| ~~~^
| |
| long long int
| %ld
438 | func, name.data(), tensor->ne[0], tensor->ne[1] / n_parts, ne[0], ne[1]);
| ~~~~~~~~~~~~~~~~~~~~~~~
| |
| int64_t {aka long int}
./main -h
usage: ./main [options]

This is on Ubuntu Focal..

Eval result is wrong compare to huggingface

When I use this on the other model, it eval garbage, so I test the code, found something.

can you tell me how to fix this, I will fix it.

Here is the result:

use huggingface:

import torch
from transformers import AutoConfig
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-7b1-mt").eval()
inputs = torch.asarray([[1, 2, 3]])
logtis = model(inputs).logits
print(logtis[0][2].tolist()[:100])

result:

0.22344008088111877, 1.9642051458358765, 11.864399909973145, -1.549119472503662, 2.6078567504882812, 4.901538372039795, 5.479622840881348, 4.366276741027832, 3.108126401901245, 4.476998805999756, 5.407215118408203, 6.087584972381592, 6.882656097412109, 3.5824639797210693, 2.8524699211120605, 6.894238471984863, 6.143975734710693, 7.291437149047852, 5.1857380867004395, 5.704357147216797, 5.13894510269165, 4.8879570960998535, 4.2335052490234375, 3.943253993988037, 4.005831718444824, 1.9380242824554443, 2.3373711109161377, 2.0637481212615967, 1.5256991386413574, 8.54484748840332, 5.76274299621582, 5.615670204162598, 5.266860485076904, 5.444922924041748, 4.748494625091553, 3.3347055912017822, 6.756032943725586, 3.9474661350250244, 4.647606372833252, 3.971529483795166, 2.891402006149292, 3.3260879516601562, 3.4882469177246094, 4.8745317459106445, 5.117419719696045, 4.261842727661133, 4.353585243225098, 4.781613826751709, 4.096264839172363, 4.630766868591309, 4.105442047119141, 5.867022514343262, 2.2942967414855957, 3.494351387023926, 5.262766361236572, 4.534628868103027, 3.265615940093994, 4.927636623382568, 3.5005316734313965, 5.573263168334961, 3.197946310043335, 3.4139623641967773, 6.333967685699463, 6.041770935058594, 5.278609752655029, 4.178605079650879, 4.641434669494629, 3.7197110652923584, 5.69587516784668, 2.7639200687408447, 3.924497127532959, 4.449933052062988, 3.4940080642700195, 3.1619396209716797, 2.9798483848571777, 4.493366241455078, 3.155033588409424, 4.518561363220215, 4.083653450012207, 5.188224792480469, 4.6946539878845215, 5.858641624450684, 3.122354507446289, 5.2717390060424805, 2.3826353549957275, 3.1856019496917725, 6.717620849609375, 4.741221904754639, 3.156816005706787, 3.8298747539520264, 3.203200340270996, 5.476276397705078, 4.176375389099121, 3.668912410736084, 5.503058910369873, 5.73520040512085, 4.525679111480713, 3.5857861042022705, -0.6433481574058533, -0.5238689184188843

In this repo:

bloom_eval(model, params.n_threads, 0, { 1, 2, 3 }, logits, mem_per_token);
   for (int i=0; i < 100 ;i++) {
       std::cout << std::fixed << std::setprecision(15) << logits[i] << std::endl;
   }

result:

-0.750549316406250, 1.381408691406250, 11.912109375000000, -2.324371337890625, 3.116577148437500, 4.497253417968750, 5.604492187500000, 4.389404296875000, 4.248657226562500, 5.013183593750000, 4.252197265625000, 5.837402343750000, 6.283691406250000, 4.027099609375000, 3.549743652343750, 6.813964843750000, 6.371337890625000, 7.534423828125000, 5.618286132812500, 6.014160156250000, 5.917968750000000, 5.415527343750000, 4.756469726562500, 4.302734375000000, 4.481933593750000, 2.578125000000000, 3.106567382812500, 2.498901367187500, 2.491088867187500, 8.800781250000000, 5.740478515625000, 5.986816406250000, 5.782470703125000, 6.171264648437500, 4.992431640625000, 3.993103027343750, 6.351440429687500, 4.086059570312500, 4.803222656250000, 4.089355468750000, 2.783569335937500, 3.649658203125000, 3.649780273437500, 4.830871582031250, 5.301757812500000, 4.125244140625000, 4.416015625000000, 4.733459472656250, 4.139587402343750, 4.553588867187500, 3.957397460937500, 5.420043945312500, 2.590698242187500, 3.759399414062500, 5.254394531250000, 4.708129882812500, 3.804687500000000, 4.893432617187500, 3.654541015625000, 5.492797851562500, 3.402587890625000, 3.385986328125000, 6.621337890625000, 6.488769531250000, 5.375976562500000, 4.649780273437500, 5.176757812500000, 3.185668945312500, 5.509765625000000, 2.897094726562500, 3.759765625000000, 4.225585937500000, 3.601074218750000, 3.360473632812500, 2.742187500000000, 4.666503906250000, 3.827758789062500, 4.143066406250000, 4.038940429687500, 4.867187500000000, 4.510742187500000, 5.198242187500000, 3.136718750000000, 4.870361328125000, 2.699951171875000, 3.326538085937500, 6.828735351562500, 5.053955078125000, 3.508911132812500, 3.542419433593750, 3.531616210937500, 5.666015625000000, 4.360961914062500, 3.968017578125000, 5.753784179687500, 6.312866210937500, 4.483673095703125, 3.922241210937500, -0.844238281250000, -1.286132812500000

model is the bloomz-7b1-mt , use the script in this repo.

several questions

i have only 16gb mem so i tried to use local-memory parameter, model loaded and i see converting started, but in the end it says killed still. i see a 20G model file generated. is it considered success?

also i was trying to convert the finetuned bloom model, (https://huggingface.co/BelleGroup/BELLE-7B-2M/tree/main). it was finetuned on 7B but looks like it was fp32 instead of fp16 so it's double sized. do i need to supply any additional param when trying to convert it to ggml? reason is after the conversion, the result becomes non-sense and weird chars.

or should i use their gptq 8bit quantized model to convert?

Is there any plan for mT0 model support?

BLOOMZ and mT0 models are related, and mT0-13B performs better than BLOOMZ-176B in some cases.

The mT0-13B will be a killer model for normal user devices after a GPTQ-4bit quantization.

Hope the model can be supported.

error

error loading model: unexpectedly reached end of file
llama_init_from_file: failed to load model
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

bloomz-3b-zh use bloomz.cpp-converter

-n param doesn't work?

I am overriding the -n param with a higher than default number, like 1,000 to get a longer response, but it doesn't seem to be working. In fact, sometimes the response is less than the default 128 chars. Anyone else have this issue?

document sampling parameters and/or minimal "viable" codegen model ?

hello. your work is great 👍

I wrapped your binary under my bot/API project https://github.com/laurentperez/ava#what-models-or-apis-does-it-support-

I'm mostly interested in code (python) generation from Bloom as a developer assist. I'm not using it for creative writing. However I'm playing with it for translations to evaluate how the 7b1 model might respond to more complex python prompts.

I infered using the bloomz-1b1 bloomz-3b and bloomz-7b1 models. So far, 7b1 model gives the best results but it's being "too creative".

see example below, the "Me encuentro muy bien/me alegro" were too creative, they did more than a translation :

curl -v -XPOST -H 'Content-Type: application/json' -d '{"msg":"translate \"Hi, how are you?\" in Spanish:"}' http://localhost:8080/hf/bloom

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000  
translate "Hi, how are you?" in Spanish: Me encuentro muy bien. ¿Cómo estas tú? Yo estoy?: me alegro</s> [end of text]

thanks !

There was a problem loading the model

after I do "python3 convert-hf-to-ggml.py bigscience/bloomz-7b1 ./models ",there was a problem loading the model:

model_path = "/aidata/yh/BelleGroup_BELLE-7B-1M-fp16/" # You can modify the path for storing the local model
    model =  AutoModelForCausalLM.from_pretrained(model_path,from_tf=True)
    model = model.half().cuda()
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    return model,tokenizer

Traceback (most recent call last):
File "/root/anaconda3/lib/python3.9/site-packages/streamlit/scriptrunner/script_runner.py", line 557, in _run_script
exec(code, module.dict)
File "lianjie_web.py", line 20, in
model,tokenizer= load_model()
File "/root/anaconda3/lib/python3.9/site-packages/streamlit/legacy_caching/caching.py", line 573, in wrapped_func
return get_or_create_cached_value()
File "/root/anaconda3/lib/python3.9/site-packages/streamlit/legacy_caching/caching.py", line 557, in get_or_create_cached_value
return_value = func(*args, **kwargs)
File "lianjie_web.py", line 15, in load_model
model = AutoModelForCausalLM.from_pretrained(model_path,from_tf=True)
File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "/root/anaconda3/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2612, in from_pretrained
model, loading_info = load_tf2_checkpoint_in_pytorch_model(
File "/root/anaconda3/lib/python3.9/site-packages/transformers/modeling_tf_pytorch_utils.py", line 401, in load_tf2_checkpoint_in_pytorch_model
from .modeling_tf_utils import load_tf_weights
File "/root/anaconda3/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 40, in
from .generation import GenerationConfig, TFGenerationMixin
ImportError: cannot import name 'TFGenerationMixin' from 'transformers.generation' (/root/anaconda3/lib/python3.9/site-packages/transformers/generation/init.py)

What is the reason for this, please?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.