GithubHelp home page GithubHelp logo

intel-analytics / ipex-llm-tutorial Goto Github PK

View Code? Open in Web Editor NEW
134.0 134.0 35.0 438 KB

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm

Home Page: https://github.com/intel-analytics/bigdl

License: Apache License 2.0

Jupyter Notebook 100.00%

ipex-llm-tutorial's People

Contributors

ariadne330 avatar ch1y0q avatar hxsz1997 avatar jason-dai avatar jinbridger avatar lalalapotter avatar mingyu-wei avatar novti avatar oscilloscope98 avatar plusbang avatar qiuxin2012 avatar sgwhat avatar shane-huang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ipex-llm-tutorial's Issues

pip install failed

❯ pip install --pre --upgrade ipex-llm[all]
zsh: no matches found: ipex-llm[all]

XPU inference failed

My code is based on bigdl-llm

`from langchain import LLMChain, PromptTemplate
from bigdl.llm.langchain.llms import TransformersLLM
from langchain.memory import ConversationBufferWindowMemory

chatglm3_6b = 'D:/AI_projects/Langchain-Chatchat/llm_model/THUDM/chatglm3-6b'

llm_model_path = chatglm3_6b # huggingface llm 模型的路径

CHATGLM_V3_PROMPT_TEMPLATE = "问:{prompt}\n\n答:"

prompt = PromptTemplate(input_variables=["history", "human_input"], template=CHATGLM_V3_PROMPT_TEMPLATE)
max_new_tokens = 128

llm = TransformersLLM.from_model_id(
model_id=llm_model_path,
model_kwargs={"trust_remote_code": True, "temperature": 0},
)

llm_chain = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
llm_kwargs={"max_new_tokens":max_new_tokens},
memory=ConversationBufferWindowMemory(k=2),
)

VICUNA_PROMPT_TEMPLATE = "USER: {prompt}\nASSISTANT:"

llm_result = llm.generate([VICUNA_PROMPT_TEMPLATE.format(prompt="讲一个笑话"), VICUNA_PROMPT_TEMPLATE.format(prompt="作一首诗")]*3)

print("-"*20+"number of generations"+"-"*20)
print(len(llm_result.generations))
print("-"*20+"the first generation"+"-"*20)
print(llm_result.generations[0][0].text)
`

but returns:

`Traceback (most recent call last):
File "D:\AI_projects\ipex-samples\main-bigdl.py", line 32, in
llm_result = llm.generate([VICUNA_PROMPT_TEMPLATE.format(prompt="讲一个笑话"), VICUNA_PROMPT_TEMPLATE.format(prompt="作一首诗")]*3)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 741, in generate
output = self._generate_helper(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 605, in _generate_helper
raise e
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 592, in _generate_helper
self._generate(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 1177, in _generate
self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\langchain\llms\transformersllm.py", line 248, in _call
output = self.model.generate(input_ids, streamer=streamer,
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\transformers\generation\utils.py", line 1538, in generate
return self.greedy_search(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\transformers\generation\utils.py", line 2362, in greedy_search
outputs = self(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\conte/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 941, in forward
transformer_outputs = self.transformer(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 167, in chatglm2_model_forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\conte/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 641, in forward
layer_ret = layer(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\conte/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 191, in chatglm2_attention_forward
return forward_function(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 377, in chatglm2_attention_forward_8eb45c
query_layer = apply_rotary_pos_emb_chatglm(query_layer, rotary_pos_emb)
NotImplementedError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: Could not run 'torch_ipex::mul_add' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torch_ipex::mul_add' is only available for these backends: [XPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

XPU: registered at C:/jenkins/workspace/IPEX-GPU-ARC770-windows/frameworks.ai.pytorch.ipex-gpu/csrc/gpu/aten/operators/TripleOps.cpp:521 [kernel]
BackendSelect: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:153 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\FunctionalizeFallbackKernel.cpp:290 [backend fallback]
Named: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\native\NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:53 [backend fallback]
AutogradCPU: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:57 [backend fallback]
AutogradCUDA: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:65 [backend fallback]
AutogradXLA: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:69 [backend fallback]
AutogradMPS: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:77 [backend fallback]
AutogradXPU: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:61 [backend fallback]
AutogradHPU: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:90 [backend fallback]
AutogradLazy: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:73 [backend fallback]
AutogradMeta: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:81 [backend fallback]
Tracer: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\torch\csrc\autograd\TraceTypeManual.cpp:296 [backend fallback]
AutocastCPU: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\autocast_mode.cpp:382 [backend fallback]
AutocastXPU: registered at C:/jenkins/workspace/IPEX-GPU-ARC770-windows/frameworks.ai.pytorch.ipex-gpu/csrc/gpu/aten/operators/TripleOps.cpp:521 [kernel]
AutocastCUDA: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\autocast_mode.cpp:249 [backend fallback]
FuncTorchBatched: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\LegacyBatchingRegistrations.cpp:710 [backend fallback]
FuncTorchVmapMode: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:161 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:165 [backend fallback]
PythonDispatcher: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:157 [backend fallback]`

my pip list is:
Package Version


accelerate 0.21.0
aiofiles 23.2.1
aiohttp 3.9.3
aiosignal 1.3.1
altair 5.3.0
annotated-types 0.6.0
antlr4-python3-runtime 4.9.3
anyio 4.3.0
arxiv 2.1.0
async-timeout 4.0.3
attrs 23.2.0
backoff 2.2.1
beautifulsoup4 4.12.3
bigdl-core-xe-21 2.5.0b20240324
bigdl-llm 2.5.0b20240330
blinker 1.7.0
blis 0.7.11
Brotli 1.1.0
cachetools 5.3.3
catalogue 2.0.10
certifi 2024.2.2
cffi 1.16.0
chardet 5.2.0
charset-normalizer 3.3.2
click 8.1.7
cloudpathlib 0.16.0
colorama 0.4.6
coloredlogs 15.0.1
confection 0.1.4
contourpy 1.2.0
cryptography 42.0.5
cycler 0.12.1
cymem 2.0.8
dataclasses-json 0.6.4
deepdiff 6.7.1
Deprecated 1.2.14
deprecation 2.1.0
distro 1.9.0
duckduckgo-search 3.9.9
effdet 0.4.1
einops 0.7.0
emoji 2.11.0
et-xmlfile 1.1.0
exceptiongroup 1.2.0
faiss-cpu 1.7.4
fastapi 0.109.0
feedparser 6.0.10
filelock 3.13.3
filetype 1.2.0
flatbuffers 24.3.25
fonttools 4.50.0
frozenlist 1.4.1
fschat 0.2.35
fsspec 2024.3.1
gitdb 4.0.11
GitPython 3.1.43
greenlet 3.0.3
h11 0.14.0
h2 4.1.0
hpack 4.0.0
httpcore 1.0.5
httpx 0.26.0
httpx-sse 0.4.0
huggingface-hub 0.22.2
humanfriendly 10.0
hyperframe 6.0.1
idna 3.6
importlib_metadata 7.1.0
importlib_resources 6.4.0
iniconfig 2.0.0
intel-extension-for-pytorch 2.1.10+xpu
intel-openmp 2024.1.0
iopath 0.1.10
Jinja2 3.1.3
joblib 1.3.2
jsonpatch 1.33
jsonpath-python 1.0.6
jsonpointer 2.4
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
langchain 0.0.354
langchain-community 0.0.20
langchain-core 0.1.23
langchain-experimental 0.0.47
langcodes 3.3.0
langdetect 1.0.9
langsmith 0.0.87
layoutparser 0.3.4
llama-index 0.9.35
lxml 5.2.0
Markdown 3.6
markdown-it-py 3.0.0
markdown2 2.4.13
markdownify 0.11.6
MarkupSafe 2.1.5
marshmallow 3.21.1
matplotlib 3.8.3
mdurl 0.1.2
metaphor-python 0.1.23
mpmath 1.3.0
msg-parser 1.2.0
multidict 6.0.5
murmurhash 1.0.10
mypy-extensions 1.0.0
nest-asyncio 1.6.0
networkx 3.2.1
nh3 0.2.17
nltk 3.8.1
numexpr 2.8.6
numpy 1.26.4
olefile 0.47
omegaconf 2.3.0
onnx 1.16.0
onnxruntime 1.15.1
openai 1.9.0
opencv-python 4.9.0.80
openpyxl 3.1.2
ordered-set 4.1.0
packaging 23.2
pandas 2.0.3
pathlib 1.0.1
pdf2image 1.17.0
pdfminer.six 20231228
pdfplumber 0.11.0
pikepdf 8.4.1
Pillow 9.5.0
pillow_heif 0.15.0
pip 23.3.1
pluggy 1.4.0
portalocker 2.8.2
preshed 3.0.9
prompt-toolkit 3.0.43
protobuf 4.25.3
psutil 5.9.8
py-cpuinfo 9.0.0
pyarrow 15.0.2
pyclipper 1.3.0.post5
pycocotools 2.0.7
pycparser 2.22
pydantic 1.10.13
pydantic_core 2.16.3
pydeck 0.8.1b0
Pygments 2.17.2
PyJWT 2.8.0
pylibjpeg-libjpeg 2.1.0
PyMuPDF 1.23.16
PyMuPDFb 1.23.9
pypandoc 1.13
pyparsing 3.1.2
pypdf 4.1.0
pypdfium2 4.28.0
pyreadline3 3.4.1
pytesseract 0.3.10
pytest 7.4.3
python-dateutil 2.9.0.post0
python-decouple 3.8
python-docx 1.1.0
python-iso639 2024.2.7
python-magic 0.4.27
python-magic-bin 0.4.14
python-multipart 0.0.9
python-pptx 0.6.23
pytz 2024.1
pywin32 306
PyYAML 6.0.1
rapidfuzz 3.7.0
rapidocr-onnxruntime 1.3.8
referencing 0.34.0
regex 2023.12.25
requests 2.31.0
rich 13.7.1
rpds-py 0.18.0
safetensors 0.4.2
scikit-learn 1.4.1.post1
scipy 1.12.0
sentence-transformers 2.2.2
sentencepiece 0.2.0
setuptools 68.2.2
sgmllib3k 1.0.0
shapely 2.0.3
shortuuid 1.0.13
simplejson 3.19.2
six 1.16.0
smart-open 6.4.0
smmap 5.0.1
sniffio 1.3.1
socksio 1.0.0
soupsieve 2.5
spacy 3.7.2
spacy-legacy 3.0.12
spacy-loggers 1.0.5
SQLAlchemy 2.0.25
srsly 2.4.8
sse-starlette 1.8.2
starlette 0.35.0
streamlit 1.30.0
streamlit-aggrid 0.3.4.post3
streamlit-antd-components 0.3.1
streamlit-chatbox 1.1.11
streamlit-feedback 0.1.3
streamlit-modal 0.1.0
streamlit-option-menu 0.3.12
strsimpy 0.2.1
svgwrite 1.4.3
sympy 1.12
tabulate 0.9.0
tenacity 8.2.3
thinc 8.2.3
threadpoolctl 3.4.0
tiktoken 0.5.2
timm 0.9.16
tokenizers 0.13.3
toml 0.10.2
tomli 2.0.1
toolz 0.12.1
torch 2.1.0a0+cxx11.abi
torchaudio 2.1.2
torchvision 0.16.0a0+cxx11.abi
tornado 6.4
tqdm 4.66.1
transformers 4.31.0
transformers-stream-generator 0.0.4
typer 0.9.4
typing_extensions 4.10.0
typing-inspect 0.9.0
tzdata 2024.1
tzlocal 5.2
unstructured 0.12.5
unstructured-client 0.22.0
unstructured-inference 0.7.23
unstructured.pytesseract 0.3.12
urllib3 2.2.1
uvicorn 0.29.0
validators 0.24.0
wasabi 1.1.2
watchdog 3.0.0
wavedrom 2.0.3.post3
wcwidth 0.2.13
weasel 0.3.4
websockets 12.0
wheel 0.41.2
wrapt 1.16.0
xformers 0.0.23.post1
xlrd 2.0.1
XlsxWriter 3.2.0
yarl 1.9.4
youtube-search 2.1.2
zipp 3.18.1

Hope you can help me with this!

about the memory problem

img_v3_0269_c20cbf2c-b81b-4866-914b-d470413adebg
Each time when I interact with the model, the memory occupied by the model increases and does not release memory resources. As a result, when there are many conversations, it is very easy for the model to crash. How can I solve this problem?

About the acclerate problem with xpu

image
After putting the model and inputs to xpu, the model is work now on intel laptop. But the inference time is about 588 seconds that is too long for me. I think maybe the gpu is not working right now, may I ask what is the problem here? Thank you very much for any response.

following is the code:
`import torch
import intel_extension_for_pytorch as ipex

from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel
from transformers import AutoTokenizer

import time
import numpy as np

from gpu_benchmark_util import BenchmarkWrapper

model_path = r"D:\rag\test_api\Baichuan2-7B-Chat"

model_path = r"C:\Users\Administrator\yishuo\chatglm2-6b"

prompt = """ 你是human_prime2,你是一个高级智能实体,你融合了最先进的算法和深度学习网络,专为跨越星际的知识探索与智慧 收集而设计。
你回答以下问题时必须跟哲学相结合,必须在15字内回答完,你会尽量参考知识库来回答。
以下是问题:请介绍钱.
以下是知识库:[{'对话': '什么是"帮费"?', '回复': '"帮费"是为**各库采买物料时,为护送官员以及送部的饭食 银拨配的额外款项。'}, {'对话': '怎么说?', '回复': '如果技术能够复制我们的外貌,它也许能够复制我们的**和感受。'}, {'对话': '你好。', '回复': '嘿,你好!你看起来长得和我可真像啊!'}].
"""
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, optimize_model=True, load_in_4bit=True).bfloat16().eval()

model = AutoModel.from_pretrained(model_path, trust_remote_code=True, optimize_model=True, load_in_4bit=True).eval()

input_ids = tokenizer.encode(prompt, return_tensors="pt")
print("finish to load")

model = model.to('xpu')

model.model.embed_tokens.to('cpu')

model.transformer.embedding.to('cpu')
input_ids = input_ids.to('xpu')

print("finish to xpu")

model = BenchmarkWrapper(model)

with torch.inference_mode():
# wamup two times as use ipex
for i in range(7):
st = time.time()
output = model.generate(input_ids, num_beams=1, do_sample=False, max_new_tokens=32)
end = time.time()
print(f'Inference time: {end-st} s')
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(output_str)
`

Some issues in the tutorial (link changed, datasets deprecated, etc.)

  1. BigDL LLM package installation:
    I notice that in some chapters the package installation suggestion is :

    pip install --pre --upgrade bigdl-llm[all]
    

    However, I also see:

    pip install bigdl-llm[all]
    

    in some files. Is it necessary to unify this command?

  2. link outdated:
    The links at the end of chapter 1 are all outdated.

image

We have already verified many models on BigDL-LLM and provided ready-to-run examples, such as [Llama](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4), 
[Llama2](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/llama2), 
[Vicuna](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/vicuna), 
[ChatGLM](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/chatglm),
[ChatGLM2](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/chatglm2), 
[Baichuan](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/baichuan),
[MOSS](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/moss), 
[Falcon](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/falcon),
[Dolly-v1](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/dolly_v1),
[Dolly-v2](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/dolly_v2),
StarCoder([link1](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4),
[link2](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/starcoder)),
Phoenix([link1](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4),
[link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/phoenix)),
RedPajama([link1](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4),
[link2](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/redpajama)),
[Whisper](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/whisper), etc. You can find model examples [here](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4).

All the links above have already been changed, so they all lead to 404 Page Not Found.
For example, the current address of Llama2 in the tutorial is: https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4, but the folder structure has been updated and it should be https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2 now.

  1. load_in_low_bit options outdated:
    In chapter 5/6, the load_in_low_bit options are:

    image

    The latest version of bigdl-llm 2.4.0 supports sym_int4, asym_int4, sym_int5, asym_int5, sym_int8, nf3, nf4, fp4, fp8, fp16 or mixed_4bit.
    And for GPU in chapter 6, the supported options are: sym_int4, asym_int4, sym_int5, asym_int5, sym_int8, nf3, nf4, fp4, fp8, fp16, mixed_fp4 or mixed_fp8

  2. Sample audio files in chapter 5.2 deprecated:

image
The common voice dataset is deprecated and will be deleted soon according to their hugging face. As for the audio files(audio_en.mp3/audio_zh.mp3) downloaded in the wget command, these files are already removed from hugging face. Using these files will lead to EOF error when running the sample code in this section.

  1. GPU Acceleration Environment Setup
image

In this image, the command source /opt/intel/oneapi/setvars.sh is listed as a recommendation for Intel GPU acceleration. However, based on my own knowledge and experience, this command is mandatory and should be used whenever a new terminal session is created. Otherwise we might encounter this error OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory.
This is not exactly an error, but I believe it might be better to highlight this command in the tutorial, either in README.md or in 6_1_GPU_Llama2-7B.md

Abou the install problem

1e78a760-4a92-41d4-97a2-9a71d357dcb2
Not sure what is going wrong here, I did totally following the guild of bigdl. Thank you very much for any response.
Please help me.

GPU acceleration failed

I use the code here:
https://github.com/intel-analytics/ipex-llm-tutorial/blob/original-bigdl-llm/Chinese_Version/ch_6_GPU_Acceleration/6_1_GPU_Llama2-7B.md

But failed. Can you help with this?
Thanks.

`from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel
from transformers import LlamaTokenizer, AutoTokenizer

chatglm3_6b = 'D:/AI_projects/Langchain-Chatchat/llm_model/THUDM/chatglm2-6b'

model_in_4bit = AutoModel.from_pretrained(pretrained_model_name_or_path=chatglm3_6b,
load_in_4bit=True,
optimize_model=False)
model_in_4bit_gpu = model_in_4bit.to('xpu')

请注意,这里的 AutoModelForCausalLM 是从 bigdl.llm.transformers 导入的

model_in_8bit = AutoModelForCausalLM.from_pretrained(

pretrained_model_name_or_path=chatglm3_6b,

load_in_low_bit="sym_int8",

optimize_model=False

)

model_in_8bit_gpu = model_in_8bit.to('xpu')

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=chatglm3_6b)

`

The error shows:

(llm_310_whl) D:\AI_projects\ipex-samples>python main-test.py C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpegorlibpnginstalled before buildingtorchvision from source? warn( 2024-04-07 00:20:03,696 - INFO - intel_extension_for_pytorch auto imported Traceback (most recent call last): File "D:\AI_projects\ipex-samples\main-test.py", line 6, in <module> model_in_4bit = AutoModel.from_pretrained(pretrained_model_name_or_path=chatglm3_6b, File "C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\bigdl\llm\transformers\model.py", line 320, in from_pretrained model = cls.load_convert(q_k, optimize_model, *args, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\bigdl\llm\transformers\model.py", line 434, in load_convert model = cls.HF_Model.from_pretrained(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\transformers\models\auto\auto_factory.py", line 461, in from_pretrained config, kwargs = AutoConfig.from_pretrained( File "C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\transformers\models\auto\configuration_auto.py", line 986, in from_pretrained trust_remote_code = resolve_trust_remote_code( File "C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\transformers\dynamic_module_utils.py", line 535, in resolve_trust_remote_code signal.signal(signal.SIGALRM, _raise_timeout_error) AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?

And the pip list is:
accelerate 0.21.0
aiohttp 3.9.3
aiosignal 1.3.1
altair 4.2.2
annotated-types 0.6.0
astor 0.8.1
asttokens 2.4.1
async-timeout 4.0.3
attrs 23.2.0
bigdl-core-xe-21 2.5.0b20240324
bigdl-llm 2.5.0b20240406
blinker 1.7.0
cachetools 5.3.3
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
colorama 0.4.6
contourpy 1.2.1
cryptography 42.0.5
cycler 0.12.1
dataclasses-json 0.5.14
decorator 5.1.1
entrypoints 0.4
exceptiongroup 1.2.0
executing 2.0.1
faiss-cpu 1.8.0
filelock 3.13.3
fonttools 4.51.0
frozenlist 1.4.1
fsspec 2024.3.1
gitdb 4.0.11
GitPython 3.1.43
google-ai-generativelanguage 0.2.0
google-api-core 2.18.0
google-auth 2.29.0
google-generativeai 0.1.0
googleapis-common-protos 1.63.0
greenlet 3.0.3
grpcio 1.62.1
grpcio-status 1.48.2
huggingface-hub 0.22.2
idna 3.6
importlib_metadata 7.1.0
intel-extension-for-pytorch 2.1.10+xpu
intel-openmp 2024.1.0
ipython 8.23.0
jedi 0.19.1
Jinja2 3.1.3
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
langchain 0.0.180
markdown-it-py 3.0.0
MarkupSafe 2.1.5
marshmallow 3.21.1
matplotlib 3.8.4
matplotlib-inline 0.1.6
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.5
mypy-extensions 1.0.0
networkx 3.3
numexpr 2.10.0
numpy 1.26.4
openai 0.27.7
openapi-schema-pydantic 1.2.4
packaging 24.0
pandas 2.2.1
pandasai 0.2.15
parso 0.8.4
pdfminer.six 20231228
pdfplumber 0.11.0
pillow 10.3.0
pip 23.3.1
prompt-toolkit 3.0.43
proto-plus 1.23.0
protobuf 3.20.3
psutil 5.9.8
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 15.0.2
pyasn1 0.6.0
pyasn1_modules 0.4.0
pycparser 2.22
pydantic 1.10.15
pydantic_core 2.16.3
pydeck 0.8.1b0
Pygments 2.17.2
Pympler 1.0.1
pyparsing 3.1.2
pypdf 3.9.0
pypdfium2 4.28.0
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
pytz 2024.1
PyYAML 6.0.1
referencing 0.34.0
regex 2023.12.25
requests 2.31.0
rich 13.7.1
rpds-py 0.18.0
rsa 4.9
safetensors 0.4.2
sentencepiece 0.2.0
setuptools 68.2.2
six 1.16.0
smmap 5.0.1
SQLAlchemy 2.0.29
stack-data 0.6.3
streamlit 1.22.0
streamlit-chat 0.0.2.2
sympy 1.12
tabulate 0.9.0
tenacity 8.2.3
tiktoken 0.4.0
tokenizers 0.13.3
toml 0.10.2
toolz 0.12.1
torch 2.1.0a0+cxx11.abi
torchaudio 2.1.0a0+cxx11.abi
torchvision 0.16.0a0+cxx11.abi
tornado 6.4
tqdm 4.66.2
traitlets 5.14.2
transformers 4.31.0
typing_extensions 4.11.0
typing-inspect 0.9.0
tzdata 2024.1
tzlocal 5.2
urllib3 2.2.1
validators 0.28.0
watchdog 4.0.0
wcwidth 0.2.13
wheel 0.41.2
yarl 1.9.4
youtube-transcript-api 0.6.0
zipp 3.18.1

and sym_int8 also fails.

[WIP] Some problems during verify

During my verification on new laptop, some problems and possible improvements are as following:

Chapter2

  • We introduce how to set up environment, should we also provide how to obtain this tutorial through git before setting up Jupyter service?

Chapter3

  • When load tokenizer of open_llama_3b_v2 in Notebook 3: Quick Start, the warning looks like error message...
    image

    Should we add legacy=False or just ignore this warning?

    image

Chapter4

4.1 Run Transformer Models

  • The loading time of sym_int8 is quite longer than sym_int4:
    image
    It seems that we only demonstrate how to load in INT8 precision and don't actually use this model_in_8bit in this notebook. Actually after running load_in_low_bit="sym_int8" cell, the available RAM decreases ~7G. Should we just change this python code cell to markdown cell for demonstration?

  • I print the time cost when do multi-turn chat, the inference time increases as usual, but should we add some descriptions about this phenomenon during multi-turn chat.

  • When I run multi-turn stream chat, there appears so many blank lines: #24
    image

4.2 Speech Recognition

#26

  • The link of audio_en.mp3 in 4.2.5 and audio_zh.mp3 in 4.2.6 fail.
  • WhisperFeatureExtractor is not explicitly used, could we add more descriptions about WhisperFeatureExtractor in note in 4.2.5?

Chapter5

  • The batch results are a little confusing, could we provide separators between answers?
    image

  • Need to wait a little longer when do second turn chat in 5.4.2, could we use simpler prompt format?

Chapter6

6.1 ChatGLM2-6B

  • Maybe we also add some descriptions about PYTHONUNBUFFERED=1 setting in 6.1.4.3.

  • When running python code cell in 6.1.5.3, there always appears a new cell in the end:
    image

  • Should we also input history="" or change it to multi-turn style? Otherwise I will get output as following when run twice(
    image

6.2 Baichuan-13B

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.