intel-analytics / ipex-llm-tutorial Goto Github PK

View Code? Open in Web Editor NEW

135.0 11.0 35.0 438 KB

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm

Home Page: https://github.com/intel-analytics/bigdl

License: Apache License 2.0

Jupyter Notebook 100.00%

ipex-llm-tutorial's Introduction

IPEX-LLM Tutorial

English | 中文

IPEX-LLM is a low-bit LLM library on Intel XPU (Xeon/Core/Flex/Arc/PVC). This repository contains tutorials to help you understand what is IPEX-LLM and how to use IPEX-LLM to build LLM applications.

The tutorials are organized as follows:

Chapter 1 Introduction introduces what is IPEX-LLM and what you can do with it.
Chapter 2 Environment Setup provides a set of best practices for setting-up your environment.
Chapter 3 Application Development: Basics introduces the basic usage of IPEX-LLM and how to build a very simple Chat application.
Chapter 4 Chinese Support shows the usage of some LLMs which suppports Chinese input/output, e.g. ChatGLM2, Baichuan
Chapter 5 Application Development: Intermediate introduces intermediate-level knowledge for application development using IPEX-LLM, e.g. How to build a more sophisticated Chatbot, Speech recoginition, etc.
Chapter 6 GPU Acceleration introduces how to use Intel GPU to accelerate LLMs using IPEX-LLM.
Chapter 7 Finetune introduces how to do Finetune using IPEX-LLM.
Chapter 8 Application Development: Advanced introduces advanced-level knowledge for application development using IPEX-LLM, e.g. langchain usage.

ipex-llm-tutorial's People

Contributors

Stargazers

Watchers

ipex-llm-tutorial's Issues

Make non-stream chat max_tokens smaller

Otherwise may wait for a long time to get the result, not friendly for demonstration.

Remove the timing in notebooks

Not convenient to manually remove them every time when demonstrating to others.

XPU inference failed

My code is based on bigdl-llm

`from langchain import LLMChain, PromptTemplate
from bigdl.llm.langchain.llms import TransformersLLM
from langchain.memory import ConversationBufferWindowMemory

chatglm3_6b = 'D:/AI_projects/Langchain-Chatchat/llm_model/THUDM/chatglm3-6b'

llm_model_path = chatglm3_6b # huggingface llm 模型的路径

CHATGLM_V3_PROMPT_TEMPLATE = "问：{prompt}\n\n答："

prompt = PromptTemplate(input_variables=["history", "human_input"], template=CHATGLM_V3_PROMPT_TEMPLATE)
max_new_tokens = 128

llm = TransformersLLM.from_model_id(
model_id=llm_model_path,
model_kwargs={"trust_remote_code": True, "temperature": 0},
)

llm_chain = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
llm_kwargs={"max_new_tokens":max_new_tokens},
memory=ConversationBufferWindowMemory(k=2),
)

VICUNA_PROMPT_TEMPLATE = "USER: {prompt}\nASSISTANT:"

llm_result = llm.generate([VICUNA_PROMPT_TEMPLATE.format(prompt="讲一个笑话"), VICUNA_PROMPT_TEMPLATE.format(prompt="作一首诗")]*3)

print("-"*20+"number of generations"+"-"*20)
print(len(llm_result.generations))
print("-"*20+"the first generation"+"-"*20)
print(llm_result.generations[0][0].text)
`

but returns:

`Traceback (most recent call last):
File "D:\AI_projects\ipex-samples\main-bigdl.py", line 32, in
llm_result = llm.generate([VICUNA_PROMPT_TEMPLATE.format(prompt="讲一个笑话"), VICUNA_PROMPT_TEMPLATE.format(prompt="作一首诗")]*3)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 741, in generate
output = self._generate_helper(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 605, in _generate_helper
raise e
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 592, in _generate_helper
self._generate(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 1177, in _generate
self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\langchain\llms\transformersllm.py", line 248, in _call
output = self.model.generate(input_ids, streamer=streamer,
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\transformers\generation\utils.py", line 1538, in generate
return self.greedy_search(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\transformers\generation\utils.py", line 2362, in greedy_search
outputs = self(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\conte/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 941, in forward
transformer_outputs = self.transformer(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 167, in chatglm2_model_forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\conte/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 641, in forward
layer_ret = layer(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\conte/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 191, in chatglm2_attention_forward
return forward_function(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 377, in chatglm2_attention_forward_8eb45c
query_layer = apply_rotary_pos_emb_chatglm(query_layer, rotary_pos_emb)
NotImplementedError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: Could not run 'torch_ipex::mul_add' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torch_ipex::mul_add' is only available for these backends: [XPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

XPU: registered at C:/jenkins/workspace/IPEX-GPU-ARC770-windows/frameworks.ai.pytorch.ipex-gpu/csrc/gpu/aten/operators/TripleOps.cpp:521 [kernel]
BackendSelect: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:153 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\FunctionalizeFallbackKernel.cpp:290 [backend fallback]
Named: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\native\NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:53 [backend fallback]
AutogradCPU: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:57 [backend fallback]
AutogradCUDA: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:65 [backend fallback]
AutogradXLA: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:69 [backend fallback]
AutogradMPS: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:77 [backend fallback]
AutogradXPU: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:61 [backend fallback]
AutogradHPU: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:90 [backend fallback]
AutogradLazy: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:73 [backend fallback]
AutogradMeta: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:81 [backend fallback]
Tracer: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\torch\csrc\autograd\TraceTypeManual.cpp:296 [backend fallback]
AutocastCPU: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\autocast_mode.cpp:382 [backend fallback]
AutocastXPU: registered at C:/jenkins/workspace/IPEX-GPU-ARC770-windows/frameworks.ai.pytorch.ipex-gpu/csrc/gpu/aten/operators/TripleOps.cpp:521 [kernel]
AutocastCUDA: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\autocast_mode.cpp:249 [backend fallback]
FuncTorchBatched: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\LegacyBatchingRegistrations.cpp:710 [backend fallback]
FuncTorchVmapMode: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:161 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:165 [backend fallback]
PythonDispatcher: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:157 [backend fallback]`

my pip list is:
Package Version

accelerate 0.21.0
aiofiles 23.2.1
aiohttp 3.9.3
aiosignal 1.3.1
altair 5.3.0
annotated-types 0.6.0
antlr4-python3-runtime 4.9.3
anyio 4.3.0
arxiv 2.1.0
async-timeout 4.0.3
attrs 23.2.0
backoff 2.2.1
beautifulsoup4 4.12.3
bigdl-core-xe-21 2.5.0b20240324
bigdl-llm 2.5.0b20240330
blinker 1.7.0
blis 0.7.11
Brotli 1.1.0
cachetools 5.3.3
catalogue 2.0.10
certifi 2024.2.2
cffi 1.16.0
chardet 5.2.0
charset-normalizer 3.3.2
click 8.1.7
cloudpathlib 0.16.0
colorama 0.4.6
coloredlogs 15.0.1
confection 0.1.4
contourpy 1.2.0
cryptography 42.0.5
cycler 0.12.1
cymem 2.0.8
dataclasses-json 0.6.4
deepdiff 6.7.1
Deprecated 1.2.14
deprecation 2.1.0
distro 1.9.0
duckduckgo-search 3.9.9
effdet 0.4.1
einops 0.7.0
emoji 2.11.0
et-xmlfile 1.1.0
exceptiongroup 1.2.0
faiss-cpu 1.7.4
fastapi 0.109.0
feedparser 6.0.10
filelock 3.13.3
filetype 1.2.0
flatbuffers 24.3.25
fonttools 4.50.0
frozenlist 1.4.1
fschat 0.2.35
fsspec 2024.3.1
gitdb 4.0.11
GitPython 3.1.43
greenlet 3.0.3
h11 0.14.0
h2 4.1.0
hpack 4.0.0
httpcore 1.0.5
httpx 0.26.0
httpx-sse 0.4.0
huggingface-hub 0.22.2
humanfriendly 10.0
hyperframe 6.0.1
idna 3.6
importlib_metadata 7.1.0
importlib_resources 6.4.0
iniconfig 2.0.0
intel-extension-for-pytorch 2.1.10+xpu
intel-openmp 2024.1.0
iopath 0.1.10
Jinja2 3.1.3
joblib 1.3.2
jsonpatch 1.33
jsonpath-python 1.0.6
jsonpointer 2.4
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
langchain 0.0.354
langchain-community 0.0.20
langchain-core 0.1.23
langchain-experimental 0.0.47
langcodes 3.3.0
langdetect 1.0.9
langsmith 0.0.87
layoutparser 0.3.4
llama-index 0.9.35
lxml 5.2.0
Markdown 3.6
markdown-it-py 3.0.0
markdown2 2.4.13
markdownify 0.11.6
MarkupSafe 2.1.5
marshmallow 3.21.1
matplotlib 3.8.3
mdurl 0.1.2
metaphor-python 0.1.23
mpmath 1.3.0
msg-parser 1.2.0
multidict 6.0.5
murmurhash 1.0.10
mypy-extensions 1.0.0
nest-asyncio 1.6.0
networkx 3.2.1
nh3 0.2.17
nltk 3.8.1
numexpr 2.8.6
numpy 1.26.4
olefile 0.47
omegaconf 2.3.0
onnx 1.16.0
onnxruntime 1.15.1
openai 1.9.0
opencv-python 4.9.0.80
openpyxl 3.1.2
ordered-set 4.1.0
packaging 23.2
pandas 2.0.3
pathlib 1.0.1
pdf2image 1.17.0
pdfminer.six 20231228
pdfplumber 0.11.0
pikepdf 8.4.1
Pillow 9.5.0
pillow_heif 0.15.0
pip 23.3.1
pluggy 1.4.0
portalocker 2.8.2
preshed 3.0.9
prompt-toolkit 3.0.43
protobuf 4.25.3
psutil 5.9.8
py-cpuinfo 9.0.0
pyarrow 15.0.2
pyclipper 1.3.0.post5
pycocotools 2.0.7
pycparser 2.22
pydantic 1.10.13
pydantic_core 2.16.3
pydeck 0.8.1b0
Pygments 2.17.2
PyJWT 2.8.0
pylibjpeg-libjpeg 2.1.0
PyMuPDF 1.23.16
PyMuPDFb 1.23.9
pypandoc 1.13
pyparsing 3.1.2
pypdf 4.1.0
pypdfium2 4.28.0
pyreadline3 3.4.1
pytesseract 0.3.10
pytest 7.4.3
python-dateutil 2.9.0.post0
python-decouple 3.8
python-docx 1.1.0
python-iso639 2024.2.7
python-magic 0.4.27
python-magic-bin 0.4.14
python-multipart 0.0.9
python-pptx 0.6.23
pytz 2024.1
pywin32 306
PyYAML 6.0.1
rapidfuzz 3.7.0
rapidocr-onnxruntime 1.3.8
referencing 0.34.0
regex 2023.12.25
requests 2.31.0
rich 13.7.1
rpds-py 0.18.0
safetensors 0.4.2
scikit-learn 1.4.1.post1
scipy 1.12.0
sentence-transformers 2.2.2
sentencepiece 0.2.0
setuptools 68.2.2
sgmllib3k 1.0.0
shapely 2.0.3
shortuuid 1.0.13
simplejson 3.19.2
six 1.16.0
smart-open 6.4.0
smmap 5.0.1
sniffio 1.3.1
socksio 1.0.0
soupsieve 2.5
spacy 3.7.2
spacy-legacy 3.0.12
spacy-loggers 1.0.5
SQLAlchemy 2.0.25
srsly 2.4.8
sse-starlette 1.8.2
starlette 0.35.0
streamlit 1.30.0
streamlit-aggrid 0.3.4.post3
streamlit-antd-components 0.3.1
streamlit-chatbox 1.1.11
streamlit-feedback 0.1.3
streamlit-modal 0.1.0
streamlit-option-menu 0.3.12
strsimpy 0.2.1
svgwrite 1.4.3
sympy 1.12
tabulate 0.9.0
tenacity 8.2.3
thinc 8.2.3
threadpoolctl 3.4.0
tiktoken 0.5.2
timm 0.9.16
tokenizers 0.13.3
toml 0.10.2
tomli 2.0.1
toolz 0.12.1
torch 2.1.0a0+cxx11.abi
torchaudio 2.1.2
torchvision 0.16.0a0+cxx11.abi
tornado 6.4
tqdm 4.66.1
transformers 4.31.0
transformers-stream-generator 0.0.4
typer 0.9.4
typing_extensions 4.10.0
typing-inspect 0.9.0
tzdata 2024.1
tzlocal 5.2
unstructured 0.12.5
unstructured-client 0.22.0
unstructured-inference 0.7.23
unstructured.pytesseract 0.3.12
urllib3 2.2.1
uvicorn 0.29.0
validators 0.24.0
wasabi 1.1.2
watchdog 3.0.0
wavedrom 2.0.3.post3
wcwidth 0.2.13
weasel 0.3.4
websockets 12.0
wheel 0.41.2
wrapt 1.16.0
xformers 0.0.23.post1
xlrd 2.0.1
XlsxWriter 3.2.0
yarl 1.9.4
youtube-search 2.1.2
zipp 3.18.1

Hope you can help me with this!

[WIP] Some problems during verify

During my verification on new laptop, some problems and possible improvements are as following:

Chapter2

We introduce how to set up environment, should we also provide how to obtain this tutorial through git before setting up Jupyter service?

Chapter3

When load tokenizer of open_llama_3b_v2 in Notebook 3: Quick Start, the warning looks like error message...

Should we add legacy=False or just ignore this warning?

Chapter4

4.1 Run Transformer Models

The loading time of sym_int8 is quite longer than sym_int4:

It seems that we only demonstrate how to load in INT8 precision and don't actually use this model_in_8bit in this notebook. Actually after running load_in_low_bit="sym_int8" cell, the available RAM decreases ~7G. Should we just change this python code cell to markdown cell for demonstration?
I print the time cost when do multi-turn chat, the inference time increases as usual, but should we add some descriptions about this phenomenon during multi-turn chat.
When I run multi-turn stream chat, there appears so many blank lines: #24

4.2 Speech Recognition

#26

The link of audio_en.mp3 in 4.2.5 and audio_zh.mp3 in 4.2.6 fail.
WhisperFeatureExtractor is not explicitly used, could we add more descriptions about WhisperFeatureExtractor in note in 4.2.5?

Chapter5

The batch results are a little confusing, could we provide separators between answers?
Need to wait a little longer when do second turn chat in 5.4.2, could we use simpler prompt format?

Chapter6

6.1 ChatGLM2-6B

Maybe we also add some descriptions about PYTHONUNBUFFERED=1 setting in 6.1.4.3.
When running python code cell in 6.1.5.3, there always appears a new cell in the end:
Should we also input history="" or change it to multi-turn style? Otherwise I will get output as following when run twice(

6.2 Baichuan-13B

The available RAM before running this notebook is 27G, but I fail to run

(I think it may be related to this intel-analytics/ipex-llm#8731)

About the acclerate problem with xpu

After putting the model and inputs to xpu, the model is work now on intel laptop. But the inference time is about 588 seconds that is too long for me. I think maybe the gpu is not working right now, may I ask what is the problem here? Thank you very much for any response.

following is the code:
`import torch
import intel_extension_for_pytorch as ipex

from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel
from transformers import AutoTokenizer

import time
import numpy as np

from gpu_benchmark_util import BenchmarkWrapper

model_path = r"D:\rag\test_api\Baichuan2-7B-Chat"

model_path = r"C:\Users\Administrator\yishuo\chatglm2-6b"

prompt = """ 你是human_prime2，你是一个高级智能实体，你融合了最先进的算法和深度学习网络，专为跨越星际的知识探索与智慧收集而设计。
你回答以下问题时必须跟哲学相结合，必须在15字内回答完，你会尽量参考知识库来回答。
以下是问题：请介绍钱.
以下是知识库:[{'对话': '什么是"帮费"？', '回复': '"帮费"是为**各库采买物料时，为护送官员以及送部的饭食银拨配的额外款项。'}, {'对话': '怎么说？', '回复': '如果技术能够复制我们的外貌，它也许能够复制我们的**和感受。'}, {'对话': '你好。', '回复': '嘿，你好！你看起来长得和我可真像啊！'}].
"""
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, optimize_model=True, load_in_4bit=True).bfloat16().eval()

model = AutoModel.from_pretrained(model_path, trust_remote_code=True, optimize_model=True, load_in_4bit=True).eval()

input_ids = tokenizer.encode(prompt, return_tensors="pt")
print("finish to load")

model = model.to('xpu')

model.model.embed_tokens.to('cpu')

model.transformer.embedding.to('cpu')
input_ids = input_ids.to('xpu')

print("finish to xpu")

model = BenchmarkWrapper(model)

with torch.inference_mode():
# wamup two times as use ipex
for i in range(7):
st = time.time()
output = model.generate(input_ids, num_beams=1, do_sample=False, max_new_tokens=32)
end = time.time()
print(f'Inference time: {end-st} s')
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(output_str)
`

GPU installation need to be updated

We need to update GPU installation (including PyTorch2.1 support, Windows installation) in chapter6 and 7, referring to https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html

Some issues in the tutorial (link changed, datasets deprecated, etc.)

BigDL LLM package installation:
I notice that in some chapters the package installation suggestion is :
```
pip install --pre --upgrade bigdl-llm[all]
```
However, I also see:
```
pip install bigdl-llm[all]
```
in some files. Is it necessary to unify this command?
link outdated:
The links at the end of chapter 1 are all outdated.

We have already verified many models on BigDL-LLM and provided ready-to-run examples, such as [Llama](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4), 
[Llama2](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/llama2), 
[Vicuna](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/vicuna), 
[ChatGLM](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/chatglm),
[ChatGLM2](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/chatglm2), 
[Baichuan](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/baichuan),
[MOSS](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/moss), 
[Falcon](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/falcon),
[Dolly-v1](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/dolly_v1),
[Dolly-v2](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/dolly_v2),
StarCoder([link1](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4),
[link2](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/starcoder)),
Phoenix([link1](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4),
[link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/phoenix)),
RedPajama([link1](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4),
[link2](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/redpajama)),
[Whisper](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4/whisper), etc. You can find model examples [here](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4).

All the links above have already been changed, so they all lead to 404 Page Not Found.
For example, the current address of Llama2 in the tutorial is: https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4, but the folder structure has been updated and it should be https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2 now.

load_in_low_bit options outdated:
In chapter 5/6, the load_in_low_bit options are:

The latest version of bigdl-llm 2.4.0 supports sym_int4, asym_int4, sym_int5, asym_int5, sym_int8, nf3, nf4, fp4, fp8, fp16 or mixed_4bit.
And for GPU in chapter 6, the supported options are: sym_int4, asym_int4, sym_int5, asym_int5, sym_int8, nf3, nf4, fp4, fp8, fp16, mixed_fp4 or mixed_fp8
Sample audio files in chapter 5.2 deprecated:

The common voice dataset is deprecated and will be deleted soon according to their hugging face. As for the audio files(audio_en.mp3/audio_zh.mp3) downloaded in the wget command, these files are already removed from hugging face. Using these files will lead to EOF error when running the sample code in this section.

GPU Acceleration Environment Setup

In this image, the command source /opt/intel/oneapi/setvars.sh is listed as a recommendation for Intel GPU acceleration. However, based on my own knowledge and experience, this command is mandatory and should be used whenever a new terminal session is created. Otherwise we might encounter this error OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory.
This is not exactly an error, but I believe it might be better to highlight this command in the tutorial, either in README.md or in 6_1_GPU_Llama2-7B.md

about the memory problem

Each time when I interact with the model, the memory occupied by the model increases and does not release memory resources. As a result, when there are many conversations, it is very easy for the model to crash. How can I solve this problem?

https://python.langchain.com/docs/modules/chains/ link changed

Abou the install problem

Not sure what is going wrong here, I did totally following the guild of bigdl. Thank you very much for any response.
Please help me.

Python performance is too poor, can it provide an inference library in C++ and provide an OpenAI-compatible API

Using bigdl-llm in a production environment, Python performance is too poor, can you provide an inference library in C++ and provide an OpenAI-compatible API

pip install failed

❯ pip install --pre --upgrade ipex-llm[all]
zsh: no matches found: ipex-llm[all]

GPU acceleration failed

I use the code here：
https://github.com/intel-analytics/ipex-llm-tutorial/blob/original-bigdl-llm/Chinese_Version/ch_6_GPU_Acceleration/6_1_GPU_Llama2-7B.md

But failed. Can you help with this?
Thanks.

`from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel
from transformers import LlamaTokenizer, AutoTokenizer

chatglm3_6b = 'D:/AI_projects/Langchain-Chatchat/llm_model/THUDM/chatglm2-6b'

model_in_4bit = AutoModel.from_pretrained(pretrained_model_name_or_path=chatglm3_6b,
load_in_4bit=True,
optimize_model=False)
model_in_4bit_gpu = model_in_4bit.to('xpu')

请注意，这里的 AutoModelForCausalLM 是从 bigdl.llm.transformers 导入的

model_in_8bit = AutoModelForCausalLM.from_pretrained(

pretrained_model_name_or_path=chatglm3_6b,

load_in_low_bit="sym_int8",

optimize_model=False

)

model_in_8bit_gpu = model_in_8bit.to('xpu')

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=chatglm3_6b)

The error shows:

(llm_310_whl) D:\AI_projects\ipex-samples>python main-test.py C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpegorlibpnginstalled before buildingtorchvision from source? warn( 2024-04-07 00:20:03,696 - INFO - intel_extension_for_pytorch auto imported Traceback (most recent call last): File "D:\AI_projects\ipex-samples\main-test.py", line 6, in <module> model_in_4bit = AutoModel.from_pretrained(pretrained_model_name_or_path=chatglm3_6b, File "C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\bigdl\llm\transformers\model.py", line 320, in from_pretrained model = cls.load_convert(q_k, optimize_model, *args, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\bigdl\llm\transformers\model.py", line 434, in load_convert model = cls.HF_Model.from_pretrained(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\transformers\models\auto\auto_factory.py", line 461, in from_pretrained config, kwargs = AutoConfig.from_pretrained( File "C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\transformers\models\auto\configuration_auto.py", line 986, in from_pretrained trust_remote_code = resolve_trust_remote_code( File "C:\ProgramData\anaconda3\envs\llm_310_whl\lib\site-packages\transformers\dynamic_module_utils.py", line 535, in resolve_trust_remote_code signal.signal(signal.SIGALRM, _raise_timeout_error) AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?

And the pip list is:
accelerate 0.21.0
aiohttp 3.9.3
aiosignal 1.3.1
altair 4.2.2
annotated-types 0.6.0
astor 0.8.1
asttokens 2.4.1
async-timeout 4.0.3
attrs 23.2.0
bigdl-core-xe-21 2.5.0b20240324
bigdl-llm 2.5.0b20240406
blinker 1.7.0
cachetools 5.3.3
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
colorama 0.4.6
contourpy 1.2.1
cryptography 42.0.5
cycler 0.12.1
dataclasses-json 0.5.14
decorator 5.1.1
entrypoints 0.4
exceptiongroup 1.2.0
executing 2.0.1
faiss-cpu 1.8.0
filelock 3.13.3
fonttools 4.51.0
frozenlist 1.4.1
fsspec 2024.3.1
gitdb 4.0.11
GitPython 3.1.43
google-ai-generativelanguage 0.2.0
google-api-core 2.18.0
google-auth 2.29.0
google-generativeai 0.1.0
googleapis-common-protos 1.63.0
greenlet 3.0.3
grpcio 1.62.1
grpcio-status 1.48.2
huggingface-hub 0.22.2
idna 3.6
importlib_metadata 7.1.0
intel-extension-for-pytorch 2.1.10+xpu
intel-openmp 2024.1.0
ipython 8.23.0
jedi 0.19.1
Jinja2 3.1.3
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
langchain 0.0.180
markdown-it-py 3.0.0
MarkupSafe 2.1.5
marshmallow 3.21.1
matplotlib 3.8.4
matplotlib-inline 0.1.6
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.5
mypy-extensions 1.0.0
networkx 3.3
numexpr 2.10.0
numpy 1.26.4
openai 0.27.7
openapi-schema-pydantic 1.2.4
packaging 24.0
pandas 2.2.1
pandasai 0.2.15
parso 0.8.4
pdfminer.six 20231228
pdfplumber 0.11.0
pillow 10.3.0
pip 23.3.1
prompt-toolkit 3.0.43
proto-plus 1.23.0
protobuf 3.20.3
psutil 5.9.8
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 15.0.2
pyasn1 0.6.0
pyasn1_modules 0.4.0
pycparser 2.22
pydantic 1.10.15
pydantic_core 2.16.3
pydeck 0.8.1b0
Pygments 2.17.2
Pympler 1.0.1
pyparsing 3.1.2
pypdf 3.9.0
pypdfium2 4.28.0
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
pytz 2024.1
PyYAML 6.0.1
referencing 0.34.0
regex 2023.12.25
requests 2.31.0
rich 13.7.1
rpds-py 0.18.0
rsa 4.9
safetensors 0.4.2
sentencepiece 0.2.0
setuptools 68.2.2
six 1.16.0
smmap 5.0.1
SQLAlchemy 2.0.29
stack-data 0.6.3
streamlit 1.22.0
streamlit-chat 0.0.2.2
sympy 1.12
tabulate 0.9.0
tenacity 8.2.3
tiktoken 0.4.0
tokenizers 0.13.3
toml 0.10.2
toolz 0.12.1
torch 2.1.0a0+cxx11.abi
torchaudio 2.1.0a0+cxx11.abi
torchvision 0.16.0a0+cxx11.abi
tornado 6.4
tqdm 4.66.2
traitlets 5.14.2
transformers 4.31.0
typing_extensions 4.11.0
typing-inspect 0.9.0
tzdata 2024.1
tzlocal 5.2
urllib3 2.2.1
validators 0.28.0
watchdog 4.0.0
wcwidth 0.2.13
wheel 0.41.2
yarl 1.9.4
youtube-transcript-api 0.6.0
zipp 3.18.1

and sym_int8 also fails.