Comments (5)
ctx.save_for_backward(weight)
ctx.x = x
ctx.bias = bias
这里为什么是拆开写的?试试下面的写法?
ctx.save_for_backward(x, weight, bias)
x, weight, bias = ctx.saved_tensor()
from paddlenlp.
@GuoxiaWang 因为这个issue里面我关心的重点是:开启recompute的时候ctx.save_for_backward(weight)这种写法会遇到backward中的weight没有main_grad的问题。
你说的这种写法是fused_layer.py中原本的写法,我也测试过,开启recompute=full后会遇到下面这个奇怪的错误,这就需要开另外一个issue了。
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
return self.forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/meta_parallel/meta_parallel_base.py", line 37, in forward
output = self._layers(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
return self.forward(*inputs, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1913, in forward
outputs = self.llama(
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
return self.forward(*inputs, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1664, in forward
layer_outputs = self.recompute_training_full(
File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1535, in recompute_training_full
hidden_states = self.recompute_func(
File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/utils/__init__.py", line 142, in recompute
return fleet.recompute.recompute(function, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/recompute/recompute.py", line 532, in recompute
return _recompute_without_reentrant(function, preserve, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/recompute/recompute.py", line 399, in _recompute_without_reentrant
outputs = function(*args, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1531, in custom_forward
return module(*inputs)
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
return self.forward(*inputs, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1228, in forward
outputs = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
return self.forward(*inputs, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 901, in forward
query_states = self.q_proj(hidden_states)
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
return self.forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/layers/mpu/mp_layers.py", line 516, in forward
output_parallel = self.linear(
File "/workspace/PaddleNLP/llm/fused_layers.py", line 36, in forward
ctx.save_for_backward(x, weight, bias)
File "/usr/local/lib/python3.10/dist-packages/paddle/autograd/py_layer.py", line 91, in save_for_backward
self.container = tensors
ValueError: (InvalidArgument) save_for_backward only support Tensor, list of Tensor, tuple of Tensor. (at /opt/paddle/paddle/paddle/fluid/pybind/eager_py_layer.cc:644)
from paddlenlp.
更新一下,recompute设置reentrant=True,可以避开这个bug。仅reentrant = False会遇到这个bug。
from paddlenlp.
更新一下,recompute设置reentrant=True,可以避开这个bug。仅reentrant = False会遇到这个bug。
@Xreki 麻烦帮忙找Paddle这边熟悉recompute的同学看一下
from paddlenlp.
![image](https://github.com@Wong4j PaddlePaddle/PaddleNLP/assets/12538138/84258d77-048e-41a2-9641-6d7a303ba6bf)
@Wong4j 这个倒是reentrant=False
时的已知问题
from paddlenlp.
Related Issues (20)
- Taskflow默认的最大序列长度怎么看?FastDeploy UIE中最长序列长度怎么设置? HOT 12
- [Question]: 2.8版本使用LLM工作流报错缺少fused_ln HOT 2
- [Bug]: pipelines中语义检索系统,启动运行后,上传扫描式PDF文件 无法解析 HOT 1
- [Bug]: TaskFlow zero_shot_text_classification HOT 3
- [Bug]: get_rank_by_dim_and_process_id 函数未实现
- [Question]: paddle.distributed.launch 启动多进程训练结束后Loading best model from checkpoint 报错 HOT 7
- 如何对长文本进行抽取 HOT 3
- uie可以做嵌套抽取吗? HOT 3
- 文档公式有误 HOT 5
- [Question]: 请问文档智能任务有用自己数据集微调的教程吗? HOT 1
- [Bug]: ImportError: DLL load failed while importing libpaddle: 找不到指定的程序。
- [Question]: 分布式
- [Question]: Data annotation and pre processing for Relation Extraction
- [Bug]: paddle的nansum不支持empty的求和
- [Bug]: Taskflow("document_intelligence"): Illegal instruction (core dumped) HOT 7
- [Bug]: AutoModel加载本地路径模型报错 HOT 2
- UTC做多标签零样本训练,测试出现过拟合怎么办?
- [Question]: 语义检索Pipelines,召回速度 HOT 1
- [Bug]:UIE-X-base模型微调报错 HOT 2
- taskflow和fastdeploy放在一起会产生中断,是怎么回事呢? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddlenlp.