Comments (6)
请提出你的问题 Please ask your question
stream_safe_custom_device_allocator在释放显存前会调用MarkAsWillBeFreed函数
该函数的功能是若当前allocation所在stream没有绑定event,则新建event并record到stream上(MarkAsWillBeFreed中对flag will_be_freed_的修改似乎有误,will_be_freed_一直为false)
我比较疑惑的是调用MarkAsWillBeFreed函数的作用是什么?是否有必要?我理解释放显存前保证已有event全部完成即可,增加新的event是否会造成性能下降甚至出现功能异常?我看到xpu的stream safe allocator中释放显存前是通过CanBeFreed函数查询所有event的状态来决定是否直接释放的;gpu的stream safe allocator的CanBeFreed函数中虽然有record stream的操作,但是是对graph_capturing_stream_set_中的stream进行的,而该set似乎通常情况下为空?
实际使用中,在开启custom device的stream safe allocator时,我遇到了2个bug:
1.在多进程场景下,当cpu的allocator和custom device的allocator同时有释放存储的操作时(custom devcice稍早于cpu),cpu allocator通过MarkAsWillBeFreed record的event为空(推测在custom device的allocator调用CanBeFreed时被删除),导致后续CanBeFreed中query时报错
2.不开stream safe allocator可以正常运行的模型,开启后可能因为调用MarkAsWillBeFreed新增event导致显存未能及时释放,出现oom
以上bug在取消对MarkAsWillBeFreed的调用后均得到解决。所以想请教下飞桨的大佬MarkAsWillBeFreed的调用是否可以删除?
你好,custom stream safe allocator 实现的是非cuda graph,即 phi::backends::gpu::CUDAGraph::IsThisThreadCapturing() = false 的情况,MarkAsWillBeFreed 确实可以删除,outstanding_event_map_已有的stream也应该复用其对应的event。我们提个pr修改下。
from paddle.
#63369 这个pr修复了
from paddle.
@ronny1996 麻烦实现stream_safe_custom_device_allocator的大佬帮忙解答一下,谢谢:)
from paddle.
我还有个疑问,StreamSafeCustomDeviceAllocation::RecordStream仅对outstanding_event_map_中不存在的stream做了record event操作,outstanding_event_map_已有的stream是否应该复用其对应的event,做record event呢?
from paddle.
请提出你的问题 Please ask your question
stream_safe_custom_device_allocator在释放显存前会调用MarkAsWillBeFreed函数
该函数的功能是若当前allocation所在stream没有绑定event,则新建event并record到stream上(MarkAsWillBeFreed中对flag will_be_freed_的修改似乎有误,will_be_freed_一直为false)
我比较疑惑的是调用MarkAsWillBeFreed函数的作用是什么?是否有必要?我理解释放显存前保证已有event全部完成即可,增加新的event是否会造成性能下降甚至出现功能异常?我看到xpu的stream safe allocator中释放显存前是通过CanBeFreed函数查询所有event的状态来决定是否直接释放的;gpu的stream safe allocator的CanBeFreed函数中虽然有record stream的操作,但是是对graph_capturing_stream_set_中的stream进行的,而该set似乎通常情况下为空?
实际使用中,在开启custom device的stream safe allocator时,我遇到了2个bug:
1.在多进程场景下,当cpu的allocator和custom device的allocator同时有释放存储的操作时(custom devcice稍早于cpu),cpu allocator通过MarkAsWillBeFreed record的event为空(推测在custom device的allocator调用CanBeFreed时被删除),导致后续CanBeFreed中query时报错
2.不开stream safe allocator可以正常运行的模型,开启后可能因为调用MarkAsWillBeFreed新增event导致显存未能及时释放,出现oom
以上bug在取消对MarkAsWillBeFreed的调用后均得到解决。所以想请教下飞桨的大佬MarkAsWillBeFreed的调用是否可以删除?你好,custom stream safe allocator 实现的是非cuda graph,即 phi::backends::gpu::CUDAGraph::IsThisThreadCapturing() = false 的情况,MarkAsWillBeFreed 确实可以删除,outstanding_event_map_已有的stream也应该复用其对应的event。我们提个pr修改下。
感谢大佬的回复,我的疑问得到了解决:)
from paddle.
#63369 这个pr修复了
感谢感谢!这个issue可以关闭了。
from paddle.
Related Issues (20)
- 缺少torch.nn.utils.rnn.pad_sequence的API或者实现 HOT 2
- 在瑞芯微3568上已经部署好Fastdeploy。请问example中Yolov5的自己的onnx模型怎么使用呢? HOT 1
- paddlepaddle==2.6.0和2.6.1适配国产化与非国产化代码返回不一致问题 HOT 1
- Torch MultiHeadAttention To Paddle MultiHeadAttention Issue HOT 2
- softmax_with_cross_entropy API在软标签下label是否需要归一化 HOT 3
- 单机多卡问题 HOT 4
- Reported errors after running paddle.utils.run_check() HOT 1
- paddle.sparse.matmul两个参数都是sparse_csr_tensor时报错RuntimeError: (NotFound) The kernel `matmul_csr_csr` is not registered. HOT 2
- 这个教程代码有问题,https://www.paddlepaddle.org.cn/documentation/docs/zh/practices/nlp/transformer_in_English-to-Spanish.html
- Paddle的 Dataloader 遇到纯文本数据时报错 StopIteration HOT 1
- How to solve "license/cla Expected — Waiting for status to be reported" HOT 1
- 使用--use_trt出错 HOT 9
- 单机多卡问题 HOT 1
- templatedoc 机制清理 HOT 2
- 在华为Atlas310pro编译Paddle_npu版本,cmake有警告,make报错 HOT 3
- 【快乐开源】PIR模式下单测问题修复与适配
- 【开源之夏】动转静支持子图高阶微分
- phi::Device::SynchronizeStream传入的stream的raw_stream成员为空指针是否是正常的? HOT 24
- Does PP has a flowchart recognize product? HOT 2
- 如何将一个Tensor在内存中(不要保存到硬盘)序列化成可以通过网络传输的二进制格式? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddle.