Comments (3)
@sergachev any thoughts?
from xla.
Sorry for the late reply.
I am verifying case by case in this model whether fusion merging decisions are well taken. Because the current approach optimizes only for the execution time and not the amount of memory allocation, sometimes the decisions to prevent fusion that make execution faster may lead to much larger temporary allocations. For instance, fusion.1594 from the bad example above writes 2.6 GB, but can potentially be merged with its consumers. We might need to add a way to trade performance for memory utilization. As a way to experiment with this I can propose to steer the model manually towards more fusion by multiplying the unfused time here https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_performance_model.cc#L210 by 1.1x ... 2x - this should give a smooth control over that performance <-> memory utilization balance.
It can also be that some of the fusion merging decisions are wrong, then improving the performance model will help.
from xla.
Thanks for the info, increasing the scale factor of time_unfused did decrease the memory usage.
GpuPerformanceModel::RunTimes t = GpuPerformanceModel::EstimateRunTimes(
producer, &*cost_analysis_, gpu_device_info_, producer->users(),
/*multi_output=*/false);
if (t.time_fused > scale*t.time_unfused) {
++num_fail_slower_if_fused_;
return "will execute slower if fused";
}
scale | Total bytes used |
---|---|
1.0 | 54.55GiB |
1.5 | 47.75GiB |
2.0 | 34.37GiB |
inf | 34.04GiB |
from xla.
Related Issues (20)
- [XLA:CPU] oneDNN Softmax gives inaccurate results HOT 1
- call ptxas become defunct, cause xla hung HOT 2
- gpu_hlo_cost_analysis NumOfDevices always return 0 HOT 2
- [Feature Request]Add more comm op support on gpu_hlo_cost_analysis HOT 4
- gpu f16 cast to fp32 calculation, and then converted back? HOT 1
- "Could not find executable `nvidia-smi`" for `./configure.py --backend=CUDA` HOT 6
- Build from source fails HOT 7
- XLA documentation for Windows HOT 1
- Implement GitHub Presubmit Checks for Windows Environment HOT 1
- Support builds with cuDNN 9
- gpu_fused_mha_test fails at HEAD on H100 HOT 1
- Workable example on normal DNN model
- [xla:auto_sharding] Question about resharding costs of Reshape strategies
- OpenCL Support. HOT 2
- BF16 matmul slower than F32 matmul on T4 GPU HOT 3
- PJRT `CopyCpuBufferToLiteral` of JAX buffer taking too long HOT 9
- Porting XLA to different backends. HOT 4
- Compiling xla/mlir/tools/mlir_interpreter/dialects/util.cc failed HOT 3
- MHLO Extraction from XLA Compiler HOT 4
- Controlling a single compiler pass in XLA for CPU target HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xla.