There's a significant memory consumption increase after torch_xla updating its TF pin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[XLA:GPU] Significant memory usage increase after adding performance modeling in fusion merger about xla HOT 3 OPEN

openxla commented on June 30, 2024 1

[XLA:GPU] Significant memory usage increase after adding performance modeling in fusion merger

from xla.

Comments (3)

cheshire commented on June 30, 2024

@sergachev any thoughts?

from xla.

sergachev commented on June 30, 2024

Sorry for the late reply.

I am verifying case by case in this model whether fusion merging decisions are well taken. Because the current approach optimizes only for the execution time and not the amount of memory allocation, sometimes the decisions to prevent fusion that make execution faster may lead to much larger temporary allocations. For instance, fusion.1594 from the bad example above writes 2.6 GB, but can potentially be merged with its consumers. We might need to add a way to trade performance for memory utilization. As a way to experiment with this I can propose to steer the model manually towards more fusion by multiplying the unfused time here https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_performance_model.cc#L210 by 1.1x ... 2x - this should give a smooth control over that performance <-> memory utilization balance.
It can also be that some of the fusion merging decisions are wrong, then improving the performance model will help.

from xla.

ymwangg commented on June 30, 2024

Thanks for the info, increasing the scale factor of time_unfused did decrease the memory usage.

  GpuPerformanceModel::RunTimes t = GpuPerformanceModel::EstimateRunTimes(
    producer, &*cost_analysis_, gpu_device_info_, producer->users(),
      /*multi_output=*/false);
  if (t.time_fused > scale*t.time_unfused) {
    ++num_fail_slower_if_fused_;
    return "will execute slower if fused";
  }

scale	Total bytes used
1.0	54.55GiB
1.5	47.75GiB
2.0	34.37GiB
inf	34.04GiB

from xla.

Recommend Projects

[XLA:GPU] Significant memory usage increase after adding performance modeling in fusion merger about xla HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs