GithubHelp home page GithubHelp logo

Comments (5)

GuoxiaWang avatar GuoxiaWang commented on May 31, 2024
        ctx.save_for_backward(weight)
        ctx.x = x
        ctx.bias = bias

这里为什么是拆开写的?试试下面的写法?

ctx.save_for_backward(x, weight, bias) 
x, weight, bias = ctx.saved_tensor()

from paddlenlp.

Wong4j avatar Wong4j commented on May 31, 2024

@GuoxiaWang 因为这个issue里面我关心的重点是:开启recompute的时候ctx.save_for_backward(weight)这种写法会遇到backward中的weight没有main_grad的问题。

你说的这种写法是fused_layer.py中原本的写法,我也测试过,开启recompute=full后会遇到下面这个奇怪的错误,这就需要开另外一个issue了。

    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/meta_parallel/meta_parallel_base.py", line 37, in forward
    output = self._layers(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1913, in forward
    outputs = self.llama(
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1664, in forward
    layer_outputs = self.recompute_training_full(
  File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1535, in recompute_training_full
    hidden_states = self.recompute_func(
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/utils/__init__.py", line 142, in recompute
    return fleet.recompute.recompute(function, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/recompute/recompute.py", line 532, in recompute
    return _recompute_without_reentrant(function, preserve, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/recompute/recompute.py", line 399, in _recompute_without_reentrant
    outputs = function(*args, **kwargs)
  File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1531, in custom_forward
    return module(*inputs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1228, in forward
    outputs = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 901, in forward
    query_states = self.q_proj(hidden_states)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/layers/mpu/mp_layers.py", line 516, in forward
    output_parallel = self.linear(
  File "/workspace/PaddleNLP/llm/fused_layers.py", line 36, in forward
    ctx.save_for_backward(x, weight, bias)
  File "/usr/local/lib/python3.10/dist-packages/paddle/autograd/py_layer.py", line 91, in save_for_backward
    self.container = tensors
ValueError: (InvalidArgument) save_for_backward only support Tensor, list of Tensor, tuple of Tensor. (at /opt/paddle/paddle/paddle/fluid/pybind/eager_py_layer.cc:644)

from paddlenlp.

Wong4j avatar Wong4j commented on May 31, 2024

更新一下,recompute设置reentrant=True,可以避开这个bug。仅reentrant = False会遇到这个bug。

from paddlenlp.

Wong4j avatar Wong4j commented on May 31, 2024

更新一下,recompute设置reentrant=True,可以避开这个bug。仅reentrant = False会遇到这个bug。

@Xreki 麻烦帮忙找Paddle这边熟悉recompute的同学看一下

from paddlenlp.

Xreki avatar Xreki commented on May 31, 2024

![image](https://github.com@Wong4j PaddlePaddle/PaddleNLP/assets/12538138/84258d77-048e-41a2-9641-6d7a303ba6bf)

@Wong4j 这个倒是reentrant=False时的已知问题

from paddlenlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.