GithubHelp home page GithubHelp logo

Comments (8)

leaves-zwx avatar leaves-zwx commented on August 17, 2024

ddp 正常而数据并行不正常,这个现象很像是有多余的 boxing,可以先看看 boxing log 里面有哪些 boxing。ddp 只有最后模型更新前有1个 allreduce 的 boxing,数据并行理论上也应该是这样。

from libai.

strint avatar strint commented on August 17, 2024

graph.debug(1) 会默认在0卡打印graph内的显存开销;

graph执行第一次后,print(graph)可以看到 graph 内创建的 op,及其输入、输出的 SBP;

from libai.

Ldpe2G avatar Ldpe2G commented on August 17, 2024

ddp 正常而数据并行不正常,这个现象很像是有多余的 boxing,可以先看看 boxing log 里面有哪些 boxing。ddp 只有最后模型更新前有1个 allreduce 的 boxing,数据并行理论上也应该是这样。

发现不少 p->s oneflow.INFO.LOG

graph.debug(1) 会默认在0卡打印graph内的显存开销;

graph执行第一次后,print(graph)可以看到 graph 内创建的 op,及其输入、输出的 SBP;

设置了 graph.debug(1) 没看到有打印显存开销,print(graph) 看内部的op sbp都是broadcast,但是也没看到输入和输出的sbp

from libai.

Ldpe2G avatar Ldpe2G commented on August 17, 2024

设置了 graph.debug(1) 没看到有打印显存开销,print(graph) 看内部的op sbp都是broadcast,但是也没看到输入和输出的sbp

要首次运行之后才有,之前是没运行graph就打印了,发现中间某些层的输入是 P,网络输入和输出都是 S(0)

from libai.

Ldpe2G avatar Ldpe2G commented on August 17, 2024

Swin 数据并行下包含的除了 P2B 之外的 boxing

P2S

Swin 中包含一个 WindowAttention 模块,下面在该模块 forward 函数中加入注释,解释SBP的转换过程:

class WindowAttention(nn.Module):
	# ......
    def forward(self, x, mask):
        B_, N, C = x.shape
        # x.sbp = S(0)
        qkv = (
            self.qkv(x) # S(0) 
            .reshape(B_, N, 3, self.num_heads, C // self.num_heads) # S(0)
            .permute(2, 0, 3, 1, 4) # S(1)
        )
        # qkv.sbp = S(1)
        q, k, v = qkv[0], qkv[1], qkv[2]
        # q.sbp = P, k.sbp = P, v.sbp = P
	    # S(1) 的tensor经过slice操作之后就变成了 P
        q = q * self.scale
        # q.sbp = P
        attn = flow.matmul(q, k, transpose_b=True)
        # attn.sbp = P
        relative_position_bias = self.relative_position_bias_table[
            self.relative_position_index.view(-1)
        ].view(
            self.window_size[0] * self.window_size[1],
            self.window_size[0] * self.window_size[1],
            -1,
        )  
        # relative_position_bias.sbp = 
        relative_position_bias = relative_position_bias.permute(
            2, 0, 1
        ).contiguous()  # nH, Wh*Ww, Wh*Ww
        unsqueeze_relative_position_bias = relative_position_bias.unsqueeze(0)
        # unsqueeze_relative_position_bias.sbp = B
        
        # attn.sbp = P
        # unsqueeze_relative_position_bias.sbp = B
        # 这里加操作推导出 attn 要是 S(0),所以插入了一个 P2S 的 boxing
        attn = attn + unsqueeze_relative_position_bias
	   # 结果 attn.sbp = S(0)
        
        attn = self.softmax(attn)
        # 结果 attn.sbp = S(0)

        attn = self.attn_drop(attn)
        # 结果 attn.sbp = S(0)

        # batch_matmul,推导出 attn 和 v 的 sbp 都要是 S(0)
        # 所以差一个 P2S 的 boxing,v要从 P 转 S(0)
        x = flow.matmul(attn, v).transpose(1, 2).reshape(B_, N, C)
       	# x.sbp = S(0)
        
        x = self.proj(x)
        x = self.proj_drop(x)
        # x.sbp = S(0)
        return x

S2S

S2B

(Send)2(Recv)

from libai.

chengtbf avatar chengtbf commented on August 17, 2024

S(1) 的tensor经过slice操作之后就变成了 P

这个用的是哪个 oneflow commit? 这个在:

修复了。 昨天合并的

from libai.

Ldpe2G avatar Ldpe2G commented on August 17, 2024

S(1) 的tensor经过slice操作之后就变成了 P

这个用的是哪个 oneflow commit? 这个在:

修复了。 昨天合并的

用的29号的版本,拿我更新一下oneflow再测一下

from libai.

Ldpe2G avatar Ldpe2G commented on August 17, 2024

S(1) 的tensor经过slice操作之后就变成了 P

这个用的是哪个 oneflow commit? 这个在:

修复了。 昨天合并的

用的29号的版本,拿我更新一下oneflow再测一下

更新了oneflow之后重新测了下显存

libai eager global fp32 batch 32

1进程 0号卡显存: 5722 M
8进程 0号卡显存: 6474 M

libai graph fp32 batch 32

1进程 0号卡显存:5098 M
8进程 0号卡显存:5986 M

可以看到相对于之前,8卡下显存占用明显少了很多

from libai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.