GithubHelp home page GithubHelp logo

kyegomez / moe-mamba Goto Github PK

View Code? Open in Web Editor NEW
72.0 5.0 2.0 2.22 MB

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

Home Page: https://discord.gg/GYbXvDGevY

License: MIT License

Shell 11.15% Python 88.85%
ai ml moe multi-modal-fusion multi-modality swarms

moe-mamba's Introduction

Multi-Modality

MoE Mamba

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta. The SwitchMoE architecture is from the Switch Transformer paper. And, I still need help with it. If you want to help please join the Agora discord and server and help in the MoE Mamba channel.

PAPER LINK

Install

pip install moe-mamba

Usage

MoEMambaBlock

import torch 
from moe_mamba import MoEMambaBlock

x = torch.randn(1, 10, 512)
model = MoEMambaBlock(
    dim=512,
    depth=6,
    d_state=128,
    expand=4,
    num_experts=4,
)
out = model(x)
print(out)

MoEMamba

import torch 
from moe_mamba.model import MoEMamba 


# Create a tensor of shape (1, 1024, 512)
x = torch.randint(0, 10000, (1, 512))

# Create a MoEMamba model
model = MoEMamba(
    num_tokens=10000,
    dim=512,
    depth=1,
    d_state=512,
    causal=True,
    shared_qk=True,
    exact_window_size=True,
    dim_head=64,
    m_expand=4,
    num_experts=4,
)

# Forward pass
out = model(x)

# Print the shape of the output tensor
print(out)

Code Quality 馃Ч

  • make style to format the code
  • make check_code_quality to check code quality (PEP8 basically)
  • black .
  • ruff . --fix

Citation

@misc{pi贸ro2024moemamba,
    title={MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts}, 
    author={Maciej Pi贸ro and Kamil Ciebiera and Krystian Kr贸l and Jan Ludziejewski and Sebastian Jaszczur},
    year={2024},
    eprint={2401.04081},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

License

MIT

moe-mamba's People

Contributors

dependabot[bot] avatar kyegomez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

moe-mamba's Issues

[BUG] I tried to run example.py as is but it fails

I installed pip install moe-mamba

I ran poetry install and then poetry run python example.py

Why did I get RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x4 and 512x512) as the following ?

Traceback (most recent call last):
File "/home/marcelo/MoE-Mamba/example.py", line 2, in
from moe_mamba.model import MoEMamba
File "/home/marcelo/MoE-Mamba/moe_mamba/init.py", line 1, in
from moe_mamba.model import MoEMambaBlock, MoEMamba
File "/home/marcelo/MoE-Mamba/moe_mamba/model.py", line 4, in
from zeta.nn import FeedForward, MambaBlock, RMSNorm
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/init.py", line 28, in
from zeta.nn import *
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/nn/init.py", line 1, in
from zeta.nn.attention import *
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/nn/attention/init.py", line 14, in
from zeta.nn.attention.mixture_attention import (
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/nn/attention/mixture_attention.py", line 8, in
from zeta.models.vit import exists
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/models/init.py", line 3, in
from zeta.models.andromeda import Andromeda
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/models/andromeda.py", line 4, in
from zeta.structs.auto_regressive_wrapper import AutoregressiveWrapper
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/structs/init.py", line 4, in
from zeta.structs.local_transformer import LocalTransformer
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/structs/local_transformer.py", line 8, in
from zeta.nn.modules import feedforward_network
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/nn/modules/init.py", line 47, in
from zeta.nn.modules.mlp_mixer import MLPMixer
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/nn/modules/mlp_mixer.py", line 145, in
output = mlp_mixer(example_input)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/nn/modules/mlp_mixer.py", line 125, in forward
x = mixer_block(x)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/nn/modules/mlp_mixer.py", line 63, in forward
y = self.tokens_mlp(y)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/zeta/nn/modules/mlp_mixer.py", line 30, in forward
y = self.dense1(x)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/marcelo/.cache/pypoetry/virtualenvs/moe-mamba-ehhCoYub-py3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 118, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x4 and 512x512)

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

Is Class SwitchMixtureOfExperts unused in main model?

I noticed that there are 2 version of class MoE in the repo. One is in model.py, named SwitchMoE, which is used in MambaMoE. While another MoE is in block.py, named SwitchMixtureOfExperts, which is not used in the model MambaMoE. Whats the purpose of that and whats the difference?

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    馃枛 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 馃搳馃搱馃帀

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google 鉂わ笍 Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.