GithubHelp home page GithubHelp logo

agwabom / towards_moe Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 0.0 3.87 MB

Implementation of "Towards Understanding Mixture of Experts in Deep Learning", NeurIPS 2022

License: MIT License

Python 3.58% Jupyter Notebook 96.42%

towards_moe's Introduction

A Pytorch Implementation of "Towards Understanding Mixture of Experts in Deep Learning"

This is my implementation of "Towards Understanding Mixture of Experts in Deep Learning" which is accepted at NeurIPS 2022.

Which still has a lot of work to do.

NOTE: I am not an author of the paper!!

Figure 1. A linear (left) and a non-linear (right) moe model that learns to dispatch data points to 8 experts.

Dataset

Below is the tsne visualization of synthetic data.

Figure 2. Each color denotes a cluster in synthetic data

Figure 3. Labels on each data point in synthetic data

Performance

  • Performance after 500 epoch

Figure 4. Accuracy and loss graph on each setting of model

Test accuracy (%) Number of Filters
Single (linear) 76.3 512
Single (nonlinear) 80.6 512
MoE (linear) 96.2 128 (16*8)
MoE (nonlinear) 1.00 128 (16*8)

Future work

Model side

  1. Add dispatch entropy evaluation
  2. Support on Language & Image dataset
  3. Replicate results on linear/non-linear MoE

Dataset side

  1. Fix synthetic data generation
    • Add cluster label for entropy evaluation
  • On current version seems different from Figure 1 in the original paper

Reference

@misc{https://doi.org/10.48550/arxiv.2208.02813,
  doi = {10.48550/ARXIV.2208.02813},
  url = {https://arxiv.org/abs/2208.02813},
  author = {Chen, Zixiang and Deng, Yihe and Wu, Yue and Gu, Quanquan and Li, Yuanzhi},
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Machine Learning (stat.ML), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Towards Understanding Mixture of Experts in Deep Learning},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

towards_moe's People

Contributors

agwabom avatar

Stargazers

Yuwen Heng avatar Junwei Zhang avatar Kyungryul Back avatar  avatar  avatar Wonjae Kim avatar Ahmad Momani avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.