GithubHelp home page GithubHelp logo

muri's Introduction

0. Introduction

This repository contains the source code for our SIGCOMM'22 paper "Multi-Resource Interleaving for Deep Learning Training".

1. Content

  • simulator/ contains code for simulation and is adapted from Tiresias. Please refer to <repo>/simulator/README.md for detailed information.
  • cluster_exp/ contains code for real-cluster experiment. Please refer to <repo>/cluster_exp/README.md for detailed information.

2. Reproduce results (for SIGCOMM'22 artifact evaluation)

Please refer to <repo>/simulator/README.md and <repo>/cluster_exp/README.md for details.

Note: Due to the execution scripts of testbed experiments are highly related to intracompany platform, we only demonstrate the functionality and show the pseudocode of the related scripts (e.g., run.sh, prepare_env.sh). Please adjust to your platform if you would like to execute the testbed experiment.

3. Contact

For any question, please contact zhaoyh98 at pku dot edu dot cn

muri's People

Contributors

rivendile avatar

Stargazers

 avatar  avatar Jefferson Martines avatar Mingjie LIU avatar guaguastandup avatar Jessie Lin avatar now-ing avatar 刘旸(Liu Yang) avatar Peter Sheng avatar  avatar  avatar  avatar  avatar  avatar Junyang Zhang avatar xcwan avatar Ke Liu avatar Yuanhang Sun avatar Shengyu Liu avatar dzxu avatar ZCHNO avatar XinYao avatar Byungsoo Oh avatar Linchang Xiao avatar flyflypeng avatar Mahiru Kagura avatar GAO WEI avatar  avatar Shengyu Fan avatar  avatar Qinghao Hu avatar Yuhang Zhou avatar BaaBaa avatar Junyeol Ryu avatar CHEN Xiang avatar Sholong Joong avatar Diandian Gu avatar  avatar

Watchers

 avatar

muri's Issues

A question about KVReader

There was an error like "ModuleNotFoundError: No module named 'dataloader'" in the file nlp_model.py(line 9:from dataloader import KVReader). I tried to use "pip install dataloader" to install the dataloader. But it didn't work. So I wonder if KVReader was written by the author?

Questions about the run_sim.py implementation

hello. While looking at the simulator code, I have a question.

While looking at the SRTF code (def shortest_first_sim_jobs(...)) in run_sim.py, I saw that the execution time including overhead was subtracted from the code that updates the remaining_iteration of the job in the runnable_jobs list.

In my opinion, this overhead is the time from the start of initial training to the start of iteration.
Therefore, it seems that this overhead needs to be subtracted only once at the beginning, but in this code, it seems that this time is repeatedly subtracted to calculate it.

Thanks for answering the question :)

a few questions about paper

Hello, I have a few questions about this paper.

I have sent you an email and look forward to your reply. Thank you!

Question about the implementation of Themis scheduler

I have a question about the themis simulator in the file run_sim.py, in the function themis_sim_jobs, it first call the function get_isolated_throughts to get a list of throughput of current runnable jobs when each of them runs alone with just 1 / N of the total resources
image
but in the function get_isolated_throughputs, it returns the varible allocation, not the varible isolated_throughputs in the function, I think it is a bug.
image

Hoping to your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.