GithubHelp home page GithubHelp logo

queue's Introduction

This is a queuing script that can automatically monitor GPU information and run your shell command when GPU condition is satisfied.

Features

  • Circularly monitor average gpu utilization rate and free memory within a fixed duration.

  • Run your command when gpu satisfies certain conditions:

    • gpu utilization rate <= threshold
    • gpu free memory >= threshold
    • condition-satisfied gpu number >= threshold
  • Send e-mail to inform you:

    • when running your command
    • when error occurred
    • when your command finfished

Installation

Packages

  • pynvml, numpy

E-mail

  • change the send_mail( ) function in queue_script.py.

Quick start

  • File structure:

      /queue 
          queue_script.py  
          user.py (user defined)  
          *.log (automatically generated)  
          README.md
    
  • Parpare your code and conda environment.

  • Create queue/user.py, and define your own command function and user name like following:

    import subprocess
    import numpy
    ############################ Edit your server name ############################
    COMPUTER_NAME = 'Computer X'
    ###############################################################################
    
    def run_command(free_gpu_id, avg_free_memory, avg_gpu_util):
        """
        Input:
            free_gpu_id: numpy.array(int), id of gpu which satisfy the condition
            avg_free_memory: numpy.array(int), len=gpu_num
            avg_gpu_util: numpy.array(int), len=gpu_num
        Output:
            return_code: int, 0 for success, else failure
            task_out: str, stdout of the command
            task_err: str, stderr of the command
        """
        ############################## Edit your command ##############################
        cmd = "python /test.py"
        cwd = "/home/share/name"
        # Generating shell command
        for gid in range(len(avg_free_memory)):
            if gid not in free_gpu_id:
                avg_free_memory[gid] = 0
        max_mem_id = np.argmax(avg_free_memory)
        CMD_prefix = 'CUDA_VISIBLE_DEVICES=%d ' % max_mem_id    # use GPU with maximum GPU memory
        CMD = CMD_prefix + cmd
        ###############################################################################
    
        # launch subprocess
        task = subprocess.Popen(CMD, shell=True, cwd=cwd, 
                                stdout=subprocess.PIPE, stderr=subprocess.PIPE, 
                                universal_newlines=True)  # if universal_newlines=False the output will be in binary format
        return_code = task.poll()
        task_out, task_err = task.communicate()
        return return_code, task_out, task_err
  • Run the queue script:

    python queue_script.py --monitor-interval 600 --measure-duration 10 --min-memory 5000 --max-util 20 --min-gpu 1

    This command means:

    • The script will check conditions every 600 seconds.
    • Run command defined in user.py when there is at least 1 GPU with more than 5000MB free memory and less than 20 utilization rate averaged in 10 seconds.
      Remember, measure-duration and monitor-interval must be >= 1, high frequency scripts are not allowed!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.