GithubHelp home page GithubHelp logo

ulhpc / launcher-scripts Goto Github PK

View Code? Open in Web Editor NEW
14.0 23.0 14.0 104 KB

(DEPRECATED) A set of launcher scripts to be used with OAR and Slurm for running jobs on the UL HPC platform

Makefile 14.33% Shell 80.78% Python 4.89%
hpc launcher oar slurm

launcher-scripts's Introduction

-- mode: markdown; mode: auto-fill; fill-column: 80 -- README -- HPC @ UL

    Time-stamp: <Mer 2013-04-03 17:40 svarrette>

UL HPC Launcher scripts

Synopsis

This repository holds a set of launcher scripts to be used on the UL HPC platform. They are provided for users of the infrastructure to make their life easier (and hopefully more efficient) on the following typical workflows:

  • Embarrassingly parallel run for repetitive and/or multi-parametric jobs over a Java/C/C++/Ruby/Perl/Python/R script, corresponding (normally) to the following cases:

  • serial (or sequential) tasks having all similar duration, run on one node

  • serial (or sequential) tasks having varying durations, run on one node

  • serial (or sequential) tasks having varying durations, run on multiple nodes

  • MPI run on n processes (ex: HPL) with abstraction of the MPI stack, MPI script, option to compile the code etc.

  • MPI run on n process per node (ex: OSU Micro-benchmarks)

We propose here two types of contributions:

  • a set of bash scripts examples that users can use as a startup example to adapt for their own workflow
  • NOT YET IMPLEMENTED a more generic ruby script interfaced by a YAML configuration file which hold the specificity of each users.

General considerations

The UL HPC platform offers parallel computing resource, so it's important you make an efficient use of the computing nodes, even when processing serial jobs. In particular, you should avoid to submit purely serial jobs to the OAR queue as it would waste the computational power (11 out of 12 cores is you reserve one node on gaia for instance).

Running a bunch of serial tasks on a single node

A bad behaviour in this context is illustrated in bash/serial/NAIVE_AKA_BAD_launcher_serial.sh where you'll recognize a pattern you perhaps use in your own script:

 for i in `seq 1 ${NB_TASKS}`; do  
    ${TASK} $i
 done 

If you're more familiar with UNIX, you can perhaps argue we can fork separate processes using the bash & (ampersand) builtin control operator and the wait command. This is illustrated in bash/serial/launcher_serial_ampersand.sh and corresponds to the following pattern:

 for i in `seq 1 ${NB_TASKS}`; do  
    ${TASK} $i &
 done 
 wait

This approach is straightforward and is sufficient assuming (1) you don't have a huge number of tasks to fork and (2) each tasks has the a similar duration. For all the other (serial) cases, an approach based on GNU parallel if more effective as it permits to easily and efficiently schedule batch of n tasks in parallel (-j n), where n typically stands for the number of cores of the nodes. This is illustrated in bash/serial/launcher_serial.sh and corresponds to the following pattern:

seq ${NB_TASKS} | parallel -u -j 12 ${TASK} {}

Not convinced you have interest to these approaches? Take a look at the following completion times performed on the chaos cluster for the task mytask.sh proposed in bash/serial/mytask.sh:

  +---------+---------------+--------+--------------+----------------------+-----------+
  | NB_TASK |    HOSTNAME   | #CORES |    TASK      |    APPROACH          |   TIME    | 
  +---------+---------------+--------+--------------+----------------------+-----------+
  |   24    | h-cluster1-32 |   12   | sleep {1-24} | Pure serial          | 5m0.483s  | 
  |   24    | h-cluster1-32 |   12   | sleep {1-24} | Ampersand + wait     | 0m24.141s |
  |   24    | h-cluster1-32 |   12   | sleep {1-24} | GNU Parallel (-j 12) | 0m36.404s |
  |   24    | h-cluster1-32 |   12   | sleep {1-24} | GNU Parallel (-j 24) | 0m24.257s |
  +---------+---------------+--------+--------------+----------------------+-----------+

The same benchmark performed for the sample argument file (see bash/serial/mytask.args.example) to perform tasks of similar duration:

  +---------+---------------+--------+---------+----------------------+-----------+
  | NB_TASK |    HOSTNAME   | #CORES | TASK    |    APPROACH          |   TIME    |
  +---------+---------------+--------+---------+----------------------+-----------+
  |   30    | h-cluster1-32 |   12   | sleep 2 | Pure serial          | 1m0.374s  |
  |   30    | h-cluster1-32 |   12   | sleep 2 | Ampersand + wait     | 0m2.217s  |
  |   30    | h-cluster1-32 |   12   | sleep 2 | GNU Parallel (-j 12) | 0m6.375s  |
  |   30    | h-cluster1-32 |   12   | sleep 2 | GNU Parallel (-j 24) | 0m4.255s  |
  +---------+---------------+--------+---------+----------------------+-----------+

GNU parallel

Resources:

Running a bunch of serial tasks on more than a single node

If you have hundreds of serial tasks that you want to run concurrently and you reserved more than one nodes, then the approach above, while useful, would require tens of scripts to be submitted in separate OAR jobs (each of them reserving 1 full nodes).

It is also possible to use GNU parallel in this case, using the --sshlogin options (altered to use the oarsh connector). This is illustrated in the generic launcher proposed in `

Running MPI programs

You'll find an example of launcher script for MPI jobs in bash/MPI/mpi_launcher.sh. Examples of usage are proposed in examples/MPI/

Contributing to this repository

Pre-requisites

Git

You should become familiar (if not yet) with Git. Consider these resources:

git-flow

The Git branching model for this repository follows the guidelines of gitflow. In particular, the central repo (on github.com) holds two main branches with an infinite lifetime:

  • production: the production-ready benchmark data
  • devel: the main branch where the latest developments interviene. This is the default branch you get when you clone the repo.

Local repository setup

This repository is hosted on out GitHub. Once cloned, initiate the potential git submodules etc. by running:

$> cd launcher-scripts
$> make setup

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.