distribution's People
distribution's Issues
subsume flux-plans with the distribution repo
Consider pulling in the single wiki document in the flux-plans repo as an issue(?) here, and removing flux-plans as it is not being used.
CTS-1 launch flux on full system
_Goals_
Determine Flux startup/teardown scalability. Find any bugs revealed at scale.
_Methodology_
Gather baseline SLURM job launch data, so SLURM overhead can be subtracted from Flux times. Use MPI program that bootstraps with PMI like Flux. This SLURM data may also be useful for comparison with Flux job launch data (not covered here). Get enough runs to extract valid statistics. Run at various scales, e.g. 1,2,4,8,12 scalable units (one task per node).
Instrument and time the following phases of Flux startup/teardown:
- Full bootstrap/wireup time (split out PMI exchange)
- Time to execute rc1
- Time to execute rc3
Run through same scales as baseline, and run enough to extract valid statistics. Log issues for all anomalies linked from #12
Extra credit: continue baseline and Flux scaling to one task per core.
Publish baseline and Flux statistics in this issue.
_resource requirements_
Ramp-up beginning on opal (1 SU, 192 nodes, 6912 cores) to full CTS-1 system.
_exit criteria_
- No failures in 10 consecutive runs of
srun -N2304 flux start /bin/true
- No run should take longer than N + baseline (FIXME: need value for N)
_Issues to watch out for_
flux module load -r all
is called serially in rc1 and the converse in rc3, and seems not to scale well (more info needed)- resource-hwloc stores broken-out hwloc data in the KVS on every node
- load time of module.so out of NFS
distribution 0.2.0 release: ATS target 2 tracking issue
- develop packaging strategy for key platforms
- ATS machine/scheduler module to interface with flux/capacitor
- Sched:
- per-core scheduling
- rank identification, actually use all nodes
- Handle scheduling by core without oversubscription
- Offer allocation of groups of cores to processes, or processes on individual cores
- Support up to thousands, think as high as 7-10 thousand, in an instance
- Scalability:
- Job throughput
- Memory
- Have distribution level tests
- Testing:
- Node scalability baseline, uq-style?
- Memory scalability baseline and analysis
- Resilience baseline (kill some non-0 rank, kill rank 0)
- sched performance
flux wreckrun - output stdout "immediately"
I noticed the ordering of the stdout in Magpie was "out of order" from what I'm used w/ Slurm. I suspect it has to do with output going to the kvs and not being "flushed" until after a process ends? I can't seem to figure out the magic to have stdout flushed immediately, just wanted to ask before I start digging more.
Here's my simple example:
#!/bin/bash
myhostname=`hostname`
for i in `seq 1 5`
do
echo ${myhostname} - ${i}
sleep 1
done
Basically output hostname 5 times with 1 second in between each output.
When I run this script with srun like so requesting 4 nodes:
srun -N4 -n4 --time=1 ./output.sh
Every second I see 4 lines output with 4 different hostnames, which is what I'd expect. When I run:
srun -N4 -n4 --time=1 src/cmd/flux start flux wreckrun -N4 output.sh
The job sits for 5 seconds with no output. After the 5th second, all 20 lines of output get dumped at once. It appears to be queued up and flushed at exit.
Playing with the stdio options in the flux-wreckrun
manpage doesn't seem to help.
If I instead open up flux in a terminal mode, i.e. w/ --pty
$ srun -N4 -n4 --time=1 --pty src/cmd/flux start
$ flux wreckrun -N4 output.sh
It behaves like what I would expect. So I suspect there is some behavior when flux recognizes you're in a terminal. Is there an option or switch to make things flush out "immediately" when not in a terminal?
Obviously, since Flux isn't bootstrapped yet, perhaps this is just a side effect of needing to go through slurm to start a Flux instance. But this also seems to effect if you want stdout to go to a file instead of the console. This case could be used to monitor how a job is going by tailing a file.
CTS-1 overall test plan
Plan for flux testing on CTS-1 during early access window.
_Target System_
CTS-1 is a large system arriving at Livermore in early Summer 2016
- 12-scalable unit, 2304 node, 82.944 core
- Interconnect: Intel OPA
- Operating system: TOSS 3 (RHEL 7.2 based) with slurm-2.3.3
- Single SU test system: opal.llnl.gov
_Goals_
- Fulfill obligation to test Flux on CTS-1 hardware
- Demonstrate Flux ability to run one broker per node, full-system
- Demonstrate Flux ability to run full-system MPI jobs, on all cores
- High throughput testing
- Testing with debugging tools
- Solid OS integration (slurm, MPI, TOSS3, etc)
- Improved integration with TCE-packaged tools and MPI
_Plan_
Specific tests areas with resource requirements and exit criteria
#13 CTS-1 launch Flux on full system
#14 CTS-1 flux-sched test plan
_Entry issues_
Issues that need to be solved in order for some tests to run
- flux-framework/flux-core#630 support launching Intel MPI jobs on Intel OPA fabric
- flux-framework/flux-sched#154 Questions that may arise in creating the first sched RPM package
- flux-framework/flux-core#679 instrument flux-core startup/teardown phases
- flux-framework/flux-sched#166 Performance analysis for flux-sched
_Exit issues_
Issues discovered and/or fixed during testing.
CTS-1 flux-sched testing plan
Goals
Understand the performance and scalability characteristics of flux-sched
as we vary the number of nodes/cores managed by a flux instance (i.e., up to CTS-1’s scales), job geometries (e.g., job sizes) and job submission rates. At the end, we will have figure-of-merit numbers as our baselines and the performance profiles for each test configuration for immediate or future performance improvements.
Order of Testing
Our campaign should be done in a “easy or more confident” to “hard or less confident” fashion so that we can address easier issues along the way. (Note that this still is a draft: I will need some discussion to decide what we need to test for different scheduling algorithms, exit criteria and refine the testing coverage).
Test Types
- High Throughput Job Stress Test (Phase I)
- Use low node counts (e.g., CNs = {2,4,8,16,32})
- Run one broker per CN under our first-come, first-serve scheduler plugin
- Each test will produce 10 performance profiles (a performance profile per 1000 jobs).
- Submit/Schedule/Execute constant unit-size jobs
- 10,000 single-process
sleep 0
jobs in core-scheduling mode and compute thenumber of executed jobs per minute
as the figure of merit and a performance profile per 1000 jobs - 10,000 single-process
MPI_sleep 0
jobs in core-scheduling mode and compute thenumber of executed jobs per minute
as the figure of merit and a performance profile per 1000 jobs
- 10,000 single-process
- Submit/Schedule/Execute constant 1/2CNs-size jobs
- 10,000 1/2CNs-process
sleep 0
jobs in core-scheduling mode and compute thenumber of executed jobs per minute
as the figure of merit and a performance profile per 1000 jobs - 10,000 1/2CNs-process
MPI_sleep 0
jobs in core-scheduling mode and compute thenumber of executed jobs per minute
as the figure of merit and a performance profile per 1000 jobs
- 10,000 1/2CNs-process
- Submit/Schedule/Execute Variable-size jobs
- 10,000
sleep 0
jobs whose sizes are powers of 2 up to 1/2 CNs processes and cycling through these sizes in core-scheduling mode and compute thenumber of executed jobs per minute
as the figure of merit and a performance profile per 1000 jobs - 10,000
MPI_sleep 0
jobs whose sizes are powers of 2 up to 1/2 CNs processes and cycling through these sizes in core-scheduling mode and compute thenumber of executed jobs per minute
as the figure of merit and a performance profile per 1000 jobs
- 10,000
- Scale-emulated Large-scale Stress Test (Phase II)
- Use medium node counts (e.g., CNs = {16,32,64})
- But run 36 brokers per CN (e.g., Flux sizes = {576, 1152, 2304}) each loading in a distinct CTS-1 hwloc xml file
- One issue might be
wrexec
's many “fork/exec” hit some “resource limits” in which case we may need to introduce a mode in which the fork/exec are skipped - We repeat the above testing
- Scheduling algorithm Test (Phase III)
- We need a reasonable performance/scalability/correctness testing coverage for EASY backfill (TBD)
- We need a reasonable performance/scalability/correctness testing coverage for Hybrid Conservative backfill (TBD)
- We need a reasonable performance/scalability/correctness testing coverage for Conservative backfill (TBD)
- Large-Scale Test (Phase IV)
- Depending on the findings above, we may choose the most interesting testing configurations on the CTS-1 itself and run it all the way up to the full scale of Jade (but w/ one broker per node)
CTS-1 program launch testing
_Goals_
Test scalability and usability of Flux program launch on full system. Determine any bugs or scaling and usability issues.
_Methodology_
Launch and collect timing data for a series of programs, both MPI and non-MPI, and compare with baseline SLURM launch data as collected in #13. Utilize and/or enhance instrumentation already extant in flux-wreckrun
and record the timing of phases, including
- state=reserved -> state=starting
- state=starting -> state=running
- state=complete -> wreckrun exited
as well as the entire time to fully execute each parallel program from a hot cache.
Run these tests through a similar scale as the baseline described in #13, with enough samples for statistical validity. Vary the number of tasks per broker rank as well as the number of total tasks for each program. Publish results in this issue.
Time permitting, include scale testing a program with increasing amounts of stdio and record impact to runtime (Tcompleted - Trunning).
_exit criteria_
- Able to run full scale program:
flux wreckrun -n $((2304*cores_per_node)) mpi_hello
- No failures for 10 runs of unusual size (ROUS), either full system as in 1. above or a TBD threshold
- I/O for full scale program is captured to kvs
_Issues to watch out for_
- Scaling of kz io to KVS is unknown. Full system program launch should be verified before attempting to launch a program with even moderate I/O.
- Investigate usage of
persist-directory
to ensure content-store doesn't fill up local tmp or tmpfs during these runs.
smallest serviceable slurm substitute
Smallest Serviceable Slurm Substitute
What follows are the requirements to replace the SLURM version
currently in use at LC, not a wish list for the perfect batch system.
The requirements are listed as bullet items with minimal text to
describe the item. This assumes an understanding of SLURM and its
features. For further details, reference the SLURM man pages.
References to SLURM commands are listed where appropriate. New
features in the versions of SLURM beyond v2.3.3 are not listed.
- Task launch
- Specify number of tasks
- Specify resources (at least nodes and cores)
- Number of resources (e.g., 4 nodes)
- include ranges
- Named resources (e.g., cluster, node[4-8], core[0-3])
- Memory size
- Generic resources
- Features
- Number of resources (e.g., 4 nodes)
- Task distribution:
- Cyclic
- Block
- Plane
- Custom (base on configuration file)
- Task to resource mapping
- Number of tasks per node (or core)
- Number of cores per task
- Hardware threading (desired? allowed? disabled?)
- Task containment - confine tasks to allocated resources: sockets, cores, memory
- Wall clock limit
- Task prolog and epilog options
- Resource management
- Resources managed: Clusters, nodes, sockets, cores, threads, memory, GPU’s, burst buffers, file systems, licenses, etc.
- Add and remove resources from management
- Report and change status of resources: up, down, draining, allocated, idle
- Resource pools (aka partitions, queues)
- Resource weights (governs priority for selection)
- Resource sharing allowed (if so, to what degree?)
- Network topology
- Contiguous resources
- Switch topology
- Resource status (
sinfo
)- Summary of nodes and states (idle, allocated, down, draining)
- Summarize for each node partition
- Rich reports of specific resources
- By node (
scontrol show node
) - By partition (
scontrol show partition
)
- By node (
- Job Specification
- Job category
- Batch script (
sbatch
) - Interactive (
salloc
)- includes xterm request (
mxterm
/sxterm
)
- includes xterm request (
- Single job step as job (
srun
)
- Batch script (
- User / group
- Bank account
- Workload characterization key
- Min/max run times
- Priority (includes nice factor if any)
- QoS
- Queue
- Resource requirements
- Min/Max node counts
- Features, tags, processor architecture, processor speed
- (Minimum or specific) memory per (socket or node)
- (Minimum or specific) (sockets or cores) per node
- Tasks per node (or core)
- Cores per task
- Shared or exclusive
- Preferred network topology / node contiguity
- Licenses
- File systems
- Installed packages and libraries
- Allocated resources
- By count (e.g., number of nodes and cores)
- By name (e.g., node names, cpu’s, gpu’s, etc.)
- Node on which batch script is running
- State (includes reason for not running)
- Dependency (other job(s) starting/completing/exit code)
- Reservation
- Prolog and Epilog
- Re-queue request
- If preempted
- If resource fails
- Terminate (or continue) on resource failure
- Times
- Submit time
- Start-after time
- Estimated start time
- Actual start time
- Run time limit
- Actual run time
- Terminate time
- Exit Status (includes if signaled and by which signal)
- Job run info
- Job name
- Command
- Working directory
- Standard In / Out / Error
- Batch script
- Job category
- Job Submission
- Option to intercept submit request and alter, override, or insert policy-related options
- Job submission fails at submit time (as opposed to run time) when invalid options are specified
- #(Pound) directive support in batch script (e.g., #SBATCH -N) as optional means to convey job specifications
- Job status
- One-line job summary (
squeue
)- Queued as well as running jobs
- Includes jobs of other users
- Verbose job record report (
scontrol show job
) - Job step reports
- Includes record of associated batch script
- One-line job summary (
- Job control
- Job removal and signaling (
scancel
) - Job signal prior to termination (per specified grace time)
- Job modification (
scontrol update job
) - Job hold/release
- Job removal and signaling (
- Job prioritization factors
- Fair share
- Job size (favoring large or small)
- Queued time (FIFO)
- QoS contribution
- Queue contribution
- User nicing
- Scheduling (starting with a prioritized queue)
- Matches job’s requests with available resources
- Supports multiple rules for resource selection:
- Best fit
- First fit
- Balanced workload
- Job submission requires a bank account and user permission to use that account
- Honors time and resource size limits imposed by
- Queue
- QoS
- User/Bank
- Imposes limits on
- Number of jobs that can be queued at any given time
- Number of jobs that can be running at any given time
- Accommodates sharing requests and allowed sharing levels
- Waits specified time to accommodate node topology request
- Backfill option
- Conservative backfill no higher priority job delayed
- EASY backfill just the top priority job cannot be delayed
- Provides estimated start times
- Considers jobs for multiple queues
- Supports job dependencies from other clusters
- Provide job preemption based on QoS or queue. Preemption action can be
- Suspension
- Checkpoint
- Terminate and Re-queue
- Terminate
- Support for job growth and shrinkage
- Quality of Service
- Affects job priority
- Allows exemptions from time and size limits
- Can impose an associated set of time and size limits
- Can amplify or dampen the usage charges
- Bank Accounts
- Fundamental to permitting user’s ability to submit jobs
- Reflects the sponsors’ claim to the cluster’s resources (i.e., the shares in fair share)
- Can impose an associated set of time and size limits
- Reservations
- Resources can be reserved in advance (DATs)
- Permitted jobs can run within those reservations
- Email user at job state transitions
- Begin
- End
- Fail
- Re-queue
- All
- Resource accounting
- Resource utilization (
sreport
) - Times reported for specified time periods under the following categories:
- Allocated
- Idle
- Reserved
- System maintenance
- Unplanned down time
- Resource utilization (
- Job accounting
- Individual job records (
sacct
)- Job and job step records for a prescribed time period
- Includes most of the job parameters listed in Job Specification above
- Composite job reports (
sreport
)- Aggregate job reports based on user, account, and workload characterization key
- Over a prescribed time period
- Includes listing of top users and top accounts
- Includes reports by job size
- Individual job records (
- Security
- Jobs can only be run by submitting user
- Job output can only be seen by submitting user
- System parameters can only be changed by authorized roles (see next item)
- Administration
- Role-based system administration and overrides
- User can monitor and alter (some) of own job parameters
- Operator can alter other users’ job parameters
- Coordinator can populate bank account memberships and limits
- Administrator can do all above and alter resource definitions
- Role-based system administration and overrides
- User/bank management (
sacctmgr
)- Cluster/partition/user/bank granularity
- Implicit permission to use bank
- Limits imposed at each level of the hierarchy
- Limits include:
- Max number of jobs running at any time in bank
- Max number of nodes for any jobs running in bank
- Max number of CPUs for any jobs running in bank
- Max number of pending + running jobs state at any time in bank
- Max wall clock time each job in bank can run
- Max (CPU*minutes) each job in bank can run
- System
- Save state and recover on restart
- Resources
- Jobs
- Usage statistics
- System can be restarted without losing queued jobs or killing running jobs
- Reliability
- High availability backup to take over when primary dies or hangs
- Resilient able to adapt to failing or failed resources
- 24x7 operation
- System updates possible on a live system without losing queued or running jobs
- Robust
- Atomic changes
- System can never get in a corrupt or inconsistent state
- Complete recovery after crashes
- Performance
- Response to user commands to be less than one or two seconds.
- Scheduling loops under one minute
- Scalability
- Thousands of jobs
- Thousands of resources
- Thousands of users
- Visibility
- Pertinent info is logged
- System diagnostics facilitate a quick discovery of what went wrong
- Configuration
- System configuration read from file or database
- System configuration parameters can be changed live
- Save state and recover on restart
- API
- Library to retrieve remaining time (
libyogrt
) - Interface to lorenz
- Library to retrieve remaining time (
- Environment Variables
- Support for user defined environment variables to be used to input job specifications (e.g. SBATCH_ACCOUNT)
- System inserts variables into the execution environment to be used by user's script or application (e.g., SLURM_JOB_ID)
- Option to convey some or all of user's environment variables to run time execution environment.
build on RHEL 7 against available EPEL packages
It would be easy on our users if our flux distribution built on top of EPEL packages on RedHat based distros. EPEL 7 provides
- zeromq-4.0.5
- czmq-1.4.1
- openpgm-5.2.122
- libsodium-1.0.5
A few issues:
- This is an ancient czmq (current is 3.0.2).
- Unsure if zeormq is built with libsodium and openpgm; if not we would need to handle missing crypto and pgm better.
post 1.0 release notes format
This issue is for discussing the proper formatting for release notes (and possibly commit messages) post 1.0 release following discussion in flux-framework/flux-core#879
Primary issues up for discussion are:
- using topic tags in pull requests to generate automatic release notes
- how release notes should be formatted for releases
I personally don't know of any project with a formal topic tag system for the commit messages. So I can't recommend any. Some Googling had some people point to this project: https://wiki.typo3.org/CommitMessage_Format_%28Git%29. It's perhaps a decent one to start with for discussion.
For release notes, I'd like to suggest something I've seen with a few Apache projects (I don't know if this is formal, it may just be a style I've noticed)
For major version releases (i.e. in X.Y.Z, X is incremented), only a high level description of major changes are added to release notes. This makes sense as it's a major version release, so "bug fix" details aren't needed. Presumably this is something that would be written out by the team. Example here with Spark 2.0.0 release: https://spark.apache.org/releases/spark-release-2-0-0.html
For minor version releases (i.e. in X.Y.Z, Y is incremented), include a high level description of changes and details on any issues/bugs that were fixed. The high level description is because something non-trivial or non-bug-fix was done to warrant a minor release version, otherwise it should only be a revision only increment. Presumably most could be auto-generated but some people work to write out high level information (unless its done via a topic tag). Example of Spark 1.5.0 release: https://spark.apache.org/releases/spark-release-1-5-0.html
For revision version releases (i.e. in X.Y.Z, Z is incremented), include a list of the specific tickets/PRs fixed and the description of those fixes. Since this release should only contain bug fixes, a list of bugs fixed should be more than sufficient and should be auto-generated. If we use topic tags, information can be organized into sections. Example from Hadoop 2.7.3 http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/releasenotes.html (Spark doesn't even bother with this, just points you to a link in JIRA.)
distribution 0.1.0 release: ATS target 1 tracking issue
- Develop distribution source release process, with RFC; See #8
- Default module list/initial program, either way a way to specify modules that should be loaded outside of source: See flux-framework/flux-core#508
- Sched integration: make having a scheduler for submit to use dependable
- Sched installer of some description
- flux-core source package for out-of-tree build? See #5 and #4
- Sched autotools build (maybe)? See flux-framework/flux-sched#83
- integration tests for
flux submit
and general functionality
build in /opt on RHEL 6
EPEL 6 provides
- zeromq3-3.2.5
- czmq-1.4.1
- openpgm-5.1.118
- libsodium-0.4.5
Given that we are also dependent on python 2.7 which is built in /opt on our systems, we should probably just build flux in /opt and also build the following dependencies there:
- zeromq-4.1.4
- czmq-3.0.2
- skip libsodium and build zeromq with builtin tweetnacl
Which Slurm Features are currently available in Flux
There is a list of Slurm features in the wiki (https://github.com/flux-framework/distribution/wiki/Smallest-Serviceable-Slurm-Substitute-(S4)), but it is not clear which ones are already supported in Flux.
Could you please list them?
Also, do you know which ones are actually being used in your cluster? I mean, are there real applications exploiting them?
Additionally, does Flux support job checkpointing/restarting? If yes, is it being used in practice for failure recover of MPI-based applications?
Finally, in the Flux paper, it seems that the main difference from Slurm is the i/O awareness scheduling and the KVS features. Is that right? Are there other features?
Thanks in advance and kind regards.
consider Ubuntu PPA for Flux projects and dependencies
It would be rather convenient for developers to have an Ubuntu PPA for flux projects and (especially) those packages they depend on that are not already available in usable form in Ubuntu.
document process for signing flux-framework tags and/or release tarballs
Here's a description of signing git tags with a PGP key.
Is there a best practice we could borrow and add to RFC 1 for signing tags and/or release tarballs of flux-framework projects?
This is probably not urgent for our initial tags, but we should figure it out before our first stable release.
need release script
As discussed in flux-framework/flux-core#526, it may be useful to provide a script for generating and uploading release materials for flux-framework projects.
Such as script would take a project and tag as arguments, then
- checkout tag
- run a project-specific script for generating tarball (e.g.
./autogen.sh; ./configure; make dist
) - use the github API to upload the resulting tarball and release notes as described here.
distribution process and content
We need to define process and content for a distribution release.
A minimal distribution release should contain
- A distribution version
- A set of package (name, version) tuples
- List of tests and test results, possibly some of the tests themselves
- distribution release notes
I'd like to propose that we add the following to the distribution repo:
README.md
- explain the purpose, content, and process for releasestest/
- directory containing scripts and record of testing including raw output, subdir per distribution release, cumulativeNEWS
- release notes (cumulative, added to top GNU style)versions
- list of project, version tuplestarballs
- list of project, URI tuples, with substitutions fromversions
Then there should be a build infrastructure which can fetch tarballs, build them, and run tests. Possibly it could have an install target that installs to ${datadir}/flux-distribution
. I guess tests should be installed so end-users can re-run tests against the installed packages on their systems.
We could also include OS distro specific metadata in a flux-distribution release. For example, yum or apt config files pointing to public repositories for built flux packages if we were to do that, rpm spec/deb metadata for packaging flux-distribution itself, with dependencies on the tested subpackages, and others required to re-execute tests.
Comments and ideas?
Smallest Serviceable Slurm Substitute - Take 2
The following is a more selective list of features and behavior that Flux needs to support in order to replace Slurm on LC systems. It seeks to pare #6 down to a more manageable level.
Area | SLURM Functionality | Flux Requirement |
---|---|---|
Resources | Nodes, cores, memory, GPUs, Licenses | |
Job Request | Quantity of: Nodes, cores, memory, GPUs, Licenses | |
Option to request specific nodes by name | ||
Option for exclusive use of nodes | ||
Node features / tags | ||
Production or debug "queue" | ||
Charge Account | ||
WCKey | ||
Policy override request (incl preemtable flag) | ||
Job dependency: on success/failure of prev job(s) | ||
Job dependency: eligible to run after specified time | ||
Job Name | ||
User-supplied annotation (aka comment field) | ||
Specify or inherit shell limits | ||
Specify or inherit environment variables | ||
Job Script | directive support to convey submission options | jobspec conveys submission options and initial program arguments |
Job type | Batch | |
Interactive | ||
xterm | ||
Scheduler | Node or core-based scheduling | |
Backfill scheduling | ||
Running Job preemption | ||
Job Priority | Queue Wait time | |
Fair-share | ||
Job Size | ||
Policy overrides (see below) | ||
Policy | User permission to charge account (forms user/account/cluster tuple) | |
Policy: limits | Job size | |
Wall clock | ||
Running Jobs per User | ||
Running Jobs per Node | ||
Policy: Limit Scope | Per User / Charge Account / Cluster (most granular) | |
Per Charge Account / Cluster | ||
Per "Queue" / Cluster | ||
Policy: Overrides | exempt: exempt from limits, normal job priority | |
expedite: exempt from limits, increased job priority | ||
standby: exempt from limits, very low priority, preempt-able | ||
default: no overrides | ||
Allocate nodes, cores, memory, GPUs, Licenses | ||
Reserve nodes, cores, memory, GPUs, Licenses for Dedicated Application Times | ||
srun replacement: Launch tasks across nodes, cores, memory, GPUs | ||
Constrain tasks to nodes, cores, memory, GPUs | ||
Status Command: resource display | State (Up, Down, Draining, etc.) | |
Allocated Jobs | ||
"queue" | ||
Status Command: Job Display - one job per line OR detailed display of all fields | Job Request (all job request fields defined above) | |
User | ||
State (Queued , Running , etc.) | ||
Reason not running | ||
Times: submit, eligible, start, end | ||
Job priority components | ||
Exit status Signal number (if signaled) | ||
Behavior | Options for Mail at job start, end, and failure | |
Option to hold and release a queued job | ||
Option to signal and cancel job | ||
Option to cancel job or keep alive following resource failure | ||
Option to requeue job following resource failure | ||
Option to attach / debug running program | ||
Option to specify output and error files | ||
Prolog and Epilog for each job | ||
Get remaining time API | ||
Database Maintains for each Cluster: Charge Account hierarchy and user permissions | User access control (unable to submit job without an authorized charge account) | |
Promised shares of resources (cpu cycles) | ||
Limits (job size, wall clock, and running job) | ||
Policy overrides | ||
Database Maintains for each Cluster | Job statistics for every job | |
Node state changes with reasons for entering failed state | ||
Tracks the Workload Characterization Keys | ||
Defines the Operator, Coordinator, and Admin Roles | ||
Accounting | Report job usage by user/account over requested period | |
Report job usage by user/account by job size over requested period | ||
Report job usage by user/WCkey over requested period | ||
Report machine utilization over requested period |
O_PATH undefined on TOSS2
The O_PATH introduced into cleanup.c in a recent commit (1daa5060eecb448116c3d0103d49d2b092f06367) is not available on TOSS 2. It would probably be valid, if slower and more expensive, to actually just define that out as 0 at configure time and make sure the fd gets closed after the unlinkat.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.