Comments (8)
@cmoussa1: awesome progress! Yeah it will be good to find out what format @ryanday36 wants to port his scripts. You may export some data to that format and see if that's good enough for @ryanday36.
from flux-accounting.
I put together two python scripts for reporting on jobs run in LSF. The first (/usr/tce/packages/lacct/lacct-1.0/bin/lacct.py on Lassen, rzansel, butte, etc) wraps the 'bhist' command to read all jobs that meet a given selection query (specified by the command arguments: user, account/bank, wckey, start, end) into a list of JobRecord objects and then print those out. The second (/usr/tce/packages/lacct/lacct-1.0/bin/lreport.py) imports lacct.py to get that list of JobRecords and then sums up the total usage according to the command arguments (e.g. byuser, bygroup, etc.).
So, the fastest way that I could adapt those to flux data would probably actually be a python module that gave me a function that I would return a list of JobRecords for a given user, bank, wckey, starttime, endtime (default values should be 'all' for each of these).
Here's what my JobRecord object looks like:
class JobRecord(object):
'''
A record of an individual job.
'''
def __init__(self, jobid, user, group, project, nnodes, hostlist, sub, start, end) :
self.jobid = jobid
self.user = user
self.group = group
self.project = project
self.hostlist = hostlist
self.nnodes = nnodes
self.sub = sub
self.start = start
self.end = end
return None
@property
def elapsed(self) :
return time.mktime(self.end) - time.mktime(self.start)
@property
def queued(self) :
return time.mktime(self.start) - time.mktime(self.sub)
And here's the arguments that I take for 'lacct' (selection criteria for jobs):
if __name__ == '__main__' :
parser = argparse.ArgumentParser(description='Show information about completed jobs.')
parser.add_argument('-P', action='store_true', help='output "|" delimited columns for easy parsing')
parser.add_argument('-v', action='store_true', help='output more details about jobs')
parser.add_argument('-s', metavar='<start_time>', help='show jobs that completed after start_time')
parser.add_argument('-e', metavar='<end_time>', help='show jobs that completed before end_time)')
parser.add_argument('-j', metavar='<jobid>', help='show job with jobid')
parser.add_argument('-u', metavar='<username>', help='show jobs run by username')
parser.add_argument('-g', metavar='<group/bank>', help='show jobs run with group/bank')
parser.add_argument('-p', metavar='<project/wckey>', help='show jobs run with project/wckey')
parser.add_argument('-C', action='store_true', help='print command line.')
main(parser.parse_args())
from flux-accounting.
So I took a look at flux-core's job archive and tried to compare what you have listed in your job record object. Here is a table comparing the two (as of now):
JobRecord object | flux-core's job-archive DB |
---|---|
jobid | id |
user | userid |
group | account |
project | missing - wckey |
hostlist | missing |
nnodes | "rank" field in R column |
sub | t_submit |
start | t_start |
end | t_inactive |
In terms of getting data out of the job-archive DB, since the database itself is SQLite, I think a Python interface (similar to the two scripts you described) should be relatively similar to grab data out (Python has a pretty nice SQLite API, which is what I have been using with flux-accounting's database).
from flux-accounting.
@cmoussa1, I just gave you lacct.py and lreport.py on the RZ.
A couple things to note on your table above comparing the JobRecord object and the flux-core accounting db is that the 'group' in LSF is the bank, so you do have that. The 'project' is the Slurm wckey, which I don't believe you have. The hostlist isn't actually needed for accounting, but can be useful for the hotline / system admins when a user calls in to ask about a job failure for a job that ran a week ago or similar.
from flux-accounting.
@ryanday36: Yeah we need to discuss this and make the porting of your script as easy as possible. Since @chu11 is out, @cmoussa1 or @grondo can tell you about how to get the data into the format that is easy to be slurped into your script?
You told us about how you did this under LSF. Maybe you can describe that + the format of the output you get from LSF to begin with? Then we can tell you a few ways you can get this info. If an easy mechanism is not there, we should open up a ticket and put that into our plan.
from flux-accounting.
I think this might be a good next development-step for accounting (thanks @dongahn for the suggestion from #29), so I think I will pursue this next if there are no objections.
@ryanday36: do you think it would be possible to give
me the scripts you mentioned above in this thread? I know you pasted some sections of it but it might be nice to have the full file to work off of.
The other question probably worth discussing is where a usage reporting script might reside - does it make sense to put those scripts in flux-accounting
? Maybe it could be another subcommand - flux account usage-report ...
or something. We could try and include all of the same options as the ones lifted above.
This might end up also defining other gaps in what we are reporting as of now and what we need to report in the future (I pointed a couple out already; fields like group/bank
, project
, etc).
from flux-accounting.
Just to update/let myself know where I am at - I've started poking around (early and experimentally π) with implementing similar style functionality within flux-accounting
- queries from the job-archive are fetched using pandas
, which returns matching entries into a Dataframe
object. I can iterate through these, extract the fields needed, and construct a JobRecord
object:
class JobRecord(object):
'''
A record of an individual job.
'''
def __init__(self, userid, jobid, ranks, t_submit, t_run, t_inactive, R) :
self.userid = userid
self.jobid = jobid
self.ranks = ranks
self.t_submit = t_submit
self.t_run = t_run
self.t_inactive = t_inactive
self.R = R
return None
@property
def elapsed(self) :
return time.mktime(self.t_inactive) - time.mktime(self.t_run)
@property
def queued(self) :
return time.mktime(self.t_run) - time.mktime(self.t_submit)
job_records = []
for index, row in dataframe.iterrows():
job_record = JobRecord(row["userid"], row["id"], row["ranks"], row["t_submit"], row["t_run"], row["t_inactive"], row["R"])
job_records.append(job_record)
from flux-accounting.
I think I am making pretty good progress on this so far. As of now we can pass in a few different options, similar to those you have listed above:
by-user show jobs run by username
by-jobid show job info from jobid
after-start-time show jobs that completed after start time
before-end-time show jobs that completed before end time
each which returns a list of JobRecord
s.
$ flux account -p /var/lib/flux/jobs.db by-user fluxuser
FOUND 6 RECORDS FROM USER 1001
RECORD 1
userid: 1001
username: fluxuser
jobid: 728382832640
t_submit: 1597421529.0616362
t_run: 1597421529.084426
t_inactive: 1597421529.1862037
nnodes: 1
RECORD 2
userid: 1001
username: fluxuser
jobid: 708015292416
t_submit: 1597421527.848035
t_run: 1597421527.8755805
t_inactive: 1597421527.9819384
nnodes: 1
.
.
.
Like we talked about in flux-framework/flux-core #3136, fields like bank
and project/wckey
can be added once those get passed into the job-archive. Currently I just have them printed to the screen, but might it also be useful for there to be an option to have the JobRecord
s written to a file, maybe in CSV or other format? @ryanday36
from flux-accounting.
Related Issues (20)
- testsuite: fix tests that look at job state HOT 1
- support bank and project updates HOT 1
- `view-bank`: `-t` option does show hierarchy for a sub bank with users in it
- per-queue user limits HOT 2
- plugin: create external `bank_info` class HOT 1
- all pending jobs killed after Flux update HOT 5
- plugin: create new `Association` class
- plugin: improve callback for `job.validate` HOT 1
- error in flux account view-job-records HOT 2
- `plugin.query`: abstract helper functions that create JSON objects of flux-accounting data HOT 1
- `job.new`: use new external functions for user/bank lookups
- plugin: support bypassing limits
- `job.update`/`job.update...queue`: use new external methods for association lookup
- `job.state.priority`: use new external function for association lookup, general function improvement
- plugin: move accounting-specific helper functions to `accounting.cpp`
- plugin: send max nodes information per-association
- plugin: create estimation of node count helper function
- docs: move flux-accounting guide to this repo HOT 1
- create script for crontab tasks HOT 3
- flux account commands hang while fairshare is being updated HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flux-accounting.