GithubHelp home page GithubHelp logo

flux-framework / flux-accounting Goto Github PK

View Code? Open in Web Editor NEW
3.0 11.0 10.0 1.08 MB

bank/accounting interface for the Flux resource manager

License: GNU Lesser General Public License v3.0

Python 33.94% Makefile 1.50% Shell 33.94% M4 7.95% C 1.87% C++ 20.24% Perl 0.25% Dockerfile 0.31%
accounting database hpc user-management priority-calculation

flux-accounting's Introduction

Build Status codecov

NOTE: The interfaces of flux-accounting are being actively developed and are not yet stable. The Github issue tracker is the primary way to communicate with the developers.

flux-accounting

Development for a bank/accounting interface for the Flux resource manager. Writes and saves user account information to persistent storage using Python's SQLite3 API. Calculates fair-share values for users and banks based on historical job data. Generates job priority values for users with a multi-factor priority plugin.

Install Instructions

For instructions for using a VSCode Development Container, see this document in flux-core. You'll want to create the environment and proceed with the steps below to build.

Building From Source
./autogen.sh
./configure --localstatedir=/var/
make -j
make check

To configure flux-accounting with a specific version of Python, pass the PYTHON_VERSION environment variable on the ./configure line (note: flux- accounting needs to be configured against the same version of Python as flux-core that it is configured against; this is the default behavior of ./configure if you choose the same prefix for flux-core and flux-accounting):

PYTHON_VERSION=3.7 ./configure --localstatedir=/var/

Configuring flux-accounting background scripts

There are a number of scripts that run in the background to update both job usage and fairshare values. These require configuration upon setup of flux-accounting. The first thing to configure when first setting up the flux-accounting database is to set the PriorityUsageResetPeriod and PriorityDecayHalfLife parameters. Both of these parameters represent a number of weeks by which to hold usage factors up to the time period where jobs no longer play a factor in calculating a usage factor. If these parameters are not passed when creating the DB, PriorityDecayHalfLife is set to 1 week and PriorityUsageResetPeriod is set to 4 weeks, i.e the flux-accounting database will store up to a month's worth of jobs broken up into one week chunks:

flux account create-db --priority-decay-half-life=2 --priority-usage-reset-period=8

The other component to load is the multi-factor priority plugin, which can be loaded with flux jobtap load:

flux jobtap load mf_priority.so

After the DB and job priority plugin are set up, the update-usage subcommand should be set up to run as a cron job. This subcommand fetches the most recent job records for every user in the flux-accounting DB and calculates a new job usage value. This subcommand takes one optional argument, --priority-decay-half-life, which, like the parameter set in the database creation step above, represents the number of weeks to hold one job usage "chunk." If not specified, this optional argument also defaults to 1 week.

flux account update-usage --priority-decay-half-life=2 path/to/job-archive-DB

After the job usage values are re-calculated and updated, the fairshare values for each user also need to be updated. This can be accomplished by configuring the flux-update-fshare script to also run as a cron job. This fetches user account data from the flux-accounting DB and recalculates and writes the updated fairshare values back to the DB.

flux account-update-fshare

Once the fairshare values for all of the users in the flux-accounting DB get updated, this updated information will be sent to the priority plugin. This script can be also be configured to run as a cron job:

flux account-priority-update

Run flux-accounting's commands:

usage: flux-account.py [-h] [-p PATH] [-o OUTPUT_FILE]
                       {view-user,add-user,delete-user,edit-user,view-job-records,create-db,add-bank,view-bank,delete-bank,edit-bank,print-hierarchy} ...

Description: Translate command line arguments into SQLite instructions for the Flux Accounting Database.

positional arguments:
  {view-user,add-user,delete-user,edit-user,view-job-records,create-db,add-bank,view-bank,delete-bank,edit-bank,print-hierarchy}
                        sub-command help
    view-user           view a user's information in the accounting database
    add-user            add a user to the accounting database
    delete-user         remove a user from the accounting database
    edit-user           edit a user's value
    view-job-records    view job records
    create-db           create the flux-accounting database
    add-bank            add a new bank
    view-bank           view bank information
    delete-bank         remove a bank
    edit-bank           edit a bank's allocation
    print-hierarchy     print accounting database

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  specify location of database file
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        specify location of output file

To run the unit tests in a Docker container, or launch into an interactive container on your local machine, you can run docker-run-checks.sh:

$ ./src/test/docker/docker-run-checks.sh --no-cache --no-home -I --
Building image el8 for user <username> <userid> group=20
[+] Building 0.7s (7/7) FINISHED

.
.
.

mounting /Users/moussa1/src/flux-framework/flux-accounting as /usr/src
[moussa1@docker-desktop src]$

User and Bank Information

The accounting tables in this database stores information like username and ID, banks to submit jobs against, allocated shares to the user, as well as static limits, including a max number of running jobs at a given time and a max number of submitted jobs per user/bank combo.

Interacting With the Accounting DB

In order to add, edit, or remove information from the flux-accounting database, you must also have read/write access to the directory that the DB file resides in. The SQLite documentation states:

Since SQLite reads and writes an ordinary disk file, the only access permissions that can be applied are the normal file access permissions of the underlying operating system.

There are two ways you can interact with the tables contained in the Accounting DB. The first way is to launch into an interactive SQLite shell. From there, you can open the database file and interface with any of the tables using SQLite commands:

$ sqlite3 path/to/FluxAccounting.db
SQLite version 3.24.0 2018-06-04 14:10:15
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> .tables
association_table bank_table

To get nicely formatted output from queries (like headers for the tables and proper spacing), you can also set the following options in your shell:

sqlite> .mode columns
sqlite> .headers on

This will output queries like the following:

sqlite> SELECT * FROM association_table;
creation_time  mod_time    deleted     username    bank        shares      max_jobs    max_wall_pj
-------------  ----------  ----------  ----------  ----------  ----------  ----------  -----------
1605309320     1605309320  0           fluxuser    foo         1           1           60       

The second way is to use flux-accounting's command line arguments. You can pass in a path to the database file, or it will default to the "compiled-in" path of ${prefix}/var/FluxAccounting.db.

With flux-accounting's command line tools, you can view a user's account information, add and remove users to the accounting database, and edit an existing user's account information:

$ flux account view-user fluxuser

creation_time    mod_time  deleted  username  bank   shares  max_jobs  max_wall_pj
   1595438356  1595438356        0  fluxuser  foo         1       100           60

Multiple rows of data can be loaded to the database at once using .csv files and the flux account-pop-db command. Run flux account-pop-db --help for .csv formatting instructions.

User and bank information can also be exported from the database using the flux account-export-db command, which will extract information from both the user and bank tables and place them into .csv files.

Release

SPDX-License-Identifier: LGPL-3.0

LLNL-CODE-764420

flux-accounting's People

Contributors

chu11 avatar cmoussa1 avatar dongahn avatar garlick avatar grondo avatar mergify[bot] avatar ryanday36 avatar stevwonder avatar vsoch avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flux-accounting's Issues

accounting_cli.py: need default values for certain fields in association_table

Currently, all of the fields in the association table are required when adding a new user. Recently though, I learned that account names are unique in Slurm, so we generally don’t need to specify the parent account when adding a user (though we do need to specify a parent when adding a new account). I don’t think that we want the parent account to be a required field in add-user. More generally, it’d be nice to have default values for several of the add-user options:

  • parent-acct=
  • admin-level=1
  • shares=1
  • max-jobs=10
  • max-wall-pj=60

repo: add licensing documents to project

Now that this repo is in flux-framework, it needs the following documents to comply with C4.1:

NOTICE
NOTICE.LLNS
COPYING
LICENSE

As well as header boilerplate to all of the files currently in the repo.

view_job_records(): function takes 2 positional arguments but 3 were given

Was trying to use view_job_records() in the interactive Docker container via the command line and am running into the following issue:

$ flux account -p /var/lib/flux/jobs.db view-job-records 

Traceback (most recent call last):
  File "/usr/libexec/flux/cmd/flux-account.py", line 11, in <module>
    load_entry_point('flux-accounting==0.1.0', 'console_scripts', 'flux-account.py')()
  File "/usr/lib64/flux/python3.6/flux_accounting-0.1.0-py3.6.egg/accounting/accounting_cli.py", line 204, in main
TypeError: view_job_records() takes 2 positional arguments but 3 were given

The function definition for view_job_records() is as follows:

def view_job_records(conn, output_file, **kwargs):

feedback on 'flux account ...' command

  1. it appears that user names currently have to be unique. We often have one user with multiple banks (accounts), so we’ll need the ability to create multiple users with the same username.
  2. account names are unique in Slurm, so we generally don’t need to specify the parent account when adding a user (though we do need to specify a parent when adding a new account). I don’t think that we want the parent account to be a required field in add-user. More generally, it’d be nice to have default values for several of the add-user options (e.g. admin-level=user --shares=1).

add docker-run-checks.sh

This repo will need a docker-run-checks.sh file, similar to the ones in flux-core and flux-sched to build the flux-accounting docker image and run tests (so, in a full directory layout, probably under src/test/docker).

They will be part of the strategy for CI testing of the flux-accounting project. This will help speed up deployment of an environment with correct build dependencies and to keep a docker image with the latest master build.

fairshare: discussions on strategy, implementation

I figured this could serve as a place to document our discussion on the strategy for calculating fairshare. I plan to update this thread with information after our Webex meeting today. Here's my background information:

Originally, I was under the impression that fairshare values were calculated by passing in a user id, fetching its association id from the accounting database, and performing a Level Fairshare calculation based on the user's association information and current jobs in the queue. Essentially, I had thought that fairshare calculations would be constantly querying information from the accounting database in order to generate a priority value.

I've since learned that these fairshare calculations (at least, in Slurm's case), are performed in memory. The scheduler sorts all of the jobs in the queue using the fair tree algorithm, sorting users by the priority their jobs should be run. It calculates job usage that's occurred over the past couple of weeks (or some other determined amount of time), also utilizing a decay factor (to more heavily weigh the more recent jobs).

Chris Morrone had a modified fair tree implementation in flux-framework/flux-sched, but we've determined that implementation was very much a prototype/work-in-progress, and is probably not usable for our own fairshare calculation.

autoconf generates config.py

We discussed this from today's coffee hour. config.py can have some variables pertaining to the Flux installation and flux-accounting proper can use them to determine default locations of things. (We also discussed that we will ultimately need TOML support to allow admins to override where the flux-accounting database etc can be found).

install: name and structure of python module

Once #73 is merged, the python module provided by this repo will be installed as accounting alongside the flux and _flux modules. To me, accounting seems a bit too generic of a name. I wonder if either we can get accounting to be installed within the flux module OR add the flux- prefix to the name (assuming - is a valid character in a python module name). Or this may just be a bad suggestion, and we leave everything how it is currently.

delete-user: document deletion of job usage history on delete

Currently, the delete-bank subcommand removes all users under that bank when it is removed. After #79 is landed, the flux-accounting database will have another table (job_usage_factor_table) that references both the association_table and bank_table; when a bank (and its subsequent users) get removed, their job usage history will also get removed from this table. It might be a good idea to:

  • document this in the delete_bank() function
# deleting a bank will also remove any users
# that fall under the bank that gets deleted,
# as well as their job usage history from
# job_usage_factor_table
def delete_bank(conn, bank):
  • document this in create_db.py where the job_usage_factor_table gets created and make a note that rows in this table will get deleted when a user/bank combination is removed.
# Job Usage Factor Table
# stores past job usage factors for users
#
# when a user gets deleted via delete_user() or delete_bank(),
# their job usage history will also be removed
logging.info("Creating job_usage_factor table in DB...")
  • present some sort of prompt when calling delete-user or delete-bank to have the user confirm a deletion of a user or bank:
$ flux account delete-user fluxuser
This will remove all job usage history for fluxuser. Are you sure you want to continue? (y/n)
$ flux account delete-bank some_bank
This will remove all job usage history for the users under the bank you are about to delete. Are you sure you want to continue? (y/n)

Then again, maybe we don't want to delete a user's job usage history in the first place! In that case, I think that would involve just removing the ON DELETE CASCADE statement from the creation of job_usage_factor_table. Then, even if a user/bank combination ever gets removed from the flux-accounting database, their job usage history will still remain in the database, which might be behavior we want.

accounting_cli.py: change 'account' arg to 'bank'

This will keep consistency between what the 'account' field is really supposed to represent - the bank that the user belongs to:

- subparser_add_user.add_argument(
-    "--account", help="account to charge jobs against", metavar="ACCOUNT",
- )
+ subparser_add_user.add_argument(
+    "--bank", help="bank to charge jobs against", metavar="BANK",
+ )

...

- elif args.func == "add_user":
-    aclif.add_user(
-        ...
-        args.account,
-        ...
-    )
+ elif args.func == "add_user":
+     aclif.add_user(
+         ...
+        args.bank,
+        ....
+     )

docker: integrate flux-accounting with fluxorama interactive image

Setting up an interactive Docker image with flux-accounting included in the fluxorama image would probably be a good testing environment for getting hands-on experience with the accounting tools.

the interactive Dockerfile right now

Here are the contents of the Dockerfile as of now:

FROM grondo/fluxorama

LABEL maintainer="Christopher Moussa <[email protected]>"

COPY . /usr/lib64/flux-accounting/

RUN yum -y update \
  && yum install -y python3-pip python3-devel git \
  && cd /usr/local/bin \
  && ln -s /usr/bin/python3 python \
  && pip3 install --user --upgrade pip \
  && pip3 install tox

# go into flux-accounting, install dependencies, and run unit tests
RUN cd /usr/lib64/flux-accounting/ \
  && make dependencies \
  && make install \
  && make check \
  && pip3 install -r requirements.txt --user \
  && cd accounting/ \
  && ./create_db.py

This will place flux-accounting into /usr/lib64/.

install process

Trying to run flux account after going into the image will result in the following error:

 [fluxuser@6c6b7e187062 flux-accounting]$ flux account -h
Traceback (most recent call last):
  File "/usr/libexec/flux/cmd/flux-account.py", line 11, in <module>
    load_entry_point('flux-accounting==0.1.0', 'console_scripts', 'flux-account.py')()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 476, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2700, in load_entry_point
    return ep.load()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2318, in load
    return self.resolve()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib64/flux/python3.6/flux_accounting-0.1.0-py3.6.egg/accounting/accounting_cli.py", line 16, in <module>
ModuleNotFoundError: No module named 'pandas'

I can go into flux-accounting and install the dependencies with pip3:

$ [fluxuser@6c6b7e187062 flux-accounting]$ pip3 install -r requirements.txt --user

And now I can run the subcommands:

 [fluxuser@6c6b7e187062 flux-accounting]$ flux account -h
usage: flux-account.py [-h] {view-user,add-user,delete-user,edit-user} ...

Description: Translate command line arguments into SQLite instructions for the
Flux Accounting Database.

positional arguments:
  {view-user,add-user,delete-user,edit-user}
                        sub-command help
    view-user           view a user's information in the accounting database
    add-user            add a user to the accounting database
    delete-user         remove a user from the accounting database
    edit-user           edit a user's value

I can write to the database with sudo, which I think makes sense because fluxuser wouldn't necessarily be an admin and have permissions to add/remove users to FluxAccounting.db:

 [fluxuser@6c6b7e187062 accounting]$ sudo flux account add-user --username=fluxuser --admin-level=1 --account=acct --parent-acct=pacct --shares=10 --max-jobs=100 --max-wall-pj=1440

But I don't need sudo to view a user's account information (which, again, I think is appropriate behavior):

 [fluxuser@6c6b7e187062 accounting]$ flux account view-user fluxuser
   id_assoc  creation_time    mod_time  deleted user_name  admin_level account parent_acct  shares  max_jobs  max_wall_pj
0         1     1594237425  1594237425        0  fluxuser            1    acct       pacct      10       100         1440

I can think of a couple of things right now that my Dockerfile needs to be improved:

  • the --install-libs option in my Makefile needed to be changed in order for make install to work. It needed to change from /usr/lib/flux/python${FLUX_PYTHON_VERSION}/ to /usr/lib64/flux/python${FLUX_PYTHON_VERSION}/. I'm assuming this is due to the location of the Flux install in the Docker image?

  • the dependencies for flux-accounting need to be linked in order for the subcommands to work. My hack was to run pip3 install -r requirements.txt --user again, but maybe there is another way to do this in the Dockerfile while the image is being built?

I'll continue to try and update this issue with changes made to the Dockerfile or the build process to get this properly set up.

Weighted tree library

@cmoussa1, @SteVwonder, @ryanday36 and @garlick chatted about this at yesterday's flux-accounting meeting and today's coffee hour. There was a consensus creating a fairtree library will position flux-accounting for our future work.

  • INPUT: An account-tree object loaded from the flux-accounting database (So a separate load code needs to be developed and @cmoussa1 agreed to take a crack at this)

  • PROCESSING: perform "sorting depth first visit" on this tree to compute fairshare values for each and every account in the account tree

  • OUTPUT: Updated account-tree object (So a separate flush code needs to be developed to update the new states into the flux-accounting database and @cmoussa1 agreed to take a crack at this)

The library approach will allow flux-accounting to remain agile until our next architectural decision is made.

repo: integrate flux-accounting with Flux install

Speaking with @dongahn on Friday, we discussed that we need to integrate flux-accounting's install with the system instance install so that it gets picked up by the flux(1) command driver, so it can be called with the following:

$ flux account view-user fluxuser

What steps are required to accomplish this? Did I miss anything else @dongahn ?

flux-account.py print-hierarchy should be more graceful where there is no account

(local_venv)  [fluxuser@7b0a61e07aa7 prototype]$ time flux-account.py create-db ./FluxAccounting.db
(local_venv)  [fluxuser@7b0a61e07aa7 prototype]$ flux-account.py print-hierarchy
Traceback (most recent call last):
  File "/usr/src/local_venv/bin/flux-account.py", line 11, in <module>
    load_entry_point('flux-accounting', 'console_scripts', 'flux-account.py')()
  File "/usr/src/accounting/accounting_cli.py", line 222, in main
    print(ph.print_full_hierarchy(conn))
  File "/usr/src/accounting/print_hierarchy.py", line 31, in print_full_hierarchy
    raise Exception("No root bank found")
Exception: No root bank found

print-hierarchy: need to improve format of print-hierarchy's output

Late last week in a meeting with @SteVwonder and @dongahn, we discussed a good first step for starting work on some of the "internals" for multi-factor job priority calculation would first be to implement a way to describe and print the hierarchy of users and their account information in a format similar to sshare.

As of now, flux-accounting has a similar function, print-hierarchy, that achieves something similar:

--------------------------------------------------
Bank: A | Shares = 100 | Parent Bank: 
--------------------------------------------------
--------------------------------------------------
Bank: B | Shares = 1 | Parent Bank: A
--------------------------------------------------
  | user1 | Shares = 1
  | user2 | Shares = 1
--------------------------------------------------
Bank: C | Shares = 1 | Parent Bank: A
--------------------------------------------------
  | user3 | Shares = 1

But I should improve this function to more closely resemble the output of sshare:

Account|User|RawShares
A||100
 B||1
  B|user1|1
  B|user2|1
 C||1
  C|user3|1

where the hierarchy of accounts can more clearly be seen.


The pseudocode for implementing this could be something as follows:

for all banks in the bank table:
  find all sub banks

  for every sub bank:
    find all associations under this sub bank

FluxAccounting.db: need a table to describe bank/account hierarchy

As discussed earlier this week, a new table describing bank and account information should be added to FluxAccounting.db, the database file storing user account information.

Currently, the lone table in this database, association_table, has a column labeled account and an optional parent_acct for every user that gets added to this database.

account       tinytext   NOT NULL
parent_acct   tinytext   NOT NULL

But the account column itself does not have any bank amounts or information itself. We could probably describe this by creating a new table, maybe called bank_table:

CREATE TABLE IF NOT EXISTS bank_table (
    account_name        text    NOT NULL,
    parent_acct_name    text,
    shares              int     NOT NULL,
    PRIMARY_KEY(account_name)
);

I believe creating a bank table would actually remove the need to have a parent_acct field in association_table, since that information would instead be stored in the bank table. This would also give us the ability to describe a bank/account hierarchy, similar to the usage files @ryanday36 has shared with us:

Account       | User   | 
root          |        |         
 root         | root   |  
 account_1    |        |  
  subacct_1   |        |  
  subacct_2   |        |  
   subacct_2_1| user1  |  
   subacct_2_2| user2  |  
  subacct_3   |        | 

note that there is no usage data as of yet, as we still have to implement usage calculation per user and per account. This would probably be a next step after the creation of a bank/account table.

The three fields, account_name, parent_acct_name, shares, are the ones I can think of that could give some baseline functionality to creating a bank hierarchy. Can you think of any other fields that might be needed? @ryanday36

add setup.py, setuptools

In order to build this project and run tests, we need to add a setup.py file and use setuptools.

bank_table: need clarification on constraints with shares

Noted in the comments from #36 (thanks @SteVwonder for bringing this up!):

And with the shares, it doesn't look like the existing bank systems enforce that the sum of the child shares must be less than or equal to the parents shares. There is just a per-level normalization step that needs to happen (see my inline comment for more details). Maybe this is something we need to discuss during a coffee hour, but it could also just be a design choice that I wasn't aware of, so feel free to ignore if so.

Some of the constraints I included in the add-bank subcommands turned out to be not technically enforced by LLNL's existing banks, notably that parent shares >= sum(child shares). In fact, none of the share constraints exist in LLNL's existing banks. The referenced example has a bank (bank1) with 3 shares that has children with 16, 48, and 32 shares each.

If that is in fact the case, then I need to remove those constraints from add-bank and edit-bank.

I should confirm with @ryanday36 on this - are constraints on shares required? Should we start to enforce those constraints in flux-accounting?

job-archive interface: filtering subcommands should instead be options

The job-archive interface currently has four subcommands that interact with the database and fetch job records:

  • by-user
  • by-jobid
  • after-start-time
  • before-end-time

These subcommands each query the database to fetch job records, but they cannot be combined with one another. For example, if you wanted to filter jobs by a specific user also before some certain time t, you would have to run two separate subcommands:

$ flux account -p path/to/db-file by-user fluxuser
...
$ flux account -p path/to/db-file before-end-time 1234
...

These should instead function like optional arguments, where combinations can be passed in to further refine queries:

$ flux account -p path/to/db-file view-job-records -u fluxuser --after-start-time 1234

bank_table: delete-bank should be recursive for parent banks

In delete-bank, my current implementation fails on the following case:

      A
      |
   |-----|
   B     C
   |     |
 |---| |---|
 D   E F   G

If A is passed into delete_bank, then it will be deleted by the first delete execute. The first select_stmt will return B, C. B will be deleted and last_parent_bank_seen set to B. C will be deleted and last_parent_bank_seen set to C. The while loop will go to the next iteration, and the select statement will return F and G (which will then be deleted). D and E will still be left in the database.

We should probably make delete-bank recursive so that we can cover a case like this. This should also include adding a deep hierarchy example in my tests that confirms it works.

print_hierarchy.cpp: move variables from 'global' scope

The following variables are defined in a global scope:

sqlite3_stmt *b_select_root_bank_stmt;
sqlite3_stmt *b_select_shares_stmt;
sqlite3_stmt *b_select_sub_banks_stmt;
sqlite3_stmt *b_select_associations_stmt;

As @SteVwonder noted in #64, if we want to expose get_sub_banks() as a function that other parts of the codebase can call, then we should instead move these definitions to inside main() and passed down to get_sub_banks().

job_archive_interface: implement job usage calculation

Okay, now that #57 has landed and the job-archive interface functions are their own module, we can move on and start to extend this module's functionality beyond just job record listing.

We discussed last week about this module playing a part in some of the "internal" work for fair share calculation. Although the static information comes directly from the flux-accounting database (user account information and bank information), this specific module (which interacts with flux-core's job-archive module) can fetch information pertaining to job history and calculate the usage factor in fair share.

Slurm's definition of usage

If we wanted to calculate job usage according to Slurm's definition, it would be the following:

U_normalized = U_user / U_total

where:

U_user is the processor * seconds consumed by all of a user's jobs in a given account for over a fixed time period and U_total is the total number of processor * seconds utilized across the cluster during that same time period.

So, I think this would entail fetching job records for some user U, summing up the number of cores and multiplying it by t_elapsed across those same jobs, and dividing it by the number of cores * t_elapsed across all job records in the job-archive. To only take the jobs that have completed in the past x minutes, we can add another filter to fetching job records to only fetch jobs that have completed after some certain time t (this could probably help represent the decay factor in calculating job usage, where more recent jobs weigh heavier than older jobs).

One note I thought of that I think we should keep in mind: in another issue thread, @ryanday36 made this good comment about charging usage:

Accounting by total number of cores works fine as long as users get charged for all of the cores when we give them a whole node. i.e., in slurm, a user can run something like 'srun -n1 ...' and only ask for 1 task on one core, but, on most of our clusters, we're giving them the whole node anyway, so we need to charge them for all 36 (or whatever) cores.

So, if I am understanding this correctly, that means even if a user submits a job with the following job spec:

> flux mini submit -n 1 hostname

this would mean we would need to charge for the whole node when performing the usage calculation, right? If that's the case, would our usage calculation instead become a product of the total num_nodes and t_elapsed across a user's jobs?

FWIW, I submitted two jobs in the fluxorama Docker container:

> [fluxuser@d215a637d63f flux-ba]$ flux mini submit -n 1 hostname
ƒAXNoQrj
> [fluxuser@d215a637d63f flux-ba]$ flux mini submit -N 1 hostname
ƒD3dvZMq

and the result of their two job specs are fairly similar:

ƒAXNoQrj

[
  {
    "type": "slot",
    "count": 1,
    "with": [
      {
        "type": "core",
        "count": 1
      }
    ],
    "label": "task"
  }
]
ƒD3dvZMq

[
  {
    "type": "node",
    "count": 1,
    "with": [
      {
        "type": "slot",
        "count": 1,
        "with": [
          {
            "type": "core",
            "count": 1
          }
        ],
        "label": "task"
      }
    ]
  }
]

So I could probably re-use the count_ranks() helper function to count the number of nodes for a users' jobs and the t_elapsed property for all of the job records. And, from speaking with @chu11, to get a core count from a node, we'd have to look up node hardware info, which isn't stored in a job's R as of now.

So, maybe to start, I can calculate a usage factor based on nnodes * t_elapsed, and then adjust if we need to instead calculate a precise number of cores.

create_db.py: break up current database into multi-table DB

Currently, create_db.py only creates one table that holds all information returned by flux job list-inactive

write_jobs.py writes all job record information to that one table, inactive. We will want to convert this into a multi-table database, one that probably contains the following tables to start:

  1. job table and/or job step table
  • job statistics for every job (job id, user, group, project/wckey, nodes, submit time, start time, end time, elapsed time, hosts)
  1. node state changes with reasons for entering a failed state

  2. table for tracking wckeys

  3. a user table to differentiate different roles such as an operator, coordinator, and admin role

"flux-account.py -h" takes 5 secs to print out messages

This may be due to the way that I built my flux-accounting development environment. But there seems to be an excessive startup cost from flux-account.py. Is this expected?

(local_venv)  [fluxuser@7b0a61e07aa7 prototype]$ time flux-account.py -h
usage: flux-account.py [-h] [-p PATH] [-o OUTPUT_FILE]
                       {view-user,add-user,delete-user,edit-user,view-job-records,create-db,add-bank,view-bank,delete-bank,edit-bank,print-hierarchy}
                       ...

Description: Translate command line arguments into SQLite instructions for the
Flux Accounting Database.

positional arguments:
  {view-user,add-user,delete-user,edit-user,view-job-records,create-db,add-bank,view-bank,delete-bank,edit-bank,print-hierarchy}
                        sub-command help
    view-user           view a user's information in the accounting database
    add-user            add a user to the accounting database
    delete-user         remove a user from the accounting database
    edit-user           edit a user's value
    view-job-records    view job records
    create-db           create the flux-accounting database
    add-bank            add a new bank
    view-bank           view bank information
    delete-bank         remove a bank
    edit-bank           edit a bank's allocation
    print-hierarchy     print accounting database

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  specify location of database file
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        specify location of output file
(local_venv)  [fluxuser@7b0a61e07aa7 prototype]$ time flux-account.py create-db ./FluxAccounting.db

real	0m7.108s
user	0m0.690s
sys	0m0.733s

priority plugin: factor gap analysis

As discussed in our remote coffee hour yesterday, and a little in #7, I think this might be a good place to record our gap analysis for multi-factor job priority. I'll make a table listing the factors used, and we can edit this comment as we look at what we currently have in our architecture vs. what we don't.

Factor Description does Flux already manage this?
Age the length of time a job has been waiting in the queue, eligible to be scheduled job-manager's t_submit
Association a factor associated with each association this could be managed in its own table in flux-framework/flux-accounting/create_db.py, which is planned to contain an association table for each cluster
Fair-share the difference between the portion of the computing resource that has been promised and the amount of resources that has been consumed we discussed multiple approaches on how this would be implemented, but one idea would be to have a fairshare.py program that can calculate a fairshare value for a user when its userid is passed in
Job size the number of nodes or CPUs a job is allocated
Nice a factor that can be controlled by users to prioritize their own jobs job-manager's priority
Partition a factor associated with each node partition
QOS a factor associated with each Quality Of Service
Site a factor dictated by an administrator or a site-developed job_submit or site_factor plugin
TRES each TRES Type has its own factor for a job which represents the number of requested/allocated TRES Type in a given partition

Each of these factors can be assigned their own weight from 0 to 1. For example, we could place a greater weight on fair-share (or fair-tree) and reduce the weight on Site or QOS. These factors produce a weighted sum using all of the factors listed:

Job_priority =
	site_factor +
	(PriorityWeightAge) * (age_factor) +
	(PriorityWeightAssoc) * (assoc_factor) +
	(PriorityWeightFairshare) * (fair-share_factor) +
	(PriorityWeightJobSize) * (job_size_factor) +
	(PriorityWeightPartition) * (partition_factor) +
	(PriorityWeightQOS) * (QOS_factor) +
	SUM(TRES_weight_cpu * TRES_factor_cpu,
	    TRES_weight_<type> * TRES_factor_<type>,
	    ...)
	- nice_factor

example configuration:

PriorityType=priority/multifactor
PriorityWeightAge=1000
PriorityWeightFairshare=99000
PriorityWeightJobSize=0
PriorityWeightQOS=200000

This would result in an equation like the following. If no specific weight is listed in the .conf file, I believe its default value is 1.

Job_priority =
	site_factor +
	(1000) * (age_factor) +
	(1) * (assoc_factor) +
	(99000) * (fair-share_factor) +
	(0) * (job_size_factor) +
	(1) * (partition_factor) +
	(200000) * (QOS_factor) +
	SUM(TRES_weight_cpu * TRES_factor_cpu,
	    TRES_weight_<type> * TRES_factor_<type>,
	    ...)
	- nice_factor

create_db.py: make file path to database configurable

create_db.py, the script responsible for creating the database file and the table inside the database, does not take any file path arguments. This means that when this script is called, the .db file (FluxAccounting.db) gets placed in the same directory:

.
|-- AUTHORS
|-- CONTRIBUTING.md
|-- COPYING
|-- LICENSE -> COPYING
|-- Makefile
|-- NOTICE -> NOTICE.LLNS
|-- NOTICE.LLNS
|-- README.md
|-- accounting
|   |-- FluxAccounting.db
|   |-- __init__.py
|   |-- accounting_cli.py
|   |-- accounting_cli_functions.py
|   |-- create_db.py

This is because of the sqlite3.connect() line in create_db.py, which takes a filepath string as an argument:

conn = sqlite3.connect(filepath)

which, for now, is just set to the same directory as create_db.py:

def main():
    create_db("FluxAccounting.db")

From talks with @SteVwonder, this functionality probably needs to undergo a couple of changes:

  • the database file probably needs to created in a different location, one that can be specified by the user when they call this script. Probably something like the following:
$ ./create_db.py path/to/db

This change will also need to be reflected in accounting_cli.py, because it establishes a connection to the .db file when it is called with any of the subcommands (add-user, edit-user, etc.).

Maybe a strategy to accomplish this is to create a subcommand for both scripts to set the .db file location?

$ ./create_db.py set-db-location --filepath=/path/to/db
$ flux account set-db-location --filepath=/path/to/db
  • the commands that make a connection to/interact with the database probably need a failsafe of some kind in case it does not establish a connection to the database. If it doesn't, the sqlite3 API will default to creating an empty .db file, which is no good. I think a workaround for this would be to interpret the database as a URI; this way, we can specify only enable read/write mode for the subcommands [1].
conn = sqlite3.connect('file:path/to/db?mode=rw', uri=True)

create_db.py: need to be able to create multiple accounts with the same username

The current structure of the association_table in flux-accounting has the restriction of having unique usernames. However, I recently learned that often times there is one user with multiple banks (accounts), so we’ll need the ability to create multiple users with the same username.

This should be a relatively simple fix, for the current table has the following parameters:

CREATE TABLE IF NOT EXISTS association_table (
                id_assoc      integer                           PRIMARY KEY,
                creation_time bigint(20)            NOT NULL,
                mod_time      bigint(20)  DEFAULT 0 NOT NULL,
                deleted       tinyint(4)  DEFAULT 0 NOT NULL,
                user_name     tinytext    UNIQUE    NOT NULL,
                admin_level   smallint(6) DEFAULT 1 NOT NULL,
                account       tinytext              NOT NULL,
                parent_acct   tinytext              NOT NULL,
                shares        int(11)     DEFAULT 1 NOT NULL,
                max_jobs      int(11)               NOT NULL,
                max_wall_pj   int(11)               NOT NULL
);

If we remove the UNIQUE tab from the user_name field, we should be able to create multiple accounts with the same username, but different accounts.

job-archive: prefer a command line interface to accounting database to querying the database directly

On the question about whether it’s preferable for me to write my own scripts to directly query the database vs. having a front-end interface, I think it’s preferable to have a front-end interface. In general, I think more folks are more comfortable with a command line interface rather than directly interacting with the database and having a front-end interface can allow you more granularity in what types of queries different users are able to make. In Slurm, the separation of the database and the front-end interface also allows them to make changes to the database to add new functionality etc. without changing the user interface to the job data.

repo: clean up unused 'import' statements

A series of files in src/bindings/python/flux/accounting/ have unused import statements in them, as revealed by flake8:

accounting_cli_functions.py

accounting_cli_functions.py:14:1: F401 'pwd' imported but unused
accounting_cli_functions.py:15:1: F401 'csv' imported but unused

create_db.py

create_db.py:13:1: F401 'pandas as pd' imported but unused
create_db.py:15:1: F401 'argparse' imported but unused

job_archive_interface.py

job_archive_interface.py:12:1: F401 'sqlite3' imported but unused
job_archive_interface.py:16:1: F401 'os' imported but unused

print_hierarchy.py

print_hierarchy.py:12:1: F401 'sqlite3' imported but unused

view_job_records(): separate function into its own module

view_job_records()'s use case as of now is pretty much for generating job usage reports. You can pass a series of options to the function which dynamically generates a SQL statement to fetch job records which will get printed to stdout (or, you can specify an output file with the -o option and they will get written there instead, delimited by |):

[fluxuser@4b2cc1954de4 flux-ba]$ flux account -p /var/lib/flux/jobs.db view-job-records
   UserID  Username          JobID         T_Submit            T_Run       T_Inactive  Nodes                                                                              R
0    2201     user1   690986418176 1602260764.59201 1602260764.61358 1602260764.69989      1    {"version":1,"execution":{"R_lite":[{"rank":"0","children":{"core":"0"}}]}}
1    2201     user1   627987972096 1602260760.83690 1602260760.86002 1602260760.94067      1  {"version":1,"execution":{"R_lite":[{"rank":"0","children":{"core":"0-1"}}]}}
2    2001  fluxuser   417870118912 1602260748.31297 1602260748.33159 1602260748.39556      1    {"version":1,"execution":{"R_lite":[{"rank":"0","children":{"core":"0"}}]}}
3    2001  fluxuser   402435080192 1602260747.39266 1602260747.41392 1602260747.50026      1    {"version":1,"execution":{"R_lite":[{"rank":"0","children":{"core":"0"}}]}}
4    2202     user2  1093119508480 1602260788.56110 1602260788.58296 1602260788.66995      1  {"version":1,"execution":{"R_lite":[{"rank":"0","children":{"core":"0-2"}}]}}
5    2202     user2  1033342287872 1602260784.99817 1602260785.01889 1602260785.10323      1    {"version":1,"execution":{"R_lite":[{"rank":"0","children":{"core":"0"}}]}}

Discussed in our weekly team meeting yesterday, another use case for this function could also be to fetch job information for the purpose of calculating usage values for users.

I think a good first step would be to separate this function from accounting_cli_functions.py and place it in its own file, where we can extend its functionality to support other use cases.

repo: squash previous commits into one "initial" commit

When importing flux-accounting into flux-framework, all of the previous (and very messy) commit history came along with it. These should be squashed into one "initial commit", which will allow all future work to comply with the C4.1 process.

flux account -h errors out

I used the python virtual environment to create a local development environment (e.g., using python setup.py develop).

(local_venv)  [fluxuser@7b0a61e07aa7 test]$ which flux-account.py
/usr/src/local_venv/bin/flux-account.py

And then I set up

(local_venv)  [fluxuser@7b0a61e07aa7 test]$ export FLUX_EXEC_PATH_PREPEND=/usr/src/local_venv/bin

But flux account command encounters an error

(local_venv)  [fluxuser@7b0a61e07aa7 test]$
(local_venv)  [fluxuser@7b0a61e07aa7 test]$
(local_venv)  [fluxuser@7b0a61e07aa7 test]$
(local_venv)  [fluxuser@7b0a61e07aa7 test]$ flux account -h
Traceback (most recent call last):
  File "/usr/src/local_venv/bin/flux-account.py", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3095, in <module>
    @_call_aside
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3079, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3108, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 570, in _build_master
    ws.require(__requires__)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 888, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 774, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'flux-accounting' distribution was not found and is required by the application

create_db.py: primary key for bank_table should be a fixed data type

Peeled off from one of the comments from #34:

I think people typically use a fixed data type like integer as the primary key because this will have performance advantages: Please see, for example, the following -- https://stackoverflow.com/questions/15477005/what-are-the-pros-and-cons-for-choosing-a-character-varying-data-type-for-primar

We may want to consider changing the primary key for the bank_table to an integer, maybe in the form of a bank_id or something.

print-hierarchy: add usage factor to print-hierarchy

After #79 lands, one of the immediate next steps will be to add the value returned by calc_usage_factor() to the output of print-hierarchy:

$ flux account print-hierarchy
Bank|User|RawShares|Usage
A||1
 B||1
  D||1
   D|user1|1|123
  E||1
 C||1
  F||1
   F|user2|1|456
   F|user3|1|789
  G||1
   G|user4|1|123

We can add this usage value to the association_table in the flux-accounting database for every user/bank combination in the database:

usage real DEFAULT 0.0,

And add an update_stmt to this value in the association_table at the end of calc_usage_factor().

The other thing we'll need to change is the query used to fetch user information in print_hierarchy.py and print_hierarchy.cpp:

"""
SELECT association_table.username,
association_table.shares,
association_table.usage,
association_table.bank
FROM association_table
WHERE association_table.bank=?
"""

Add dependencies into README.md etc

It would be good to add dependencies needed to play with these tools. I created a docker off of flux-sched, which seems to bring in most but not all: like pandas.

I was a bit lost in my attempt to play with these tools. If these tools can be used to some extent at this point, it will be good to add them to README.md.

Finally, a typo in the last sentence:

< ... retrieve user account information amd job record history and

... retrieve user account information and job ...

[discussion] where should multi-factor priorities be enforced

Spin off from #7 (comment)

We still haven't decided whether multi-factor priority plugin will sort jobs at the job-manager level or the external scheduler (e.g., flux-sched) level. While my preference is to do this at the job-manager level, we have to ensure this will not lead to "ALLOC" thrashing problem. Let me open up a ticket and reason about whether the "ALLOC" thrashing will be a real issue or not.

This really goes back to how often multi-factor priorities could lead to a job ordering change in job-manager. Maybe we can reason about this step by step by looking at one factor at a time and also whether there will be combined effect when multiple factors interplay (e.g., queue time + job size...)

create_db.py: add error handling for failed DB creation

As mentioned in the comments in #36, there is no error-handling the create_db function. Even if the database fails to be created, there will be no error message or return code. We'll need to add some output in case the database fails to get created.

job archive interface tests: integrate unittest.mock()

@SteVwonder pointed out in #79 that time.time() can sometimes cause problems with unit tests. The job-archive-interface unit tests use time.time() in a number of those tests, more specifically with creating the fake job records and their timestamps.

We can instead use mock to override time.time() within our tests:

@mock.patch('time.time', mock.MagicMock(return_value=12345))
def test_00_some_unit_test(self):

[discussion] fairshare: implementation strategy of fairshare factor for submitted jobs

I figured this thread could be used to discuss the implementation strategy for a fairshare factor for users when submitting jobs.

Some background - my notes from Slurm docs

Users are granted permission to submit jobs against specific accounts. A user is granted a certain number of shares of the account they can charge jobs against.

flux-accounting keeps track of the number of shares allocated to a user in its association_table. To calculate a user's normalized shares, we would also need the number of shares allocated to all the siblings, to the account, its sibling accounts, its parent account, parent-siblings, and so on...

Usage for jobs is calculated using processor*seconds for every job, which are tracked in real-time alongside a decay factor D, which is a number of seconds from which to de-prioritize older job usage. That is then divided by the total number of processor*seconds to find a user's normalized usage that spans multiple time periods.

To my knowledge, the usage we care about is node usage, so maybe instead of processor*seconds, we can use node*seconds, which the job-archive DB already tracks, under the R field.

So, the simplified formula for calculating a fair-share factor for usage that spans multiple time periods and subject to a half-life decay is:

F = 2**(-U/S)

where:

F is the fair-share factor

S is the normalized shares

U is the normalized usage factoring in half-life decay (by default, this is five minutes)

But, since accounts have multiple users charging jobs against it, another layer of fairness is needed. A job's fair-share factor must be influenced by the computing resources delivered to jobs of other users drawing from the same account. So, the actual formula used is a refinement of the above:

F = 2**(-U/S)

where:

U is really effective usage, which is defined as:

U = U(child) + ( (U(parent) - U(child)) * S(child)/S(siblings) )


Implementation breakdown

So, flux-accounting already has a field that keeps track of the shares allocated to a user within an account. If no shares are specified when adding a user, then that user is given 1 share.

In terms of tracking usage, I think flux-core's job-archive will be able to calculate a user's job usage with the R field. Time can be calculated by subtracting t_inactive from t_run. nodes should also be able to be extracted from job-archive.

At first glance, it seems to me we would need to gather input from three different sources to generate a fair share factor:

  • the current queue of jobs to see which users have submitted jobs
  • the number of shares allocated to each user from flux-accounting's association_table
  • the most recent job usage from each user (and their respective account) from flux-core's job-archive. (EDIT: the most recent job usage from each user will probably be grabbed from memory, not a persistent database)

Is this a sound strategy? This probably warrants some more discussion as to whether this route is the one we want to pursue.

Is the scheduler the one that ultimately applies this fair share factor (and consequently re-orders jobs) to all of the jobs submitted to a queue? Maybe a fairshare calculation can be done elsewhere and passed to the scheduler?

[discussion] FluxAccounting.db: database file protection

A question that came up during today's call was how to protect the SQLite database file in flux-accounting.

One option that comes to mind is to protect the file with just Unix file permissions, only allowing read/write access to the admins and/or those who manage user accounts.

From just a few minutes of Google-ing, it looks like the sqlite3 module doesn't provide a direct way to password protect the .db file on creation. Instead, an outside tool like SQLCipher would have to be installed. I don't know if I exactly prefer this option because this would add another dependency.

build system: convert to autotools

In a meeting today, we discussed converting over from setuptools to autotools. Benefits:

  • More consistent across the various flux-framework repos
  • Allows C++ within this repo
  • Flux-core already has the automake/TAP magic to get python unittests working with the build system and make check
  • Flux-sched already has the automake/configure magic to easily install a project within the same prefix as flux-core

Plan of attack:

  • @dongahn will do an initial copy over from flux-sched to support C++
  • @SteVwonder will do a copy over from flux-core of the python/TAP magic

create_db.py: add ability to create database with subcommand instead of calling script directly

The current way to create the flux-accounting database is to call the Python script create_db.py directly:

$ ./create_db.py -p path/to/FluxAccounting.db

It might be a good idea to instead wrap this under a subcommand so that it doesn't have to be run directly from the same directory where the file lies:

$ flux account create-db path/to/FluxAccounting.db

This should be relatively straightforward change, since the script can already be imported in order to run some of the unit tests:

from accounting import create_db as c

# create database and make sure it exists
def test_00_test_create_db(self):
    c.create_db("FluxAccounting.db")

    assert os.path.exists("FluxAccounting.db")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.