GithubHelp home page GithubHelp logo

cybergis / cybergis-compute-python-sdk Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 7.0 9.57 MB

Home Page: https://cybergis.github.io/cybergis-compute-python-sdk

License: Apache License 2.0

Python 58.49% Jupyter Notebook 41.51%

cybergis-compute-python-sdk's People

Contributors

alexandermichels avatar jtsiv1 avatar mitkotak avatar sanhitad207 avatar taylorziegler avatar zhiyuli avatar zimo-xiao avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cybergis-compute-python-sdk's Issues

[Bug] list_job() broken

The list_job() function seems to be broken. To replicate, run a job and then run:

cybergis.list_job()

Seems like a straightforward issue though, it throws an error if the remoteExecutableFolder is None. I've attached a screenshot and copy/paste of the error below:

Screenshot:
Screenshot from 2022-09-03 15-27-39

Copy/Paste:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 cybergis.list_job()

File /data/cigi/cybergisx-easybuild/conda/python3-0.9.0/lib/python3.8/site-packages/cybergis_compute_client/CyberGISCompute.py:277, in CyberGISCompute.list_job(self, raw)
    271 data = []
    272 for job in jobs['job']:
    273     data.append([
    274         job['id'],
    275         job['hpc'],
    276         job['remoteExecutableFolder']["id"] if (
--> 277             "id" in job['remoteExecutableFolder']) else None,
    278         job['remoteDataFolder']["id"] if (
    279             "id" in job['remoteDataFolder']) else None,
    280         job['remoteResultFolder']["id"] if (
    281             "id" in job['remoteResultFolder']) else None,
    282         job['remoteDataFolder'],
    283         job['remoteResultFolder'],
    284         json.dumps(job['param']),
    285         json.dumps(job['slurm']),
    286         job['userId'],
    287         job['maintainer'],
    288         job['createdAt'],
    289     ])
    291 if self.isJupyter:
    292     if len(data) == 0:

TypeError: argument of type 'NoneType' is not iterable

Cancel Button missing in SDK

Issue

SDK currently does not support ability to cancel a job. This can be an issue for users who might accidentally submit a job that they want to cancel

Solution

The proposed solution requires changes in core and sdk

Core

  • Add job/:jobId/delete_job path in server.ts
  • The validation code will be identical job/:jobId/create_job
  • Create supervisor.popJobFromQueue(job) similar to supervisor.pushJobToQueue(job) inside src/Supervisor.ts
  • Createthis.queues[job.hpc].pop(job) similar tothis.queues[job.hpc].push(job) ( Would this change only support removal of recently submitted jobs ) ?
  • Create pop(item: Job) similar to push(item:Job) inside src/Queue.ts
  • Modify redis.conf to support job poping

SDK

  • Add delete which makes a call to /job/:jobId/delete similar to submit
  • In UI create renderDelete and onDeleteButtonClick

Users Need Quick Access to CyberGIS-Compute Usernames

With the introduction of the allow/denylist on Core, users need access to their usernames without submitting jobs. The "Logged in as ...." message appears to be inconsistent so adding the username to the UI would likely be the simplest approach.

[Feature Request] An Interface to create manifest

Feature Request

One of the tasks for model contributors for CyberGIS-Compute is to create and add a manifest to the underlying GitHub repository. The manifest creation process is manual as of now. It would be ideal, if we can create a simple interface to generate a manifest from user inputs.

Describe the solution you'd like
An interface or a web form for model contributors to generate manifest files.

No "Submit New Job" Button if Job Restored

If you bring up the UI and go to the "Your Jobs" tab then restore a random job, there is not "Submit New Job" button on the "Job Configuration" Tab. See below:

image

We should include the "Submit New Job" button still so that users can run a new job through the UI without having to re-run the "show_ui()" line.

[FeatureRequest-IGUIDE]Allow to programmatically pre-fill values for model-specific parameters

the manifest.json in model repo allows to specify model-specific parameters of different types, which will be rendered on SDK GUI.
Also, default values could be defined in manifest.json for each parameter.

For example:
https://github.com/cybergis/cybergis-compute-v2-wrfhydro/blob/main/manifest.json#L35

 "param_rules": {
        "Model_Version": {
            "type": "string_input",
            "require": true,
            "default_value": "v5.2.0"
        },
        "LSM_Type": {
            "type": "string_option",
            "options": ["Noah", "NoahMP"],
            "default_value": "NoahMP"
        },
        "Forcing_Path": {
            "type": "string_input",
            "require": true,
            "default_value": "<UPLOAD>"
        },
        "Merge_Output": {
            "type": "string_option",
            "options": ["True", "False"],
            "default_value": "True"
        }
    },

IGUIDE is developing a simple "chained jobs" usercase for wrfhydro where 2 different jobs will be run in order, and some of the model-specific parameters of the 2nd job are determined by the outputs or job properties of the 1st job.
Currently, two jobs can only act independently in one notebook, and notebook enduser has to manually provide correct parameter values in the UI (textbox/dropdown/slider...) of the 2nd job to match the outputs from the 1st job, which is error-prone.

I am proposing a new dictionary-type argument "model_params" for the create_job_by_ui() function in SDK
where notebook developer can programmatically pre-fill values for model-specific parameters

In the wrfhydro example, if notebook developer wants to pre-fill 2 of the model-specific parameters:

# this value has to be identical to that of Job1
# ??? what is the correct way to retrieve model parameter values given a job object?
LSM_Type = job1.param_LSM_Type

# this value contains jobid of Job1
Forcing_Path= "{Job1_ID}/forcing".format(job1.id)

model_params_prefill = {
                                        "LSM_Type": LSM_Type, 
                                        "Forcing_Path": Forcing_Path
                                      }
create_job_by_ui(..., model_params=model_params_prefill, ...)

SDK Logs/Downloads Throw Errors if Job Doesn't Run

Expected Behavior

When a job fails, we should not display "Job Completed" and we should not attempt to view/download logs or results, but instead notify the user that the job did not run, with a reason why it did not run, and next steps (run again, who to contact, etc.).

Actual Behavior

If a job is unable to run, the SDK says that the job is completed, attempts to view the logs, and throws errors. I assume a similar error will also occur if we try to download results automatically.

image

The message says that the job completed, we should check the events to see if that's actually true.

I assume this same issue comes up with Download Results and automatic download.

Steps to Reproduce the Problem

  1. Run a job that doesn't run due to malformed HPC configs.

Explanation of `/` in the Download Tab

Users are reporting that they don't understand the dropdown in the Download tab. We should add some text there explaining that the options in the dropdown are the subdirectories in the result folder, not the files and that selecting / will download the entire results folder.

Your Jobs tab page to show model name

It seems the "Your Jobs" tab page currently does not show the model name. So users can not know what model it is

This happens on current V2 (tag 2.1) branch, not sure if this is a regression issue.

image

[Feature Request] Redirect output to Jupyter Cell Console/Output

Feature Request

In current setup, output from submitted HPC job i.e. status etc. goes to a separate tab in CyberGIS-Compute interface widget. It would be beneficial, from user's perspective, if this output can be redirected to Cell output/console instead. This would enhance overall user engagement with the CyberGIS-Compute interface.

Describe the solution you'd like
Status and other output from submitted HPC job is visible on the output console of the cell containing CyberGIS-Compute interface.

Describe alternatives you've considered
An alternative is currently available in existing implementation i.e. a tab in CyberGIS-Compute widget having status outputs from submitted HPC job.

SDK Errors out if cybergis_compute_user.json is Empty

Expected Behavior

The expected behavior would be that the SDK disregards/deletes the file and tries to authenticate using environmental variables.

Actual Behavior

If cybergis_compute_user.json is empty, the SDK seems to try to authenticate without credentials which causes a vague "invalid input['is not of type(s) string']" error. See screenshot below:

MicrosoftTeams-image

Copy/Paste of the error stack from replicating:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 cybergis.show_ui()

File /cvmfs/cybergis.illinois.edu/software/conda/cjw/python3-2022-06/lib/python3.9/site-packages/cybergis_compute_client/CyberGISCompute.py:555, in CyberGISCompute.show_ui(self, input_params, defaultJob, defaultDataFolder, defaultRemoteResultFolder, jupyterhubApiToken)
    553 if df is not None:
    554     self.ui.defaultRemoteResultFolder = df if df[0] == '/' else '/' + df
--> 555 self.ui.render()

File /cvmfs/cybergis.illinois.edu/software/conda/cjw/python3-2022-06/lib/python3.9/site-packages/cybergis_compute_client/UI.py:78, in UI.render(self)
     73 """
     74 Render main UI by initializing, rendering,
     75 and displaying each component
     76 """
     77 self.init()
---> 78 self.renderComponents()
     79 divider = Markdown('***')
     80 # render main UI
     81 # 1. job template

File /cvmfs/cybergis.illinois.edu/software/conda/cjw/python3-2022-06/lib/python3.9/site-packages/cybergis_compute_client/UI.py:155, in UI.renderComponents(self)
    153 self.renderResultLogs()
    154 self.renderDownload()
--> 155 self.renderRecentlySubmittedJobs()
    156 self.renderLoadMore()
    157 self.renderSubmitNew()

File /cvmfs/cybergis.illinois.edu/software/conda/cjw/python3-2022-06/lib/python3.9/site-packages/cybergis_compute_client/UI.py:543, in UI.renderRecentlySubmittedJobs(self)
    541 if self.recently_submitted['output'] is None:
    542     self.recently_submitted['output'] = widgets.Output()
--> 543     jobs = self.compute.client.request('GET', '/user/job', {'jupyterhubApiToken': self.compute.jupyterhubApiToken})
    544 with self.recently_submitted['output']:
    545     display(Markdown('**Recently Submitted Jobs for ' + self.compute.username.split('@', 1)[0] + '**'))

File /cvmfs/cybergis.illinois.edu/software/conda/cjw/python3-2022-06/lib/python3.9/site-packages/cybergis_compute_client/Client.py:67, in Client.request(self, method, uri, body)
     65     if 'messages' in data:
     66         msg = str(data['messages'])
---> 67     raise Exception('server ' + self.url + uri + ' responded with error "' + data['error'] + msg + '"')
     69 return data

Exception: server cgjobsup.cigi.illinois.edu:443/user/job responded with error "invalid input['is not of a type(s) string']"

This affected a user to lose access to CyberGIS-Compute for a few days and took quite a bit of debugging, so we need to make this login mechanism more robust.

Steps to Reproduce the Problem

  1. Login to CyberGISX
  2. Use the Hello World notebook once to populate a cybergis_compute_user.json file if necessary.
  3. Delete all of the contents of the file and save it.
  4. Try to run a job with CyberGISX using show_ui().

Specifications

  • Version: 0.2.3
  • Platform: CyberGISX and CJW

Fail to download big zip file from job supervisor

@zimo-xiao
There was a big summa job that has a 5.9GB zipped results file. User failed to download it though tried several times.


MemoryError Traceback (most recent call last)
in
----> 1 communitySummaJob.download(str(model_folder))
/opt/conda/envs/pysumma/lib/python3.7/site-packages/job_supervisor_client/Job.py in download(self, dir)
92 dir = self.client.download('GET', '/supervisor/download/' + self.id, {
93 "aT": self.JAT.getAccessToken()
---> 94 }, dir, self.protocol)
95 print('file successfully downloaded under: ' + dir)
96 return dir
/opt/conda/envs/pysumma/lib/python3.7/site-packages/job_supervisor_client/Client.py in download(self, method, uri, body, localDir, protocol)
33 connection.request(method, uri, json.dumps(body), headers)
34 response = connection.getresponse()
---> 35 body = response.read()
36 contentType = response.getheader('Content-Type')
37
/opt/conda/envs/pysumma/lib/python3.7/http/client.py in read(self, amt)
468 else:
469 try:
--> 470 s = self._safe_read(self.length)
471 except IncompleteRead:
472 self._close_conn()
/opt/conda/envs/pysumma/lib/python3.7/http/client.py in _safe_read(self, amt)
623 s.append(chunk)
624 amt -= len(chunk)
--> 625 return b"".join(s)
626
627 def _safe_readinto(self, b):
MemoryError:

Failing to set `jupyterhubHost` variable when Javascript is not supported (for e.g JupyterLab)

Issue

SDK is not able to find JUPYTER_INSTANCE_URL in JupyterLab environments and even if it can, it does not support Javascript to capture the variable as done here

Solution

Allow the user to specify jupyterhost ip and port and migrate to service url instead of jupyter instance url

RoadMap

if enable_jupyter raises some kind of javascript exception then prompt the user to enter the url and port similar to login

Allow to retrieve model results after SDK session dies

Users might not wait for long-running job submission and the notebook session may die (computer goes into sleep mode or browser is closed). Or in some cases, the SDK UI stops updating job status due to bugs or something else.

Assume user knows job_id, we need a way for user to check job status and retrieve model results in a new notebook session.

I wrote up a notebook for emergency use to handle this case (upon request from a user). But this notebook is NOT user-friendly.
https://github.com/cybergis/cybergis-compute-v2-wrfhydro/blob/main/retrieve_model_outputs.ipynb

It is better to add a new function to the SDK that can take a job_id, display (re-activate) the job status UI, and allow to download results.

SDK Failing Silently if Hostname Not in Globus Configuration

If the hostname isn't registered, the SDK will fail without giving any indication on the UI. The errors go to the log at the bottom of the lab interface. This needs to fail gracefully and give a better indication to the end-user.

The error is below:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
File ~/.local/python3-0.9.0/lib/python3.8/site-packages/cybergis_compute_client/UI.py:600, in UI.onSubmitButtonClick.<locals>.on_click(change)
    598 data = self.get_data()
    599 if data['computing_resource'] != 'local_hpc':
--> 600     self.jupyter_globus = self.compute.get_user_jupyter_globus()
    601 if self.job['require_upload_data']:
    602     dataPath = self.uploadData['selector'].selected

File ~/.local/python3-0.9.0/lib/python3.8/site-packages/cybergis_compute_client/CyberGISCompute.py:537, in CyberGISCompute.get_user_jupyter_globus(self)
    530 def get_user_jupyter_globus(self):
    531     """
    532     Return the current job instance
    533 
    534     Returns:
    535          Job: Latest Job object instance
    536     """
--> 537     return self.client.request(
    538         'GET', '/user/jupyter-globus', {
    539             "jupyterhubApiToken": self.jupyterhubApiToken})

File ~/.local/python3-0.9.0/lib/python3.8/site-packages/cybergis_compute_client/Client.py:67, in Client.request(self, method, uri, body)
     65     if 'messages' in data:
     66         msg = str(data['messages'])
---> 67     raise Exception('server ' + self.url + uri + ' responded with error "' + data['error'] + msg + '"')
     69 return data

Exception: server cgjobsup-dev.cigi.illinois.edu:443/user/jupyter-globus responded with error "unknown host"

UI Not Fully Reset if User "Submits New Job"

This is a minor bug, but I wanted to document it here. I've noticed two small things don't get reset when a user runs a job and then clicks "Submit New Job" button on the "Job Configuration" tab:

  1. The "Job Configuration" Tab changes name to ":white_check_mark: Your Job Status" when a job completes and does not reset when clicking "Submit New Job."
  2. The "Download Job Result" Tab changes to ":white_check_mark: Download Job Result" and has information about the prior download. You can see both in the screenshot below:
    DownloadJobFiles

I think the solutions would be:

  1. Just don't change the "Job Configuration" tab name in the first place since it never really gives information on the job status.
  2. Remove the checkmark from the tab name of "Download Job Result" and remove the text about previous downloads.

Need to Catch HPC Denied Exception

Feature Request

Is your feature request related to a problem? Please describe.

The Core has a new feature (cybergis/cybergis-compute-core#99) which allows for allow/deny lists on HPCs. However, the exception is only shown in the Jupyter logs at the bottom-left of the page:
image
image

Describe the solution you'd like

We should catch and better handle the exception so that users know what is wrong and what steps they can take to fix or work around it.

[Bug] random 504 error when downloading results back to jupyter via globus

This is a random error message on the client side that may happen during downloading model results back to jupyter using globus.
The symptom is SDK throws out a 504 bad gateway error. After rerun the show_ui cell and restore the job, it shows the globus downloading is still going though globus downloading has been done (I could see the result files on jupyter). The consequence is download path attribute of SDK object is None affecting some cells below that depend on its value. and you can not re-download the results again as it always show downloading is still going.

I just registered this issue here and will try to post more error messages when it happens again (if any).
As far I can tell this is NOT a new issue I remember it happened several times months ago, but we did not get a chance to investigate the root.

Probable case might be on the core side where globus job status stops being tracked and updated due to some errors

[Bug] "Error displaying widget: model not found" with `--force-reinstall` flag

Expected Behavior

We should be able to install the SDK with the --force-reinstall flag.

Actual Behavior

When installing on CyberGISX (and using the --force-reinstall flag), the SDK seems to break. For example, on CyberGISX right now, if you run

pip install --force-reinstall git+https://github.com/cybergis/cybergis-compute-python-sdk.git

then restart the kernel and run the cells to connect to dev and bring up the UI, you get the following error:

Error displaying widget: model not found
Click here for full output ๐Ÿ“ƒ Found "cybergis_compute_user.json" NOTE: if you want to login as another user, please remove this file ๐ŸŽฏ Logged in as [email protected] Error displaying widget: model not found hello_world Job Description: none

keeling_community HPC Description: none

Estimated Runtime: unknown

Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
๐Ÿ˜ด No Job to Work On
you need to submit your job first

โณ Waiting for Job to Finish...
Error displaying widget: model not found
Recently Submitted Jobs for alexandermichels

id slurmId hpc remoteExecutableFolder remoteDataFolder remoteResultFolder
1664464919Miqcu 942289 keeling_community {'id': '1664464920xVoco', 'name': None, 'hpc': 'keeling_community', 'hpcPath': '/data/keeling/a/cigi-gisolve/scratch/1664464920xVoco', 'globusPath': '/1664464920xVoco', 'userId': '[email protected]', 'isWritable': False, 'createdAt': '2022-09-29T15:22:01.434Z', 'updatedAt': None, 'deletedAt': None} None {'id': '16644649210br1B', 'name': None, 'hpc': 'keeling_community', 'hpcPath': '/data/keeling/a/cigi-gisolve/scratch/16644649210br1B', 'globusPath': '/16644649210br1B', 'userId': '[email protected]', 'isWritable': False, 'createdAt': '2022-09-29T15:22:01.742Z', 'updatedAt': None, 'deletedAt': None}
param slurm userId maintainer createdAt modelName
{"input_a": 50, "input_b": "foo"} {"time": "10:00", "num_of_task": 2, "cpu_per_task": 1} [email protected] community_contribution 2022-09-29T15:21:59.227Z hello_world
Error displaying widget: model not found
id slurmId hpc remoteExecutableFolder remoteDataFolder remoteResultFolder
1664464651phVmf 942287 keeling_community {'id': '16644646533Rf22', 'name': None, 'hpc': 'keeling_community', 'hpcPath': '/data/keeling/a/cigi-gisolve/scratch/16644646533Rf22', 'globusPath': '/16644646533Rf22', 'userId': '[email protected]', 'isWritable': False, 'createdAt': '2022-09-29T15:17:34.373Z', 'updatedAt': None, 'deletedAt': None} None {'id': '1664464654mfhbf', 'name': None, 'hpc': 'keeling_community', 'hpcPath': '/data/keeling/a/cigi-gisolve/scratch/1664464654mfhbf', 'globusPath': '/1664464654mfhbf', 'userId': '[email protected]', 'isWritable': False, 'createdAt': '2022-09-29T15:17:34.636Z', 'updatedAt': None, 'deletedAt': None}
param slurm userId maintainer createdAt modelName
{"input_a": 50, "input_b": "foo"} {"time": "10:00", "num_of_task": 2, "cpu_per_task": 1} [email protected] community_contribution 2022-09-29T15:17:30.757Z hello_world
Error displaying widget: model not found
id slurmId hpc remoteExecutableFolder remoteDataFolder remoteResultFolder
1663963513C5Ygc None keeling_community None None None
param slurm userId maintainer createdAt modelName
{} {} [email protected] community_contribution 2022-09-23T20:05:13.455Z None
Error displaying widget: model not found
id slurmId hpc remoteExecutableFolder remoteDataFolder remoteResultFolder
1663963051LhLUR None keeling_community None None None
param slurm userId maintainer createdAt modelName
{} {} [email protected] community_contribution 2022-09-23T19:57:30.989Z None
Error displaying widget: model not found
id slurmId hpc remoteExecutableFolder remoteDataFolder remoteResultFolder
1663277428FfZ06 None keeling_community None None None
param slurm userId maintainer createdAt modelName
{} {} [email protected] community_contribution 2022-09-15T21:30:27.518Z None
Error displaying widget: model not found
Error displaying widget: model not found
Welcome to CyberGIS-Compute
A scalable middleware framework for enabling high-performance and data-intensive geospatial research and education on CyberGIS-Jupyter

Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
๐Ÿ“‹ job events (live refresh)
Error displaying widget: model not found
๐Ÿ“‹ job logs
Output of this cell has been trimmed on the initial display.
Displaying the first 50 top outputs.
Click on this message to get the complete output.


Steps to Reproduce the Problem

  1. Reinstall the SDK on CyberGISX using the line !pip install --force-reinstall git+https://github.com/cybergis/cybergis-compute-python-sdk.git
  2. Restart the kernel
  3. Run
from cybergis_compute_client import CyberGISCompute

cybergis = CyberGISCompute(url="cgjobsup-dev.cigi.illinois.edu", isJupyter=True, protocol="HTTPS", port=443, suffix="v2")
cybergis.show_ui()

Specifications

  • Version: latest on Github (0.2.2 plus some commits), but was encountered before this version.
  • Platform: CyberGISX

JupyterLab widgets fail to render if user tries to install compute explicitly

Problem

The UI breaks If the user tries to install cybergis-compute-python-sdk on top of the version that's already provided.

import sys
!{sys.executable} -m pip install --ignore-installed git+https://github.com/cybergis/cybergis-compute-python-sdk.git

Screen Shot 2022-09-01 at 10 22 47 AM

Screen Shot 2022-09-01 at 10 20 07 AM

This is happening due to dependency conflicts as seen in the output for the pip command

Screen Shot 2022-09-01 at 10 20 56 AM

Proposed Solution

Ask users to restart their container and avoid installing the cybergis-compute-python-sdk explicitly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.