cybergis / cybergis-compute-python-sdk Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://cybergis.github.io/cybergis-compute-python-sdk
License: Apache License 2.0
Home Page: https://cybergis.github.io/cybergis-compute-python-sdk
License: Apache License 2.0
The list_job()
function seems to be broken. To replicate, run a job and then run:
cybergis.list_job()
Seems like a straightforward issue though, it throws an error if the remoteExecutableFolder
is None. I've attached a screenshot and copy/paste of the error below:
Copy/Paste:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 cybergis.list_job()
File /data/cigi/cybergisx-easybuild/conda/python3-0.9.0/lib/python3.8/site-packages/cybergis_compute_client/CyberGISCompute.py:277, in CyberGISCompute.list_job(self, raw)
271 data = []
272 for job in jobs['job']:
273 data.append([
274 job['id'],
275 job['hpc'],
276 job['remoteExecutableFolder']["id"] if (
--> 277 "id" in job['remoteExecutableFolder']) else None,
278 job['remoteDataFolder']["id"] if (
279 "id" in job['remoteDataFolder']) else None,
280 job['remoteResultFolder']["id"] if (
281 "id" in job['remoteResultFolder']) else None,
282 job['remoteDataFolder'],
283 job['remoteResultFolder'],
284 json.dumps(job['param']),
285 json.dumps(job['slurm']),
286 job['userId'],
287 job['maintainer'],
288 job['createdAt'],
289 ])
291 if self.isJupyter:
292 if len(data) == 0:
TypeError: argument of type 'NoneType' is not iterable
SDK currently does not support ability to cancel a job. This can be an issue for users who might accidentally submit a job that they want to cancel
The proposed solution requires changes in core and sdk
job/:jobId/delete_job
path in server.ts
job/:jobId/create_job
supervisor.popJobFromQueue(job)
similar to supervisor.pushJobToQueue(job)
inside src/Supervisor.ts
this.queues[job.hpc].pop(job)
similar tothis.queues[job.hpc].push(job)
( Would this change only support removal of recently submitted jobs ) ?pop(item: Job)
similar to push(item:Job)
inside src/Queue.ts
redis.conf
to support job popingdelete
which makes a call to /job/:jobId/delete
similar to submit
renderDelete
and onDeleteButtonClick
With the introduction of the allow/denylist on Core, users need access to their usernames without submitting jobs. The "Logged in as ...." message appears to be inconsistent so adding the username to the UI would likely be the simplest approach.
One of the tasks for model contributors for CyberGIS-Compute is to create and add a manifest to the underlying GitHub repository. The manifest creation process is manual as of now. It would be ideal, if we can create a simple interface to generate a manifest from user inputs.
Describe the solution you'd like
An interface or a web form for model contributors to generate manifest files.
If you bring up the UI and go to the "Your Jobs" tab then restore a random job, there is not "Submit New Job" button on the "Job Configuration" Tab. See below:
We should include the "Submit New Job" button still so that users can run a new job through the UI without having to re-run the "show_ui()" line.
the manifest.json in model repo allows to specify model-specific parameters of different types, which will be rendered on SDK GUI.
Also, default values could be defined in manifest.json for each parameter.
For example:
https://github.com/cybergis/cybergis-compute-v2-wrfhydro/blob/main/manifest.json#L35
"param_rules": {
"Model_Version": {
"type": "string_input",
"require": true,
"default_value": "v5.2.0"
},
"LSM_Type": {
"type": "string_option",
"options": ["Noah", "NoahMP"],
"default_value": "NoahMP"
},
"Forcing_Path": {
"type": "string_input",
"require": true,
"default_value": "<UPLOAD>"
},
"Merge_Output": {
"type": "string_option",
"options": ["True", "False"],
"default_value": "True"
}
},
IGUIDE is developing a simple "chained jobs" usercase for wrfhydro where 2 different jobs will be run in order, and some of the model-specific parameters of the 2nd job are determined by the outputs or job properties of the 1st job.
Currently, two jobs can only act independently in one notebook, and notebook enduser has to manually provide correct parameter values in the UI (textbox/dropdown/slider...) of the 2nd job to match the outputs from the 1st job, which is error-prone.
I am proposing a new dictionary-type argument "model_params" for the create_job_by_ui() function in SDK
where notebook developer can programmatically pre-fill values for model-specific parameters
In the wrfhydro example, if notebook developer wants to pre-fill 2 of the model-specific parameters:
# this value has to be identical to that of Job1
# ??? what is the correct way to retrieve model parameter values given a job object?
LSM_Type = job1.param_LSM_Type
# this value contains jobid of Job1
Forcing_Path= "{Job1_ID}/forcing".format(job1.id)
model_params_prefill = {
"LSM_Type": LSM_Type,
"Forcing_Path": Forcing_Path
}
create_job_by_ui(..., model_params=model_params_prefill, ...)
When a job fails, we should not display "Job Completed" and we should not attempt to view/download logs or results, but instead notify the user that the job did not run, with a reason why it did not run, and next steps (run again, who to contact, etc.).
If a job is unable to run, the SDK says that the job is completed, attempts to view the logs, and throws errors. I assume a similar error will also occur if we try to download results automatically.
The message says that the job completed, we should check the events to see if that's actually true.
I assume this same issue comes up with Download Results and automatic download.
Users are reporting that they don't understand the dropdown in the Download tab. We should add some text there explaining that the options in the dropdown are the subdirectories in the result folder, not the files and that selecting /
will download the entire results folder.
In current setup, output from submitted HPC job i.e. status etc. goes to a separate tab in CyberGIS-Compute interface widget. It would be beneficial, from user's perspective, if this output can be redirected to Cell output/console instead. This would enhance overall user engagement with the CyberGIS-Compute interface.
Describe the solution you'd like
Status and other output from submitted HPC job is visible on the output console of the cell containing CyberGIS-Compute interface.
Describe alternatives you've considered
An alternative is currently available in existing implementation i.e. a tab in CyberGIS-Compute widget having status outputs from submitted HPC job.
The expected behavior would be that the SDK disregards/deletes the file and tries to authenticate using environmental variables.
If cybergis_compute_user.json
is empty, the SDK seems to try to authenticate without credentials which causes a vague "invalid input['is not of type(s) string']" error. See screenshot below:
Copy/Paste of the error stack from replicating:
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 cybergis.show_ui()
File /cvmfs/cybergis.illinois.edu/software/conda/cjw/python3-2022-06/lib/python3.9/site-packages/cybergis_compute_client/CyberGISCompute.py:555, in CyberGISCompute.show_ui(self, input_params, defaultJob, defaultDataFolder, defaultRemoteResultFolder, jupyterhubApiToken)
553 if df is not None:
554 self.ui.defaultRemoteResultFolder = df if df[0] == '/' else '/' + df
--> 555 self.ui.render()
File /cvmfs/cybergis.illinois.edu/software/conda/cjw/python3-2022-06/lib/python3.9/site-packages/cybergis_compute_client/UI.py:78, in UI.render(self)
73 """
74 Render main UI by initializing, rendering,
75 and displaying each component
76 """
77 self.init()
---> 78 self.renderComponents()
79 divider = Markdown('***')
80 # render main UI
81 # 1. job template
File /cvmfs/cybergis.illinois.edu/software/conda/cjw/python3-2022-06/lib/python3.9/site-packages/cybergis_compute_client/UI.py:155, in UI.renderComponents(self)
153 self.renderResultLogs()
154 self.renderDownload()
--> 155 self.renderRecentlySubmittedJobs()
156 self.renderLoadMore()
157 self.renderSubmitNew()
File /cvmfs/cybergis.illinois.edu/software/conda/cjw/python3-2022-06/lib/python3.9/site-packages/cybergis_compute_client/UI.py:543, in UI.renderRecentlySubmittedJobs(self)
541 if self.recently_submitted['output'] is None:
542 self.recently_submitted['output'] = widgets.Output()
--> 543 jobs = self.compute.client.request('GET', '/user/job', {'jupyterhubApiToken': self.compute.jupyterhubApiToken})
544 with self.recently_submitted['output']:
545 display(Markdown('**Recently Submitted Jobs for ' + self.compute.username.split('@', 1)[0] + '**'))
File /cvmfs/cybergis.illinois.edu/software/conda/cjw/python3-2022-06/lib/python3.9/site-packages/cybergis_compute_client/Client.py:67, in Client.request(self, method, uri, body)
65 if 'messages' in data:
66 msg = str(data['messages'])
---> 67 raise Exception('server ' + self.url + uri + ' responded with error "' + data['error'] + msg + '"')
69 return data
Exception: server cgjobsup.cigi.illinois.edu:443/user/job responded with error "invalid input['is not of a type(s) string']"
This affected a user to lose access to CyberGIS-Compute for a few days and took quite a bit of debugging, so we need to make this login mechanism more robust.
cybergis_compute_user.json
file if necessary.show_ui()
.It seems Jupyter kernel may "sleep" and SDK UI may get stuck if there is a long compute job runs on HPC.
A workaround would be add a "Refresh" button to the SDK UI allowing user to manually refresh job status if needed.
@zimo-xiao
There was a big summa job that has a 5.9GB zipped results file. User failed to download it though tried several times.
MemoryError Traceback (most recent call last)
in
----> 1 communitySummaJob.download(str(model_folder))
/opt/conda/envs/pysumma/lib/python3.7/site-packages/job_supervisor_client/Job.py in download(self, dir)
92 dir = self.client.download('GET', '/supervisor/download/' + self.id, {
93 "aT": self.JAT.getAccessToken()
---> 94 }, dir, self.protocol)
95 print('file successfully downloaded under: ' + dir)
96 return dir
/opt/conda/envs/pysumma/lib/python3.7/site-packages/job_supervisor_client/Client.py in download(self, method, uri, body, localDir, protocol)
33 connection.request(method, uri, json.dumps(body), headers)
34 response = connection.getresponse()
---> 35 body = response.read()
36 contentType = response.getheader('Content-Type')
37
/opt/conda/envs/pysumma/lib/python3.7/http/client.py in read(self, amt)
468 else:
469 try:
--> 470 s = self._safe_read(self.length)
471 except IncompleteRead:
472 self._close_conn()
/opt/conda/envs/pysumma/lib/python3.7/http/client.py in _safe_read(self, amt)
623 s.append(chunk)
624 amt -= len(chunk)
--> 625 return b"".join(s)
626
627 def _safe_readinto(self, b):
MemoryError:
SDK is not able to find JUPYTER_INSTANCE_URL
in JupyterLab environments and even if it can, it does not support Javascript to capture the variable as done here
Allow the user to specify jupyterhost ip and port and migrate to service url instead of jupyter instance url
if enable_jupyter raises some kind of javascript exception then prompt the user to enter the url and port similar to login
Users might not wait for long-running job submission and the notebook session may die (computer goes into sleep mode or browser is closed). Or in some cases, the SDK UI stops updating job status due to bugs or something else.
Assume user knows job_id, we need a way for user to check job status and retrieve model results in a new notebook session.
I wrote up a notebook for emergency use to handle this case (upon request from a user). But this notebook is NOT user-friendly.
https://github.com/cybergis/cybergis-compute-v2-wrfhydro/blob/main/retrieve_model_outputs.ipynb
It is better to add a new function to the SDK that can take a job_id, display (re-activate) the job status UI, and allow to download results.
If the hostname isn't registered, the SDK will fail without giving any indication on the UI. The errors go to the log at the bottom of the lab interface. This needs to fail gracefully and give a better indication to the end-user.
The error is below:
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
File ~/.local/python3-0.9.0/lib/python3.8/site-packages/cybergis_compute_client/UI.py:600, in UI.onSubmitButtonClick.<locals>.on_click(change)
598 data = self.get_data()
599 if data['computing_resource'] != 'local_hpc':
--> 600 self.jupyter_globus = self.compute.get_user_jupyter_globus()
601 if self.job['require_upload_data']:
602 dataPath = self.uploadData['selector'].selected
File ~/.local/python3-0.9.0/lib/python3.8/site-packages/cybergis_compute_client/CyberGISCompute.py:537, in CyberGISCompute.get_user_jupyter_globus(self)
530 def get_user_jupyter_globus(self):
531 """
532 Return the current job instance
533
534 Returns:
535 Job: Latest Job object instance
536 """
--> 537 return self.client.request(
538 'GET', '/user/jupyter-globus', {
539 "jupyterhubApiToken": self.jupyterhubApiToken})
File ~/.local/python3-0.9.0/lib/python3.8/site-packages/cybergis_compute_client/Client.py:67, in Client.request(self, method, uri, body)
65 if 'messages' in data:
66 msg = str(data['messages'])
---> 67 raise Exception('server ' + self.url + uri + ' responded with error "' + data['error'] + msg + '"')
69 return data
Exception: server cgjobsup-dev.cigi.illinois.edu:443/user/jupyter-globus responded with error "unknown host"
This is a minor bug, but I wanted to document it here. I've noticed two small things don't get reset when a user runs a job and then clicks "Submit New Job" button on the "Job Configuration" tab:
I think the solutions would be:
Is your feature request related to a problem? Please describe.
The Core has a new feature (cybergis/cybergis-compute-core#99) which allows for allow/deny lists on HPCs. However, the exception is only shown in the Jupyter logs at the bottom-left of the page:
Describe the solution you'd like
We should catch and better handle the exception so that users know what is wrong and what steps they can take to fix or work around it.
This is a random error message on the client side that may happen during downloading model results back to jupyter using globus.
The symptom is SDK throws out a 504 bad gateway error. After rerun the show_ui cell and restore the job, it shows the globus downloading is still going though globus downloading has been done (I could see the result files on jupyter). The consequence is download path attribute of SDK object is None affecting some cells below that depend on its value. and you can not re-download the results again as it always show downloading is still going.
I just registered this issue here and will try to post more error messages when it happens again (if any).
As far I can tell this is NOT a new issue I remember it happened several times months ago, but we did not get a chance to investigate the root.
Probable case might be on the core side where globus job status stops being tracked and updated due to some errors
We should be able to install the SDK with the --force-reinstall
flag.
When installing on CyberGISX (and using the --force-reinstall
flag), the SDK seems to break. For example, on CyberGISX right now, if you run
pip install --force-reinstall git+https://github.com/cybergis/cybergis-compute-python-sdk.git
then restart the kernel and run the cells to connect to dev and bring up the UI, you get the following error:
Error displaying widget: model not found
keeling_community HPC Description: none
Estimated Runtime: unknown
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
๐ด No Job to Work On
you need to submit your job first
โณ Waiting for Job to Finish...
Error displaying widget: model not found
Recently Submitted Jobs for alexandermichels
id slurmId hpc remoteExecutableFolder remoteDataFolder remoteResultFolder
1664464919Miqcu 942289 keeling_community {'id': '1664464920xVoco', 'name': None, 'hpc': 'keeling_community', 'hpcPath': '/data/keeling/a/cigi-gisolve/scratch/1664464920xVoco', 'globusPath': '/1664464920xVoco', 'userId': '[email protected]', 'isWritable': False, 'createdAt': '2022-09-29T15:22:01.434Z', 'updatedAt': None, 'deletedAt': None} None {'id': '16644649210br1B', 'name': None, 'hpc': 'keeling_community', 'hpcPath': '/data/keeling/a/cigi-gisolve/scratch/16644649210br1B', 'globusPath': '/16644649210br1B', 'userId': '[email protected]', 'isWritable': False, 'createdAt': '2022-09-29T15:22:01.742Z', 'updatedAt': None, 'deletedAt': None}
param slurm userId maintainer createdAt modelName
{"input_a": 50, "input_b": "foo"} {"time": "10:00", "num_of_task": 2, "cpu_per_task": 1} [email protected] community_contribution 2022-09-29T15:21:59.227Z hello_world
Error displaying widget: model not found
id slurmId hpc remoteExecutableFolder remoteDataFolder remoteResultFolder
1664464651phVmf 942287 keeling_community {'id': '16644646533Rf22', 'name': None, 'hpc': 'keeling_community', 'hpcPath': '/data/keeling/a/cigi-gisolve/scratch/16644646533Rf22', 'globusPath': '/16644646533Rf22', 'userId': '[email protected]', 'isWritable': False, 'createdAt': '2022-09-29T15:17:34.373Z', 'updatedAt': None, 'deletedAt': None} None {'id': '1664464654mfhbf', 'name': None, 'hpc': 'keeling_community', 'hpcPath': '/data/keeling/a/cigi-gisolve/scratch/1664464654mfhbf', 'globusPath': '/1664464654mfhbf', 'userId': '[email protected]', 'isWritable': False, 'createdAt': '2022-09-29T15:17:34.636Z', 'updatedAt': None, 'deletedAt': None}
param slurm userId maintainer createdAt modelName
{"input_a": 50, "input_b": "foo"} {"time": "10:00", "num_of_task": 2, "cpu_per_task": 1} [email protected] community_contribution 2022-09-29T15:17:30.757Z hello_world
Error displaying widget: model not found
id slurmId hpc remoteExecutableFolder remoteDataFolder remoteResultFolder
1663963513C5Ygc None keeling_community None None None
param slurm userId maintainer createdAt modelName
{} {} [email protected] community_contribution 2022-09-23T20:05:13.455Z None
Error displaying widget: model not found
id slurmId hpc remoteExecutableFolder remoteDataFolder remoteResultFolder
1663963051LhLUR None keeling_community None None None
param slurm userId maintainer createdAt modelName
{} {} [email protected] community_contribution 2022-09-23T19:57:30.989Z None
Error displaying widget: model not found
id slurmId hpc remoteExecutableFolder remoteDataFolder remoteResultFolder
1663277428FfZ06 None keeling_community None None None
param slurm userId maintainer createdAt modelName
{} {} [email protected] community_contribution 2022-09-15T21:30:27.518Z None
Error displaying widget: model not found
Error displaying widget: model not found
Welcome to CyberGIS-Compute
A scalable middleware framework for enabling high-performance and data-intensive geospatial research and education on CyberGIS-Jupyter
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
๐ job events (live refresh)
Error displaying widget: model not found
๐ job logs
Output of this cell has been trimmed on the initial display.
Displaying the first 50 top outputs.
Click on this message to get the complete output.
!pip install --force-reinstall git+https://github.com/cybergis/cybergis-compute-python-sdk.git
from cybergis_compute_client import CyberGISCompute
cybergis = CyberGISCompute(url="cgjobsup-dev.cigi.illinois.edu", isJupyter=True, protocol="HTTPS", port=443, suffix="v2")
cybergis.show_ui()
The UI breaks If the user tries to install cybergis-compute-python-sdk on top of the version that's already provided.
import sys
!{sys.executable} -m pip install --ignore-installed git+https://github.com/cybergis/cybergis-compute-python-sdk.git
This is happening due to dependency conflicts as seen in the output for the pip command
Ask users to restart their container and avoid installing the cybergis-compute-python-sdk explicitly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.