Comments (7)
The one caveat is that the way I have currently structured
HPCExecutor
is that everything PSI/J-related is written out as a file on the remote machine
I think that's what I meant by the second bullet point.
I think, should it come to it, there is a somewhat distasteful but possible solution to this in general: subclassing dicts. But perhaps it may be wise to defer making such a change while reasonable workarounds are possible.
That said, some of the suggestions that popped up here (e.g., the parse_walltime
function, easier access to existing serialization routines, etc.) may still be worth considering.
from psij-python.
I've got opinions! :)
But I'll abstain for now.
Would JobAttributes(duration=parse_walltime('12:00'))
work?
Also, can you tell me more about the serialization challenges? There is serialization code already in psij-python, which may help depending on the exact problem.
Allowing the duration parameter be a str
is an option, but I worry a bit about adding too many formats at the core level since it trickles back into the formal specification which would ideally be kept short.
from psij-python.
Thanks for the reply! Your opinions are most certainly welcome and appreciated. That's why I posted here after all 😄
Would JobAttributes(duration=parse_walltime('12:00')) work?
I'm not sure I follow this 100% --- apologies. What would parse_walltime
be in this hypothetical scenario?
Also, can you tell me more about the serialization challenges? There is serialization code already in psij-python, which may help depending on the exact problem.
I can try, but I must admit that I'm in the middle of a bug-squashing session and have not delved too deeply into this particular issue (because I have a workaround).
In short, in this scenario, I am using a workflow packaged called Covalent and a custom executor plugin I developed around PSI/J called covalent-hpc-plugin. Covalent works by decorating a function and specifying the various execution parameters in a class that is passed to the decorator.
As is common among workflow packages, one of the criteria is that the (decorated) function must be JSON serializable, I believe. However, passing in a class object with a datetime.timedelta
object causes a JSON serialization error.
The way around this, of course, is for me (the plugin developer) to allow the user to pass in a JSON serializable parameter (str
, int
, float
, whatever) and then --- when the plugin ultimately calls PSI/J and makes the submission script --- instantiates the timedelta
object when it's needed on the remote machine. This is what I'm doing, and it's fine, but it made me wonder if timedelta
was the most user-friendly option in general (of course, it is perhaps the most unambiguous).
I recognize this may not be super insightful! Apologies if so --- I don't know the inner workings of Covalent's serialization approaches.
Allowing the duration parameter be a str is an option, but I worry a bit about adding too many formats at the core level since it trickles back into the formal specification which would ideally be kept short.
This is fair. Having a multitude of options is a potential concern, although I don't think DD:HH:SS as a str
would be too outlandish seeing as that is how most job schedulers conventionally accept the walltime anyway and is what most end-users are probably used to by force of habit. I certainly prefer str
over something like int
/float
because then PSI/J would have to specify a unit (e.g. minute, hour, whatever), which seems not great.
from psij-python.
Thanks for the reply! Your opinions are most certainly welcome and appreciated. That's why I posted here after all 😄
Would JobAttributes(duration=parse_walltime('12:00')) work?
I'm not sure I follow this 100% --- apologies. What would
parse_walltime
be in this hypothetical scenario?
It would be a function provided by psij-python.
Also, can you tell me more about the serialization challenges? There is serialization code already in psij-python, which may help depending on the exact problem.
I can try, but I must admit that I'm in the middle of a bug-squashing session and have not delved too deeply into this particular issue (because I have a workaround).
In short, in this scenario, I am using a workflow packaged called Covalent and a custom executor plugin I developed around PSI/J called covalent-hpc-plugin. Covalent works by decorating a function and specifying the various execution parameters in a class that is passed to the decorator.
As is common among workflow packages, one of the criteria is that the (decorated) function must be JSON serializable, I believe. However, passing in a class object with a
datetime.timedelta
object causes a JSON serialization error.The way around this, of course, is for me (the plugin developer) to allow the user to pass in a JSON serializable parameter (
str
,int
,float
, whatever) and then --- when the plugin ultimately calls PSI/J and makes the submission script --- instantiates thetimedelta
object when it's needed on the remote machine. This is what I'm doing, and it's fine, but it made me wonder iftimedelta
was the most user-friendly option in general (of course, it is perhaps the most unambiguous).I recognize this may not be super insightful! Apologies if so --- I don't know the inner workings of Covalent's serialization approaches.
I think it's very useful.
At first I thought that json-serializability is indeed a requirement of covalent, and I wrote a reply based on that. As I was prepared to insert the evidence in the reply, in the form of pointers to code, I started having some doubts.
I'm looking at _workflow/Lattice.__init__
and seeing self.workflow_function = TransportableObject.make_transportable(self.workflow_function)
. This leads me to TransportableObject.__init__
and b64object = base64.b64encode(cloudpickle.dumps(obj))
.
I would think that cloudpickle would support more than what json.dump
does, but there is still much uncertainty in my head. If you had a stack trace, perhaps that could help pinpoint where in covalent the issue is.
In any event, it's a fair point that properties of psij objects are vanilla json-serializable except for pathlib.Path
and timedelta
objects.
Allowing the duration parameter be a str is an option, but I worry a bit about adding too many formats at the core level since it trickles back into the formal specification which would ideally be kept short.
This is fair. Having a multitude of options is a potential concern, although I don't think DD:HH:SS as a
str
would be too outlandish seeing as that is how most job schedulers conventionally accept the walltime anyway and is what most end-users are probably used to by force of habit. I certainly preferstr
over something likeint
/float
because then PSI/J would have to specify a unit (e.g. minute, hour, whatever), which seems not great.
I think the universal struggle in designing an API is to have a set of principles that sufficiently constrain the result so as to avoid an incoherent mess. And by "incoherent mess" I mean the result coming from the persistent uncertainty when confronted with the question of "why don't we do it this way instead", because that invariably happens.
We have the job arguments as a list of strings. It may be more intuitive, especially if you come straight from the command line, to have it as a single string that gets magically tokenized, just like *sh would do it. Of course, you don't get that for free. It comes with having to deal with arguments with spaces, quotes, etc (in some previous incantation of something like PSI/J, we never could convince Condor to pass arguments correctly because of such an issue). We decided to opt for formalism in the argv case and other cases because, ultimately, psij is a tool presumably sitting between one formal specification (workflow) and another formal specification (the scheduler or other job execution mechanisms).
It's not too dissimilar with walltimes. I honestly never know what the exact format a given scheduler accepts. Is "aa:bb" interpreted as "hh:mm" or "mm:ss". Is "aa:bb:cc" "dd:hh:mm" or "hh:mm:ss"? If it's "hh:mm", can hh > 24
or do I have to switch to "dd:hh:mm"?
So the question here is: should we maintain the formalism^* principle, break it in some cases, or substitute it with another one? If we break it in some cases, is there something else that we could anchor the change on, such as, for example, the difficulties in applying a vanilla json.dump
on a JobSpec
? Even so, is the requirement imposed by and external library that generic objects be json-serializable something that passes whatever threshold of reasonableness we set?
*) It's more like "simple formalism". Surely one can also formalize the myriad of "aa:bb[:cc]" versions or the precise grammar of argument strings.
from psij-python.
At first I thought that json-serializability is indeed a requirement of covalent
I know some workflow engines like Prefect require JSON serializability, but my understanding is actually what you mentioned --- that Covalent requires objects to be pickle-able. This is where I started to have some doubts as well.
Since it seems you were curious enough to go code-diving, here's an example:
pip install covalent covalent-hpc-plugin
covalent start
import covalent as ct
from datetime import timedelta
executor = ct.executor.HPCExecutor(
address="doesntmatter",
username="alsodoesntmatter",
job_attributes_kwargs={"duration": timedelta(minutes=10)}
)
@ct.electron(executor=executor)
def add(a, b):
return a + b
@ct.lattice
def workflow(a, b):
return add(a, b)
dispatch_id = ct.dispatch(workflow)(1, 2)
result = ct.get_result(dispatch_id,wait=True)
The traceback:
TypeError Traceback (most recent call last)
Cell In[2], line 18
13 @ct.lattice
14 def workflow(a, b):
15 return add(a, b)
---> 18 dispatch_id = ct.dispatch(workflow)(1, 2)
19 result = ct.get_result(dispatch_id,wait=True)
File ~/software/miniconda/envs/quacc/lib/python3.10/site-packages/covalent/_dispatcher_plugins/local.py:130, in LocalDispatcher.dispatch.<locals>.wrapper(*args, **kwargs)
127 lattice.build_graph(*args, **kwargs)
129 # Serialize the transport graph to JSON
--> 130 json_lattice = lattice.serialize_to_json()
132 # Extract triggers here
133 json_lattice = json.loads(json_lattice)
File ~/software/miniconda/envs/quacc/lib/python3.10/site-packages/covalent/_workflow/lattice.py:103, in Lattice.serialize_to_json(self)
101 attributes["transport_graph"] = None
102 if self.transport_graph:
--> 103 attributes["transport_graph"] = self.transport_graph.serialize_to_json()
105 attributes["args"] = []
106 attributes["kwargs"] = {}
File ~/software/miniconda/envs/quacc/lib/python3.10/site-packages/covalent/_workflow/transport.py:400, in _TransportGraph.serialize_to_json(self, metadata_only)
397 data["links"][idx].pop("edge_name", None)
399 data["lattice_metadata"] = encode_metadata(self.lattice_metadata)
--> 400 return json.dumps(data)
File ~/software/miniconda/envs/quacc/lib/python3.10/json/__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
226 # cached encoder
227 if (not skipkeys and ensure_ascii and
228 check_circular and allow_nan and
229 cls is None and indent is None and separators is None and
230 default is None and not sort_keys and not kw):
--> 231 return _default_encoder.encode(obj)
232 if cls is None:
233 cls = JSONEncoder
File ~/software/miniconda/envs/quacc/lib/python3.10/json/encoder.py:199, in JSONEncoder.encode(self, o)
195 return encode_basestring(o)
196 # This doesn't pass the iterator directly to ''.join() because the
197 # exceptions aren't as detailed. The list call should be roughly
198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
201 chunks = list(chunks)
File ~/software/miniconda/envs/quacc/lib/python3.10/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
252 else:
253 _iterencode = _make_iterencode(
254 markers, self.default, _encoder, self.indent, floatstr,
255 self.key_separator, self.item_separator, self.sort_keys,
256 self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)
File ~/software/miniconda/envs/quacc/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o)
160 def default(self, o):
161 """Implement this method in a subclass such that it returns
162 a serializable object for ``o``, or calls the base implementation
163 (to raise a ``TypeError``).
(...)
177
178 """
--> 179 raise TypeError(f'Object of type {o.__class__.__name__} '
180 f'is not JSON serializable')
TypeError: Object of type timedelta is not JSON serializable
So, I believe you are correct that the function arguments and returns have to be pickle-able, but I didn't investigate why this JSON serialization issue came up.
So the question here is: should we maintain the formalism^* principle, break it in some cases, or substitute it with another one? If we break it in some cases, is there something else that we could anchor the change on, such as, for example, the difficulties in applying a vanilla json.dump on a JobSpec? Even so, is the requirement imposed by and external library that generic objects be json-serializable something that passes whatever threshold of reasonableness we set?
This is all a very reasonable take! And I largely agree! My only potential caveat to it is that PSI/J specifically exists within the context of reducing a very commonly used pattern across workflow management tools (broadly defined). While Covalent itself may not strictly require JSON-serializability, I believe both Prefect and FireWorks do, and I imagine there might be others. So, on one hand, that could be an argument to try to maintain JSON-serializability with PSI/J. At the same time, however, maybe this is relatively rare (I can't say for sure!).
Anyway, for me, I am pretty flexible. I can address this "on my end," and I don't really see it as a major dealbreaker or anything. But perhaps food for thought. Ultimately, I am happy with whatever decision you make --- it's a good conversation to have either way 😄
from psij-python.
So, I believe you are correct that the function arguments and returns have to be pickle-able, but I didn't investigate why this JSON serialization issue came up.
Oh, I see. The function is pickled, but the executor is json-serialized.
This seems to be triggered in _AbstractBaseExecutor.to_dict
.
Correct me if I'm wrong, but I think the following apply here:
- in principle, HPCExecutor could override
to_dict
and deal with timedeltas. - the data that is shipped remotely for the executor needs not actually be psij-python objects, since things get templated.
- allowing strings in the constructor might not fix the problem, since
json.dump(s)
looks at the internal property type, which will still betimedelta
. - this would also apply to
pathlib.Path
objects inJobSpec
(i.e.,json.dumps(pathlib.Path('/tmp'))
also fails). - the various PSI/J classes, such as
JobSpec
,JobAttributes
, etc., will remain problematic even if all their properties are json-serializable (i.e.,json.dumps(JobAttributes())
fails). - the
json
package does not have a mechanism to make a class serializable except for a customJSONEncoder
which is not something that can be easily inserted into a third-party library like covalent - it might help to expose the various to/from dict methods in the psij serialization module/classes.
- an electron is not always an electron
So the question here is: should we maintain the formalism^* principle, break it in some cases, or substitute it with another one? If we break it in some cases, is there something else that we could anchor the change on, such as, for example, the difficulties in applying a vanilla json.dump on a JobSpec? Even so, is the requirement imposed by and external library that generic objects be json-serializable something that passes whatever threshold of reasonableness we set?
This is all a very reasonable take! And I largely agree! My only potential caveat to it is that PSI/J specifically exists within the context of reducing a very commonly used pattern across workflow management tools (broadly defined). While Covalent itself may not strictly require JSON-serializability, I believe both Prefect and FireWorks do, and I imagine there might be others. So, on one hand, that could be an argument to try to maintain JSON-serializability with PSI/J. At the same time, however, maybe this is relatively rare (I can't say for sure!).
This discussion has helped quite a bit. I think there are two relevant issues from above with json serializability that I will restate:
- serializing properties (e.g.,
json.dump(obj.__dict__)
) would require the internal types to be json-serializable; changing the constructor to accept a more lenient type and convert it to the otherwise non-json-serializable representation will not quite work - psij classes will still be non-json-serializable without a custom
JSONEncoder
; in other words, from my understanding of thejson
package,json.dumps(job_spec)
will simply not work. So if that statement appears in a library, there is nothing that can be done short of replacing all relevant psij-python objects with dicts.
We could consider the idea of doing class JobSpec(dict)
and storing all properties in the dict itself, but the idea of emulating classes with dicts for the sole purpose of making a specific method of serialization work seems a bit like an anti-pattern.
Anyway, for me, I am pretty flexible. I can address this "on my end," and I don't really see it as a major dealbreaker or anything. But perhaps food for thought. Ultimately, I am happy with whatever decision you make --- it's a good conversation to have either way 😄
Indeed a good conversation.
from psij-python.
Correct me if I'm wrong, but I think the following apply here:
In general, all the points you mentioned are correct.
The one caveat is that the way I have currently structured HPCExecutor
is that everything PSI/J-related is written out as a file on the remote machine and is not instantiated locally. As such, there are no actual PSI/J objects being called when the executor is specified. See here for an example of what I mean. That's also why none of the serialization issues are a problem for me at the moment, once I ensured that the user is not passing in a timedelta
(unless I override .to_dict()
like you suggested). This is obviously a specific example and may not be a representative usage pattern and also, this will inevitably change (for the better) when PSI/J remote is made/released.
psij classes will still be non-json-serializable without a custom JSONEncoder; in other words, from my understanding of the json package, json.dumps(job_spec) will simply not work. So if that statement appears in a library, there is nothing that can be done short of replacing all relevant psij-python objects with dicts.
Good point. This seems like a bad game of whack-a-mole. From this conversation, I would lean towards not pushing for further JSON serializability of PSI/J inputs and instead requiring the plugin developer (e.g. me) to either work with the workflow engine (e.g. Covalent) to address any issues or design around it (like I've done by having the user pass in a dictionary of keyword arguments to HPCExecutor
--- not the JobAttributes
or ResourceSpec
itself! --- to instantiate PSI/J classes later).
from psij-python.
Related Issues (20)
- Create a notebook or similar with tutorial material
- Repeated jobs hang indefinitely when using multiprocessing with "fork" start method HOT 10
- Documentation: Update `ResourceSpec` to `ResourceSpecV1` in docs HOT 5
- Suggestion: Print out path to submit script if there is a submission error with batch jobs HOT 1
- Documentation: Additional details needed for `custom_attributes` HOT 4
- Retrieving the status of a submitted batch job returns `NEW` unless the user waits a few seconds HOT 14
- Properly implement wait()
- polling for jobs removed from the queue does not properly mark them as completed HOT 6
- Documentation: Example for how to load modules HOT 2
- Implement https://github.com/ExaWorks/job-api-spec/pull/168 HOT 1
- Implement https://github.com/ExaWorks/job-api-spec/pull/172
- Incorrect job duration formatting for SLURM HOT 2
- `main` branch breaks `custom_attributes` (at least for Slurm) HOT 3
- Some words are cut off on PSI/J Python Webpage HOT 1
- Next steps for a new release? HOT 4
- The `environment` kwarg in `JobSpec` gets the wrong number of quotation marks if there are spaces HOT 4
- Not able to get ResourceSpecV1 working with PBS HOT 4
- PBS walltime hour is being formatted as a float causing job submission failure
- Account vs. project HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from psij-python.