GithubHelp home page GithubHelp logo

Comments (7)

hategan avatar hategan commented on August 11, 2024 1

The one caveat is that the way I have currently structured HPCExecutor is that everything PSI/J-related is written out as a file on the remote machine

I think that's what I meant by the second bullet point.

I think, should it come to it, there is a somewhat distasteful but possible solution to this in general: subclassing dicts. But perhaps it may be wise to defer making such a change while reasonable workarounds are possible.

That said, some of the suggestions that popped up here (e.g., the parse_walltime function, easier access to existing serialization routines, etc.) may still be worth considering.

from psij-python.

hategan avatar hategan commented on August 11, 2024

I've got opinions! :)
But I'll abstain for now.

Would JobAttributes(duration=parse_walltime('12:00')) work?

Also, can you tell me more about the serialization challenges? There is serialization code already in psij-python, which may help depending on the exact problem.

Allowing the duration parameter be a str is an option, but I worry a bit about adding too many formats at the core level since it trickles back into the formal specification which would ideally be kept short.

from psij-python.

Andrew-S-Rosen avatar Andrew-S-Rosen commented on August 11, 2024

Thanks for the reply! Your opinions are most certainly welcome and appreciated. That's why I posted here after all 😄

Would JobAttributes(duration=parse_walltime('12:00')) work?

I'm not sure I follow this 100% --- apologies. What would parse_walltime be in this hypothetical scenario?

Also, can you tell me more about the serialization challenges? There is serialization code already in psij-python, which may help depending on the exact problem.

I can try, but I must admit that I'm in the middle of a bug-squashing session and have not delved too deeply into this particular issue (because I have a workaround).

In short, in this scenario, I am using a workflow packaged called Covalent and a custom executor plugin I developed around PSI/J called covalent-hpc-plugin. Covalent works by decorating a function and specifying the various execution parameters in a class that is passed to the decorator.

As is common among workflow packages, one of the criteria is that the (decorated) function must be JSON serializable, I believe. However, passing in a class object with a datetime.timedelta object causes a JSON serialization error.

The way around this, of course, is for me (the plugin developer) to allow the user to pass in a JSON serializable parameter (str, int, float, whatever) and then --- when the plugin ultimately calls PSI/J and makes the submission script --- instantiates the timedelta object when it's needed on the remote machine. This is what I'm doing, and it's fine, but it made me wonder if timedelta was the most user-friendly option in general (of course, it is perhaps the most unambiguous).

I recognize this may not be super insightful! Apologies if so --- I don't know the inner workings of Covalent's serialization approaches.

Allowing the duration parameter be a str is an option, but I worry a bit about adding too many formats at the core level since it trickles back into the formal specification which would ideally be kept short.

This is fair. Having a multitude of options is a potential concern, although I don't think DD:HH:SS as a str would be too outlandish seeing as that is how most job schedulers conventionally accept the walltime anyway and is what most end-users are probably used to by force of habit. I certainly prefer str over something like int/float because then PSI/J would have to specify a unit (e.g. minute, hour, whatever), which seems not great.

from psij-python.

hategan avatar hategan commented on August 11, 2024

Thanks for the reply! Your opinions are most certainly welcome and appreciated. That's why I posted here after all 😄

Would JobAttributes(duration=parse_walltime('12:00')) work?

I'm not sure I follow this 100% --- apologies. What would parse_walltime be in this hypothetical scenario?

It would be a function provided by psij-python.

Also, can you tell me more about the serialization challenges? There is serialization code already in psij-python, which may help depending on the exact problem.

I can try, but I must admit that I'm in the middle of a bug-squashing session and have not delved too deeply into this particular issue (because I have a workaround).

In short, in this scenario, I am using a workflow packaged called Covalent and a custom executor plugin I developed around PSI/J called covalent-hpc-plugin. Covalent works by decorating a function and specifying the various execution parameters in a class that is passed to the decorator.

As is common among workflow packages, one of the criteria is that the (decorated) function must be JSON serializable, I believe. However, passing in a class object with a datetime.timedelta object causes a JSON serialization error.

The way around this, of course, is for me (the plugin developer) to allow the user to pass in a JSON serializable parameter (str, int, float, whatever) and then --- when the plugin ultimately calls PSI/J and makes the submission script --- instantiates the timedelta object when it's needed on the remote machine. This is what I'm doing, and it's fine, but it made me wonder if timedelta was the most user-friendly option in general (of course, it is perhaps the most unambiguous).

I recognize this may not be super insightful! Apologies if so --- I don't know the inner workings of Covalent's serialization approaches.

I think it's very useful.

At first I thought that json-serializability is indeed a requirement of covalent, and I wrote a reply based on that. As I was prepared to insert the evidence in the reply, in the form of pointers to code, I started having some doubts.
I'm looking at _workflow/Lattice.__init__ and seeing self.workflow_function = TransportableObject.make_transportable(self.workflow_function). This leads me to TransportableObject.__init__ and b64object = base64.b64encode(cloudpickle.dumps(obj)).

I would think that cloudpickle would support more than what json.dump does, but there is still much uncertainty in my head. If you had a stack trace, perhaps that could help pinpoint where in covalent the issue is.

In any event, it's a fair point that properties of psij objects are vanilla json-serializable except for pathlib.Path and timedelta objects.

Allowing the duration parameter be a str is an option, but I worry a bit about adding too many formats at the core level since it trickles back into the formal specification which would ideally be kept short.

This is fair. Having a multitude of options is a potential concern, although I don't think DD:HH:SS as a str would be too outlandish seeing as that is how most job schedulers conventionally accept the walltime anyway and is what most end-users are probably used to by force of habit. I certainly prefer str over something like int/float because then PSI/J would have to specify a unit (e.g. minute, hour, whatever), which seems not great.

I think the universal struggle in designing an API is to have a set of principles that sufficiently constrain the result so as to avoid an incoherent mess. And by "incoherent mess" I mean the result coming from the persistent uncertainty when confronted with the question of "why don't we do it this way instead", because that invariably happens.

We have the job arguments as a list of strings. It may be more intuitive, especially if you come straight from the command line, to have it as a single string that gets magically tokenized, just like *sh would do it. Of course, you don't get that for free. It comes with having to deal with arguments with spaces, quotes, etc (in some previous incantation of something like PSI/J, we never could convince Condor to pass arguments correctly because of such an issue). We decided to opt for formalism in the argv case and other cases because, ultimately, psij is a tool presumably sitting between one formal specification (workflow) and another formal specification (the scheduler or other job execution mechanisms).

It's not too dissimilar with walltimes. I honestly never know what the exact format a given scheduler accepts. Is "aa:bb" interpreted as "hh:mm" or "mm:ss". Is "aa:bb:cc" "dd:hh:mm" or "hh:mm:ss"? If it's "hh:mm", can hh > 24 or do I have to switch to "dd:hh:mm"?

So the question here is: should we maintain the formalism^* principle, break it in some cases, or substitute it with another one? If we break it in some cases, is there something else that we could anchor the change on, such as, for example, the difficulties in applying a vanilla json.dump on a JobSpec? Even so, is the requirement imposed by and external library that generic objects be json-serializable something that passes whatever threshold of reasonableness we set?

*) It's more like "simple formalism". Surely one can also formalize the myriad of "aa:bb[:cc]" versions or the precise grammar of argument strings.

from psij-python.

Andrew-S-Rosen avatar Andrew-S-Rosen commented on August 11, 2024

At first I thought that json-serializability is indeed a requirement of covalent

I know some workflow engines like Prefect require JSON serializability, but my understanding is actually what you mentioned --- that Covalent requires objects to be pickle-able. This is where I started to have some doubts as well.

Since it seems you were curious enough to go code-diving, here's an example:

pip install covalent covalent-hpc-plugin
covalent start
import covalent as ct
from datetime import timedelta
executor = ct.executor.HPCExecutor(
    address="doesntmatter",
    username="alsodoesntmatter",
    job_attributes_kwargs={"duration": timedelta(minutes=10)}
)

@ct.electron(executor=executor)
def add(a, b):
    return a + b

@ct.lattice
def workflow(a, b):
    return add(a, b)


dispatch_id = ct.dispatch(workflow)(1, 2)
result = ct.get_result(dispatch_id,wait=True)

The traceback:

TypeError                                 Traceback (most recent call last)
Cell In[2], line 18
     13 @ct.lattice
     14 def workflow(a, b):
     15     return add(a, b)
---> 18 dispatch_id = ct.dispatch(workflow)(1, 2)
     19 result = ct.get_result(dispatch_id,wait=True)

File ~/software/miniconda/envs/quacc/lib/python3.10/site-packages/covalent/_dispatcher_plugins/local.py:130, in LocalDispatcher.dispatch.<locals>.wrapper(*args, **kwargs)
    127 lattice.build_graph(*args, **kwargs)
    129 # Serialize the transport graph to JSON
--> 130 json_lattice = lattice.serialize_to_json()
    132 # Extract triggers here
    133 json_lattice = json.loads(json_lattice)

File ~/software/miniconda/envs/quacc/lib/python3.10/site-packages/covalent/_workflow/lattice.py:103, in Lattice.serialize_to_json(self)
    101 attributes["transport_graph"] = None
    102 if self.transport_graph:
--> 103     attributes["transport_graph"] = self.transport_graph.serialize_to_json()
    105 attributes["args"] = []
    106 attributes["kwargs"] = {}

File ~/software/miniconda/envs/quacc/lib/python3.10/site-packages/covalent/_workflow/transport.py:400, in _TransportGraph.serialize_to_json(self, metadata_only)
    397                 data["links"][idx].pop("edge_name", None)
    399 data["lattice_metadata"] = encode_metadata(self.lattice_metadata)
--> 400 return json.dumps(data)

File ~/software/miniconda/envs/quacc/lib/python3.10/json/__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    226 # cached encoder
    227 if (not skipkeys and ensure_ascii and
    228     check_circular and allow_nan and
    229     cls is None and indent is None and separators is None and
    230     default is None and not sort_keys and not kw):
--> 231     return _default_encoder.encode(obj)
    232 if cls is None:
    233     cls = JSONEncoder

File ~/software/miniconda/envs/quacc/lib/python3.10/json/encoder.py:199, in JSONEncoder.encode(self, o)
    195         return encode_basestring(o)
    196 # This doesn't pass the iterator directly to ''.join() because the
    197 # exceptions aren't as detailed.  The list call should be roughly
    198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
    200 if not isinstance(chunks, (list, tuple)):
    201     chunks = list(chunks)

File ~/software/miniconda/envs/quacc/lib/python3.10/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
    252 else:
    253     _iterencode = _make_iterencode(
    254         markers, self.default, _encoder, self.indent, floatstr,
    255         self.key_separator, self.item_separator, self.sort_keys,
    256         self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)

File ~/software/miniconda/envs/quacc/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o)
    160 def default(self, o):
    161     """Implement this method in a subclass such that it returns
    162     a serializable object for ``o``, or calls the base implementation
    163     (to raise a ``TypeError``).
   (...)
    177
    178     """
--> 179     raise TypeError(f'Object of type {o.__class__.__name__} '
    180                     f'is not JSON serializable')

TypeError: Object of type timedelta is not JSON serializable

So, I believe you are correct that the function arguments and returns have to be pickle-able, but I didn't investigate why this JSON serialization issue came up.

So the question here is: should we maintain the formalism^* principle, break it in some cases, or substitute it with another one? If we break it in some cases, is there something else that we could anchor the change on, such as, for example, the difficulties in applying a vanilla json.dump on a JobSpec? Even so, is the requirement imposed by and external library that generic objects be json-serializable something that passes whatever threshold of reasonableness we set?

This is all a very reasonable take! And I largely agree! My only potential caveat to it is that PSI/J specifically exists within the context of reducing a very commonly used pattern across workflow management tools (broadly defined). While Covalent itself may not strictly require JSON-serializability, I believe both Prefect and FireWorks do, and I imagine there might be others. So, on one hand, that could be an argument to try to maintain JSON-serializability with PSI/J. At the same time, however, maybe this is relatively rare (I can't say for sure!).

Anyway, for me, I am pretty flexible. I can address this "on my end," and I don't really see it as a major dealbreaker or anything. But perhaps food for thought. Ultimately, I am happy with whatever decision you make --- it's a good conversation to have either way 😄

from psij-python.

hategan avatar hategan commented on August 11, 2024

So, I believe you are correct that the function arguments and returns have to be pickle-able, but I didn't investigate why this JSON serialization issue came up.

Oh, I see. The function is pickled, but the executor is json-serialized.
This seems to be triggered in _AbstractBaseExecutor.to_dict.

Correct me if I'm wrong, but I think the following apply here:

  • in principle, HPCExecutor could override to_dict and deal with timedeltas.
  • the data that is shipped remotely for the executor needs not actually be psij-python objects, since things get templated.
  • allowing strings in the constructor might not fix the problem, since json.dump(s) looks at the internal property type, which will still be timedelta.
  • this would also apply to pathlib.Path objects in JobSpec (i.e., json.dumps(pathlib.Path('/tmp')) also fails).
  • the various PSI/J classes, such as JobSpec, JobAttributes, etc., will remain problematic even if all their properties are json-serializable (i.e., json.dumps(JobAttributes()) fails).
  • the json package does not have a mechanism to make a class serializable except for a custom JSONEncoder which is not something that can be easily inserted into a third-party library like covalent
  • it might help to expose the various to/from dict methods in the psij serialization module/classes.
  • an electron is not always an electron

So the question here is: should we maintain the formalism^* principle, break it in some cases, or substitute it with another one? If we break it in some cases, is there something else that we could anchor the change on, such as, for example, the difficulties in applying a vanilla json.dump on a JobSpec? Even so, is the requirement imposed by and external library that generic objects be json-serializable something that passes whatever threshold of reasonableness we set?

This is all a very reasonable take! And I largely agree! My only potential caveat to it is that PSI/J specifically exists within the context of reducing a very commonly used pattern across workflow management tools (broadly defined). While Covalent itself may not strictly require JSON-serializability, I believe both Prefect and FireWorks do, and I imagine there might be others. So, on one hand, that could be an argument to try to maintain JSON-serializability with PSI/J. At the same time, however, maybe this is relatively rare (I can't say for sure!).

This discussion has helped quite a bit. I think there are two relevant issues from above with json serializability that I will restate:

  • serializing properties (e.g., json.dump(obj.__dict__)) would require the internal types to be json-serializable; changing the constructor to accept a more lenient type and convert it to the otherwise non-json-serializable representation will not quite work
  • psij classes will still be non-json-serializable without a custom JSONEncoder; in other words, from my understanding of the json package, json.dumps(job_spec) will simply not work. So if that statement appears in a library, there is nothing that can be done short of replacing all relevant psij-python objects with dicts.

We could consider the idea of doing class JobSpec(dict) and storing all properties in the dict itself, but the idea of emulating classes with dicts for the sole purpose of making a specific method of serialization work seems a bit like an anti-pattern.

Anyway, for me, I am pretty flexible. I can address this "on my end," and I don't really see it as a major dealbreaker or anything. But perhaps food for thought. Ultimately, I am happy with whatever decision you make --- it's a good conversation to have either way 😄

Indeed a good conversation.

from psij-python.

Andrew-S-Rosen avatar Andrew-S-Rosen commented on August 11, 2024

Correct me if I'm wrong, but I think the following apply here:

In general, all the points you mentioned are correct.

The one caveat is that the way I have currently structured HPCExecutor is that everything PSI/J-related is written out as a file on the remote machine and is not instantiated locally. As such, there are no actual PSI/J objects being called when the executor is specified. See here for an example of what I mean. That's also why none of the serialization issues are a problem for me at the moment, once I ensured that the user is not passing in a timedelta (unless I override .to_dict() like you suggested). This is obviously a specific example and may not be a representative usage pattern and also, this will inevitably change (for the better) when PSI/J remote is made/released.

psij classes will still be non-json-serializable without a custom JSONEncoder; in other words, from my understanding of the json package, json.dumps(job_spec) will simply not work. So if that statement appears in a library, there is nothing that can be done short of replacing all relevant psij-python objects with dicts.

Good point. This seems like a bad game of whack-a-mole. From this conversation, I would lean towards not pushing for further JSON serializability of PSI/J inputs and instead requiring the plugin developer (e.g. me) to either work with the workflow engine (e.g. Covalent) to address any issues or design around it (like I've done by having the user pass in a dictionary of keyword arguments to HPCExecutor --- not the JobAttributes or ResourceSpec itself! --- to instantiate PSI/J classes later).

from psij-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.