Mutation as the name for "nu"
Data prototype has a concept of nu
for performing data-to-data transforms.
Firstly, this name is not descriptive at all, and while it makes sense in the context of a pure math description, programmers are unlikely to have that context.
A more descriptive name is preferable.
- "Transforms" (and by extension, probably other "trans-" prefixed words) are confusing because mpl already uses this term to mean something specific.
- "converters" is used by mpl to mean specifically "unit converters", so potentially falls into a similar boat
- OTOH, unit conversion is a specific case of this system, so potentially an option
- That said, its not directly the converter, it is rather some level of adaptation needed
Thus I propose the term "mutator", though certainly open to other options.
The term "mutate", while used in a few docstrings, tests, and variable names in mpl, is not really used in any type names or public signatures outside of Transforms.mutated[xy]?
which return booleans.
Also has the advantage of using the same vowel sound as nu
, so may help those who are familiar with the mathematical framing connect the concept.
Kinds of Mutators
compute
Using the same variable name but achieving a (potentially) different value.
Identity is a subset of this.
spelling on current main:
rename
Actually somewhat redundant to reuse + delete
spelling on current main:
reuse
{'x': A} -> {'x': A, 'y': A}
e.g. "color" expanding to "facecolor" and "edgecolor"
spelling on current main:
nu={"y": lambda x: x, "x": lambda x: x}
(or including "x" in expected/required keys, but including the y lambda)
combine
{'x': A, 'y': B} -> {'z': Z}
spelling on current main:
nu={"z": lambda x, y: x+y}
spelling with #17:
mutual mutation
{'x': A, 'y': B} -> {'x': C, 'y': D}
Importantly, the computation for C and D both depend on the values from A and B.
Potentially has some performance concerns as often perhaps they can actually be computed together, but some frameworks may require computing C and D separately.
spelling on current main:
nu={"x": lambda x, y: x+y, "y": lambda x, y: x-y}
spelling with #17:
NOT POSSIBLE.
While in most cases, #17 will upcast a single function to a list containg only that one function (plus units, if applicable), unlike main
, it does operations sequentially.
Thus the value of x
gets overridden by the first process, and it is not the same when processing y
If x=1, y=2, then main with the nu
spefcified above will give an output of x=3, y=-1. #17 will give an output of x=3, y=1.
deletion
spelling on current main:
Neither provide a nu
for "x"
nor include in required/expected keys (as those include a default identity)
chaining
{'x': A} -> {'y': B} -> {'z': C}
Importantly may include more complex operations as each step
spelling on current main:
NOT POSSIBLE, at least not in an elegant/composable way
nu={"z": lambda x: (lambda y: y+1)(x) + 1}
Is kind of the idea, but doesn't allow inspection or mutation of the internal structure.
Nor does it provide a way to e.g. add units
in automatically aside from strictly before or strictly after.
If you also want to keep "y" in the final, you need to pass (and compute) it separately
spelling with #17:
nu={"x": [lambda x: x+1, lambda x: x+1], "z": lambda x: x}
(which will necessarily keep both x and z, set to the same value)
or
nu={"y": lambda x: x+1, "z": lambda y: y+1}
(which will necessarily keep both y and z, with different values)
While chaining was the purpose of #17, it's implementation is less elegant than I would like.
It works reasonably well when chaining things with the same name, but falls apart rather quickly when trying to change names as in this example.
The deeply ingrained order dependence feels awkward and likely to do things that are not intended.
E.g. in the last example, did the user intend for y
in the computation of z
to be the newly modified version (perhaps not, but maybe).
If you flip the y and z it looks the same, but is actually different on that branch.
But I think having intermediate values is useful.
A proposal
The behavior on main
has advantages including order independence of nu
and relatively easily doing computations with multiple inputs and outputs
The behavior on #17 allows treating units as just another nu
function, i.e. separating individual transforms into single logical functions.
It also has the advantage of being able to use intermediate calculations, though with a significant drawback of order dependence and not being the most understandable system.
#17 introduces a list of functions for each variable to accomplish its goals.
The proposal then is to invert that a bit and instead of having a list of functions for each variable, to have a list of "mutation stages", each of which act as the behavior on main
today.
Thus if you want precisely the behavior of main, it is identical to just having a list of one stage.
But if you want intermediate values (and units behavior), you add separate stages.
I've not yet written code for this, but I don't think it'll be that hard to do so.
I think I would lean towards separate objects to manage the interactions, rather than relying on a pure list of dictionaries.
This would allow us to give stages names, which in turn allows a (relatively) ergonomic way of saying: [MyStagePreUnits("pre units", ...), "units", MyStage("post units", ...)]
Mutation stages could each have "expected/required" keys, rather than just an overall. (with the default being to pass every key input plus every nu
).
More radical ideas/fallout that may be enabled (but I haven't thought through completely)
- Doing the caching at the
stage
level
- if so, do containers actually just become a
MutationStage
?
- That may be a bridge too far, and keeping a divide may be more useful, even if it could collapse
- Do
FuncContainers
actually cease to exist, even if not all containers do?
- The container would be the arguments to the func rather than the functions themselves. functions become a Mutation Stage.
- Does the behavior that reaches into
axes
to get the transform
/size become an optional MutationStage
?
- Would decouple the majority of the stack from matplotlib specific code, potentially making this idea viable for other plotting/data analysis libraries.
- Only
FuncContainer
even uses it at this time (other than passing)
- How does the renderer/axes info get introduced if it does become optional?
- perhaps the "core" gets implemented independent of this, but a mpl-specific wrapper introduces this and mpl units behavior?
- Does argument parsing/defaulting just become a "MutationStage"?
These Ideas may fall a little far into "I have a hammer so everything looks like a nail", but I could see a path where each of these make sense.