GithubHelp home page GithubHelp logo

Should Transformers be recursive? about go-cmp HOT 16 CLOSED

dsnet avatar dsnet commented on August 19, 2024
Should Transformers be recursive?

from go-cmp.

Comments (16)

rogpeppe avatar rogpeppe commented on August 19, 2024

FWIW my first experiment with this package, was to test that transformers were repeatedly applied, as I think that makes things more scalable, always transforming towards the goal of comparable types.

I think the current implementation is both intuitive and useful.

from go-cmp.

jba avatar jba commented on August 19, 2024

I'm probably confused, but as an unlikely working hypothesis I'm going to assume instead that both of you are confused.

Here's how I see it: At a given call of cmp.Equal, you have a value x (well, actually, two values x and y, but they behave independently w.r.t. transformers) and a set S of transformers that have passed the filters. You then do the following repeatedly:

  1. Choose a transformer t in S that can be applied to x. If none, terminate. If more than one, fail.
  2. Apply t to x, giving a new x.
  3. Remove t from S.

Step 3 ensures that this algorithm will terminate.

Now in comparing the x' and y' you get from applying the transforms, cmp.Equal may invoke itself recursively. This is the only recursion that should exist (unless a transformer has some internal recursion, which should be irrelevant to this discussion). In that recursive call, the above algorithm will be applied again to a new x and y, with a fresh S that may include the same transformers.

@dsnet, your tree example is handled by my algorithm. There is a transformer from *Node to *Node that sorts Node.Children and returns. When cmp.Equal recurses on the children, they each in turn will be transformed. In a sense the same transformer is being used recursively, but there is no danger of infinite recursion in transformer invocations.

@rogpeppe, if by "repeatedly applied" you mean at each recursive call of cmp.Equal, then that is what I'm suggesting. If, however, you mean that the same transformer is applied repeatedly at the same level, then I don't understand why that would be useful. Why wouldn't I just write the transformer to do all the work on the first call? (I suspect this is the step I'm confused about, and there is some case where it's useful to do that.)

from go-cmp.

bcmills avatar bcmills commented on August 19, 2024

Imagine you have a non-binary tree, where each node is essentially a set of children. When comparing this data-structure, you may want to sort each node using a transformer.

Why wouldn't you implement that as a single Transformer that returns the sorted tree? (The user's transformer code can always use recursion internally...)

The other logical behavior we could consider would be to apply Transformers recursively until we either reach a fixed-point or a cycle (where a cycle that is not a fixed-point would cause a panic). But that would require us to have some way to reliably analyze the T to determine when it converges.

from go-cmp.

bcmills avatar bcmills commented on August 19, 2024

@jba The three steps you're describing are essentially what we're doing today, except that we don't do step (3); instead, we rely on the user's filter to indicate that the Transformer has reached a fixed-point.

from go-cmp.

jba avatar jba commented on August 19, 2024

@bcmills Why is the current choice better than mine? I certainly see the advantage to having step 3—no need to filter in the case where the transformer's input and output types are the same. Are there good examples that argue for the current design?

from go-cmp.

bcmills avatar bcmills commented on August 19, 2024

The current API is strictly more restrictive: every program which is correct under the current API is also correct under both of the possible alternatives (non-recursive Transformers, and iteration until fixed points), so absent a compelling use-case, it's a better default position to start from.

[Edit: actually, that's not true. Programs that are currently correct using filters to stop at fixed-points would terminate early with non-recursive transformers.]

from go-cmp.

jba avatar jba commented on August 19, 2024

so absent a compelling use-case...

I think we have two compelling use-cases for the non-recursive design:

I haven't seen any use-cases for the current design. All the alleged examples are handled by the recursion of cmp.Equal itself.

I can certainly imagine examples. Like algebraic simplification, where each simplification rule is expressed as a separate transformer. But wouldn't the author be more likely to use a single transformer that does all the work?

All that said, if you're not convinced I don't see a problem with waiting a few months to gather more data. It's unlikely that the change will cause much breakage.

from go-cmp.

dsnet avatar dsnet commented on August 19, 2024

@jba, I don't see how step 3 in your algorithm accomplishes anything since you say that a recursive call to Equal uses "a fresh S". In which case, your algorithm is exactly as what we have today (i.e., one that leads to infinite recursion).

Un-rolling your example, we end up with this:

  • First call to cmp.Equal
  • 1a: select t from S where S := {t}
  • 2a: obtain x' := t(x)
  • 3a: remove t from S such that S := {}
  • Second recursive call to cmp.Equal
  • 1b: select t from S where S := {t} (according to your description this has a fresh S)
  • 2b: obtain x'' := t(x')
  • 3b: remove t from S such that S := {}
  • Third recursive call to cmp.Equal
  • ... (infinite recursion)

The algorithm I had in mind for preventing recursion was to basically say that every Transformer has an implicit filter on it where it is applicable only if that Transformer does not already exist within the current Path. I prototyped an implementation of such here: #26
The crucial implementation details are:

  • In the applyOption function when we use a transformer, we add it to the usedTrans set (with a defer to remove it from the set when returning)
  • In the applyFilters function, we check if a given transformer is already in usedTrans. If yes, then we exclude it from S.

@bcmills said:

[Edit: actually, that's not true. Programs that are currently correct using filters to stop at fixed-points would terminate early with non-recursive transformers.]

Correct. A change in this behavior is subtle and can break someone (however unlikely) if they were explicitly depending on the recursive behavior. At least inside google, there are no use cases that would be broken by the new policy. If we want to switch this policy, I'd rather do it sooner than later.


@rogpeppe, you mentioned that you find the current implementation useful. Did you have a real use-case in mind? So far all use-cases for recursive transformers have been theoretical.

from go-cmp.

bcmills avatar bcmills commented on August 19, 2024

every Transformer has an implicit filter on it where it is applicable only if that Transformer does not already exist within the current Path

I think that's too surprising in general: what happens if you have a Transformer that flattens the top level of a recursive type, and the comparison steps through some number of non-Transform paths before reaching the next level of the data structure?

I think the rule would have to be that the Transformer only applies if that Transformer does not already exist within the Transform tail of the current Path. But that seems pretty subtle to me, and might not catch all instances of infinite recursion.

from go-cmp.

dsnet avatar dsnet commented on August 19, 2024

I think the rule would have to be that the Transformer only applies if that Transformer does not already exist within the Transform tail of the current Path. But that seems pretty subtle to me.

Agreed; that seems subtle to me as well.

An alternative would be to add cmpopts.NonReetrantTransformer (better name required) that has the implicit filter added where it can't be called if the transformer itself exists in the Path (or last segment of the path as you suggest).

from go-cmp.

jba avatar jba commented on August 19, 2024

@dsnet The only recursive calls to cmp.Equal that I think should occur are the ones dictated by the structure of the values being compared. That is, cmp.Equal calls itself recursively on the elements of a slice, the fields of a struct, and so on.

The steps 1b, 2b, etc. in your unrolling never happen. Once you get the x' and y' from applying each transformer at most once, you compare them directly.

from go-cmp.

jba avatar jba commented on August 19, 2024

@dsnet:

every Transformer has an implicit filter on it where it is applicable only if that Transformer does not already exist within the current Path

I don't think this is right. For example, I have

type Node struct {
    Key int
    Children []*Node
}

and I want to sort the children at each level. I write a transformer func(*Node) *Node to sort the children of a single *Node. I want it to run on "" (the initial, empty path) as well as .Children[i], .Children[i].Children[j], and so on. It sounds like your suggestion would prevent this.

I think this is what @bcmills was talking about in his response, except I don't see why "some number of non-Transform paths" matters.

from go-cmp.

dsnet avatar dsnet commented on August 19, 2024

That is, cmp.Equal calls itself recursively on the elements of a slice, the fields of a struct, and so on.

I see what you mean now. What you are suggesting is equivalent to a policy where every Transformer has an implicit filter on it where it is only applicable if that specific Transformer does not already exist within the tail of the current Path since the last non-Transform step.

This is compromise between:

  • My suggestion where it does not exist in the Path at all
  • @bcmills's suggestion that it does not match the last step.

I think your suggestion is strictly better policy than mine which does not work on the theoretical use-case posed, and better than Bryan's suggestion which does not prevent infinite cycles with func(T) R and func(R) T.

It sounds like your suggestion would prevent this.

Correct. That was the theoretical use-case that led us to the current policy.

from go-cmp.

dsnet avatar dsnet commented on August 19, 2024

Here's my evaluation of an approach based on @jba's algorithm where:

  • Every Transformer has an implicit filter on it where it is only applicable if that specific Transformer does not already exist within the tail of the current Path since the last non-Transform step.

The prototype is here: #29. The actual logic to add this behavior is only ~10 lines long.

The benefits:

trans := cmp.Transformer("Sort", func(in []int) []int {
	out := append([]int(nil), in...) // Copy input to avoid mutating it
	sort.Ints(out)
	return out
})
  • It does not prevent more advanced usages of Transformers that may actually want to rely on the recursive property of Transformers. Thus for the example shown above:
type Node struct {
    Key int
    Children []*Node
}

// This does not require a filter because Sort is not allowed to trigger again
// until cmp.Equal has descending into the []*Node type.
// However, it still triggers for each sub-Node containing []*Node.
trans := cmp.Transformer("Sort", func(in []*Node) []*Node {
	out := append([]int(nil), in...) // Copy input to avoid mutating it
	sort.Slice(out, ...)
	return out
})
  • It is very very very unlikely to break anyone already using cmp unless they were already relying on the same Transformers being recursively applied to itself immediately.

The neutral:

  • This does not prevent all possible cycles. Go's lack of generics forces us to rely heavily on interface{} for generic transformers that operate on any slice for example. The interface{} forces cmp.Equal to recursively descend into the interface, adding a TypeAssertion step to the Path and thus allowing the Transformer to again apply on the underlying concrete type. However, this is not any different that what we have to do today. The cases where you have to rely on interface{} are probably only going to be helper libraries like cmpopts, which are only written once and used by many. I'm okay if these cases are not easily handled.
    • We could adjust the rules to be special for TypeAssertion and Indirect steps, but I am opposed to this since they are still technically a natural part of the structure of a Go value. Thus, it is easier to justify specially handling the Transform step since it is the only step that does not directly correspond with recursive calls dictated by the structure of the value itself, while Indirect, TypeAssertion, SliceIndex, MapIndex, and StructField correspond directly pointers, interfaces, slices, maps, and structs.
  • In the very very very rare case where someone really cared about Transformers recursively applying immediately to itself (I cannot think of any use case for this), this can be trivially achieved by simply returning the value in an interface{}. Thus, this approach does not prevent any use-case that was possible today.

The detriments:

  • There is additional logic to handle this implicit filter. This adds complexity to cmp, but the complexity is cheap: 10 lines.
  • There is additional cognitive load on the user using Transformers to know about the implicit filter (the documentation in my PR could be improved). However, it could be argued that its easier on users since they generally don't have to think about the base-case filter that is currently required. In the event that the user writes a infinitely-recursive Transformer, the failure is obvious from the stack trace, and the solution is the same as today: add a base-case filter.

Conclusion: I believe the benefits out-weight the detriments and are worth doing. Thoughts?

from go-cmp.

jba avatar jba commented on August 19, 2024

I disagree that a TypeAssertion corresponds to a change in structure. It's more of a change of viewpoint. But usability is more important than our philosophical differences. What are some examples where a type assertion naturally comes into play, and which rule makes them easier to write?

There is additional cognitive load on the user using Transformers to know about the implicit filter

I'm willing to bet that most people will think about it in the terms I used: each transformer is applied once per value per call, and the recursion is structural. So I think this way is easier to think about.

from go-cmp.

dsnet avatar dsnet commented on August 19, 2024

What are some examples where a type assertion naturally comes into play, and which rule makes them easier to write?

Generally, I don't see anticipate people creating transformers that return interfaces because transformers require you to create a copy of the input. Unless you use reflect, creating a copy usually requires you to create a value of a concrete type. You might as well return that concrete value.

Again, it's only in the cases of generic algorithms like cmpopts.SortSlices and cmpopts.SortMaps that this is becomes a problem. However, those are the exception and not the norm.

I'm willing to bet that most people will think about it in the terms I used

Perhaps? I'm an example of someone who thinks of it the other way around, but I could be biased since I wrote the implementation. One problem I have with your high-level description is that "recursion is structural" is not well defined (again, someone may wonder whether TypeAssertion and Indirect is considered "structural" or not). Do you have succinct description of the algorithm that is unambiguous as to exactly how it operates?

from go-cmp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.