GithubHelp home page GithubHelp logo

Comments (17)

cpovirk avatar cpovirk commented on May 28, 2024 1

Thanks, I'm glad to hear that BiFunction is at least an option. You have found a clever way to show how much more complex that makes the signature, though ;) (I do seriously think that the complexity may explain the difference between Collectors.toMap, which has complex signature that we don't want to become yet more complex (so it uses BinaryOperator), and Map.merge, which is relatively straightforward (so it uses BiFunction). But I'm speculating.

I wonder if the sweet spot might be BiFunction<V, V, @Nullable V>: We don't do the PECS wildcards that explode the length of the type (the way that BiFunction<? super V, ? super V, ? extends @Nullable V> would), but we have enough distinct types that we can put @Nullable where we want it.

Still, this matters relatively rarely, so it's not necessarily worth going more complex than BinaryOperator.

And I can give some data about how rarely! I ran some tests overnight, and I have a couple different things to report:

  • My proposal with @NonNull from my earlier post is complex enough that it causes problems...
    • ...for our Checker Framework users. The problem appears to be only in type inference, which is an area that the Checker Framework owners have been actively working on, so it might go away in time.
    • ...for some Kotlin experiments that we're doing. The problem there might also be only type inference, but I'm not going to look into it any more deeply because the Checker Framework issue is already a blocker for doing it now.
  • We do have several different users who rely on being able to return null from the mergeFunction of toImmutableMap. (I don't think we have any tests of our own, though... :)) Most (maybe all?) of them fit into the category of "If we see any duplicates, then drop the key altogether," which I will note does not actually work if there might be 3, 5, 7, etc. copies of a key, IIUC :) So the functionality gets used, which is all the more reason to document and test it.

So I think my plan now is:

  • Document the null handling.
  • Test the null handling.
  • Maybe change the nullness annotations someday if the Checker Framework issue gets fixed.
  • Maybe or maybe not agonize over BinaryOperator vs. BiFunction if we follow up on #6822

from guava.

perceptron8 avatar perceptron8 commented on May 28, 2024 1

Thanks also for catching toImmutableTable. I dug up the review, which I had forgotten about. It turns out that the author and I both complained about the JDK's null behavior, with the author even citing the "if you had three duplicates you'd be back to discarding two and putting in one" behavior that I "discovered" above... :)

If nulls are used to eliminate duplicates then yes, it is going to be error prone. However, I can imagine some exotic use cases, involving toImmutableTable but also toImmutableRangeMap, which we all need desperately ;), where you may want to use nulls as a way to avoid storing neutral elements in some collection / table / map (as being uninteresting or for whatever other reason). Merging -x and x could then result in null instead of 0, symmetric difference of two identical sets could result in null instead of empty set and so on. I know that this can be accomplished with post-processing (e.g. streaming, filtering out "zeroes", recollecting), and that in general it would be less error-prone method than the one involving nulls, but we are where we are.

That still gives us the option to change toImmutableTable, presumably to match toTable. That said, consistency might be less important there: I'd expect users to less commonly migrate from toTable to toImmutableTable than from toMap to toImmutableMap, since toImmutableTable has been available just as long as toTable.

I would love to see consistency between toImmutableTable and toTable. Firstly, it may be useful to some. Furthermore, consistency may be "less important", but it is still important, I hope :) After all, it may be the only quality that that led me - and consequently you - to this issue.

from guava.

cpovirk avatar cpovirk commented on May 28, 2024

Nice catch, thanks! As you say:

in case of Guava, having "just one" V forces both parameters and return type to be annotated together

That does suggest that your proposed ImmutableRangeMap.toImmutableRangeMap overload would benefit from using BiFunction, following the Map.merge precedent you cited instead of the Collectors.toMap precedent. (Maybe there was an argument for having toImmutableMap follow Collectors.toMap so as to ease migration between the two, but that argument wouldn't apply for ImmutableRangeMap, which has no JDK equivalent.)

For ImmutableMap.toImmutableMap, we can't change the actual type used at this point. We could maybe change to something like:

public static <T extends @Nullable Object, K, V extends @Nullable Object>
    Collector<T, ?, ImmutableMap<K, @NonNull V>> toImmutableMap(
        Function<? super T, ? extends K> keyFunction,
        Function<? super T, ? extends @NonNull V> valueFunction,
        BinaryOperator<V> mergeFunction) {

That would let existing users stay unchanged (I hope! I'd want to test to see what happens in practice) while allowing anyone who wants to return null from mergeFunction to do so—at the cost of having the inputs to mergeFunction also change to have nullable types, which can be inconvenient (but is often fine, like if your merge function is (first, second) -> second).

I think that would at least be preferable to the alternative of...

public static <T extends @Nullable Object, K, V>
    Collector<T, ?, ImmutableMap<K, V>> toImmutableMap(
        Function<? super T, ? extends K> keyFunction,
        Function<? super T, ? extends V> valueFunction,
        BinaryOperator<@Nullable V> mergeFunction) {

...which would foist the nullable mergeFunction inputs onto everyone.

Even if we don't change the types, we should document the behavior.

(I've been assuming in all this that the current behavior is what we want. I think it probably is, if only to ease migration from Collectors.toMap. Maybe someone else on the team remembers specific discussions of this.)

Thought on any of that?

from guava.

perceptron8 avatar perceptron8 commented on May 28, 2024

That does suggest that your proposed ImmutableRangeMap.toImmutableRangeMap overload would benefit from using BiFunction, following the Map.merge precedent you cited instead of the Collectors.toMap precedent. (Maybe there was an argument for having toImmutableMap follow Collectors.toMap so as to ease migration between the two, but that argument wouldn't apply for ImmutableRangeMap, which has no JDK equivalent.)

It needs to be BinaryOperator. Not only there's this pleasant analogy between Map.merge <-> Collectors.toMap vs RangeMap.merge <-> ImmutableRangeMap.toImmutableRangeMap, BiFunction <-> BinaryOperator in both cases. What's more important, there are some some serious wildcard-related issues with signatures like...

Function<? super T, Range<K>> keyFunction,
Function<? super T, ? extends V> valueFunction,
BiFunction<? super V, ? super V, ? super V> mergeFunction

It seems that BinaryOperator<V> is a neccessity.

(EDIT: See next comment.)

Even if we don't change the types, we should document the behavior.

Please do!

Thought on any of that?

I guess that's all :-)

from guava.

perceptron8 avatar perceptron8 commented on May 28, 2024

I've made stupid mistake. mergeFunction can and possibly should be declared like...

Function<? super T, Range<K>> keyFunction,
Function<? super T, ? extends V> valueFunction,
BiFunction<? super V, ? super V, ? extends V> mergeFunction

... which is fine and causes no compilation errors (which previously were caused by using super instead of extends in the last type param). Even so, I think I would still prefer to use BinaryOperator<V> because of similarity to Collectors.toMap. Thanks for the hint anyway!

from guava.

cpovirk avatar cpovirk commented on May 28, 2024

I have a change out for review for those first two bullets.

I'm going to generalize this issue to also cover potentially changing the nullness annotations at some future point.

Thanks again for the report.

from guava.

perceptron8 avatar perceptron8 commented on May 28, 2024

Unfortunately there are still some problems I missed before (sorry for that!). You will likely have to generalize at lest once more.

ImmutableSortedMap.toImmutableSortedMap and Maps.toImmutableEnumMap need merge/null-related documentation and tests too. This seems to be unquestionable. But there's more.

Tables.toTable and ImmutableTable.toImmutableTable are already documented and tested (that's good news), but ImmutableTable.toImmutableTable behave differently than, say, ImmutableMap.toImmutableMap (Tables.toTable accepts nulls from mergeFunction, ImmutableTable.toImmutableTable does not). This may be an argument for changing ImmutableMap.toImmutableMap and other toImmutableXYZ collectors to align with ImmutableTable.toImmutableTable (current behavior was unspecified so there may be still room for change, even if breaking for some; EDIT: this would make part of the annotation-related problems go away too!). WDYT?

from guava.

perceptron8 avatar perceptron8 commented on May 28, 2024

Here's current, undocumented behavior.

@Test
void toImmutableSortedMap() {
	ImmutableSortedMap<Integer, Integer> map = Stream.of(0, 0).collect(ImmutableSortedMap
		.toImmutableSortedMap(Comparator.<Integer>naturalOrder(), e -> e, e -> e, (a, b) -> null));
	assertThat(map).isEmpty();
}
	
@Test
void toImmutableEnumMap() {
	ImmutableMap<RoundingMode, RoundingMode> map = Stream
		.of(RoundingMode.UP, RoundingMode.UP, RoundingMode.DOWN, RoundingMode.DOWN)
		.collect(Maps.toImmutableEnumMap(e -> e, e -> e, (a, b) -> null));
	assertThat(map).isEmpty();
}

from guava.

cpovirk avatar cpovirk commented on May 28, 2024

That's a tad embarrassing, given that I've been actively working on the various immutable-collection Collector APIs... :) I think toImmutableBiMap even popped into my head at one point, but I reassured myself that it doesn't have a mergeFunction overload. Clearly my brain was determined not to see the remaining work, so we needed someone else to do so. Thanks.

Thanks also for catching toImmutableTable. I dug up the review, which I had forgotten about. It turns out that the author and I both complained about the JDK's null behavior, with the author even citing the "if you had three duplicates you'd be back to discarding two and putting in one" behavior that I "discovered" above... :)

I'm inclined to think that a change to toImmutableMap and friends is more dangerous than we'd like now. And the current behavior might even be the least bad choice if we were designing the method today, given the JDK precedent (and given that users might like to be able to s/toMap/toImmutableMap/g without fear, assuming that they don't use null keys).

That still gives us the option to change toImmutableTable, presumably to match toTable. That said, consistency might be less important there: I'd expect users to less commonly migrate from toTable to toImmutableTable than from toMap to toImmutableMap, since toImmutableTable has been available just as long as toTable.

And that brings us to the fact that we were reviewing toImmutableTable and toTable on the same day, discussed (and documented and tested, as you saw) their null-handling behavior, and still chose different policies. Hmm.

I think it would be a step forward to test and document the current ImmutableSortedMap.toImmutableSortedMap and Maps.toImmutableEnumMap behavior. Beyond that... hmm.

from guava.

perceptron8 avatar perceptron8 commented on May 28, 2024

That still gives us the option to change toImmutableTable, presumably to match toTable.

It's also about consistency between toImmutableTable and toImmutableMap...

from guava.

cpovirk avatar cpovirk commented on May 28, 2024

True, I have focused entirely on the "remove duplicates" use case, neglecting legitimate use cases like the one in the new test that we'll be releasing of these days:

    toImmutableMap(
        Map.Entry::getKey,
        Map.Entry::getValue,
        (a, b) -> {
          int result = a + b;
          return result == 0 ? null : result;
        });

In fairness, I have done that because "remove duplicates" was the only use case I encountered in practice with toImmutableMap in our codebase. And while I don't want to say we'd be "doing people a favor" by breaking their code (which does work if there are never more than two copies), especially if the alternative is to make their code throw but only in case of duplicates, I also don't feel confident that the null->remove behavior is necessarily the one that's going to lead to more resilient programs.

But consistency is still a good argument. And while there's always danger in changing behavior, the danger is minimal when the old behavior was "throw an exception."

from guava.

perceptron8 avatar perceptron8 commented on May 28, 2024

I also don't feel confident that the null->remove behavior is necessarily the one that's going to lead to more resilient programs.

Sadly, it probably will lead to less resilient programs. It's almost too tricky to use. Note that even in the example you gave in your last post, things may go wrong, depending on what happens before. It's easy to assume that the resulting map won't contain 0 values, but that's guaranteed only if there are no mappings to 0 in the collected entries. This fact is easy to miss when writing code and even easier to miss when reading.

Maybe it would be best to have @NonNull V since the beginning (it's too late now), but I still think the next best option is to stick to consistency (although it's getting harder and harder to believe ;)).

from guava.

cpovirk avatar cpovirk commented on May 28, 2024

It's easy to assume that the resulting map won't contain 0 values, but that's guaranteed only if there are no mappings to 0 in the collected entries.

Yikes. That does sour me a little more on the idea.

On the other side: One other lesson that I could stand to remind myself of is that, as much as we hate for our libraries to promote fragile coding practices (like "removing duplicates" that works only with an even number of copies), we've gone too far the other direction at least once, leading to (e.g.) crashes in Android apps because we threw an exception.

I'm thinking here of ImmutableMap, whose build() method rejects duplicate keys. (That's surprisingly similar to what we're discussing here.) It's a very principled thing to do: There's a whole class of security bugs that arises when one part of a system keeps the first entry for a key and another part keeps the last. And there are cases in which a duplicate indicates some kind of problem. Maybe rejecting duplicates was a good choice for stateless server apps. But we kept hearing of cases in which it crashed apps or long-running pipelines or other stateful processes. That led us to introduce buildKeepingLast().

(Not that I'm advocating for having two methods here. Without a lot more users who might benefit, I don't see enough justification for a toImmutableMapDroppingDuplicates.)

If only we had universal nullness checking, in which case we could have made compilers reject any attempt to add null from the beginning!

[edit: It might be interesting to see whether the handful of "duplicate-removing" users I found inserted the duplicate-removal logic after seeing an exception in prod. It might also be interesting to see whether other people have seen the exception in prod, filed bugs, and resolved them in some other way. Maybe we're breaking people's apps when they really just want deduplication, or maybe we're catching real issues.]

from guava.

cpovirk avatar cpovirk commented on May 28, 2024

(It turns out that we have a toImmutableMapIgnoringDuplicateEntries inside Google. (It's not in ImmutableMap itself but in a sort of "contrib" package.) It's a bit different than the use case above: It permits duplicate key-value entries (and includes them (well, one of them, but it shouldn't matter which :)) in the map), but if it sees a key with two different values, it throws an exception. It has more usages than I would have guessed—definitely more than the handful I saw who were attempting removal of all duplicate keys.)

from guava.

perceptron8 avatar perceptron8 commented on May 28, 2024

@cpovirk You seem to be hoping to publish a release soon. I'd like to ask if you are absolutely sure that toImmutabeMap null handling have to be different than toImmutableTable? If not, then maybe it would be wise to undo #6826 and wait until there are no doubts? BTW what about toImmutableSortedMap / toImmutableEnumMap? It seems that they are still untested and undocumented.

You probably have more important things to do and I understand that. I just wanted to remind you that publishing a release including #6826 will make it's changes permanent (behavior will change from unspecified to guaranteed).

from guava.

cpovirk avatar cpovirk commented on May 28, 2024

Thanks. I still find everything about this kind of gross... :\

Mainly, though, I think that we have clear evidence that some users depend on the until-recently undocumented behavior. A change to that behavior is probably more likely to ruin someone's day than to improve it. If anything, I could see reconsidering toImmutableTable, since we could hope that no one depends on the NullPointerException there.... Oh, and maybe we'll see fit to change up the toImmutableMap nullness annotations someday, too.

from guava.

cpovirk avatar cpovirk commented on May 28, 2024

Looking at this again after a teammate flagged it in a followup to #7077, I think the problem with the Kotlin experiment is that I'm changing ImmutableMap.toImmutableMap but not changing the signature that we give Kotlin for Collectors.toMap (which toImmutableMap calls). (Or at least that's part of the problem.)

Currently, the blocker to doing anything here remains that we're on an older version of the Checker Framework with type-inference problems. If that problem goes away, then I should try again with a corresponding change to the toMap signature used by Kotlin.

from guava.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.