GithubHelp home page GithubHelp logo

Comments (9)

GregoryTravis avatar GregoryTravis commented on June 11, 2024 1

As an alternative solution, we could re-open #9296 and actually try fixing the consistency (but that may make hash for Float values pretty complicated)

Yes, we might. @JaroslavTulach provided a patch to improve consistency (which was merged in a recent PR) but we still need to explore whether this is feasible or not. Given time constraints, I think it's acceptable to leave the hashing completely inconsistent, and warn users when they attempt to rely on it. For users who are experienced with this issue, they will find this a familiar situation, and for other users, who aren't, the warning will be enough to tell them that they should not rely on the hashing.

from enso.

JaroslavTulach avatar JaroslavTulach commented on June 11, 2024 1

Hashing, equality and behavior in a Map of builtin types like Integer and Float is a an engine issue, not libs-only issue. CCing @Akirathan

A quote from #9296:

@radeusgd I agree with you that we should be more consistent. I think the best would be to change the hashing so that 2 and 2.0 have different hash values, so that (as you said) users wouldn't get the idea that they can count on that.

Compare with 2.should_equal 2.0 as originally requested on the assumption regular Enso users are unlikely to differentiate between Integer and Float.

from enso.

radeusgd avatar radeusgd commented on June 11, 2024

Related discussion: #9296 (comment)

from enso.

radeusgd avatar radeusgd commented on June 11, 2024

As an alternative solution, we could re-open #9296 and actually try fixing the consistency (but that may make hash for Float values pretty complicated)

from enso.

Akirathan avatar Akirathan commented on June 11, 2024

There is the following contract, that the engine expects:

The runtime expects the following semantics for all the comparators:
- Hash consistency:
- If x == y then hash(x) == hash(y)
- If hash(x) != hash(y) then x != y
- Consistency: if x == y then x == y for all the subsequent invocations.
- Symmetry: if x == y then y == x
- Reflexivity: x == x
- Transitivity: if x < y and y < z then x < z
- Antisymmetry: if x > y then y < x

There are even tests that should cover this contract, for example in https://github.com/enso-org/enso/blob/a664dd9d56226bab96290a186f3a053ec0af4497/engine/runtime-integration-tests/src/test/java/org/enso/interpreter/test/HashCodeTest.java .

If I understand this correctly, your proposition contradicts these contracts.

In our current implementation, Integer / Float pairs that are considered equal by == do not always hash the same. Values near +/-Double.MAX_VALUE in particular have different hash values. In contrast, 2 and 2.0 do have the same hash value, since our hash implementation takes care to make these hash the same, in most cases.

If there are values, that are considered equal and not have the same hash value, then this is a bug.

For consistency, we should make Integer and Float values hash differently, so that users do not come to the conclusion.

I don't understand the word consistency here. I would rather be consistent with the contract we have specified, and fix the aforementioned bug.

Is there some inherent problem that I am missing?

GitHub
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.

from enso.

radeusgd avatar radeusgd commented on June 11, 2024

Is there some inherent problem that I am missing?

I understood from #9296 (comment) that it is quite hard to actually fix hashing to re-gain consistency.

@GregoryTravis how hard do you think it would be to actually fix it by making the hashes consistent? I think you were mentioning there were some drawbacks, could you please remind us what these were?

from enso.

JaroslavTulach avatar JaroslavTulach commented on June 11, 2024

Is there some inherent problem that I am missing?
it is quite hard to actually fix hashing

The problem isn't in hashing. The problem is in ==.

from enso.

GregoryTravis avatar GregoryTravis commented on June 11, 2024

Is there some inherent problem that I am missing?

I understood from #9296 (comment) that it is quite hard to actually fix hashing to re-gain consistency.

@GregoryTravis how hard do you think it would be to actually fix it by making the hashes consistent? I think you were mentioning there were some drawbacks, could you please remind us what these were?

There are drawbacks, but the advantages of consistency are more important, as I explain below.

The real problem is that, for example, in Java, numbers are considered equal when they aren't really equal; the result is that equal values can have different hash codes. It's an exception to hash/== consistency, just for numerical values.

Java gets around these problems by distinguishing between .equals and ==, and also by distinguishing between double and Double. This partially hides the problem.

In the example below, lng is the maximum long value. (This problem happens for other values around this value too.)

If you convert lng to a double, the result is definitely not mathematically equal to the original long, but == says they are. And they have different hash codes. But you can't use these as hash keys unless you promote them to be Long and Double, which are not equal by ==. You aren't even allowed to compare them with ==. They are not .equals either, since they are different classes. So their hash codes can be different, without violating hash/== consistency.

    long lng = 9223372036854775807l;
    double doub = (double)lng;  
    Long lobj = Long.valueOf(lng);
    Double dobj = Double.valueOf(doub);
    System.out.println(lng);  // 9223372036854775807
    System.out.println(doub);  // 9.223372036854776E18
    System.out.println(lng == doub);  // true <== *here is the problem*
    System.out.println(lobj.equals(dobj));  // false
    System.out.println(dobj.equals(lobj));  // false
    System.out.println(Long.hashCode(lng));  // -2147483648
    System.out.println(lobj.hashCode());  // -2147483648
    System.out.println(Double.hashCode(doub));  // 1138753536
    System.out.println(dobj.hashCode());  // 1138753536

@JaroslavTulach's solution is to make equality more precise, which will solve this problem. Then lng and doub above will not be equal. This goes against the convention commonly used for floating-point values, which is to try to hide the small inaccuracies caused by rounding and conversion, but in our case it is necessary. Comparing doubles and integers via BigDecimal also goes against efficiency expectations for low-level languages, but Enso is not a low-level language, and these conventions are decades old.

Here's what happens if you use these two 'equal' values as hash keys in Java:

    {
      HashMap h = new HashMap();
      h.put(lng, "1");
      h.put(doub, "2");
      System.out.println(h.get(lng));  // 1
      System.out.println(h.get(doub));  // 2
    }

    {
      HashMap h = new HashMap();
      h.put(lobj, "1");
      h.put(dobj, "2");
      System.out.println(h.get(lobj));  // 1
      System.out.println(h.get(dobj));  // 2
    }

We know that lng == doub, but they are unequal as keys. Whether we explicitly promote them to Long and Double or not, the behavior is exactly the same.

So this is why I've been saying that we can make an exception to the hash/== convention and not be violating standard expectations, if we want to, for speed and tradition. But hash/== consistency is necessary for our engine, and there's no good reason to complicate the engine just to to continue this ancient tradition. (Instead, we can warn users that using floating-point values as hash keys is dangerous.)

from enso.

JaroslavTulach avatar JaroslavTulach commented on June 11, 2024

Consistency between == and hash is achieved by always converting a number-like object to Float (e.g. Java double) and then computing Double.hashCode.

Consistency of conversions is achieved by exact conversion of Float to Decimal. That has its own gotchas, but the users are warned by an attached Warning to result of such conversion that things may not turn out to be as expected to be.

from enso.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.