Comments (9)
As an alternative solution, we could re-open #9296 and actually try fixing the consistency (but that may make hash for
Float
values pretty complicated)
Yes, we might. @JaroslavTulach provided a patch to improve consistency (which was merged in a recent PR) but we still need to explore whether this is feasible or not. Given time constraints, I think it's acceptable to leave the hashing completely inconsistent, and warn users when they attempt to rely on it. For users who are experienced with this issue, they will find this a familiar situation, and for other users, who aren't, the warning will be enough to tell them that they should not rely on the hashing.
from enso.
Hashing, equality and behavior in a Map
of builtin types like Integer
and Float
is a an engine issue, not libs-only issue. CCing @Akirathan
A quote from #9296:
@radeusgd I agree with you that we should be more consistent. I think the best would be to change the hashing so that 2 and 2.0 have different hash values, so that (as you said) users wouldn't get the idea that they can count on that.
Compare with 2.should_equal 2.0 as originally requested on the assumption regular Enso users are unlikely to differentiate between Integer
and Float
.
from enso.
Related discussion: #9296 (comment)
from enso.
As an alternative solution, we could re-open #9296 and actually try fixing the consistency (but that may make hash for Float
values pretty complicated)
from enso.
There is the following contract, that the engine expects:
enso/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Ordering.enso
Lines 49 to 57 in 7098c5e
There are even tests that should cover this contract, for example in https://github.com/enso-org/enso/blob/a664dd9d56226bab96290a186f3a053ec0af4497/engine/runtime-integration-tests/src/test/java/org/enso/interpreter/test/HashCodeTest.java .
If I understand this correctly, your proposition contradicts these contracts.
In our current implementation, Integer / Float pairs that are considered equal by == do not always hash the same. Values near +/-Double.MAX_VALUE in particular have different hash values. In contrast, 2 and 2.0 do have the same hash value, since our hash implementation takes care to make these hash the same, in most cases.
If there are values, that are considered equal and not have the same hash value, then this is a bug.
For consistency, we should make Integer and Float values hash differently, so that users do not come to the conclusion.
I don't understand the word consistency here. I would rather be consistent with the contract we have specified, and fix the aforementioned bug.
Is there some inherent problem that I am missing?
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.
from enso.
Is there some inherent problem that I am missing?
I understood from #9296 (comment) that it is quite hard to actually fix hashing to re-gain consistency.
@GregoryTravis how hard do you think it would be to actually fix it by making the hashes consistent? I think you were mentioning there were some drawbacks, could you please remind us what these were?
from enso.
Is there some inherent problem that I am missing?
it is quite hard to actually fix hashing
The problem isn't in hashing. The problem is in ==
.
from enso.
Is there some inherent problem that I am missing?
I understood from #9296 (comment) that it is quite hard to actually fix hashing to re-gain consistency.
@GregoryTravis how hard do you think it would be to actually fix it by making the hashes consistent? I think you were mentioning there were some drawbacks, could you please remind us what these were?
There are drawbacks, but the advantages of consistency are more important, as I explain below.
The real problem is that, for example, in Java, numbers are considered equal when they aren't really equal; the result is that equal values can have different hash codes. It's an exception to hash/== consistency, just for numerical values.
Java gets around these problems by distinguishing between .equals
and ==
, and also by distinguishing between double
and Double
. This partially hides the problem.
In the example below, lng
is the maximum long value. (This problem happens for other values around this value too.)
If you convert lng
to a double, the result is definitely not mathematically equal to the original long, but ==
says they are. And they have different hash codes. But you can't use these as hash keys unless you promote them to be Long
and Double
, which are not equal by ==
. You aren't even allowed to compare them with ==
. They are not .equals
either, since they are different classes. So their hash codes can be different, without violating hash/== consistency.
long lng = 9223372036854775807l;
double doub = (double)lng;
Long lobj = Long.valueOf(lng);
Double dobj = Double.valueOf(doub);
System.out.println(lng); // 9223372036854775807
System.out.println(doub); // 9.223372036854776E18
System.out.println(lng == doub); // true <== *here is the problem*
System.out.println(lobj.equals(dobj)); // false
System.out.println(dobj.equals(lobj)); // false
System.out.println(Long.hashCode(lng)); // -2147483648
System.out.println(lobj.hashCode()); // -2147483648
System.out.println(Double.hashCode(doub)); // 1138753536
System.out.println(dobj.hashCode()); // 1138753536
@JaroslavTulach's solution is to make equality more precise, which will solve this problem. Then lng
and doub
above will not be equal. This goes against the convention commonly used for floating-point values, which is to try to hide the small inaccuracies caused by rounding and conversion, but in our case it is necessary. Comparing doubles and integers via BigDecimal also goes against efficiency expectations for low-level languages, but Enso is not a low-level language, and these conventions are decades old.
Here's what happens if you use these two 'equal' values as hash keys in Java:
{
HashMap h = new HashMap();
h.put(lng, "1");
h.put(doub, "2");
System.out.println(h.get(lng)); // 1
System.out.println(h.get(doub)); // 2
}
{
HashMap h = new HashMap();
h.put(lobj, "1");
h.put(dobj, "2");
System.out.println(h.get(lobj)); // 1
System.out.println(h.get(dobj)); // 2
}
We know that lng == doub
, but they are unequal as keys. Whether we explicitly promote them to Long
and Double
or not, the behavior is exactly the same.
So this is why I've been saying that we can make an exception to the hash/== convention and not be violating standard expectations, if we want to, for speed and tradition. But hash/== consistency is necessary for our engine, and there's no good reason to complicate the engine just to to continue this ancient tradition. (Instead, we can warn users that using floating-point values as hash keys is dangerous.)
from enso.
Consistency between ==
and hash
is achieved by always converting a number-like object to Float
(e.g. Java double
) and then computing Double.hashCode
.
Consistency of conversions is achieved by exact conversion of Float
to Decimal
. That has its own gotchas, but the users are warned by an attached Warning
to result of such conversion that things may not turn out to be as expected to be.
from enso.
Related Issues (20)
- Add hand to drag cursor (part 2) HOT 1
- Cloud Integrated File Browser
- Sliders display 0 when there is no default value HOT 2
- Opening a ensoproject file bug
- New find_duplicates component HOT 1
- Partially applied autoscope constructors get thru ascribed type check
- Remove expression UUIDs from metadata section of a source file HOT 3
- The project name is always accepted, regardless which button I click. HOT 2
- Filter component's drop down bugs
- Autoscoping can build `ClosureRootNode` with a `null` `ModuleScope`
- Stopping a (local) project results in a crash HOT 1
- Output port interaction hint HOT 1
- Migrate to pnpm
- Migrate to pnpm
- StackOverflow when multiple Managed Resources are being cleaned up at the same time HOT 3
- Benchmark results website 2.0 HOT 1
- Read `Decimal` column from Postgres into in-memory table HOT 1
- Optimize Windows-1252 fallback logic in `Encoding.Default` to do only one pass on the happy path
- cannot login on mac app HOT 1
- `Meta.get_qualified_type_name` does not work when run outside of project
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from enso.