Comments (11)
In that case I think we can drop the host type conversion function altogether and just have the subtype relation for binary integer types:
U8 ≤ {0..<2^8}
U16BE ≤ {0..<2^16}
U16LE ≤ {0..<2^16}
And the existing subtype relation for ranged integer types:
c ≤ a ∧ b ≤ d ⊢ {a..b} ≤ {c..d}
We can define the parseable types with a predicate matching the old host type function in structure:
parseable : Type → Prop
parseable U8
parseable U16BE
parseable U16LE
parseable (Array n t) if parseable t
parseable (Record {...}) if all of its fields are parseable
parseable (Cond e t1 t2) if parseable t1 ∧ parseable t2
from fathom.
A different example which requires type conversion to go the other way:
Header = record {
width: U16BE,
height: U16BE,
format: U16BE
}
Pixels(h: header) = record {
data: Array (h.width * h.height * h.format) U8
}
Image = record {
header: Header,
pixels: Pixels(header)
}
Although this looks reasonable, it will not type check if the header field has been converted to a host record type before being passed to Pixels, which expects the original binary type. How can we solve this?
1. Duplicate the type definition
Define a new type representing the parsed header record and use that instead:
parsed_header = record {
width: {0..<2^16},
height: {0..<2^16},
format: {0..<2^16}
}
Pixels(h: parsed_header) = record {
This will now type check, but the duplication is annoying. Worse still it's manually doing the work of the host type conversion function, so why not just use that instead?
2. Use the host conversion function
Pixels(h: host header) = record {
This neatly solves the problem, until the user forgets to add host and it fails with a confusing error message. When will we ever want to pass a binary type without converting it to a host type? Is that even a meaningful operation, given that the binary type represents an unparsed value?
3. Implicitly apply the host conversion function
Implicitly applying the host conversion function to the type of all arguments solves this problem, but does it break anything else?
4. Subtyping of binary types
If we only apply the host function in the subtype relation then the original code will already type check, as the header value still has its original binary type.
5. Subtyping of host types
If host types are subtypes of the original binary type then we can pass in a host type where an equivalent binary type is expected, but this is a little weird.
from fathom.
Some amount of subtyping is needed to get the first example to work in any of the cases, since the type of the array size is smaller than it has to be. Moreover, there doesn't appear to be any need describe things as "host" or "binary" in order to explain the example. All that's needed is that U16BE
is a subtype of {0..}
, which is perfectly reasonable given the interpretation of the types as sets of numbers.
The second example is similar, but there is a slight bit of trickiness to the subtyping because it could happen in a number of different places, since the *
function is presumably overloaded to work with various different integer types so it is ambiguous where conversions take place. Semantically, however, this shouldn't matter as we ought to get the same result whatever we choose. (This property of an overloaded function with respect to subtyping is called "coherence".)
Where the "host" vs "binary" distinction comes up is in an example like this:
record {
len: {0..100},
data: Array len {0..255}
}
It should be perfectly ok to construct records of this type and pass them to functions, etc. But the type cannot be parsed because neither field has enough information (size and endianness) to do that, so this type should not be considered a binary type.
It might be helpful to think of "binary" as being a typeclass which might also be reasonably named Parseable
.
from fathom.
Thanks a bunch for all these examples and thoughts, this is extremely helpful, and I am reading with interest! Feels like we are getting closer!
It might be helpful to think of "binary" as being a typeclass which might also be reasonably named
Parseable
.
The Power of Pi paper uses this explanatory bridge in section 3.1 as well. It's a good one!
from fathom.
Perhaps the parseable predicate could include sizes as (potentially infinite) sets of integers:
parseable : Type → Size → Prop
parseable U8 {1}
parseable U16BE {2}
parseable U16LE {2}
parseable (Array n t) (n * s) if parseable t s ∧ singleton s
parseable (Record {...}) (sum s) if all of its fields are parseable
parseable (Cond e t1 t2) (union s1 s2) if parseable t1 s1 ∧ parseable t2 s2
This checks that array elements can only have a single known size.
from fathom.
Not to go off on a tangent, but sizes can be represented using the polynomial approach that Mark sketched out for determining field alignments, or for arrays a simpler method that just distinguishes between known fixed sizes and unknown/variable sizes:
size U8 = Fixed 1
size U16BE = Fixed 2
size U16LE = Fixed 2
size (Array n t) = if n == Fixed n' && size t == Fixed s then n' * s else Unknown
size (Record {...}) = sum of field sizes
size (Cond e t1 t2) = if size t1 == size t2 then size t1 else Unknown
from fathom.
How tricky will it be to integrate subtyping into the type checker? At least the subtype relation sketched out above does not have cycles, but it can require multiple steps to complete:
U16BE ≤ {0..<2^16} ≤ {0..}
Is it sufficient to just greedily apply subtyping repeatedly/speculatively whenever a type doesn't match?
from fathom.
Yeah, my plan was to add as part of CONV
, instead of just checking for alpha equivalence.
from fathom.
Alas, I have never added subtyping to a language before. I had always hoped to get to grips Steven Dolan's thesis, Algebraic Subtyping, before hand, but perhaps I'll just blunder through! 😅
from fathom.
Based on my work in #215, I'm thinking it makes sense to think of 'formats' as 'descriptions of types' of type Format
, rather than types in their own right. These can then be converted to their corresponding representation type by way of some built-in repr : Format -> Type
function. We'd decouple the typing rules of format structures and host structures - so format structures can have their own typing rules, rather than trying to overload structures themselves.
This nicely sidesteps the issue of having silly cases where you might be able to construct elements of format types - because they have no constructors themselves this becomes an impossibility. It also means we don't need to use subtyping for this, which becomes rather complicated!
from fathom.
Funnily enough this means our language will look very much like the one described in the Power of Pi paper and implemented in Narcissus.
from fathom.
Related Issues (20)
- Constrained representation types HOT 3
- Cover more unification codepaths in the testsuite
- Let formats HOT 2
- Sugar for guarded fields in record formats
- Challenges arising from the OpenType `glyf` table HOT 2
- Inconsistency between synthesised function literals and checked function literals HOT 1
- Sum types? HOT 4
- Semantic Interpretation Revisited
- Inconsistency between tuple types and record types
- Compile time benchmarks in CI? HOT 1
- Add documentation for implicit arguments HOT 1
- Lazy evaluation HOT 6
- OpenType data description
- Distillation crashes in some cases HOT 1
- Implementation annoyances HOT 1
- Multiple modules HOT 1
- Global string interner HOT 8
- Separate name resolution from elaboration HOT 5
- Question: Comparison with Kaitai? HOT 4
- Incorrect elaboration of record literals?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fathom.