GithubHelp home page GithubHelp logo

webassembly / interface-types Goto Github PK

View Code? Open in Web Editor NEW
641.0 129.0 57.0 7.92 MB

License: Other

Makefile 0.49% Python 1.13% Batchfile 0.95% CSS 0.04% Shell 0.38% OCaml 8.16% Standard ML 0.01% WebAssembly 85.92% JavaScript 2.93%
proposal

interface-types's Introduction

Interface Types Proposal for WebAssembly

Note: This proposal is currently inactive, with work having moved on to the component-model repo. This proposal repo could become active again in the future to resume work on the Adapter Functions future feature.


This repository is a clone of github.com/WebAssembly/spec/. It is meant for discussion, prototype specification and implementation of the Interface Types proposal.

See the explainer for an in-progress summary of the proposal.

interface-types's People

Contributors

andrewscheidecker avatar athos avatar augustushuang avatar binji avatar bnjbvr avatar cellule avatar chicoxyzzy avatar choikwa avatar dschuff avatar fgmccabe avatar fitzgen avatar flagxor avatar gahaas avatar gstraube avatar jakzale avatar jfbastien avatar jgravelle-google avatar kg avatar kripken avatar lgalfaso avatar lindig avatar lukewagner avatar mbebenita avatar msprotz avatar naturaltransformation avatar pjuftring avatar ppopth avatar rossberg avatar sunfishcode avatar vilie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

interface-types's Issues

Video Call: July 18th 2019

No registration required. Email jgravelle [ at ] google (dot) com for the meeting link. (Meeting will use Zoom software)

Meeting will start at (EDIT) 12noon PDT, and will last one hour (until 1pm PDT).


Notes available here

Would it be good to separate out JS specific semantics?

Reading the overview, I feel this proposal addresses several needs:

  • How to specify what all these i32's in wasm import/exports actually mean for a given host embedding. This is definitely very much needed.
  • How to do this specifically in the case of JS.
  • How to store/refer to host objects that may be referred to by such an i32 (in this case, through extra tables).

And one concern that is I am missing being addressed:

  • How to do this for other possible host environments, such as a host that does not have a JS environment, but binds directly to C/C++.

It seems to me that these 4 concerns would be best addressed orthogonally. So far, the wasm spec, while in its primary implementation is intended to be used with JS, has been specified as a very general low-level machine with no dependence on any JS semantics. This proposal could end up adding a lot of JS specifics, which may not serve future uses well, even if optional.

Specifically, I'd suggest:

  • Make "JS Bindings" a spec that is entirely separate from the wasm spec, and something only (optionally) relevant to JS hosts.
  • Not specify that mapping i32 to object has to go through tables, or how those tables are to be stored in a wasm module. If we have a generic way for embeddings to work with tables, this can still work, but I can imagine a particular embedding (even JS) could instead manage these itself in an array of objects in its managed space, without the need to use wasm tables for this. In the end, an i32 in linear memory will be an unreliable reference, regardless of wether tables are used or not.
  • Have a universal host bindings spec that talks about things that may concern any possible embedding / host language. Things like how to copy memory in both directions, and how to interface with wasm allocators may be universal.
    • We could even try to define ways of accessing data on both sides of the fence that are universal across possible embeddings, by making it very low level (allowing to specify exact memory layout and alignment). This is of course very tricky, given that I can think of endless ways a host or wasm program could define what it means to be a "string" etc. And of course, different language implementations may have different representations. Probably not practical, just throwing it out there..
  • Have at least one other specific embedder bindings spec, to be able to contrast with JS-specific bindings and what we think are universal binding concerns. Bindings typical to C/C++ hosts would seem a logical addition to me, but maybe there are other examples. Again, entirely optional, but very helpful to create an eco-system of shared APIs across different embeddings.

Binding operators for pass-through functionality?

The current proposal allows WebIDL functions to be invoked on WebIDL values transparently and efficiently constructed from wasm linear memory, but what if what's in hand is a ref.host that has meaning when it's passed to a WebIDL function? Obvious cases are JS strings and views that flow into wasm from the host via anyref parameters and globals, and that are intended to be opaque to wasm. (Not a security matter, just engineering.). Should there be eg a js-str or perhaps better host-str operator like there is a utf8-str operator?

How dose the source tuple come from?

(type $AddContactFuncWasm (func (param anyref i32 i32 i32 i32 i32) (result i32)))

Like above, how can we confirm the types and sequence of the params here? I did not find any explainations in the proposal summary (Explainer.md).

Experiments with Record design

I've been experimenting with a design for structs/records in my polyfill of interface types (viewable here). This is all open for discussion, but here's a thing I tried and some of the stuff I ran in to.

Declaring a record type looks like:

(@interface type $Foo record
  (field $bar string)
  (field $baz int)
)

This declares a record type Foo with fields bar and baz.
Note that the fields are represented as indices in instructions to follow, but in the declaration the names are preserved in the binary. This is so languages like JS, Python, Lua, etc. can have reasonable string names in their code, e.g. (JS):

instance.exports.readFoo({bar: "hello", baz: 12});

There's two main instructions needed to interact with record objects in an interface adapter, creation and destructuring. My version has make-record and get-field

get-field is straightforward, given a record and a field, get that field off the record. So get-field $Foo $baz pops a Foo and pushes a string. get-field takes two immediates, the type index, and the field index.

make-record is straightforward too, but raises some interesting questions. make-record $Comment pops a string and an int, and pushes a Foo. In general make-record takes the type index as an immediate, and has one stack argument per field.


Where it gets interesting is the question: what arguments do we pass to make-record? Let's say we have a corresponding C struct:

struct Foo {
  char* bar;
  int bar_len;
  int baz;
};

and a function

void readFoo(struct Foo foo);

how does that foo argument translate to C's ABI? One reasonable way is to destructure the struct into its components, and pass those all as arguments individually. Another reasonable way is to stack-allocate the argument in the caller's frame, and pass the pointer in (this is what Clang does, and I think is a standard C ABI thing).

If we destructure, the adapter is just re-structuring those arguments back into a record. If we pass by pointer however, we now need some way to read fields off that pointer.
What I'm doing for the time being is defining an exported getter function for each field in the C struct. This functions, but can almost-certainly be improved. I'm not sure how to improve it without respecifying load+store instructions in interface adapters. We would also need to do similar for gc objects.
The nice thing about call-export is it lets us defer reimplementing anything expressible with wasm instructions. The downsides are that it requires a specific kind of toolchain integration to generate those exports (not really too bad), and it relies on engine inlining to not be inefficient.

So that's the general sketch of a design I've been working with. Thoughts?

How to polyfill this proposal in JS?

This is intended to extract an explicit discussion point about polyfilling this proposal today. A good bit of discussion about this happened in #25 already which led to #26, and then this question is also coming up in #57.

I think it's worth discussing what a JS polyfill for this proposal would look like and how it's envisioned to work. Much of the discussion so far seems to be centered around using existing WebAssembly APIs to inspect the module, extract custom sections, etc. This in turn was the motivation for #26 where you can't use WebAssembly to access functions by index, only by name. This is in turn starting to be considered as a design constraint for ideas like #57.

I would like to propose, however, that such a polyfill probably isn't suitable for this proposal at this time. This proposal currently depends on a number of distinct WebAssembly proposals that aren't yet stable:

  • Reference types
  • Multi-value
  • Using 64-bit integers in exported/imported functions in JS

(and this list may grow over time!)

This means that if you want to write a polyfill for this proposal today you won't be able to use existing WebAssembly JS APIs since any module using the above features will fail validation. My conclusion is that for a polyfill right-this-red-hot-second we'll need to have something much more intrusive which actually rewrites the bytes of the original WebAssembly file. (this is what wasm-bindgen does)

Thinking about this from a different angle I think that if we want to be able to polyfill this proposal today it's not sufficient to try to engineer the proposal so a shim only has to use WebAssembly JS APIs. Eventually I think this will probably be enabled, but it's not clear when all of the above proposals will land support in browser WebAssembly APIs.

What should the snowman be? (Proposal renaming)

In the talk at the CG Meeting the idea of ⛄-bindings was proposed, with the intention being that it was a new as-yet-unnamed thing.

What should we name it? We need to name the proposal itself, the new types we're specifying, the operators the convert to+from wasm types and ⛄ types, and the set of values that can produce. A reasonable naming scheme fits the pattern of x-bindings, x-types, x-operators, and x-values.

One option is to call the proposal wasm-bindings, and then have binding-types, binding-operators, and binding-values. A problem here is that referring to "binding" is ambiguous, the term isn't precise enough to disambiguate from a binding expression, to the stubs an engine would generate between modules, to the verb of the act of binding itself. This proposal is potentially very abstract, and having concrete names for the components of it should facilitate discussion.

An idea that @lukewagner came up with offline was "WebAssembly External Bindings" for the proposal itself. This implies "external binding [types|operators|values]". This is precise, but verbose. For ease of communication, I think we can abbreviate that in most contexts to EBTypes, EBOperators, and EBValues.

Are there naming considerations I'm missing? Wildly better naming ideas?

Binding types for Nullable

WebIDL supports the concept of a Nullable type for both return values and arguments, and it is a popular feature for host bindings. This spec has no current way to point to a null object.

Proposal: For each "typed object table", the 0th index in the table is always the null object for that type. It is an error to set this index in the table to a valid object from either WASM or the host. The rest of the spec language holds, except that in the case where NEXT_SLOT would be incremented and that slot index passed, NEXT_SLOT is not incremented when a null object is passed.

This proposal still leaves no ability to pass a null STRING or ARRAY_BUFFER.

How far do we want to go with a type system?

This proposal tries to use annotations to describe most simple types seen on the web, along with a flat calling convention.

However, WebIDL has a fairly robust type system containing nullables, sequence types, record types, and more, all of which can be composed together. For now, we could mandate that each "variant" of a type requires an explicit host binding and get around things that way.

For instance, CanvasRenderingContext2D.setLineDash could take a Float64Sequence, which one would need to construct and push items to manually:

Float64Sequence* seq = Float64Sequence_new();
Float64Sequence_push(seq, 2);
Float64Sequence_push(seq, 3);
...
CanvasRenderingContext2D_setLineDash(ctx, seq);
Float64Sequence_free(seq);

But it might be interesting to look at supporting a variant-ish type system where we can describe a complex WebIDL value directly, rather than special casing a variety of "simple types" in isolation and expecting bindings to fill in the rest.

Informal regular video call

There's enough interest in this, and enough of an open design space that it makes sense to have a venue for regular high-bandwidth conversation.

Sign-up/interest/finding-a-time form: here


For cadence, to start I'm thinking weekly.

We should have an agenda before each meeting. As for mechanism I propose copying the CG format and having a doc per meeting, in their own meetings/ folder.

For video call software, I assume Zoom works.


Feel free to nitpick the specifics here.

STRING i32 pairs for UTF-16LE

Regarding

JavaScript hosts might additionally provide:

STRING | Converts the next two arguments from a pair of i32s to a utf8 string. It treats the first as an address in linear memory of the string bytes, and the second as a length.

Any chance that there'll be support for JS-style strings (UTF-16LE) as well? I know this doesn't really fit into the C/C++ world, but languages approaching things the other way around will most likely benefit when not having to convert back and forth on every host binding call.

Split `bind`s out into their own subsection?

Right now we have two subsections:

  1. Web IDL Type Subsection: contains webidl type definitions
  2. Web IDL Function Binding Subsection: contains function bindings definitions and binds that pair a bindings definition with an actual wasm function

It seems like it might make sense to split (2) into two subsections: one for the function bindings definitions and another for the bind pairings.

Binding to host provided String, Array, Set, Map, Error

When thinking about how host bindings might become useful, a few questions come up in my mind. For example, will it be possible to ...

  • Bind to the String constructor, providing UTF8 bytes, returning an object handle? (probably)
  • Test two such String object handles for equality, though there is no method to bind to / without a JS helper?
  • Bind to the Array constructor, providing an initial set of elements as variable arguments (0 to a lot), returning an object handle?
  • Perform indexed get/set on such Array object handles, though there is no method to bind to?
  • Obtain and work with an iterator object handle on a Set or Map?

Context: If all of these would be possible, something like AssemblyScript could just use WASM<->JS interchangeable object handles directly for pretty much everything crossing the boundary instead of re-implementing a standard library on top of linear memory.

Binding type for dicts

Passing in dicts is a common Web API pattern.
We should consider making this efficient.
We probably need to take into account the idea that several versions of the set of keys might exist.

Hand-written Web IDL is hardly maintainable

When writing Web IDL by hand, at some point, a program author should use func-binding which has the following form:

func-binding $<name> import $<function-wasm-type> $<function-webidl-type>
    (param …)
    (result …)

When encoding the Web IDL definitions into a Wasm module (into a custom section), the $<function-wasmtype-type> is likely to be an indeterminate index, rather than a fully-qualified name. Let me explain.

I see two main scenarios when having a Wasm module with Web IDL bindings:

  1. Either a compiler emits a Wasm module + the Web IDL bindings,
  2. Or a compiler emits a Wasm module without Web IDL bindings, and the latter are hand-written by the program author and encoded inside the Wasm module later.

In scenario 1, the compiler is able to resolve all the names. There is no issue.

In scenario 2, the compiler swaps the function names by indexes, and also swaps the type names by indexes. Depending on the compiler, a “debug name custom section" can be generated (c.f. wat2wasm --debug-names), but (i) the custom section format isn't standardized, (ii) it contains only a mapping from function indexes to function names. In all cases, a mapping from type indexes to type names is missing.

So when writing func-binding $<name> import $<function-wasm-type> …, the program author has to use an index for $<function-wasm-type>. This index can change from one compilation to another, which makes hand-written Web IDL hardly maintainable. After each new compilation, the program author has to ensure that its type indexes haven't moved.

Am I missing something? If no, this is a problem to solve :-). One way to solve that is to introduce a new “debug type names custom section” that is basically a mapping from type indexes to type names, so that the Web IDL encoder can resolve all type names (and also function names) more easily.

Will fallback to a slow path disincentivize changing IDL constructs?

A potential issue that occurred to me is that if a specification decides to change the signature of a method to enable new functionality or clean up some prose by making use of new IDL functionality implementations might not be eager to make these changes anymore as they could result in significant performance regressions on important sites.

It wasn't entirely clear to me whether this was considered.

Should snowman-bindings check unions?

In the snowman-bindings presentation at the June meeting, one of the proposed binding is for enum/union/sum types.

I'm wondering, should these types be statically guaranteed to be valid?

For instance, if a function takes a bool as a parameter, where bool is defined as an enum of 0 or 1, can that function safely assume that the parameter will always be either 0 or 1, even when being called by malicious code?

If so, how do you think this guarantee could be enforced?

Specify validation typing rules and/or algorithm

From the explainer:

Validating a Web IDL Bindings section primarily involves type checking all the contained binding expressions relative to the respective source and destination tuple types of the WebAssembly or Web IDL function signatures of the function binding. Although the precise static semantics are outside the scope of this [explainer] document, they should be able to follow the same general validation conventions as core WebAssembly.

This is a placeholder issue for specifying the typing rules and context and/or the algorithm for validating a webidl-bindings, since I didn't see an issue already open.

Need a `get-receiver` incoming binding operator?

Say I have this Web IDL interface:

interface Foo {
    void myMethod();
}

And I am implementing my_method in Wasm with the following function:

(func $myMethod (param anyref)
  ;; ...
  )

For the bindings between Foo#myMethod and $myMethod, we need an incoming binding expression similar to get, but which gets the this receiver rather than an indexed parameter.

Note that this need comes up not just for methods on interfaces, but also for various callbacks on the Web where this is used to pass extra contextual information (e.g. event listeners).

Or alternatively, is the this receiver intended to always the first element of the incoming Web IDL values? If so, then we should make this more clear and provide an example of using the this receiver from Wasm.

Video Call: August 15th 2019

No registration required. Email jgravelle [ at ] google (dot) com for the meeting link. (Meeting will use Zoom software)

Meeting will start at 12noon PDT, and will last one hour (until 1pm PDT).

Please suggest agenda items as comments on this issue.


Notes here

Allocator function considerations

Some of the incoming binding expressions reference an allocator function (namely, alloc‑utf8‑str and alloc‑copy), and there's a few considerations for the design.

First is: how should this be polyfilled? By referencing an allocator function by index, we can't access it directly via JS.
Given that a polyfill can be assumed to be part of the instantiation code, in theory we could say that it could modify the incoming wasm bytes to export the given function. However that would make the polyfill modify the underlying wasm module, exporting the allocator for the world to see and use.

Second: in the non-polyfill case, how odd is it that the embedder can call a non-exported function? Even though it's opt-in (if you don't want the function to be called, don't specify it as an allocator function in the webidl-binding section), it's still unusual to have a non-exported function being called from the outside world.

To simplify those, I think a reasonable thing to do is to have the allocator functions specify exports, either by name or by index.

Thoughts? Other considerations I'm missing?

Add owned version of outgoing "copy" and "utf8-*" bindings

Currently if you're passing an array of data from WebAssembly to JS (such as strings or a list of bytes) you have the option of using outgoing bindings like copy, utf8-*, or view. In some cases though what happens is that the WebAssembly computes a value (e.g. renders the input as markdown) and then wants to return the computed value. In this scenario though currently WebIDL bindings don't provide a great way to manage this.

The WebAssembly module needs to return the pointer/length to JS, and then after JS has copied it to its own heap (e.g. via TextDecoder or copying a typed array out) then the original allocation in the WebAssembly needs to be deallocated.

Currently tools like wasm-bindgen work with this by indeeding having JS perform the deallocation, but it means that Rust-defined functions which return a string can't use vanilla WebIDL bindings and still require JS shims.

Would it be possible to add a new outgoing binding which copies the data out, but also has a free function listed to deallocate the data after JS has read it?

Instanceof / Upcast equivalent

Host bindings, DOM in particular, use subclasses a lot. We would need some ability to do type discrimination. For instance, if I were to receive a DOMEvent object, I would likely want to be able to determine that this is DOMMouseEvent, and be able to call DOMMouseEvent-y things on it.

The details on runtime typechecking in the current proposal are vague (it just says "Throw TypeError (as now) if the wrong type is stored in a Table." but does not elaborate on what "the wrong type" means). Assuming that this is a strict 1:1 type-check, we would need some way to do runtime type discrimination, and a way to cast from one type to another.

Possible idea: a CAST instruction which takes two table indexes src and dest. The instruction checks src instanceof TableType(dest). If the check passes, 1 is pushed to the stack and the object's address copied into dest, so it exists in both tables. Otherwise, 0 is pushed to the stack. Cleanup for both src and dest is left to native code.

A variant of the instruction which takes a table index and a typeidx might also be entertained, in case of if (foo instanceof Type) checks that have no need to access the resulting casted object.

Video Call: July 25th 2019

No registration required. Email jgravelle [ at ] google (dot) com for the meeting link. (Meeting will use Zoom software)

Meeting will start at 12noon PDT, and will last one hour (until 1pm PDT).


Notes here

Managing JS facade objects with WebAssembly.ReferenceMap

@lukewagner and @tschneidereit and I were tossing around some ideas for weak references and finalization in JS, specifically to cover the wasm use case for host bindings where we want to wrap wasm objects (represented as addresses in the linear heap) in JS objects, and perform finalization on the wasm object when the JS object is GC'd.

Since weakrefs and finalization in JS are a can of worms, and since we don't know what we'll need for wasm once we have full GC support for wasm, I put together a simple proposal to cover what I perceive to be the main wasm use case at present.

Writeup here: https://github.com/lars-t-hansen/moz-sandbox/blob/master/refmap/ReferenceMap.md

Probably this proposal can be subsumed by, or be implemented in terms of, a "full" solution for weakrefs and finalization, if/when that appears. Also, the proposal does not extend wasm, only the JS interface to wasm.

Comments welcome, obviously.

String subsection

There was some discussion about using strings in the imports+exports, as well as immediates to call instructions in the (soon to be proposed) interface adapter function instructions.
There's a few questions here, 1) Should we constrain the design of this proposal such that it must be polyfillable? 2) Are strings too inefficient size-wise? 3) Are they uncomfortable semantics-wise?

  1. I think is an interesting discussion and we should have it somewhere (though I'm inclined to just say "yes"), and I don't think I've heard anyone actually say 3), but we might want something related to 2) regardless.

We will probably want to have a subsection for strings, that deduplicates them. For example, given a set of import bindings (syntax + semantics are made up, just note structure):

(@interface func $foo (export "foo") (allocator "malloc") ...)
(@interface func $bar (export "bar") (allocator "malloc") ...)
(@interface func $baz (export "baz") (allocator "malloc") ...)

we can translate that to a more binary-equivalent:

(@interface-section
  (string-subsection
    (0 "foo")
    (1 "bar")
    (2 "baz")
    (3 "malloc")
  )
  (func-subsection
    ($foo (export 0) (allocator 3) ...)
    ($bar (export 1) (allocator 3) ...)
    ($baz (export 2) (allocator 3) ...)
  )
)

This is similar to having a type subsection, which we already need.

This is separate from the names section, because 1) polyfill-friendliness means keeping all the data in the custom section itself, and 2) these are non-omittable string imediates. I imagine most modules will have enough repetition that this out-of-lining will be a good size savings on average.

Weak Imports

Should we model weak imports as a Bindings-layer feature?

It's a feature that can be expressed in terms of bindings, it affects how modules are wired together, and it's a weaker constraint on the environment than wasm imports (which state "give me this or I will fail to instantiate").

At a first stab, we can describe this feature with no new wasm-core functionality, by for example adding a pair of bindings, one to wrap the imported object (with either the object if present, or something that traps if not), and a boolean global to check if the import succeeded or not.

Maybe not for MVP which makes the timing awkward? Or maybe it's compelling enough to add.

This was mentioned in this morning's WASI meeting, @sunfishcode more thoughts?
Previous discussions: WebAssembly/WASI#36 , WebAssembly/design#1281

Returning arrays with snowman-bindings

After the discussion we had during the July 18th meeting, I'm reconsidering how dynamic-sized values (eg, arrays and strings) should be returned from boundary calls.

The current direction is that, for the following C++ code:

auto array = someWasmModule_getArray(...);
auto someData = array[i];

The C++ module should give its host a malloc-equivalent function at compile-time. Then, at runtime:

  • The C++ module calls someWasmModule_getArray,
  • someWasmModule creates an array internally and returns it,
  • The hosts calls the malloc function provided by the C++ module,
  • The host makes a byte-by-byte copy of the array from someWasmModule to the address in linear memory allocated by malloc,
  • The host returns that address to the C++ module.
  • The C++ module can then access its copy of the data freely.

There are some problems with that workflow:

  • It makes a memcpy in every case. Given that this copy goes to a language-controlled section of linear memory, it might be difficult or impossible for the host to elide that copy away in cases where it would otherwise know the copy is superfluous.
    • For instance, in the above code, the C++ module only needs one element of the returned array.
    • A function might fetch a string from the DOM, then pass it to a search function, then discard it; in which case, allocating space in linear memory only to discard it immediately afterwards is useless overhead.
  • There is no obvious way to return an array of references.

While these aren't blocking problems in the short term, in the long term, they might constrain the types of data that can be exchanged between wasm modules and hosts; especially if my recent OCAP bindings proposal gets traction.

So I'm wondering it might be worth the cost to bite the bullet and implement first-class variable-sized types in wasm.

By first class, I mean having them as valtypes, that can be stored in stack variables, function arguments, return values, globals and so on.

Doing so would add some complexity to wasm:

  • An additional generic type
  • Additional instructions for:
    • Creating an array (from values, from tables, from linear memory),
    • Indexing, slicing, getting the size,
    • Copying, moving.

I'm not familiar with the internal of the big wasm VMs. How big of a cost is this? How hard a sell would it be as an addition to the spec?

I think it's not overly complex, semantically. We're not talking about generics or monads here. Every instruction could still be validated in O(1) time (though array copying would be O(n) at runtime).

It would be, essentially, splitting off another part of the GC proposal, and implementing it as a stack-only feature, to give the host more information easily accessible to the compiler in some cases.

What do you think?

Statically rule out "weird" incoming binding expression

In a recent large refactoring of wasm-bindgen I ended up writing what amounts to translating an incoming webidl binding expression to a JS shim. In doing so I found that some incoming binding expressions don't really make much sense, for example:

(alloc-utf8-str "malloc" (alloc-utf8-str "malloc" (get 1)))

or things like:

(field "foo" (as i32 (get 1)))

I was wondering if perhaps it'd be better to split incoming binding expression into two forms of expressions? One form of expression would be actually acquiring a value, and the second would be converting that value to WebAssembly. The idea here would be that while there's some bindings that are tree-like in reality each binding at the top level is a conversion from a value to wasm values, but once you get wasm values it seems odd to do more transformations that are otherwise expecting WebIDL-like values.

A strawman proposal for this would be:

in-value-expr

Operator Immediates Children
get idx
field field‑idx in‑value-expr

in-expr

Operator Immediates Children
as wasm‑type in‑value-expr
alloc‑utf8‑str alloc‑func‑name in‑value-expr
alloc‑copy alloc‑func‑name in‑value-expr
enum‑to‑i32 webidl‑type in‑value-expr
bind‑import wasm‑type
binding
in‑value-expr

Video Call: August 8th 2019

No registration required. Email jgravelle [ at ] google (dot) com for the meeting link. (Meeting will use Zoom software)

Meeting will start at 12noon PDT, and will last one hour (until 1pm PDT).

Please suggest agenda items as comments on this issue.


Notes here

No extra module section

This proposal suggests adding a 'host bindings' section to a warm module.

I do not support this idea. The basic reason is that we already have a mechanism for importing and host bindings are simply a particular kind of import.

More generally, this appears to violate the separation of concerns architectural principle.

Fundamentally, we do need a way of layering multiple semantics over the core machine level semantics of wasm. But that is a separate topic ...

Apply `BindExport` to table elements as well

Currently in the explainer it mentions the BindExport operation as how exports are modified with their WebIDL counterpart, and then describes:

When the module's exports are extracted, each webidl-bind statement binding an export must invoke BindExport to replace the WebAssembly export with a bound Web IDL Callback value.

When integrating the current explainer into the wasm-bindgen project I've found though that in our implementation of callbacks we also need to have WebIDL bindings for table elements. For example we'll use table elements in JS for passing Rust-defined closures to JS, where JS would otherwise call the equivalent of wasm_exported_function_table.get(the_index)(...). In this case we'll want to leverage WebIDL bindings for efficiently passing arguments like strings back and forth.

Currently @fitzgen has a strawman ast format for WebIDL bindings which does specifically allow for this in the technical sense, but it seemed from rereading the explainer that this wasn't originally considered.

I don't think it'd modify the proposal too much, but it might be good to explain that BindExport is called for elements of function tables in a module whenever a function is accessed through that (not just through an export item)

Is it possible to share complicated JavaScript data structure directly with WebAssembly?

I want to know is it possible to share complicated JavaScript data structure like "Object" or "Array" directly with WebAssembly context?

As I know, there will be a lot of overhead if we want to share a JavaScript array with WebAssembly, first we need to encode this array data and then pass it to WebAssembly context through WebAssembly.Memory. Second, the WebAssembly module(C/C++ side) also need to fetch those data again in the WebAssembly linear memory through the pointer of this array.

So I want to know is there a way to share this array entity which already generated and exist in V8 (or any other JavaScript engine) without re-constructing it in WebAssembly.Memory again?

why JSON?

Why is JSON called out as a special case in this proposal.
Even if we accepted this, where are all the other types (DOM Element, CSS, ..., URL, ... )?
Editorial: I think that it might be better to see web bindings as a special case of the more general inter-module binding problem.

Node + ctypes bindings?

In a node embedding, in addition to JavaScript types, it would be good to be able to express direct calls to system libraries. The "natural" way to do this with the host bindings proposal would be to add a ctypes library to node with properties that would allow it to be optimized to bare calls.

It looks like there is an early work in progress library:
https://www.npmjs.com/package/ctypes

Some thought should go into how to make this work nicely with host bindings.

How much would "anyref" as a new value type help?

We have a pretty uncontroversial plan to introduce an anyref type to WASM in the future. Instead of restricting this type to tables only, what if we also allow anyref as a WASM value type (not to be stored in linear memory) and table_get and table_set operations? If anyref remains an opaque type for now (i.e. no new operators), I think this has minimal impact on engine complexity. (For non-JS embedded engines there are implementation strategies that don't require pointer maps, see below*)

This could remove a lot of table/slot management complexity, especially for references that never need to be stored into memory, where table/slot management is just overhead. Instead, user code can implement its own WASM wrappers around imports and exports to manage tables and slots so that slot indexes can be stored into the linear memory.

I think this would be more forward-compatible, since I can imagine a world where managed data is dominant and JS/WASM interop is much smoother. In that future world the table bindings would be legacy complexity that we'd probably want to remove.

  • I specifically imagine native WASM engines (i.e., not browser engines) that have no support for GC, no pointer maps for stack walking, etc. In such engines, it would be less work to introduce their own stack-allocated handle tables and not have to track pointer maps internally, e.g. in JITed code.

Video Call: August 29th 2019

No registration required. Email jgravelle [ at ] google (dot) com for the meeting link. (Meeting will use Zoom software)

Meeting will start at 12noon PDT, and will last one hour (until 1pm PDT).

Please suggest agenda items as comments on this issue.


Notes here

Resolve the "TODO" in Explainer in "Export returning string (dynamically allocated)"

There is a TODO at the end of the "Export returning string (dynamically allocated)" section which describes two important problems that we need to solve (not just for strings, but for any type when returned as dynamically-allocated linear memory from a wasm export).

The two issues that need to be solved are:

  • between a call-export "foo" where foo is wasm code that allocates and returns linear memory (or some other resource that needs to be released by a subsequent call), if an exception is thrown (in the future when we have exception-handling), the allocation must be released during unwinding
  • for optimizations reasons, the linear memory allocation should not be released by memory-to-string, but rather some time after the consuming string-to-memory of the caller, so that the engine can perform a memcpy from source to destination.

Since this is not an esoteric problem, but one everyone will hit early and often, we'd ideally offer a simple, compact, and "canonical" solution to both of these problems.


One possible solution is to have a defer instruction that says "call the given export with copies of the top-of-stack values (the count and types determined by the export's signature) at the end of the adapted call". Here, "adapted call" means treating the adapter function of the caller and the adapter function of the callee as a single call, as if the callee was inline into the caller (which is the whole point, from an optimization POV).

So, the current example:

  (@interface func (export "greeting") (result string)
    call-export "greeting_"
    memory-to-string "mem" "free"
  )

could be replaced with

  (@interface func (export "greeting") (result string)
    call-export "greeting_"
    defer "free"
    memory-to-string "mem"
  )

Noting that:

  • free takes a single i32 and thus defer "free" will take a copy of the single top stack value (which is validated to be an i32), which we assume is the pointer of the (pointer,offset) pair.
  • the defer always immediately follows the call; in a less-trivial example, other adapter instructions (that can throw) could be inserted between the defer and memory-to-string, so the immediate defer ensures "free" is called in all cases.

Spec-wise, I imagine that the configuration (the input and output of every instruction) would contain a vector of (export,arguments) pairs that is appened to by defer and executed (LIFO, presumably) at the end of the adapted call.

This solution does feel a little "special" and irregular, though, so I'm interested to hear about any other options that are more regular. One way to rationalize this is to consider the adapted function callee to be inlined into the adapted function caller, without introducing a new scope, and then the defer is like a C++ RAII stack object.

Video Call: August 1st 2019

No registration required. Email jgravelle [ at ] google (dot) com for the meeting link. (Meeting will use Zoom software)

Meeting will start at 12noon PDT, and will last one hour (until 1pm PDT).

Please suggest agenda items as comments on this issue.


Notes

Finalization and linear allocation

The Overview document says that there is no provision for finalizing or freeing a table slot, but proposes that an index to a pending slot could be used.

That, of course, won't work for more complex allocation patterns (e.g. I retrieve three DOM elements, free the first and third ones and keep the second around in a global -- the third one gets the pending slot, and we can't free the first one). This means that such an object will be leaked forever, with no ability to unroot the object in it.

Proposal: If the copy_elem proposed in issue #4 is implemented, one could finalize a slot by copying an uninitialized null slot over it. The slot would still not be able to be reused with a simple "pending slot" approach, but at least the object doesn't leak forever.

For a more robust solution, the WASM code should track free slots at runtime. For that, a companion swap_elem opcode to sort the table into short-lived "stack" slots and long-lived "heap" slots would be a bonus, but one could obviously do it the long way with three copy_elems.

The expectation so far seems to be that slots are stack-allocated and short-lived, but I don't see that being the case in real-world code. This can also be seen in the NEXT_SLOT global approach to export binding object allocation. Perhaps in the future, other approaches could be explored for allocation strategy (e.g. formalizing separate "stack" and "heap" spaces. One cheap, common idea is the lower end of the table space be reserved for long-lived "heap" objects, and a NEXT_SLOT global that counts down from the end of the table be reserved for short-lived "stack" objects)

Exception object type as anyref or its subtype

We have been discussing using anyref as an exception type (or its supertype) recently, as @eholk mentioned in #9 (comment).

The current exception handling proposal imposes several difficulties (WebAssembly/exception-handling#30 and WebAssembly/exception-handling#31), and allowing opaque exception objects to be stored in locals and be dynamically type-tested can solve most of those problems. (They don't have to be stored in linear memory.) And the proposed anyref as WASM value (#9) type sounds like it satisfies many requirements.

One thing is, we use a 'tagged value' to represent an exception object. An exception object is a pair of a tag and a list of values. Definitions of the related terminologies are here. Tags can be used many ways, possibly to denote types (int, MyException&, ...) or languages (C++, JavaScript, ...). In C++ exception support, we are using them to denote languages (we can't do types with C++ because of inheritance and such): so for example a specific tag can mean C++, the other tag can mean JavaScript, etc.

Can we treat anyref as tagged values in general? It might be possible for most of non-exception objects to have the same predefined tag, making them essentially tagless. This way, we would also need a match instruction that dynamically tells if the current object (on stack) has the specified tag or not.

Or, can we make the tagged value type as a subtype of anyref? In that way we would also need some instructions like isinstanceof as suggested in #4, as well as also the match to dynamically test the tag. This way we need to introduce supertype/subtype hierarchy in the system.

cc @eholk @dschuff @KarlSchimpf

IndexedDb and XHR

Hello,

Thank you for the work on this proposal! I'm very much looking forward to it's implementation (it is my understanding that this will make browser I/O much faster).

I have a few of questions:

Will support for IndexedDb be included?
Will support for XHR (or it's modern incarnation) be included?
What is the status of this proposal (there haven't been any activity on the repository for almost a year)?

Again, thank you very much for your time and effort!

Performance concerns about UTF-8 strings

I've written some Rust web apps which are compiled to WebAssembly and run in the browser. I'm using wasm-bindgen for this.

Generally the code runs really fast (since both Rust and WebAssembly are quite fast), but there's one area in particular which is an order of magnitude slower: strings.

Rust uses UTF-8 strings, and so whenever I send a string to/from JS it has to encode/decode it to UTF-16. This is done using the browser's TextEncoder.encodeInto and TextDecoder.decode APIs.

This is as optimal as we can currently get in terms of speed, but it's not good enough, because encoding/decoding is way slower than everything else:

DevTools Screenshot

The total script execution time is 2413ms. Out of that, 494ms is from the browser's decoding, and a further 434ms from garbage collecting the JS strings. So that means just passing strings from Rust to JS is taking up ~40% of the total execution time!

This encoding/decoding is so slow that it means that a pure JavaScript app is faster than a Rust app (the JS app only takes 1236ms for all scripting + gc)!

These are not big strings, they are all small strings like "row", "col-sm-6", "div", etc. The biggest string is "glyphicon-remove".

Unfortunately, I cannot avoid this string passing, because I'm calling native web APIs (like document.createElement, Node.prototype.textContent, etc.), so the usual solution of "move everything into Rust" doesn't work.

This is clearly a known concern for WebIDL bindings, which is why the proposal includes the utf8‑str and utf8‑cstr types.

However, I'm concerned that WebIDL won't actually fix this performance problem, because the browsers (at least Firefox) internally use UTF-16, so they'll still have to do the encoding/decoding, so the performance will be the same.

I know this is an implementation concern and not a spec concern, but is there any practical plan for how the browsers can implement the utf8-str and utf8-cstr types in a fast zero-copy O(1) way without needing to change their engines to internally use UTF-8?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.