webassembly / multi-memory Goto Github PK

Multiple per-module memories for Wasm

Home Page: https://webassembly.github.io/multi-memory/

License: Other

Makefile 0.18% Python 4.95% Batchfile 0.18% CSS 0.01% Perl 0.13% Shell 0.06% HTML 0.03% OCaml 3.41% Standard ML 0.01% WebAssembly 88.16% JavaScript 2.22% TeX 0.01% Bikeshed 0.67%

proposal

multi-memory's Introduction

Multi Memory Proposal for WebAssembly

This repository is a clone of github.com/WebAssembly/spec/. It is meant for discussion, prototype specification and implementation of a proposal to add support for multiple memories to WebAssembly.

See the overview for a summary of the proposal.
See the modified spec for details.

Original README from upstream repository follows…

spec

This repository holds a prototypical reference implementation for WebAssembly, which is currently serving as the official specification. Eventually, we expect to produce a specification either written in human-readable prose or in a formal specification language.

It also holds the WebAssembly testsuite, which tests numerous aspects of conformance to the spec.

View the work-in-progress spec at webassembly.github.io/spec.

At this time, the contents of this repository are under development and known to be "incomplet and inkorrect".

Participation is welcome. Discussions about new features, significant semantic changes, or any specification change likely to generate substantial discussion should take place in the WebAssembly design repository first, so that this spec repository can remain focused. And please follow the guidelines for contributing.

citing

For citing WebAssembly in LaTeX, use this bibtex file.

multi-memory's People

Contributors

Stargazers

Watchers

Forkers

iamthatiam777 osa1 alexcrichton webstorage119 kustomzone ashleynh q82419 yamt anchao keithw backes seanpm2001 brugarolas

multi-memory's Issues

Use-case for multi-memory: Flutter Web Engine linking with wasm-compiled skia (C++) and ICU4X (Rust)

We are building a new mode for the Flutter Web Engine that compiles dart application code to WasmGC via the new dart2wasm compiler. As part of this effort, we are enabling direct wasm-to-wasm interop between the application wasm module and the skia wasm module. Our measurements have shown a serious improvement (~14x) in interop speed (vs JavaScript bindings) when wiring wasm imports and exports directly to each other. We have some other efforts we are working on to reduce code size which involve using a leaner version of skia which lacks some text layout functionality and supplementing that with ICU4X, which is a wasm-compiled Rust library.

However, without multi-memory support, linking directly on both of these modules and importing both of their memories is impossible.

Use-case for multi-memory: Less overhead tracking of store calls to facilitate networked rollback

A project I've been working on (https://github.com/kettle11/tangle) automatically networks WebAssembly without the Wasm needing to do anything and without adding excessive input latency. This works by borrowing a networking concept from games called 'rollback'. Rollback relies on determinism and takes periodic 'snapshots' of the code state to rollback and 'resimulate' if new events arrive from remote users.

At the moment Tangle works be periodically cloning the entire Wasm memory / globals, which works well enough for Wasm programs that use small amounts of memory. This approach scales predictably poorly as memory usage increases.

What would be better is if there were a less overhead way to track calls to store. I attempted calling out to the host whenever a call to store occurs but there was far too much overhead.

Another solution is to have each networked Wasm module declare a chunk of memory that Tangle can use to track calls to store, but this introduces room for user-error and loses much of Tangle's 'it just works' magic.

The ideal solution for Tangle would be able to be able to allocate some sort of global array, or other memory, that could be controlled by Tangle but written to from within the user Wasm program. It looks like the multi-memory extension would be perfect for this!

Data segment syntax incompatible with bulk-memory-operations proposal

I ran into this issue while trying to test a WASM compiler that implements both of the multi-memory and bulk-memory-operations specs.

So in the multi-memory proposal the load.wast test contains the following module:

(module
  (memory $mem1 (import "M" "mem") 2)
  (memory $mem2 3)

  (data $mem1 (i32.const 20) "\01\02\03\04\05")
  (data $mem2 (i32.const 50) "\0A\0B\0C\0D\0E")

  (func (export "read1") (param i32) (result i32)
    (i32.load8_u $mem1 (local.get 0))
  )
  (func (export "read2") (param i32) (result i32)
    (i32.load8_u $mem2 (local.get 0))
  )
)

Unfortunately as the bulk-memory-operations proposal requires, the data segments are parsed as having labels "mem1" and "mem2", instead of initializing those memories. To make this test compatible with the bulk-mem syntax, only a small change is required:

  (data (memory $mem1) (i32.const 20) "\01\02\03\04\05")
  (data (memory $mem2) (i32.const 50) "\0A\0B\0C\0D\0E")

It may be a good idea to change the multi-memory syntax to the bulk-memory-operations version since it's the more general one that would support both proposals.

Tracking Phase 4 Requirements

This is an issue to track the requirements needed to move multi-memory into Phase 4.

Two or more Web VMs have implemented the feature and pass the test suite
- V8
- Firefox
At least one toolchain has implemented the feature
- Binaryen
The spec document has been fully updated in the forked repo.
The reference interpreter has been fully updated in the forked repo and passes the test suite.
The Community Group has reached consensus in support of the feature and consensus that its specification is complete.
- #6
- #34

I plan to propose the advancement to Phase 4 after we meet these requirements. Please reply if there are any remaining concerns for the phase advancement that are not listed here.

Text format for data segments may be confusing with passive data segments

According to the current rendered spec the text format for data segments attached to non-index-zero memories is:

(data $mem (i32.const 0) "...")

but I think this could be confusing and difficult to parse with data segments that are otherwise named for passive segment initialization:

(data $foo "...")

;; ...
memory.init $foo

Perhaps data attached to a nonzero memory could be parsed similarly to elem segments with nonzero tables?

;; currently used for tables
(elem $elem_segment_id (table $a) (i32.const 0) (; ... ;) )

;; possibility for memory
(data $data_segment_id (memory $a) (i32.const 0) (; ... ;) )

How is a non-default memory index to be denoted in the text format?

The context is this comment. Currently wasm-tools will accept a memory index denoted simply by an integer trailing the opcode. This conflicts with the SIMD spec, which wants an integer trailing the opcode to denote a vector lane in some contexts; for instructions with both a memory operand and a lane operand the conflict must be resolved.

I can't find anything in the proposal or interpreter here about what the present proposal thinks is a reasonable way of denoting the memory index in the text format.

Similarly to offset and alignment, may we assume that a keyword must be used, eg memory=n or memidx=n? @rossberg, opinions?

A question about high level language interaction

while the spec is reasonably simple and clear at wasm bytecode level,
i'm not sure how it's supposed to interact with high level languages compiled into wasm. (eg. C)

is non-default memory supposed to be accessible from languages like C at all?
as memory index is encoded as an immediate, it doesn't seem straightforward to implement "far" pointers which can access those memories.

or, is this feature mainly for offline tools which rewrite instructions?

Use-case for multi-memory: dart2wasm to implement ByteBuffer/TypedList objects with linear memory

The dart2wasm compiler has so far used WasmGC for its object representations, meaning that the dart runtime itself does not require any linear memory as of today except to communicate with an external module. However, the dart objects ByteBuffer or TypedList are implemented via a WasmGC array. The issue with this is that we sometimes want to use these ByteBuffer or TypedList objects in browser APIs that take a JS TypedArray, and currently there is no way to create a TypedArray object from a WasmGC array. Due to this, we are looking at changing the implementation of ByteBuffer and TypedList to instead use regions of linear memory, from which a JS TypedArray can be created.

The issue here is that without multi-memory support, dart2wasm cannot have its own linear memory and also import memory from an external module. We could import malloc and free from the external module and just use regions of the imported memory in this case, but we are not always guaranteed to be linking on an external module. This means we would have to maintain two separate implementations for linear-memory-backed ByteBuffer/TypedList objects depending on whether we are importing memory from an external module or not. This is an unfortunate burden which would be greatly simplified by multi-memory support.

How would multiple memories work with function imports?

First of all, apologies if this is not the right place for feedback and questions. If this isn't the right place then just let me know what is.

It looks like from the proposed specification that memidx is contained in the immediate part of the memory instructions meaning that when a function is compiled it statically binds to a specific memory in the target module. But suppose same function was then exported and then imported into a separate module; is it expected that the this function would still be bound to this same memindex in the new module? If so, then this seems problematic--either the opportunity for code reuse by importing functions becomes limited to just those situations where the memindex in the exporting module and the importing module happen to align, or the importer would need some way to specify which memories a given imported function binds.

Multi-Memory Lowering & Memory Imports

The wasm-split tool was recently updated to support writing instrumentation data to a separate, secondary memory. With the addition of the multi-memory lowering pass in Binaryen, we attempted to simplify the wasm-split tool to instrument only with secondary memory and ran into a problem. After wasm-split creates the second memory, the lowering pass combines the memories into one, creating a single memory that adds the page sizes of all memories together. This is a problem when the main memory is imported from Javascript, as the memory size accessed during instantiation needs to increase for a split module. We don’t have a heuristic for knowing whether the module is split before the instance is created.

In general, creating the lowering pass for multi-memories has been difficult as there are problems without great solutions. For example, how should independent memories that are imports or exports be represented in a single combined memory. Similarly, how do we adjust non-Const data segment offsets for data segments that belong to a memory other than the first? It would be far simpler from a tooling perspective for browsers to add support for multi-memory.

Additional test cases for multi-memory

When testing the Wizard implementation of multi-memory, I looked into spectest coverage for the feature. There are a few tests for multi-memory, but I unless I overlooked something, I didn't find many.

In this titzer/wizard-engine@a41d70c commit, I copied+mutated most spec tests that reference memories to have multi-memory variations.

I'd like to contribute these tests to the spec repo, as I think they are valuable. The reference interpreter passes all of them.

Binary format is out of date

The decoder in this branch is out of date with the bulk-memory-operations branch.

In particular, for data segments:

This repo:

let segment dat s =
  let index = at var s in
  let offset = const s in
  let init = dat s in
  {index; offset; init}

Bulk memory:

let data s =
  match vu32 s with
  | 0x00l ->
    let dmode = at active_zero s in
    let dinit = string s in
    {dinit; dmode}
  | 0x01l ->
    let dmode = at passive s in
    let dinit = string s in
    {dinit; dmode}                                                                                                                     
  | 0x02l ->
    let dmode = at active s in
    let dinit = string s in
    {dinit; dmode}
  | _ -> error s (pos s - 1) "invalid data segment kind"

I.e. it appears this repo is interpreting the reserved 0 byte as i32 index, when it should be a flags bitfield, with bit 6 indicating the presence of a memory segment index.

Possibility to dynamically add memories at runtime?

I couldn't figure out the answer by reading the overview.
Is it planned to be able for host functions to add more memories to an already-running Wasm instance?

There are a couple of use cases that would need this:

A Wasm module could ask the host to spawn another Wasm module, in which case it is desirable to share a memory between the parent and the child for "inter-process" ("inter-instance") communication.
Implementing some equivalent to mmap.

Might be related to #9

Overlap with recent issue raised on the main WASM design repo

I noticed this multi-memory proposal when commenting on this WASM design thread about memory support:

WebAssembly/design#1397

It seems this multi-memory proposal already supports most of what we were asking for in that thread, with the notable exception of decommiting memory to "shrink" memory regions that have previously grown. Is it possible that memory decommitting could be considered in this proposal, or a separate interrelated proposal?

- Casey

Referencing locations across memories

I did a brief search for this in issues, but didn't find anything. I am curious about referencing memory locations across memories. This can be used for running the same code on different memories. Is there a first-class standard way of doing this? I think there might be a way to do it in a library though.

Toolchain support for multiple memories

Does anyone know the current status of multi-memory support in toolchains, e.g. LLVM? After a cursory search of LLVM commits, I didn't turn up anything.

Additional motivation: efficient memory mapping

An application may want to map en external buffer and read/write to it directly.

A concrete example is mapping a webgl vertex buffer into a wasm module to have it filled out based by decoding a model file.

Contradiction in multi-memory binary format instruction rules

Hi,

I'm implementing multi-memories in Binaryen and noticed a contradiction while following the Multiple Memories for Wasm proposal.

Under Binary format, the proposal says "For loads and stores: Reinterpret the alignment value in the memarg as a bitfield; if bit 6 (the MSB of the first LEB byte) is set, then an i32 memory index follows after the offset immediate." The problem is this ordering requires parsing the offset before the memory index. While the memory index is always i32, with the introduction of 64-bit, the offset can now be i32 or i64. Before adding multi-memories support, the code checked the index type of the memory before parsing either an i32 or i64 offset.

As a result, the following steps are now performed to follow the proposal as currently written:

Create an i64 for the offset
Identify the memory based on the index following the offset
Check whether the memory index type is i32
If step 3 is true, check if the offset is a larger number than what fits into an i32
If step 4 is true, throw an error
If step 4 is false, down-cast the i64 offset to an i32 offset

It would be better if the i32 memory index immediately followed the alignment instead of the offset, because then the index type would be known and the code would parse the offset accordingly. I propose changing the sentence above in bold to "then an i32 memory index follows after the alignment".

@rossberg @titzer @kmiller68 @eqrion @jkummerow - seeking thoughts & sign off, thanks!

Text syntax for SIMD lane index vs memory index operands

This proposal adds a memidx to the syntax for load/store instructions: i32.load 1 loads from memory index 1.

The SIMD proposal adds instructions with a laneidx immediate that load or store a single lane of a vector: v128.load8_lane 1 loads a scalar and replaces lane index 1 in a v128 operand.

If the proposals are merged, it's unclear whether the 1 in v128.load8_lane 1 should be parsed as a memory or lane index. It's possible to disambiguate it by saying that if there's a single index, it's the lane index, and if there are two indices, it's the memory index followed by the lane index.

Alternatively, we could require that memory index immediates are wrapped in a (memory ..) tag like in an active data segment definition. i.e.:

v128.load8_lane 2 would load a scalar from memory 0 and replace lane index 2.
v128.load8_lane (memory 1) would be malformed due to missing a lane index.
v128.load8_lane (memory 1) 2 would load a scalar from memory 1 and replace lane index 2.

Implementation support

SpiderMonkey has implemented the proposal and it should be available in Nightly builds of Firefox starting on July 8th. You will need to add a preference flag for javascript.options.wasm_multi_memory in about:config. If anyone runs into issues, please file bugs here.

Memory index immediate vs. operand

I read that the memory instructions will receive a memory index immediate. That means there is no way to write code that is agnostic to the memory it is located. I wonder if a memory index specified via an operand is on the table?

Use-case for multi-memory: Component Model object transfer browser polyfill

The component model (https://github.com/WebAssembly/component-model) is based on a "shared-nothing" contract between two wasm modules, which means they do not directly import each other's memory. In the component model, when two components (wasm modules) communicate, the host runtime is responsible for facilitating copies of objects between the two components according to what is specified by their interface types, which allows each component to have isolated control over its own memory.

However, no browsers support this functionality yet, and those working on the component model have produced polyfill code that emulates the functionality of a host with component model support within the browser. However, without multi-memory support, the code copying of these objects from one memory to another must be written in JavaScript, which essentially means all interop between components is burdened with a jump to the JavaScript environment and back. That has been shown in our microbenchmarking to be much more costly than direct wasm-to-wasm calls (~14x performance difference in our benchmarks). With multi-memory support, the host polyfill for copying objects across modules could be emitted in WebAssembly, which means we would be able to avoid a context switch into the JavaScript environment.

Performance and memory use considerations

The overview already covers the fact that many implementations reserve a register for the heap pointer and suggests that this could be reserved for memory 0. This is a good idea, but we might consider taking it a little bit further.

In particular, while we might want to use all sorts of tricks for the primary memory, including a heap base register and large vm reservations to avoid bounds checking on 64-bit systems, we might not wish to pay the cost of using those tricks on secondary memories, or it may be nice to avoid them if possible. On both mobile systems and in other contexts, with large reservations for every memory, it is possible to not run wasm effectively. We've had bugs reported from more restricted operating systems such as OpenBSD and (I suspect) VM-based installations that strongly restrict the virtual memory space for user programs.

At Mozilla we've been tossing around the idea for some time that it may be desirable to designate some memories as "optimized-for-speed" and others as "optimized-for-size", for the lack of better nomenclature. It's possible that we should consider these attributes in the context of multi-memory. An optimized-for-speed memory might get a dedicated register for the heap base, and the VM tricks, while an optimized-for-space memory would get the smallest allocation and probably explicit bounds checking.

The problem with the attributes is the same as with "shared" -- the importer and the exporter have to have matched attributes on each memory at linking time. Yet resource optimization a practical problem worth solving.

(Perhaps instead of optimized-for-speed and optimized-for-size the attributes should be "primary" and "secondary", and there can only be one "primary" per module.)

[js-api] Maximum number of memories

For JS embeddings, we need to define the limit for the maximum number of memories occurring in a given module. I'd propose 100 for starters.

JS-API limits test still tests for limit of 1 memory

The limit was raised to 100 memories in #27, but the test/js-api/limits.any.js test still tests for 1 memory:
https://github.com/WebAssembly/multi-memory/blob/main/test/js-api/limits.any.js#L20

Text format: parsing `v128.loadX_lane` and `v128.storeX_lane` requires backtracking

When a parser has parsed v128.load8_lane 0, it does not know whether 0 is a memory index and the memarg and lane index are still to come or whether 0 is the lane index and the memory index and memarg were both empty. To resolve this ambiguity, the parser has to initially assume the 0 is a memory index and if it encounters an error it has to retry parsing the instruction assuming 0 is the lane index. To my knowledge this is the only place in the text format that requires this kind of backtracking; I think we've tried to be careful to avoid it. Is it worth changing the format of memory indices to fix this? Unfortunately it's too late to change the format of the lane indices since they're already in the standard.

Use case for multi-memory: Sharing persistent datastructures between threads

Functional programmers are going to want to be able to share memory with other processes, because that's one of functional programming's big claims, persistent datastructures consist basically entirely of immutable parts so there's no obstacle to sharing, but from what I can tell, with today's shared memory features, this is unsafe. It could be made safe with a different type of memory that follows different rules, which I'll name Only-Write-On-Allocation (OWOA) memory.

I define OWOA memory as follows:

Each allocation can only be written once, during allocation, and is immutable and so safely sharable from then on.
References are resource handles. When all handles are released, the memory is freed. In a functional programming language this would tend to involve GC, but rust interfaces could do it with an intrinsic Rc type.
All accesses are checked, memory can only be accessed via valid references and accesses to offsets from that reference would need to be bounds checked, I do not know whether this is feasible at all. (You could remove this rule, correct and polite code would still work, but you'd usually want to share an OWOA with broad communities of processes, so incorrect code would be able to browse the OWOA, steal information from other processes by just looking at their allocations, and steal a lot more information through race-conditioned timing attacks)

A purely functional language could get by with just an OWOA memory, but a rust component couldn't, so it seems to me like we'd need multiple memories to make OWOA sharing practical.

I'm bringing this up because this (serialization, copy, deserialization, inability to share persisted datastructures in a large pool) seems like the main disadvantage wasm runtimes have next to erlang runtimes or other functional, compartmented runtimes, if wasm-gc had this, then I think every language could be together, at that point? 🙏

Potential alternative to this, though: I could imagine this being implemented by generally allowing wasm processes to lock parts of their memories and send a resource handle to other processes, that controls the lifetime of that stretch and enables access. This wouldn't require multiple memories, and it's notable that there only needs to be one OWOA memory for the entire runtime, so the need for multiple memories there is already kinda tenuous. But I don't know enough to say whether that would be unacceptably slow in a wasm jit. I'd be curious to know what maintainers think about this.

memarg encoding

How should memarg encode a non-zero memory index?

Using some range from the alignment field seems like the simplest way to do it in a backward compatible way. Fortunately, the alignment field stores log2(alignment), and so needs very little range. Using 3 bits for alignment is probably safe, and 4 bits is definitely safe.

So I propose that we use the lower 4 bits of the current alignment field to encode the alignment, and the 5th bit as a flag to indicate that the other immediate fields will be followed by a varuint32 memory index field.

Use case: Call stack sharing for microfunctions

We're trying to build an application where WASM is used to run "microfunctions", small stateless functions that can be written once (e.g. in Rust) and then ported via WASM to run in a variety of runtimes. A WASM Memory may be built once and then used again and again for multiple microfunction invocations. The buffers backing the WASM Memories would be owned and destroyed by the host environment.

The multi-memory proposal would be useful for our use case because it would enable a thread to carry around a Memory for its call stack (for Rust objects, not to be confused with WASM's operand stack) and plug it into any module, which has its own Memory for a heap and static variables. When only a single memory is used, it is much harder to reason about the thread safety of call stacks when multiple threads are calling the same module.

In other words, one Memory could be thread-local and mutable, and the other Memory could be cross-thread and read-only.

CC @hagbard @nciric @echeran

Related: rustwasm/wee_alloc#88

Read only memories?

Pardon my naivety, but one of the slightly weirder parts of wasm, from a system programmer's point of view, is the lack of separate data/rodata/bss segments. In particular I'm used to the OS being able to ensure data is read-only, and that trying to modify it will be caught. This is useful both for correctness but more importantly for security.

Has any thought been given to being able to mark memory's read-only, so they cannot be altered after initialization? This proposal seems like it is currently the best place to at least ponder the idea.

Is there a way to access a non-hardcoded memory?

The load and store instructions include a memidx, so they hard code which memory they are to access.
Ideally, I would like to have pointers that include both the memory index and the offset in it's value. I would like to be able to create functions that I can use for any memories. But if I understand how this currently works correctly, that's not currently possible? At least, not unless I essentially make a big switch case for all my memories for every load and store?

Zero copy and 2D context

HTML Canvas provides CanvasRenderingContext which uses ImageData handle to interface with a byte array. Interface with all of this via Wasm requires creating ImageData from a byte array and requires copying the array from ImageData into module's memory. It might be possible to do zero copy access to rendering context or image data directly using functionality from this proposal if one of those types exported WebAssembly.Memory().

Also see whatwg/html#5173