ewasm / design Goto Github PK

View Code? Open in Web Editor NEW

1.0K 1.0K 123.0 1.39 MB

Ewasm Design Overview and Specification

License: Apache License 2.0

eth1 ethereum ewasm specification webassembly

design's People

Stargazers

Watchers

Forkers

jalateras tryggvil alphablockchain bnjbvr franklinliu roomos jpitts cequencer alexxnica kryndex honggyukim fluon max257 pettybell theotherside books55 monsoon15 zcfeel rude9 humanely ego-123 piouslove jamesray1 willemneal hugo-dc wasserfuhr miller-kk xsls cyjseagull zheewang hodgeseegle realityforge nikvolf lrettig eakintev happy-ferret dennwc eternalerrors mnp014 jdetychey blutooth come-maiz howaichun michaekang smartcontract light-chain jakelang decayboy gustavorssilva stevengu999 chfast paulpw nwilson1412 achalasb mrchico bakaoh tinychain unsol walkerq rbndigital yyh1102 daviddao solversa kanthgithub poemm devrajsinghrawat ethereumprog dm4 s1na maimai88 chainpunk rbchi franklinwaller figitaki terry2012 seanyoung aihuawu hayscoding rampenke chriseth secp256 catalyst-network etherscan-io princesinha19 ocean-dev forksbot sintan1071 misswhite minaminao fcmonoid cipheroth defi-tools second-state qiangyuzhou anchalsri82 kenun99 h4x3rotab masstensor yintellij dispensable1988

design's Issues

EEI: calls and return data

I think since Byzantium has finally introduced returndatacopy and returndatasize the original result buffers of call* are obsolete (create is a different beast).

It would clean semantics and implementations a lot of if remove the result buffer support from calls and require clients to use returndata*.

Related #12

Wasm interface/environment naming

eth_interface.md lists the exported method to the Wasm interface. These methods will be effectively available in the languages used to write contracts, such as C, C++ and anything else having an llvm backend.

These methods are made available via import statements (e.g. (import $codeSize "ethereum" "codeSize" (result i64)))

Every import statement has the following properties:

local identifier (in the scope of the wasm code)
environment
method name
(return value)

Currently the environment in ewasm is set to ethereum.

There is an easy way to compiler C or C++ code online: http://mbebenita.github.io/WasmExplorer/

This llvm compiler has env hardcoded as environment and the method names are mangled according to C/C++ rules.

For C input, it is the raw method name:

extern unsigned long long codeSize();

int main() {
  codeSize();
}

becomes in C99 mode:

(module
  (memory 1)
  (export "memory" memory)
  (type $FUNCSIG$j (func (result i64)))
  (import $codeSize "env" "codeSize" (result i64))
  (export "main" $main)
  (func $main (result i32)
    (call_import $codeSize)
    (return
      (i32.const 0)
    )
  )
)

while in C++14 mode:

(module
  (memory 1)
  (export "memory" memory)
  (type $FUNCSIG$j (func (result i64)))
  (import $_Z8codeSizev "env" "_Z8codeSizev" (result i64))
  (export "main" $main)
  (func $main (result i32)
    (call_import $_Z8codeSizev)
    (return
      (i32.const 0)
    )
  )
)

Similarly, C++ namespaces are part of the name mangling:

namespace ethereum {
  extern unsigned long long codeSize();
}

int main() {
  ethereum::codeSize();
}

becomes

(module
  (memory 1)
  (export "memory" memory)
  (type $FUNCSIG$j (func (result i64)))
  (import $_ZN8ethereum8codeSizeEv "env" "_ZN8ethereum8codeSizeEv" (result i64))
  (export "main" $main)
  (func $main (result i32)
    (call_import $_ZN8ethereum8codeSizeEv)
    (return
      (i32.const 0)
    )
  )
)

I'm not sure the environment field will be accessible from C/C++, but it could easily become an attribute: __attribute(environment("ethereum"))__

Unless that happens and considering the complexity it might cause for other languages, perhaps the best choice is to stick to the standard environment and include ethereum in the method names:

ethereum_codeSize
ethereum_caller
and so on

In C/C++ this means importing them with the C name convention:

extern "C" {
  unsigned long long ethereum_codeSize();
}

EEI: Account handle

The main issue with the cost of e.g. getBalance() is the fact that the account lookup in the database might be needed. This idea is to split the account loading from the accessing account metadata.

Instead of

getBalance(address, resultOffset);

we should have

handle = loadAccount(address);
getBalance(handle, resultOffset);

Similarly to getBalance() there should be getters for code hash, code, code size and others.

Pros

The cost of account lookup is separated from the cost of the getter.
Getting information about the current account should be cheap because it is already loaded (the handle to the current account could be predefined).
The cost of accessing the same account multiple times is lower.

Cons

Handles must be deterministic as they may leak. Simple solution would be to have an array of loaded accounts and return the index in this array. Each call to loadAccount would append new entry to the array.
Contracts are responsible of accounts management.

Alternatives

Just dump account matadata to memory: loadAccount(memoryOffset). This is simpler, but might waste some memory when contract is not interested in all data. Is is also not extensible, i.e. we cannot change the account representation in the future.
Extension of the alternative 1 where the contract specify the bit mask of account fields it is interested in, e.g. loadAccount(BALANCE | NONCE, memoryOffset). This at least allows adding more fields to the account in the future. But the output would be a mess, especially when getting the account code is considered.

Rearch

relaxed queus for async messages http://www.faculty.idc.ac.il/gadi/MyPapers/2015ST-RelaxedDataStructures.pdf

August 31, 2016 @ 11:00 AM EST / 4:00 BST / 5:00 CEST

This is a planning issue for next Wendays weekly meeting.

WebAssembly (may) switch from AST to being a stack machine

See the conversation at Binaryen: WebAssembly/binaryen#663

We need to investigate how does this affects eWASM.

Specify a license for this repo

Probably Apache 2.0 makes sense given its consideration for patents.

(Also webassembly/design uses Apache 2.0.)

@wanderer @gcolvin are you fine with Apache 2.0? It is only three of us as contributors to this repo so far.

optionally merkle-ized storage

My understanding is that the only function of merkleizing the entire storage space is for thin client support. I wonder this is an opportunity to introduce an optional storage space which is not summarized at all. The burden of proof is on the dapp dev if they want to use this space and roll their own app-level thin client proof solution.

For some apps the limiting factor for transaction bandwidth is plain old cpu speed on one thread. Sharding is not enough to be competitive with centralized exchanges or architectures like bitshares

Design a poll-based interface

In case WASM will not support asynchronous methods we might need to take an alternative approach.

A simple way is to have each operation work with polling, storageLoad would become:

ethereum.storageLoad(args)
ethereum.storageLoadResult() -> i32 (where 1 means result is written to the specified memory location)

And the following pseudocode in WASM:

ethereum.storageLoad()
do { 
  result = ethereum.storageLoadResult()
} while(!result)

The JS interface would need to do the sleep then. It's not efficient at all, but there's no way to sleep in WASM.

Metering: optional but enabled by default or required to enable or disable

eWASM defines metering as an optional layer to accommodate for these use cases.

https://github.com/ewasm/design/blob/master/rationale.md

While I haven't read in further details yet, assuming that this is an argument to a function, should this be optional, but enabled by default, unless specified explicitly as disabled, for greater security? Or alternatively, require to explicitly state whether it is enabled or disabled.

Arbitrary call return sizes

In EVM the caller must define the available space for return values.

In the current EVM2 design this is carried over, however it could be improved:

call doesn't defines the return value space
return doesn't writes the value to the caller's memory space (I understand this today can depend on the VM anyway)
return places the values into an intermediate, in-memory storage (callstore) and the contract is charged according to the size of this
a new opcode, callResultCopy is introduced for copying between callstore and the caller's memory space (and callResultSize to retrieve the total size)
the callstore is erased when a call is executed

It is a rough design, but it could be ironed out.

"Standard library" as "system library"

Based on the library design described in #17, suggest to support a small standard library at address 0x00..0f to have the following exports:

memcpy(dst:i32, src:i32, len:i32)
memset(dst:i32, val:i32, len:i32)
TBD

Propose an ewasm subset for precompiles on the main chain

The only features ewasm would need to expose are:

useGas
calldata access (calldatacopy/calldatasize)
return / revert

This could be a way to get wasm VMs implemented and experimented with in a more controlled environment on the main chain.

It is not clear whether the precompiles would have "magic gas calculation rules" or just use "a metering process" on them.

Backwards Compatibility; Secure Metering Isolation

Overview

There are three ways to achieve backwards compatibility with EVM1's gas prices in EVM2 in a secure manner. Currently in the prototype we are just leaving EVM1 gasPrices unmodified but this is not secure under the assumption that some severe mismatch in the gas price of an EVM1 opcode is found in the future that doesn't affect EVM2.

Meter all the EVM1 contracts in with the EVM2 gasprices which target 0.5 second processing time. This is perhaps the cleanest way but it has the disadvantage to this is that some EVM1 opcodes and precompiles will cost more than they currently do which could break some contracts (it will also force all contract to run wasm VMs, there will be no way to fallback to EVM1 implementation when running an EVM1 contract).
Lower all the EVM2 gasprices to the point that all the EVM1 opcodes and precompiles cost less or equivalent to what they cost now. This would might have the effect of lowering the overall gasLimit for EVM1 contracts and may make some EVM1 contracts unusable
Have separate gasPrices for EVM1 and EVM2 contracts. This is the most pluralistic option but is more complex than 1 and 2.

This issue is to explore option 3.

Rational

As the recent DOS attacks have shown metering in EVM1 can be inaccurate therefore EVM1 contracts need to have isolation from EVM2 contracts if they are metered differently. Having different Metering Types for EVM1 and EVM2 would allow nodes to change gas limits and accept different gas prices for each type. In case of a DOS attack on EVM1 the gaslimit could be lowered just for EVM1 without affecting operation of EVM2 contracts.

Metering Types

name	binary encoding
EVM1	0x00
EVM1 (after EIP 150)	0x01
EVM2	0x02

BlockHeader structure

Change the gasLimit and gasUsed fields to an array containing [meterType, gasLimit, gasUsed]. If one of the metering types is omitted then this block doesn't contain any computation of that required that type. Gas Limit is calculated independently for each gas type using the canonical gas limit calculation as defined in the Yellow Paper.

Tx structure

In the tx replace the gasLimit and gasPrice fields with an array the contains the elements [[meterType, gasPrice, gasLimit] ... ] where each type must be a unique metering type. If one of the metering types is omitted then this tx will not fund any computation of that required that type.

Re-metering

This has been suggested by @poemm as a possible solution for updating gas costs in deployed contracts.

The current proposal is to meter contracts at deployment time which would lock in gas costs from that point on. In this method any "metering statements" (aka. call $useGas) does not receive any special handling and is just treated as a regular call.

With re-metering we could have two options:

update the constants in previously inserted metering statements (with the special rule of handling the first statement in each block)
always remove metering statements prior to metering

specify layer 1 compression

Our layer one compression will be slightly different from wasm canonical layer one compression. Since we have global knowledge of all the code in the Ethereum State we can duplicate on that global state. This make layer one compression more efficient.

ref

Exception (or error reporting) system

EVM1 provides no option to convey why an executed was stopped. We should think about ways supporting different exceptions, perhaps even with messages.

This isn't only a change for the VM, but the protocol: transaction receipts should also include the execution outcome.

EEI: document exception conditions for the methods

What do they do when values are out of bounds? (Many of them currently in Hera throw an exception.)

Need to also what are the in bounds values.

EEI redesign process

This proposes the process of introducing changes in the EEI specification to move it from the revision 2 to the revision 3.

Each of the methods in EEI should be reviewed and discusses (with proposing alternative solutions) as an individual discussion threads. See examples:

#106
#105

When a consensus about the changes is reached, changes are applied to the document.
In some cases we might want to reach out to https://ethresear.ch or https://ethereum-magicians.org.

In the end the revision 3 is published as a draft to be reviewed by broader audience.

In what form do you want to keep the documents? Do we want to keep the obsoleted revisions?

Review if the reason for disallowing the start function still exists

Disallowing the start function and using a main function was decided based on a limitation in the Wasm Javascript API.

I think it is time to review this limitation and if it still holds.

Clarification of SelfDestruct: multiple invocations will overwrite the benficiary address.

From here: https://github.com/ewasm/design/blob/master/eth_interface.md#selfdestruct

I don't understand what this means:

Note: multiple invocations will overwrite the benficiary address.

Does it mean the beneficiary changes to the address specified by the most recent SELFDESTRUCT call?

Testing in remix, it seems the first address is the beneficiary.

Meeting - WENSDAY: August 10, 2016 @ 12:00 AM EST / 5:00 BST / 6:00 CEST

This is a planning issue for next Mondays weekly meeting. Here is the link https://meet.jit.si/DizzyGorillasRejoiceAlone
@gcolvin says the 10:00 AM is too early. If we move to 11:00 AM will that be too late for @axic @chriseth ?

Also feel free to add comments for agenda items.

New EEI method: abort

Abort execution and store a reason.

Parameters

reasonCode i32 the reason code
descriptionOffset i32 the memory offset to load the reason text from
descriptionLength i32 the length of the reason text (limited to 32 bytes)

Returns

nothing

Investigate C/C++

There is an easy way to compile C or C++ code online: http://mbebenita.github.io/WasmExplorer/

Two-level imports aren't supported yet: WebAssembly/design#522 (comment)

Investigate Rust

There is an ongoing effort in having a Rust compiler to output WASM:

(wasm/wast toolkit written in Rust: https://gitlab.com/DanielKeep/wasm)

Do not transfer ETH with selfDestruct

In EVM1 the SELFDESTRUCT is messy and complex due to the additional ETH transfer coupled with it.

Simplify selfDestruct by removing its address argument and do no transfer ether. The ether is destroyed with the account.
Add transfer() function that only transfers ether to other account. The target account's code is not executed. This function is needed to implement SELFDESTRUCT in evm2wasm.

add FAQ

It would be good to have an FAQ, with topics like

what are the high level goals of this project
whats the timeline
will this be compatible with WASM
how will gas work
how will solidity and serpent work

Formal-spec style description of the EEI

https://github.com/ewasm/design/blob/master/eth_interface.md

We should be trying to follow the format of the WASM spec here, instead of having our own style/format. This will force us to disambiguate a lot of the behaviour here which is potentially ambiguous.

I can do this work if people would like, or at least start the branch doing the work. Just want to make sure there is agreement that this should happen.

Define ABI

The word ABI is overused in ethereum and I think there are several levels to it. Not all of them are properly documented:

the way contracts pass data between themselves (was depending on the language, now it seems to be converging), this includes precompiled contracts
the way external inputs are entered to the contract (both during calls and creation) - it is defined in the Contract ABI
the way storage is used (specific to the language)

Since eWASM changes the word size from 256 bit to at most 64 bit, it is important to state whether it will follow the same ABI for contract data passing or define a new one, more appropriate to its word size.

should `unreachable` not OOG and just revert the state?

Currently unreachable works just like an invalid opcode in EVM1, but it could provide a nicer way of handling reverts

Support Constantinople changes

It seems the relevant EIPs are:

Last discussion here: ethereum/pm#50

Move precompiles into the Ethereum state

The system contracts are currently defined as precompiles and live at addresses 0x000...0 to 0x000...b.

We should research how these can be moved into being part of the Ethereum state.

Call stack metering / deterministic depth restriction

The stack size and stack depth may be different on different engines (or machines the engine) is executed on, especially in the case of a JIT engine. The target machine stack size could have a big influence and potentially introduce non-determinism.

The number of locals can influence the amount memory used in a stack frame and the depth of the call frame may be different.

Investigate how Solidity could output wasm/llvm

Current Solidity code is tied to emitting EVM opcodes. Investigate the possibility of moving it to LLVM or emitting WASM directly.

add Gregs VM to the VM comparisons

https://github.com/ethereum/evm2.0-design/blob/master/comparison.md

August 17, 2016 @ 11:00 AM EST / 4:00 BST / 5:00 CEST

This is a planning issue for next Wendays weekly meeting.
Since we have been having troubles with Jitsu lets try google hangouts.

https://hangouts.google.com/hangouts/_/qgkj5jkxzjhnbhm567f54ui6tqe?authuser=0&hl=en

getBlockTimestamp need to return an i64?

eWASM prototype todo

Rust-libeth not shared.

I followed this link from the readme, and got the standard github 'page not found'
https://github.com/ewasm/rust-libeth

Dumass

Backward compatibility

We have several options for backward compatibility EVM1

run both the VM's and use wasm's magic number to determine which VM to run the code in
write an EVM1 interpreter in EVM2
transpile EVM1 contracts to EVM2

would be the easiest to do but would have the additional concern of twice the surface area for consensus breaking bugs.

Non-determinism with division by zero

Source: https://webassembly.github.io/spec/core/_download/WebAssembly.pdf

pp48:
idiv_u𝑁 (𝑖1, 𝑖2)
• If 𝑖2 is 0, then the result is undefined.

idiv_s𝑁 (𝑖1, 𝑖2)
• If 𝑗2 is 0, then the result is undefined.
• Else if 𝑗1 divided by 𝑗2 is 2𝑁−1, then the result is undefined.

irem_u𝑁 (𝑖1, 𝑖2)
• If 𝑗2 is 0, then the result is undefined.

irem_s𝑁 (𝑖1, 𝑖2)
• If 𝑖2 is 0, then the result is undefined.

pp61:
trunc_u𝑀,𝑁 (𝑧)
(Not a problem since this is floating point only.)

pp62:
trunc_s𝑀,𝑁 (𝑧)
(Not a problem since this is floating point only.)

(raised by @holiman)

selfDestruct clarification

Continue of discussion started here:

@pepyakin :

What does it mean "the contract shall halt execution after this call"? Is it means that contract code is supposed to some how return control after calling selfDestruct? What if it doesn't do so? Why just don't trap?

@axic :

Next time please open an issue on the repo - it is way harder to track it that way.

It just means that effectively selfDestruct is just a marking for deletion a buffer. Any subsequent calls overwrite that buffer. Any successful halt will enforce the selfdestruct.

Combining it with a trap would put the condition detection onus no the VM implementation - not all (especially browser) VMs make that easy.

Hm, I had an impression that VMs should usually provide a way to trap from the inside of the host function with the ability to distingish between different host traps.

For example, binaryen implements traps with exceptions, you can just implement selfDestruct to trap with a special string or just throw your own exception. WAVM is also lets you do the same by throwing and catching exceptions, which possibly created by the embedder.

As for browser trapping inside the browser definitely possible! traps can be implemented by JS exceptions and JS exceptions are easy to discriminate.

For example, this is how abort implemented in expiremental wasm musl implementation.

here is definition of TerminateWasmException
upon a call to abort this exception is thrown.
the code of main start executing here in a try block.
TerminateWasmException is caught here

So it seems easy to me. Maybe I misunderstood you?

And about the current approach:
what should happen, if contract called selfDestruct and then tried to touch the storage (read or write), call to a create create or *call?

handling code validation

On account creation the if the code is invalid should it OOG?

WASM modules as libraries

EVM doesn't have a real concept of libraries, rather it was added retroactively with DELEGATECALL and that Solidity ensures a contract defined as a library cannot make use of SSTORE/SLOAD. The VM however still needs to consider it the same as other contracts and ensure that proper rollback mechanism is in place.

A WASM code is called a module, which defines the memory needed and has one or more functions.

One of the premises of using WASM is that we wouldn't need precompiled contracts given the speed loss caused by the bytecode is insignificant compared to EVM. Not using precompiles can also lead to a lot of code duplication.

I think it could be useful supporting a way to store WASM modules on the blockchain, which could be loaded by contracts during the linking stage. Perhaps these modules would be special contracts, which are not meant to be executed.

Ideas welcome how this could fit into the blockchain model we have.

Rename "return" to "finish"

The problem is that return is usually a keyword in languages and therefore in most of them when importing the return EEI method the user has to use an alternative name.

The benefit of the change is that this won't be a problem anymore.

Test suite for the EEI

Each test case should have:

wasm bytecode
account / block / tx state
expected return or revert data

The following methods must be tested:

I'd suggest to start with return and then the others can use it to return data thus reduce the complexity of checking the test's output.

Bootstrapping into useful testing phase

It would be fairly simple to move this from an isolated test into a more useful testing framework by adjusting one of the VM implementations.

The adjustment would include to run the eWASM VM when a contract bytecode starts with the WASM bytecode signature (\0asm).

This can be easily achieved by using ethereumjs-vm, which then would provide a state and full blockchain.

Depending on ABI decision (#1), wrapper methods for callCode and delegateCall might be needed to transform between the new and current ABIs.

sstore/sload without fixed field size

Currenty sstore/sload writes/reads in 256 bit chunks - similarly to EVM.

They could be changed to have a 3. parameter for length:

length must be > 0
gas should be calculated according to the length

Determine Gas Price for opcodes

At some point we need to update the fee schedule with accurate gas prices.
One way to do this would be to do is to see how many cycle each equivalent opcode takes on physical hardware.