webmachinelearning / webnn Goto Github PK

View Code? Open in Web Editor NEW

368.0 368.0 45.0 7.42 MB

🧠 Web Neural Network API

Home Page: https://www.w3.org/TR/webnn/

License: Other

Makefile 0.12% Bikeshed 96.95% JavaScript 2.65% Python 0.28%

webnn's People

Contributors

Stargazers

Watchers

webnn's Issues

OperandType of gemm / matmul return

[This issue was originally posted at https://github.com/w3c/machine-learning-workshop/issues/86 ]

@kpu wrote:

The spec says gemm returns "an Operand" (and the same thing for matmul).

If both arguments are tensor-quant8-asymm, what is the OperandType of the return? I can see use cases for tensor-int32 which is how it will actually be generated by existing hardware, tensor-quant8-asymm for a fully quantized model, or even tensor-float32 for people that have only partly quantized their model.

This matters because the spec doesn't appear to have e.g. a requantization operator to convert int32 to int8 and anyway one would need the ability to set the scaling factor used by running the model in advance to measure an appropriate scaling factor.

[Explainer] explain the workflow from authoring tools to webnn

The developers usually use some kinds of machine learning authoring tools, such as TensorFlow, PyTorch and others, to create and train the models. It would be helpful that the explainer could explain the workflow that how the developers can deploy and infer the trained models by webnn in web browsers. This would cover:

Model conversion/adaption
Handling of missing ops/data types
The efficiency of the model inference

This might not a complete list. Please feel free to add anything I missed.

Fingerprinting via machine-specific artifacts

Apropos of #3 and webmachinelearning/webmachinelearning-ethics#22, an efficient matmul implementation can be fingerprinted to determine hardware capabilities.

On pre-VNNI Intel, the only efficient way to implement 8-bit multiplication is via pmaddubsw that produces a 16-bit result summed horizontally with saturation. I can construct matrices that test for this saturation, which indicates a pre-VNNI Intel CPU. Whereas ARM and NVidia implement signed * signed to 32-bit.

Saturating addition, which should be used for accuracy lest you generate large sign errors, can be used to infer the order of operations. So vpdpbusds saturation tells me what order the matmul ran in.

The slowdown from using AVX512 instructions is likely detectable with timing.

In floats one can also infer order of operations from rounding. This would reveal the SIMD length and possibly variations in the compiler used to build the user agent. A cache-efficient matmul implementation reveals cache sizes via floating point order of operations.

OperandDescriptor's dimension should match ECMAScript's TypedArray dimension

In OperandDescriptor, the size of each axis is defined in "dimensions" as "long".

The limits of the "long" type do not match the length definition of TypedArray.

For consistency and to avoid bugs caused by conversion, it would be better to accept a Number in "dimensions" and use ToLength() to obtain the size.

Define set of exceptions for WebNN API

WebNN API would need exception handlers which report appropriate error status for an API (other than the default JavaScript runtime exceptions).
These exceptions would be used by developers to catch specific states of error in WebNN API.

TensorFlow conv2d expects channel_last filter layout regardless of input layout format

From the WebNN conv2d spec, the same layout parameter controls both the input and filter layout (below). In TensorFlow (and presumably TFLite), regardless of the input layout, the filter layout remains in the "channel_last" format i.e. [height, width, input_channels/groups, output_channels].

"nchw":
input tensor: [batches, input_channels, height, width]
filter tensor: [output_channels, input_channels/groups, height, width]
output tensor: [batches, output_channels, height, width]

"nhwc":
input tensor: [batches, height, width, input_channels]
filter tensor: [height, width, input_channels/groups, output_channels]
output tensor: [batches, height, width, output_channels]

WebNN needs to support models requiring dynamic shape inference

One such models is NSNet2, which requires shape and constant-of-shape operation. Additionally, static shape inference should also be supported as it's a majority case to many models.

Backwards compatibility of the initial set of ops

Spun off from #17 and as agreed on 22 Aug 2019 CG call, this issue is to discuss and investigate backwards compatibility of the initial set of ops.

Support declaring and setting graph inputs and outputs by string keys

As mentioned in WebML CG call on 28 May, this proposal is based on a conversation with @wchao1115 and @RafaelCintron.

The motivations are:

Fix an issue of existing spec that doesn't allow a graph to declare the keys of its inputs and outputs.
It is more developer friendly to use string-based keys and it has better alignments to model-loader API.

The proposal includes:

when building a graph, allow to declare the inputs and outputs by string keys.
when executing a graph, allow to set the input and output buffers by string keys.

There are two examples to showcase the difference between the existing API and the new proposal.

Both examples build and execute a graph with topology as:

The example code by using existing spec:

// ISSUE: there is not a chance to declare the keys of graph inputs.
const a = nn.input({type: 'tensor-float32', dimensions: [3, 4]});
const b = nn.input({type: 'tensor-float32', dimensions: [4, 3]});

const c = nn.matmul(a, b);

// ISSUE: the key of graph output is implied by the order of the elements in sequence.
const model = await nn.createModel([c]);

// Skip the code of creating compilation and execution

// ISSUE: It is obscure to tell which input/output has index 0 or 1.
execution.setInput(0, bufferA);
execution.setInput(1, bufferB);
execution.setOutput(0, bufferC);

await execution.startCompute();

The example code by using the new proposal:

// SOLUTION: explicitly declare the key of graph inputs by string.
const a = nn.input('a', {type: 'tensor-float32', dimensions: [3, 4]});
const b = nn.input('b', {type: 'tensor-float32', dimensions: [4, 3]});

const c = nn.matmul(a, b);

// SOLUTION: explicitly declare the key of graph outputs by string.
const model = await nn.createModel([{name: 'c', operand: c}]);

// Skip the code of creating compilation and execution

// SOLUTION: set the input and output buffers by string keys explicitly.
execution.setInput('a', bufferA);
execution.setInput('b', bufferB);
execution.setOutput('c', bufferC);

await execution.startCompute();

[op compatibility] matMul

This issue will track op compatibility resolution for matMul.

Signature:
matmul(a, b)

Arguments:
a: n-dim tensor
b: n-dim tensor

Docstring:
If both a and b are 2-D they are multiplied like conventional matrices. If one of a or b are 1-D this is treated as a matrix times vector dot product.

If either argument is N dimensional, N>2, it is treated as a stack of matrices (rank-3) with dimensions corresponding to the inner two indices. The matrix multiplication will be broadcasted accordingly.

Example:
If a has shape [2, 3, 4, 5] and b has shape [5, 4], the resulting tensor will have a shape of [2, 3, 4, 4] as a is treated as a size 2 * 3 = 6 stack of [4, 5] matrices. These get broadcast multiplied over the [5, 4] matrix creating 6 [4, 4] matrices. They keep the original shape of a's outer dimensions, resulting in a shape of [2, 3, 4, 4].

Notes:

Does not support fused bias (this will be taken care of by graph optimizer)
Does not support transpose arguments (this is taken care of by graph optimizer)

To be discussed:

Compatibility with underlying APIs / hardware

Add a more advanced example

We should add a more advanced example to the spec that makes use of (some of) the new ops we've added to the spec recently. We should keep a simple "Hello World" style example and in addition add another more advanced one within a reasonable LOC limit.

The examples in specs are often a starting point for web developers who try out a new API for the first time, so we need to make sure the examples are maintained along the spec definition.

WebNN API is not support Mirrorpad, SquaredDifference, Pow, TransposeConv

As a use case about style transfer, source is https://intel.github.io/webml-polyfill/examples/style_transfer/?prefer=none&b=WebGL&m=fast_style_transfer_onnx&s=image&d=0&f=WebNN,
Some ops style transfer used is not be supported in current WebML API spec.
Please add following ops to support list. thanks.
mirrorpad, SquaredDifference, Pow, TransposeConv for tflite format models.
mirrorpad, Pow, ConvTranspose for onnx format models.

Create plan for how to move forward with Inference API

This is a followup to Issue 41.

Now that the group has decided that an inference API (also known as Load/ Run Model API) is within the charter, let's define the steps to move forward with this.

I've talked with some web standards experts, and will update this thread with concrete steps.

Remove Compilation.finish and setPreference

As thoroughly discussed in this PR as related to the failing state of Compilation.createExecution, both the Compilation.finish and Compilation.setPreference method are redundant to what createExecution can provide. The removal of setPreference can be remedied with a param passing to createExecution as @RafaelCintron pointed out in the thread.

Executing models

Per resolution on the 14 Feb 2019 call, this issue is for discussing requirements for an API for executing models.

@gregwhitworth, would you be in a position to help kick off discussion in this issue? I believe learnings from ONNX.js would be good input.

Remove quantization-specific params from OperandDescriptor

Keep OperandDescriptor scoped and versioning friendly by removing quantization specific params e.g. scale and zeropoint, while keeping OperandType enum values as a straight data type enum. Quantization-specific params could be made arguments of a new Operand making overload.

[op compatibility] conv2d

This issue will track op compatibility resolution for conv2d.

Signature:
conv2d(x, filter, padding, strides, dilations)

Arguments:
x: 4-dim tensor with logical shape [N, H, W, in_channels]
filter: 4-dim tensor with logical shape [H, W, in_channels, out_channels]
padding: array of ints. Padding for the beginning and ending along each spatial axis.
strides: array of ints that have length 1, 2 or 4. The stride of the sliding window for each dimension of input.
dilations: array of ints of length 2: [dil_height, dil_width] The dilation factor for each spatial dimension of the input.

Docstring:
Computes a 2-D convolution given 4-D input and filter tensors.

Notes:

Tensor layout should be a single logical type to leave transpose graph optimizations up to the browser (this is much more portable because the model author does not have to think about the hardware of the device).
Tensor layout should be NHWC because HTMLMediaElements are already of this format and we want to remove the burden on the user to remove transposing.

To be discussed:

Compatibility with underlying APIs / hardware

Explainer document

Web specs are expected to be reviewed by W3C's Technical Architecture Group (TAG), and the best practice is to seek such TAG review earlier rather than later in the spec design process.

An explainer is complementary to the formal spec document, and as a bonus large parts of the explainer document can be eventually repurposed as the formal spec's informative content (e.g. introduction, examples). To distinguish the two, the explainer as a whole should be readable and understandable by people who are not domain experts, while the formal spec is primarily aimed at implementers and assumes a certain level of domain expertise.

I've opened this meta issue to solicit feedback and comments on the WebNN API explainer and pushed an explainer template to the repo to be used as a starting point:

https://github.com/webmachinelearning/webnn/blob/master/explainer.md

It is a simple markdown file so making contributions should be straightforward. The explainer welcomes PRs from all group participants.

Debug pr-preview issues

From #76:

@anssiko Is there anyone who can help us getting the pr-preview bot back in working state? It's more convenient for the reviewers to have the bot update the preview automatically when a new commit arrives.

@wchao1115 Markdown is not supported by pr-preview currently.

That said, I think I fixed pr-preview for bikeshed (.bs) by reinstalling the GitHub pr-preview integration (aka GitHub App), here's an example of it working: #77

FTR: In an attempt to fix it, I first tried updating .pr-preview.json to its latest version in the webnn repo but that caused the whole webmachinelearning GH org to throw Internal Server Error 500 octocat so I quickly reverted to the old config, uninstalled, and reinstalled the integration and things seem to be working now. Fingers crossed.

FYI @tobie just in case someone else sees similar behavior.

@wchao1115 @huningxin If the webmachinelearning repos start to misbehave and throw 500 errors and I'm not around, also @huningxin can uninstall the integration via https://github.com/organizations/webmachinelearning/settings/installations to fix the immediate issue.

Edit: Let's consolidate all webnn repo pr-preview issues here so we keep track, and when the root cause has been identified open an issue upstream.

Handling unsupported OperandType

Current spec defines OperandType enum

enum OperandType {
  "float32",
  "int32",
  "uint32",
  "tensor-float32",
  "tensor-int32",
  "tensor-quant8-asymm"
};

The float16 and tensor-float16 are being added #35 .

However, as mentioned by @wchao1115 in #26 (comment), there are situations that the selected device doesn't have native support of an OperandType. For example, some CPUs may not support tensor-float16, some GPUs may not support tensor-quant8-asymm and some AI accelerators may not support tensor-float32. To allow the app gracefully handle these situations, e.g. select a different device, or use different model with supported operand type, the API should report the unsupported OperandType error.

Open this issue to explore the definition of unsupported OperandType error and the API behavior to return that error.

Thoughts?

How to handle backwards compatibility for operation definitions

In issue #17 the issue of how we'll be handling versioning was brought up. ONNX currently has a way in which to handle versioning but the web rarely has versioning of APIs (they normally are frozen forever), that doesn't mean it isn't possible but something we should investigate and come up with a solution for since we know there are reasons for these operations to evolve.

TAG review

We agreed to request a W3C TAG Spec Review for the WebNN API.

Before opening the review request, I'll nudge @cynthia to make sure we meet the review readiness expectations. Notably, we do not yet have a comprehensive explainer for this spec.

Self-Review Questionnaire: Security and Privacy

In preparation for the TAG review #89, it is recommended we complete the Self-Review Questionnaire: Security and Privacy for the WebNN API.

Let's use this issue to document the responses to the following questions 2.1-2.17. Please find more context regarding these questions from the self-review document itself.

2.1 What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?

This feature exposes the navigator.ml.getNeuralNetworkContext() factory that encapsulates the rest of the API surface used to create, compile, and run machine learning networks. The API allows web apps to make use of hardware acceleration for neural network inference.

2.2 Is this specification exposing the minimum amount of information necessary to power the feature?

The API exposes the minimum amount of information necessary to address the identified use cases for the best performance and reliability of results.

2.3 How does this specification deal with personal information or personally-identifiable information or information derived thereof?

No personal information is exposed.

2.4 How does this specification deal with sensitive information?

No sensitive information is exposed.

2.5 Does this specification introduce new state for an origin that persists across browsing sessions?

No.

2.6 What information from the underlying platform, e.g. configuration data, is exposed by this specification to an origin?

No information from the underlying platform is exposed directly. An execution time analysis may reveal indirectly the performance of the underlying platform's neural network hardware acceleration capabilities relative to another underlying platform.

2.7 Does this specification allow an origin access to sensors on a user’s device

No.

2.8 What data does this specification expose to an origin? Please also document what data is identical to data exposed by other features, in the same or different contexts.

The API adheres to the same-origin policy.

2.9 Does this specification enable new script execution/loading mechanisms?

No.

2.10 Does this specification allow an origin to access other devices?

This specification enables access to the underlying hardware used to acceleration neural network inference.

2.11 Does this specification allow an origin some measure of control over a user agent’s native UI?

No.

2.12 What temporary identifiers might this this specification create or expose to the web?

No temporary identifiers are exposed.

2.13 How does this specification distinguish between behavior in first-party and third-party contexts?

At the moment, the feature does not distinguish between first-party and third-party contexts. Since the feature gives developers access to hardware accelerated features of the device, we could make it be a policy controlled feature similar to WebXR and its xr-spatial-tracking feature identifier.

2.14 How does this specification work in the context of a user agent’s Private Browsing or "incognito" mode?

The feature works the same regardless of whether in-private browsing or incognito mode is active.

2.15 Does this specification have a "Security Considerations" and "Privacy Considerations" section?

Work-in-progress at #122

2.16 Does this specification allow downgrading default security characteristics?

No.

2.17 What should this questionnaire have asked?

It asked good questions, in particular, 2.15 was helpful for outlining the concerned section.

Model Execution API

There are couple questions on the existing execution API:

The current model execution API requires users to provide output buffers before execution, this is not very convenient since this is an extra step for the user and user might not know the shape of the output before hand. Also, for many model this output shape is based on the input shape, it is an extra burden for users to find that out.
The current execution is build on the compilation of the full graph, while the execution API does not prevent users to execution the sub-graph of the model, it is not clear why the pre-compilation is needed and should it be internal of the execution, so it can take care of sub-graph execution.

Packing operations for gemm / matmul

Most GEMM implementations have a packed representation. Some support packing in advance, like MKL does for float32 and oneDNN does for int8 if you know where to look while not officially supporting it. This is particularly useful for inference where usually one of the parameters is a constant. So a common use case is downloading parameters in some canonical format (row major or whatever), packing it ideally in-place (and throwing away the canonical form to save RAM), then passing the packed format to GEMM. The packed format is opaque and varies by hardware due to SIMD lengths etc. Throwing away the canonical form is key because otherwise it effectively doubles RAM requirements for the model.

In theory a compiler with whole-program knowledge could pack in advance and throw away the canonical form. But in practice it's theoretically possible some new code will read the canonical values. So packing is usually done explicitly by the user. Or there would need to be a slow path that unpacks the values to emulate a canonical read.

Would you consider adding optional packing operators for a and b?

(Moved from w3c/machine-learning-workshop#85)

Float16 type support

The current spec defines the tensor types "float32", "int32" and "8 bits quantized".

We should consider "float16" for a few reasons:

Certain hardware are significantly slower handling float32 compared to float16. This includes but is not limited to GPUs.
Storing the intermediate values in float16 can be beneficial to reduce memory footprint and memory bandwidth. This is true even if all the operations are done on float32 in the FPUs.
Supporting constant tensors in float16 would allow clients to half the size of the model's parameters. This can reduce models size in memory and reduce network bandwidth when loading the page.

Implementations without native float16 support should be allowed to use float32 internally. This would help maximizing performance on float32 hardware while keeping things simple for the authors.

Security and privacy considerations

Per https://www.w3.org/TR/security-privacy-questionnaire/#considerations we should add a Security and privacy considerations section to the spec.

high level, low level - confusing naming

As it is separate topic I've made this issue as a continuation to this post: #3 (comment)

@anssiko wrote:

Re low-level and high-level, I observe much of the confusion arises from the inconsistent use of these adjectives in different contexts. Low-level & high-level APIs and low-level & high-level use cases do not map and that causes us talking past each other. We need to add definitions of there terms to the spec, or come up with better names.

Totally agree, easy to mix these terms, this should be more self describing in my opinion, more-less like this:

Predefined ML models:

predefined models (or as @anssiko mentioned "pre-canned models")
predefined models API

Generic ML:

ML API - top level ML api for NN preparing, modifying, running, training, data preparing
ML operators API - operators specific api
operators list
graph format

And each of above will have naturally its own use cases.

High level vs low level

Hey everyone,

While reviewing this PR I had an issue but it's a horizontal issue that should be discussed outside of that PR, maybe a good agend item for our first telecon @anssiko. I have had very few folks desire a low level API for this based on the use cases I've been able to procure.

Additionally, I'm not sure that some of the items that are denoted as low-level in the linked commit I would define as low-level, rather that they are not using the pre-trained models exactly as is. In the discussions that I've had with many folks across Microsoft, the majority of client side use cases that people are looking at for production scenarios would not require a low level API. Effectively this comes back to the issue of who the customer is, a library author or web developer solving the problem. There are pros and cons to each approach we take and I don't know if we need to have a fundamentals line in the sand but I do think I'm seeing a larger desire for higher level APIs than a lower level one. Thoughts?

Evaluate noise suppression models for required ops

Noise suppression use case was adopted by WebNN API in #61 (thanks @ibelem!):

https://webmachinelearning.github.io/webnn/#usecase-noise-suppression

This issue is to discuss the next steps:

Evaluate RNNoise model for required ops, to be added to the first-wave models
Gauge PoC interest, possible collaboration with related groups e.g. WebRTC
Sharing resources, e.g. RNNoise paper and demo

Standardization process status

Guys, wasn't sure where to post this issue so posting it here, but somebody please help me because I feel little lost:

where are we with standardization process?
what is left to do?
what decisions are to be made?

(Computer) network-related use cases

The newly launched Web & Networks Interest Group is looking for network-related use case input from WebML CG. In the context of the IG, "networks" means computer networks, not to be confused with neural networks :-) This is how the IG introduces itself:

The mission of the Web & Networks Interest Group is to explore solutions for web applications to leverage network capabilities in order to achieve better performance and resources allocation, both on the device and network.

The IG has identified the following use case for this group that could benefit from exposure of more advanced (computer) network capabilities to web apps:

The Web & Networks IG should coordinate with [WebML CG] to explore on how to load balance computing between cloud-based and client-side machine learning using network hints including bandwidth and latency, radio power consumption, and available computing power and battery on the client.

Let's use this issue to discuss and solicit feedback on use cases and requirements on network characteristics (bandwidth, latency, others) beneficial for implementing logic for switching between in-browser and cloud-based inference at runtime. For example, what network hints are needed to estimate when fetching a pre-trained model from the server yields too big of a startup cost? An example API that might get extended as an outcome is the Network Information API.

By seeding use case input to W&N IG, we can help make sure any future work on network information APIs consider requirements from the ML domain. I'm happy to package this group's input from discussion in this issue and submit it to the W&N IG. I think they're expecting us to provide feedback latest by TPAC timeframe mid-September.

FYI @sudeepdi @dontcallmedom

Chained API for the Operands

@pyu10055 proposed this idea in #94 (comment)

ideally it would be great to allow chained api for the Operands.
builder.constant(...).add(builder.constant(...))

[investigation] buffer sharing between GPU and ML accelerator

For WebNN interoperability for custom op support, so far, we have done the investigation and report out for WebNN-WASM interop and WebNN-WebGPU interop.

According to the WebNN interop investigation next steps discussion in WebML CG call on 3 Oct, the participants were interested in the buffer sharing between GPU and ML accelerator. Opening this issue to capture the requirement as well as share the status and data.

The idea is that WebNN allows to run expensive ops (e.g. conv2d) on ML accelerator and share buffer to WebGPU compute shader to run custom ops (e.g. add/relu). It can be illustrated by following code sample.

// Create a WebNN model contains conv2d
const model = await createWebNNConv(filterValue, noBias, noRelu);
const compilation = await model.createCompilation();
// Let WebNN compilation for the ML accelerator
compilation.setPreference(nn.LOW_POWER);
await compilation.finish();
const execution = await compilation.createExecution();
// input, output, bias are tf.tensor
// Get underlying WebGPUBuffer
const inputBuffer = tf.backend().getBuffer(input.dataId);
const outputBuffer = tf.backend().getBuffer(output.dataId);
// Set WebGPUBuffer as input and output to WebNN execution
execution.setInputGPUBuffer(0, inputBuffer);
execution.setOutputGPUBuffer(0, outputBuffer);
// Execute the WebNN ops on ML accelerator
execution.startCompute();
// Execute the WebGPU ops on GPU
let addOutput = tf.add(output, bias);
let reluOutput = tf.relu(addOutput);
// Read back result from GPU
let result = await reluOutput.data();

Per recommendation from @walrusmcd (thanks!), the investigation will initially target the AI on the PC Devkit. This device has both GPU and VPU (as an example of ML accelerator) that are supported D3D12 and DirectML API. The Chromium WebNN POC will be enhanced to support above scenario.

There are some dependencies need to be work on:

Rebase WebNN POC to the version that WebGPU compute shader works on D3D12
Get WebNN/DirectML backend work on VPU
Get WebGPU-WebNN interop work on D3D12/DML for GPU
Get WebGPU/D3D12/GPU and WebNN/DML/VPU interop work

Currently, we have done the rebase and get basic VPU work in WebNN/DML backend. We'll update here once we make progress on the WebGPU-WebNN interop on D3D12/DML.

All, please kindly let me know whether I miss anything.

Look into pre-canned models

At F2F we agreed to look into pre-canned (built-in platform-provided) models. See https://www.w3.org/2018/10/26-webmachinelearning-minutes.html#x03 for related discussion.

The group seemed to agree that support for built-in models is a v2 feature, and in v1 the API would support custom pre-trained models fetched from the server.

Tagging @gregwhitworth @cynthia @mmccool @huningxin who took part in this discussion.

I suggest we use this issue to solicit further input while making sure the v1 API provides extension points to allow support for pre-canned models in v2.

Revisit inference API to load and run a model

Can we revisit the idea of an API to load and run a model?

I've written up a draft explainer for what a Web ML Inference API might look like, as well as a bunch of issues and questions around it.

The idea was discussed way back in issue 3, and probably even earlier by many people in the group. For various reasons, the group decided to pursue a graph API instead.

Why revisit now? The TensorFlow team raised some concerns that a graph API may not be the right level of abstraction, due to how fast ML is evolving, and their experience with the rapid growth in operations in TensorFlow models. After digging a bit to understand where this caution came from, I learned about the efforts around MLIR, and how the TensorFlow team sees that fitting into the picture. Also, I had a chance to talk with Greg and others at Microsoft about the original reasons not to go with an inference API, and it seems like things may have changed.

If this is an interesting enough topic to people, we could consider talking about it during the face-to-face in Berlin.

Graph format

As I find it a separate topic to #3, would like to continue conversations about graph format itself (not operators and not api) here.

So far we have 3 major discussions:

If to include any graph format support
If it should be ONNX format
If it should be JSON format

For the reasons I gave in these (and other) posts:

I opt for option 3 - JSON format.

Simple example with one hidden layer, 8 inputs and 9 outputs:

{
  activation: "tanh",
  layers: [8, 14, 9]
}

Same NN as above, but with weights data provided:

{
  activation: "tanh",
  layers: [8, 14, 9],
  setupData: 'ISEhIEhlbGxvIHdlYiBNTCAhISE='
}

Some "dummy" advanced example for JSON format:

{
  domain: 'custom-domain',
  dataType: 'fp32',

  layers: [
    {
      name: "input",
      pipes: [
        {
          name: "A",
          size: 30,
          to: ['addIfAboveInputA'] 
        },
        {
          name: "B",
          size: 30
          to: ['addIfAboveInputB'] 
        }
      ]
    },
    {
      type: 'addIfAbove',
      a: 'addIfAboveInputA',
      b: 'addIfAboveInputB',
      value: 0
    },
    {
      type: 'onlyLowerThan',
      value: 50
    },
    {
      type: 'neurons',
      activation: 'tanh',
      size: 30
    },
    30
  ]

}

WebNN polyfill and samples

As we are iterating the API design and adding more examples, a polyfill with run-able sample code would be helpful.

For example, in PR #80 review, the running version of the examples based on a simple polyfill got good feedbacks from reviewers.

So I propose to create two repos that host the development of polyfill and sample code.

polyfill: https://github.com/webmachinelearning/webnn-polyfill
samples: https://github.com/webmachinelearning/webnn-samples

This practice is also adopted by other groups, for example W3C Immersive Web develops webxr-polyfill and webxr-samples.

Web Neural Network API as-is in WASI?

There's an interesting proposal over at WebAssembly System Interface repo WebAssembly/WASI#59 we should discuss.

Quoting OP:

It might sound too early to be talking about such fancy high level APIs but I think a WASI runtime with hardware accelerated neural networks and related algorithms would have a huge immediate market since it's such a hot topic. Probably this is another of those times when copying the web or following it closely is a good thing, the Web Neural Network API(examples) is being defined to provide this APIs to JS developers but might work even better with WASI.

Follow-ups over at the WASI repo or here. Consider this an early exploration. Currently this CG is not chartered to consider WASI but that could change in the future given adequate support for such a direction.

Cc @huningxin and @zolkis

More variations of supported convolution type needed.

The way we define conv2d today is sufficient for a typical usage of convolution. However, there are a couple variants of the convolution operation we should consider support.

Grouped convolution, used in AlexNet. A new groupCount param is needed, where the filter tensor shape becomes [out_channels / group_count, group_count, in_channels / group_count, H,W] when group_count > 1.
Tranposed convolution, used in autoencoder or models generating high resolution image e.g. skeleton tracking, etc. Also known as "backward" convolution, used to compute convolution gradient during model training. API needs an extra enum param to support it.

Note that depthwise convolution, one used by MobileNet is implemented today as 2 passes convolution with the first depthwise pass done with the groupCount set to the number of input channels, while the second pointwise pass is simply a convolution with 1x1 filter kernel size.

graph-building syntax simpler for web developers

I think it is useful to make the (graph-building) syntax simpler for the user. In fact, I think we can make it look almost exactly like an API for executing the ops directly (and the "eager" API), at least for the non-control-flow ops, as explained below.

(1) In the proposal, operands are represented by integers, implicitly associated with the order in which "addOperand" is called. It would be cleaner to have "addOperand" return an object/value that represents the operand. So, this should allow us to replace

let tensor1 = operandIndex++;
model.addOperand(float32TensorType);

let tensor1 = model.addOperand(float32TensorType);

(2) Instead of separately creating operands to represent the outputs of an op and separately adding the op (connecting the inputs and outputs up), it is better to have addOperation create the operands representing the outputs and returning them. This would allow us to replace

// intermediateOutput0 is the output of the first ADD operation.
let intermediateOutput0 = operandIndex++;
model.addOperand(float32TensorType);
// Add the first ADD operation.
model.addOperation(nn.ADD, [tensor0, tensor1, fusedActivationFuncNone], [intermediateOutput0]);

// Add the first ADD operation.
// intermediateOutput0 is the output of the first ADD operation.
let intermediateOutput0 = model.addOperation(nn.ADD, [tensor0, tensor1, fusedActivationFuncNone]);

I omitted the type float32TensorType since it can be inferred from the operands. But where necessary, we can add the type as an extra parameter. This approach can still support operations that return multiple outputs.

(3) Furthermore, instead of representing operations by a constant like nn.ADD, it seems better to encapsulate them as a method ADD, allowing us to simplify

let intermediateOutput0 = model.addOperation(nn.ADD, [tensor0, tensor1, fusedActivationFuncNone]);

let intermediateOutput0 = model.ADD([tensor0, tensor1, fusedActivationFuncNone]);

(4) Similarly, for constants, instead of separately creating an operand and then setting its value, we can use a single method to create a constant that does both. This would allow us to simplify:

// Add the operand for the NONE activation function, and set its value to FUSED_NONE.
let fusedActivationFuncNone = operandIndex++;
model.addOperand(scalarInt32Type);
model.setOperandValue(fusedActivationFuncNone, newInt32Array([nn.FUSED_NONE]));

to:

// Add the operand for the NONE activation function, and set its value to FUSED_NONE.
let fusedActivationFuncNone = model.Constant(scalarInt32Type, newInt32Array([nn.FUSED_NONE]));

(As discussed earlier, the type can be omitted if it can be inferred.)

Originally posted by @gramalingam in #15 (comment)

Add WebNN prototype POC Intel as foundation spec to evolve

Based on today's call the following resolution was made:

RESOLVED: Evolve WebNN API specification using https://github.com/intel/webml-polyfill/blob/master/docs/api.md as foundation specification

@huningxin - please begin the process of moving this proposal over to spec text so that issues can be filed against it.

Thank you.

[Edited by @anssiko: fix broken link]

Survey graph-building APIs from native ecosystem

Per resolution on the 9 May 2019 CG call, this issue is for surveying graph-building APIs from native ecosystem which aims to support the discussion in #16. The current foundation spec is direct derivative from Android NNAPI which is a C style API, during the CG call, the participants agreed to survey other graph-building APIs in native ecosystem to learn API design patterns.

There were three APIs mentioned in the CG call.

@walrusmcd mentioned "would love to contribute our learnings from two Microsoft's graph-building APIs". Feel free to add them into the list. Thanks.

Specify the ModelBuilder.createModel and other ModelBuilder members

In webmachinelearning/webnn-polyfill#23 (comment), @pyu10055 mentioned

this is bit confusing, the builder creates models, but the topology is created before the model is created, and how made the topology immutable for the model?

Today's spec lacks the detailed steps of ModelBuilder.createModel that causes the confusion.

Executing operations

Per resolution on the 14 Feb 2019 call, this issue is for discussing requirements for an API for executing operations.

IIRC @dsmilkov volunteered to take the first stab at this issue (thanks!). To frame the discussion, perhaps a good start is to evaluate the requirements through the lens of existing ML frameworks as API consumers. I believe also @huningxin's proof-of-concept might provide useful input.

Custom operations

Starting a thread to open the discussion for supporting custom operations.

The ML field is fast moving and model architectures and operations are evolving quickly. In TensorFlow.js, we have around ~200 ops and we still run into issues of missing ops when someone is trying to port a new model to the browser. I believe that the number of built-in ops will be relatively small and will grow very slowly due to standardization.

Thus, it is important to provide a way for library authors to write custom ops that can interop with the built-in neural net ops. That means having high-performance data exchange between custom ops and built-in ops. I stress the importance of high-performance, otherwise lib authors would revert back to implementing all of the ops using lower-level APIs (e.g. WebGPU).

A good way to start is to understand the scope and complexity of the problem. We can look at technical details on how browser-vendors plan to implement built-in ops, which gives us details about where these ops run and where the data lives.

Suggestion: use an options dictionary for functions with lots of args

A syntax-only question/suggestion. There may be reasons for the current design (disclaimer: I am long C) but wouldn't the JavaScript code look better with shorter signature containing the operands and the rest would be named in a dictionary?

For instance, now there is

partial interface NeuralNetworkContext {
  Operand gemm(Operand a, Operand b, optional Operand c, 
               optional float alpha = 1.0, optional float beta = 1.0, 
               optional boolean aTranspose = false, optional boolean bTranspose = false);
};

and it could be also written (following the pattern used in many other web APIs) as

partial interface NeuralNetworkContext {
  Operand gemm(Operand a, Operand b, optional Operand c, optional GemmOptions options);
};

dictionary GemmOptions {
   float alpha = 1.0;
   float beta = 1.0; 
   boolean aTranspose = false; 
   boolean bTranspose = false;
};

while the code using defaults would stay the same, and code using non-defaults would look more clear:

let r = nn.gemm(a, b, c);

let q = nn.gemm(a, b, c, { alpha: 1.1, beta: 1.1, aTranspose:  true });

as opposed to

let q = nn.gemm(a, b, c, 1.1, 1.1, true);

Maybe not so big improvement for gemm, but bigger opportunities for the likes of gru ops, improving clarity of the function semantics and code readability as well.

I am aware this doesn't add much value to counter the arguments for the original design, if there is more to that than to make it look like a C API :).

Define the set of operations and their specification

As raised in CG meeting, the first foundation spec only lists 32 operation types without information about how to use them.

We need to define the set of operations and their specification/semantics.

The set of operations could be derived from the use cases and corresponding models. For WebNN POC, there is a spreadsheet that lists supported models and their required operations. It can be used as a starting point.

By following the spirit of WebML CG charter, the specification will be implementable on top of existing major platform APIs, such as Android NNAPI, Windows DirectML, and macOS/iOS MPS/BNNS. So when specifying operations, the platform APIs mapping/support need to be looked into. For WebNN POC, there is another spreadsheet the captures the native API mapping of supported operations. It can also be leveraged.

We can file individual issue for each operation specification and use this one as the meta issue.

Add QuantizeLinear and DequantizeLinear for mixed precision

The current proposal has support for quantized types like tensor-quant8-asymm and some operators support them. Many networks run in mixed precision i.e. quantized output matrix multiply followed by logsoftmax in float32.

Propose adding https://github.com/onnx/onnx/blob/master/docs/Operators.md#DequantizeLinear and https://github.com/onnx/onnx/blob/master/docs/Operators.md#QuantizeLinear to make the quantized operators actually usable for many models.

Support the execution of the sub-graph scenario

Opening this issue to follow up the discussion of #94 (comment).

@pyu10055 mentioned:

With this API, the computation is tied with the compilation. should the compile method have the inputs/outputs pair and execution can only execution the graph compiled with the input/output pair? Otherwise, the compilation might not support the execution of the sub-graph scenario?

Actually, the existing builder.createModel allows to specify the outputs. With that, developers could create different sub-graphs (models) from a topology within a builder and compile/compute them individually. For example:

const builder = nn.createModelBuilder();
const a = builder.input('a', descA);
const b = builder.constant(descB, bufferB);
const c = builder.constant(descC, bufferC);
const d = builder.mul(a, b);
const e = builder.add(d, c);
const model1 = builder.createModel({d}); // d = a * b
const model2 = builder.createModel({e}); // e = a* b + c

@pyu10055 , please let us know any gaps of the existing API and share more details of your proposal. Thanks!

webmachinelearning / webnn Goto Github PK

webnn's People

Contributors

Stargazers

Watchers

Forkers

webnn's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs