webmachinelearning / webnn Goto Github PK
View Code? Open in Web Editor NEW๐ง Web Neural Network API
Home Page: https://www.w3.org/TR/webnn/
License: Other
๐ง Web Neural Network API
Home Page: https://www.w3.org/TR/webnn/
License: Other
[This issue was originally posted at https://github.com/w3c/machine-learning-workshop/issues/86 ]
@kpu wrote:
The spec says gemm returns "an Operand" (and the same thing for matmul).
If both arguments are tensor-quant8-asymm, what is the OperandType of the return? I can see use cases for tensor-int32 which is how it will actually be generated by existing hardware, tensor-quant8-asymm for a fully quantized model, or even tensor-float32 for people that have only partly quantized their model.
This matters because the spec doesn't appear to have e.g. a requantization operator to convert int32 to int8 and anyway one would need the ability to set the scaling factor used by running the model in advance to measure an appropriate scaling factor.
The developers usually use some kinds of machine learning authoring tools, such as TensorFlow, PyTorch and others, to create and train the models. It would be helpful that the explainer could explain the workflow that how the developers can deploy and infer the trained models by webnn in web browsers. This would cover:
This might not a complete list. Please feel free to add anything I missed.
Apropos of #3 and webmachinelearning/webmachinelearning-ethics#22, an efficient matmul implementation can be fingerprinted to determine hardware capabilities.
On pre-VNNI Intel, the only efficient way to implement 8-bit multiplication is via pmaddubsw
that produces a 16-bit result summed horizontally with saturation. I can construct matrices that test for this saturation, which indicates a pre-VNNI Intel CPU. Whereas ARM and NVidia implement signed * signed to 32-bit.
Saturating addition, which should be used for accuracy lest you generate large sign errors, can be used to infer the order of operations. So vpdpbusds
saturation tells me what order the matmul
ran in.
The slowdown from using AVX512 instructions is likely detectable with timing.
In floats one can also infer order of operations from rounding. This would reveal the SIMD length and possibly variations in the compiler used to build the user agent. A cache-efficient matmul implementation reveals cache sizes via floating point order of operations.
In OperandDescriptor, the size of each axis is defined in "dimensions" as "long".
The limits of the "long" type do not match the length definition of TypedArray.
For consistency and to avoid bugs caused by conversion, it would be better to accept a Number in "dimensions" and use ToLength() to obtain the size.
WebNN API would need exception handlers which report appropriate error status for an API (other than the default JavaScript runtime exceptions).
These exceptions would be used by developers to catch specific states of error in WebNN API.
From the WebNN conv2d
spec, the same layout
parameter controls both the input and filter layout (below). In TensorFlow (and presumably TFLite), regardless of the input layout, the filter layout remains in the "channel_last" format i.e. [height, width, input_channels/groups, output_channels]
.
"nchw":
input tensor: [batches, input_channels, height, width]
filter tensor: [output_channels, input_channels/groups, height, width]
output tensor: [batches, output_channels, height, width]
"nhwc":
input tensor: [batches, height, width, input_channels]
filter tensor: [height, width, input_channels/groups, output_channels]
output tensor: [batches, height, width, output_channels]
One such models is NSNet2, which requires shape
and constant-of-shape
operation. Additionally, static shape inference should also be supported as it's a majority case to many models.
Spun off from #17 and as agreed on 22 Aug 2019 CG call, this issue is to discuss and investigate backwards compatibility of the initial set of ops.
As mentioned in WebML CG call on 28 May, this proposal is based on a conversation with @wchao1115 and @RafaelCintron.
The motivations are:
The proposal includes:
There are two examples to showcase the difference between the existing API and the new proposal.
Both examples build and execute a graph with topology as:
The example code by using existing spec:
// ISSUE: there is not a chance to declare the keys of graph inputs.
const a = nn.input({type: 'tensor-float32', dimensions: [3, 4]});
const b = nn.input({type: 'tensor-float32', dimensions: [4, 3]});
const c = nn.matmul(a, b);
// ISSUE: the key of graph output is implied by the order of the elements in sequence.
const model = await nn.createModel([c]);
// Skip the code of creating compilation and execution
// ISSUE: It is obscure to tell which input/output has index 0 or 1.
execution.setInput(0, bufferA);
execution.setInput(1, bufferB);
execution.setOutput(0, bufferC);
await execution.startCompute();
The example code by using the new proposal:
// SOLUTION: explicitly declare the key of graph inputs by string.
const a = nn.input('a', {type: 'tensor-float32', dimensions: [3, 4]});
const b = nn.input('b', {type: 'tensor-float32', dimensions: [4, 3]});
const c = nn.matmul(a, b);
// SOLUTION: explicitly declare the key of graph outputs by string.
const model = await nn.createModel([{name: 'c', operand: c}]);
// Skip the code of creating compilation and execution
// SOLUTION: set the input and output buffers by string keys explicitly.
execution.setInput('a', bufferA);
execution.setInput('b', bufferB);
execution.setOutput('c', bufferC);
await execution.startCompute();
This issue will track op compatibility resolution for matMul.
Signature:
matmul(a, b)
Arguments:
a: n-dim tensor
b: n-dim tensor
Docstring:
If both a and b are 2-D they are multiplied like conventional matrices. If one of a or b are 1-D this is treated as a matrix times vector dot product.
If either argument is N dimensional, N>2, it is treated as a stack of matrices (rank-3) with dimensions corresponding to the inner two indices. The matrix multiplication will be broadcasted accordingly.
Example:
If a has shape [2, 3, 4, 5] and b has shape [5, 4], the resulting tensor will have a shape of [2, 3, 4, 4] as a is treated as a size 2 * 3 = 6 stack of [4, 5] matrices. These get broadcast multiplied over the [5, 4] matrix creating 6 [4, 4] matrices. They keep the original shape of a's outer dimensions, resulting in a shape of [2, 3, 4, 4].
Notes:
To be discussed:
We should add a more advanced example to the spec that makes use of (some of) the new ops we've added to the spec recently. We should keep a simple "Hello World" style example and in addition add another more advanced one within a reasonable LOC limit.
The examples in specs are often a starting point for web developers who try out a new API for the first time, so we need to make sure the examples are maintained along the spec definition.
As a use case about style transfer, source is https://intel.github.io/webml-polyfill/examples/style_transfer/?prefer=none&b=WebGL&m=fast_style_transfer_onnx&s=image&d=0&f=WebNN,
Some ops style transfer used is not be supported in current WebML API spec.
Please add following ops to support list. thanks.
mirrorpad, SquaredDifference, Pow, TransposeConv for tflite format models.
mirrorpad, Pow, ConvTranspose for onnx format models.
This is a followup to Issue 41.
Now that the group has decided that an inference API (also known as Load/ Run Model API) is within the charter, let's define the steps to move forward with this.
I've talked with some web standards experts, and will update this thread with concrete steps.
As thoroughly discussed in this PR as related to the failing state of Compilation.createExecution
, both the Compilation.finish
and Compilation.setPreference
method are redundant to what createExecution
can provide. The removal of setPreference
can be remedied with a param passing to createExecution
as @RafaelCintron pointed out in the thread.
Per resolution on the 14 Feb 2019 call, this issue is for discussing requirements for an API for executing models.
@gregwhitworth, would you be in a position to help kick off discussion in this issue? I believe learnings from ONNX.js would be good input.
Keep OperandDescriptor scoped and versioning friendly by removing quantization specific params e.g. scale and zeropoint, while keeping OperandType enum values as a straight data type enum. Quantization-specific params could be made arguments of a new Operand making overload.
This issue will track op compatibility resolution for conv2d.
Signature:
conv2d(x, filter, padding, strides, dilations)
Arguments:
x: 4-dim tensor with logical shape [N, H, W, in_channels]
filter: 4-dim tensor with logical shape [H, W, in_channels, out_channels]
padding: array of ints. Padding for the beginning and ending along each spatial axis.
strides: array of ints that have length 1, 2 or 4. The stride of the sliding window for each dimension of input.
dilations: array of ints of length 2: [dil_height, dil_width] The dilation factor for each spatial dimension of the input.
Docstring:
Computes a 2-D convolution given 4-D input and filter tensors.
Notes:
To be discussed:
Web specs are expected to be reviewed by W3C's Technical Architecture Group (TAG), and the best practice is to seek such TAG review earlier rather than later in the spec design process.
An explainer is complementary to the formal spec document, and as a bonus large parts of the explainer document can be eventually repurposed as the formal spec's informative content (e.g. introduction, examples). To distinguish the two, the explainer as a whole should be readable and understandable by people who are not domain experts, while the formal spec is primarily aimed at implementers and assumes a certain level of domain expertise.
I've opened this meta issue to solicit feedback and comments on the WebNN API explainer and pushed an explainer template to the repo to be used as a starting point:
https://github.com/webmachinelearning/webnn/blob/master/explainer.md
It is a simple markdown file so making contributions should be straightforward. The explainer welcomes PRs from all group participants.
From #76:
@anssiko Is there anyone who can help us getting the pr-preview bot back in working state? It's more convenient for the reviewers to have the bot update the preview automatically when a new commit arrives.
@wchao1115 Markdown is not supported by pr-preview currently.
That said, I think I fixed pr-preview for bikeshed (.bs) by reinstalling the GitHub pr-preview integration (aka GitHub App), here's an example of it working: #77
FTR: In an attempt to fix it, I first tried updating .pr-preview.json to its latest version in the webnn repo but that caused the whole webmachinelearning GH org to throw Internal Server Error 500 octocat so I quickly reverted to the old config, uninstalled, and reinstalled the integration and things seem to be working now. Fingers crossed.
FYI @tobie just in case someone else sees similar behavior.
@wchao1115 @huningxin If the webmachinelearning repos start to misbehave and throw 500 errors and I'm not around, also @huningxin can uninstall the integration via https://github.com/organizations/webmachinelearning/settings/installations to fix the immediate issue.
Edit: Let's consolidate all webnn repo pr-preview issues here so we keep track, and when the root cause has been identified open an issue upstream.
Current spec defines OperandType
enum
enum OperandType {
"float32",
"int32",
"uint32",
"tensor-float32",
"tensor-int32",
"tensor-quant8-asymm"
};
The float16
and tensor-float16
are being added #35 .
However, as mentioned by @wchao1115 in #26 (comment), there are situations that the selected device doesn't have native support of an OperandType
. For example, some CPUs may not support tensor-float16
, some GPUs may not support tensor-quant8-asymm
and some AI accelerators may not support tensor-float32
. To allow the app gracefully handle these situations, e.g. select a different device, or use different model with supported operand type, the API should report the unsupported OperandType
error.
Open this issue to explore the definition of unsupported OperandType error and the API behavior to return that error.
Thoughts?
In issue #17 the issue of how we'll be handling versioning was brought up. ONNX currently has a way in which to handle versioning but the web rarely has versioning of APIs (they normally are frozen forever), that doesn't mean it isn't possible but something we should investigate and come up with a solution for since we know there are reasons for these operations to evolve.
We agreed to request a W3C TAG Spec Review for the WebNN API.
Before opening the review request, I'll nudge @cynthia to make sure we meet the review readiness expectations. Notably, we do not yet have a comprehensive explainer for this spec.
In preparation for the TAG review #89, it is recommended we complete the Self-Review Questionnaire: Security and Privacy for the WebNN API.
Let's use this issue to document the responses to the following questions 2.1-2.17. Please find more context regarding these questions from the self-review document itself.
- 2.1 What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?
This feature exposes the navigator.ml.getNeuralNetworkContext()
factory that encapsulates the rest of the API surface used to create, compile, and run machine learning networks. The API allows web apps to make use of hardware acceleration for neural network inference.
- 2.2 Is this specification exposing the minimum amount of information necessary to power the feature?
The API exposes the minimum amount of information necessary to address the identified use cases for the best performance and reliability of results.
- 2.3 How does this specification deal with personal information or personally-identifiable information or information derived thereof?
No personal information is exposed.
- 2.4 How does this specification deal with sensitive information?
No sensitive information is exposed.
- 2.5 Does this specification introduce new state for an origin that persists across browsing sessions?
No.
- 2.6 What information from the underlying platform, e.g. configuration data, is exposed by this specification to an origin?
No information from the underlying platform is exposed directly. An execution time analysis may reveal indirectly the performance of the underlying platform's neural network hardware acceleration capabilities relative to another underlying platform.
- 2.7 Does this specification allow an origin access to sensors on a userโs device
No.
- 2.8 What data does this specification expose to an origin? Please also document what data is identical to data exposed by other features, in the same or different contexts.
The API adheres to the same-origin policy.
- 2.9 Does this specification enable new script execution/loading mechanisms?
No.
- 2.10 Does this specification allow an origin to access other devices?
This specification enables access to the underlying hardware used to acceleration neural network inference.
- 2.11 Does this specification allow an origin some measure of control over a user agentโs native UI?
No.
- 2.12 What temporary identifiers might this this specification create or expose to the web?
No temporary identifiers are exposed.
- 2.13 How does this specification distinguish between behavior in first-party and third-party contexts?
At the moment, the feature does not distinguish between first-party and third-party contexts. Since the feature gives developers access to hardware accelerated features of the device, we could make it be a policy controlled feature similar to WebXR and its xr-spatial-tracking
feature identifier.
- 2.14 How does this specification work in the context of a user agentโs Private Browsing or "incognito" mode?
The feature works the same regardless of whether in-private browsing or incognito mode is active.
- 2.15 Does this specification have a "Security Considerations" and "Privacy Considerations" section?
Work-in-progress at #122
- 2.16 Does this specification allow downgrading default security characteristics?
No.
- 2.17 What should this questionnaire have asked?
It asked good questions, in particular, 2.15 was helpful for outlining the concerned section.
There are couple questions on the existing execution API:
The current model execution API requires users to provide output buffers before execution, this is not very convenient since this is an extra step for the user and user might not know the shape of the output before hand. Also, for many model this output shape is based on the input shape, it is an extra burden for users to find that out.
The current execution is build on the compilation of the full graph, while the execution API does not prevent users to execution the sub-graph of the model, it is not clear why the pre-compilation is needed and should it be internal of the execution, so it can take care of sub-graph execution.
Most GEMM implementations have a packed representation. Some support packing in advance, like MKL does for float32 and oneDNN does for int8 if you know where to look while not officially supporting it. This is particularly useful for inference where usually one of the parameters is a constant. So a common use case is downloading parameters in some canonical format (row major or whatever), packing it ideally in-place (and throwing away the canonical form to save RAM), then passing the packed format to GEMM. The packed format is opaque and varies by hardware due to SIMD lengths etc. Throwing away the canonical form is key because otherwise it effectively doubles RAM requirements for the model.
In theory a compiler with whole-program knowledge could pack in advance and throw away the canonical form. But in practice it's theoretically possible some new code will read the canonical values. So packing is usually done explicitly by the user. Or there would need to be a slow path that unpacks the values to emulate a canonical read.
Would you consider adding optional packing operators for a and b?
(Moved from w3c/machine-learning-workshop#85)
The current spec defines the tensor types "float32", "int32" and "8 bits quantized".
We should consider "float16" for a few reasons:
Implementations without native float16 support should be allowed to use float32 internally. This would help maximizing performance on float32 hardware while keeping things simple for the authors.
Per https://www.w3.org/TR/security-privacy-questionnaire/#considerations we should add a Security and privacy considerations section to the spec.
As it is separate topic I've made this issue as a continuation to this post: #3 (comment)
@anssiko wrote:
Re low-level and high-level, I observe much of the confusion arises from the inconsistent use of these adjectives in different contexts. Low-level & high-level APIs and low-level & high-level use cases do not map and that causes us talking past each other. We need to add definitions of there terms to the spec, or come up with better names.
Totally agree, easy to mix these terms, this should be more self describing in my opinion, more-less like this:
Predefined ML models:
Generic ML:
And each of above will have naturally its own use cases.
Hey everyone,
While reviewing this PR I had an issue but it's a horizontal issue that should be discussed outside of that PR, maybe a good agend item for our first telecon @anssiko. I have had very few folks desire a low level API for this based on the use cases I've been able to procure.
Additionally, I'm not sure that some of the items that are denoted as low-level in the linked commit I would define as low-level, rather that they are not using the pre-trained models exactly as is. In the discussions that I've had with many folks across Microsoft, the majority of client side use cases that people are looking at for production scenarios would not require a low level API. Effectively this comes back to the issue of who the customer is, a library author or web developer solving the problem. There are pros and cons to each approach we take and I don't know if we need to have a fundamentals line in the sand but I do think I'm seeing a larger desire for higher level APIs than a lower level one. Thoughts?
Noise suppression use case was adopted by WebNN API in #61 (thanks @ibelem!):
https://webmachinelearning.github.io/webnn/#usecase-noise-suppression
This issue is to discuss the next steps:
Guys, wasn't sure where to post this issue so posting it here, but somebody please help me because I feel little lost:
The newly launched Web & Networks Interest Group is looking for network-related use case input from WebML CG. In the context of the IG, "networks" means computer networks, not to be confused with neural networks :-) This is how the IG introduces itself:
The mission of the Web & Networks Interest Group is to explore solutions for web applications to leverage network capabilities in order to achieve better performance and resources allocation, both on the device and network.
The IG has identified the following use case for this group that could benefit from exposure of more advanced (computer) network capabilities to web apps:
The Web & Networks IG should coordinate with [WebML CG] to explore on how to load balance computing between cloud-based and client-side machine learning using network hints including bandwidth and latency, radio power consumption, and available computing power and battery on the client.
Let's use this issue to discuss and solicit feedback on use cases and requirements on network characteristics (bandwidth, latency, others) beneficial for implementing logic for switching between in-browser and cloud-based inference at runtime. For example, what network hints are needed to estimate when fetching a pre-trained model from the server yields too big of a startup cost? An example API that might get extended as an outcome is the Network Information API.
By seeding use case input to W&N IG, we can help make sure any future work on network information APIs consider requirements from the ML domain. I'm happy to package this group's input from discussion in this issue and submit it to the W&N IG. I think they're expecting us to provide feedback latest by TPAC timeframe mid-September.
@pyu10055 proposed this idea in #94 (comment)
ideally it would be great to allow chained api for the Operands.
builder.constant(...).add(builder.constant(...))
For WebNN interoperability for custom op support, so far, we have done the investigation and report out for WebNN-WASM interop and WebNN-WebGPU interop.
According to the WebNN interop investigation next steps discussion in WebML CG call on 3 Oct, the participants were interested in the buffer sharing between GPU and ML accelerator. Opening this issue to capture the requirement as well as share the status and data.
The idea is that WebNN allows to run expensive ops (e.g. conv2d) on ML accelerator and share buffer to WebGPU compute shader to run custom ops (e.g. add/relu). It can be illustrated by following code sample.
// Create a WebNN model contains conv2d
const model = await createWebNNConv(filterValue, noBias, noRelu);
const compilation = await model.createCompilation();
// Let WebNN compilation for the ML accelerator
compilation.setPreference(nn.LOW_POWER);
await compilation.finish();
const execution = await compilation.createExecution();
// input, output, bias are tf.tensor
// Get underlying WebGPUBuffer
const inputBuffer = tf.backend().getBuffer(input.dataId);
const outputBuffer = tf.backend().getBuffer(output.dataId);
// Set WebGPUBuffer as input and output to WebNN execution
execution.setInputGPUBuffer(0, inputBuffer);
execution.setOutputGPUBuffer(0, outputBuffer);
// Execute the WebNN ops on ML accelerator
execution.startCompute();
// Execute the WebGPU ops on GPU
let addOutput = tf.add(output, bias);
let reluOutput = tf.relu(addOutput);
// Read back result from GPU
let result = await reluOutput.data();
Per recommendation from @walrusmcd (thanks!), the investigation will initially target the AI on the PC Devkit. This device has both GPU and VPU (as an example of ML accelerator) that are supported D3D12 and DirectML API. The Chromium WebNN POC will be enhanced to support above scenario.
There are some dependencies need to be work on:
Currently, we have done the rebase and get basic VPU work in WebNN/DML backend. We'll update here once we make progress on the WebGPU-WebNN interop on D3D12/DML.
All, please kindly let me know whether I miss anything.
At F2F we agreed to look into pre-canned (built-in platform-provided) models. See https://www.w3.org/2018/10/26-webmachinelearning-minutes.html#x03 for related discussion.
The group seemed to agree that support for built-in models is a v2 feature, and in v1 the API would support custom pre-trained models fetched from the server.
Tagging @gregwhitworth @cynthia @mmccool @huningxin who took part in this discussion.
I suggest we use this issue to solicit further input while making sure the v1 API provides extension points to allow support for pre-canned models in v2.
Can we revisit the idea of an API to load and run a model?
I've written up a draft explainer for what a Web ML Inference API might look like, as well as a bunch of issues and questions around it.
The idea was discussed way back in issue 3, and probably even earlier by many people in the group. For various reasons, the group decided to pursue a graph API instead.
Why revisit now? The TensorFlow team raised some concerns that a graph API may not be the right level of abstraction, due to how fast ML is evolving, and their experience with the rapid growth in operations in TensorFlow models. After digging a bit to understand where this caution came from, I learned about the efforts around MLIR, and how the TensorFlow team sees that fitting into the picture. Also, I had a chance to talk with Greg and others at Microsoft about the original reasons not to go with an inference API, and it seems like things may have changed.
If this is an interesting enough topic to people, we could consider talking about it during the face-to-face in Berlin.
As I find it a separate topic to #3, would like to continue conversations about graph format itself (not operators and not api) here.
So far we have 3 major discussions:
For the reasons I gave in these (and other) posts:
I opt for option 3 - JSON format.
Simple example with one hidden layer, 8 inputs and 9 outputs:
{
activation: "tanh",
layers: [8, 14, 9]
}
Same NN as above, but with weights data provided:
{
activation: "tanh",
layers: [8, 14, 9],
setupData: 'ISEhIEhlbGxvIHdlYiBNTCAhISE='
}
Some "dummy" advanced example for JSON format:
{
domain: 'custom-domain',
dataType: 'fp32',
layers: [
{
name: "input",
pipes: [
{
name: "A",
size: 30,
to: ['addIfAboveInputA']
},
{
name: "B",
size: 30
to: ['addIfAboveInputB']
}
]
},
{
type: 'addIfAbove',
a: 'addIfAboveInputA',
b: 'addIfAboveInputB',
value: 0
},
{
type: 'onlyLowerThan',
value: 50
},
{
type: 'neurons',
activation: 'tanh',
size: 30
},
30
]
}
As we are iterating the API design and adding more examples, a polyfill with run-able sample code would be helpful.
For example, in PR #80 review, the running version of the examples based on a simple polyfill got good feedbacks from reviewers.
So I propose to create two repos that host the development of polyfill and sample code.
This practice is also adopted by other groups, for example W3C Immersive Web develops webxr-polyfill and webxr-samples.
There's an interesting proposal over at WebAssembly System Interface repo WebAssembly/WASI#59 we should discuss.
Quoting OP:
It might sound too early to be talking about such fancy high level APIs but I think a WASI runtime with hardware accelerated neural networks and related algorithms would have a huge immediate market since it's such a hot topic. Probably this is another of those times when copying the web or following it closely is a good thing, the Web Neural Network API(examples) is being defined to provide this APIs to JS developers but might work even better with WASI.
Follow-ups over at the WASI repo or here. Consider this an early exploration. Currently this CG is not chartered to consider WASI but that could change in the future given adequate support for such a direction.
Cc @huningxin and @zolkis
The way we define conv2d
today is sufficient for a typical usage of convolution. However, there are a couple variants of the convolution operation we should consider support.
groupCount
param is needed, where the filter tensor shape becomes [out_channels / group_count, group_count, in_channels / group_count, H,W] when group_count > 1.Note that depthwise convolution, one used by MobileNet is implemented today as 2 passes convolution with the first depthwise pass done with the groupCount
set to the number of input channels, while the second pointwise pass is simply a convolution with 1x1 filter kernel size.
I think it is useful to make the (graph-building) syntax simpler for the user. In fact, I think we can make it look almost exactly like an API for executing the ops directly (and the "eager" API), at least for the non-control-flow ops, as explained below.
(1) In the proposal, operands are represented by integers, implicitly associated with the order in which "addOperand" is called. It would be cleaner to have "addOperand" return an object/value that represents the operand. So, this should allow us to replace
let tensor1 = operandIndex++;
model.addOperand(float32TensorType);
by
let tensor1 = model.addOperand(float32TensorType);
(2) Instead of separately creating operands to represent the outputs of an op and separately adding the op (connecting the inputs and outputs up), it is better to have addOperation create the operands representing the outputs and returning them. This would allow us to replace
// intermediateOutput0 is the output of the first ADD operation.
let intermediateOutput0 = operandIndex++;
model.addOperand(float32TensorType);
// Add the first ADD operation.
model.addOperation(nn.ADD, [tensor0, tensor1, fusedActivationFuncNone], [intermediateOutput0]);
by
// Add the first ADD operation.
// intermediateOutput0 is the output of the first ADD operation.
let intermediateOutput0 = model.addOperation(nn.ADD, [tensor0, tensor1, fusedActivationFuncNone]);
I omitted the type float32TensorType since it can be inferred from the operands. But where necessary, we can add the type as an extra parameter. This approach can still support operations that return multiple outputs.
(3) Furthermore, instead of representing operations by a constant like nn.ADD, it seems better to encapsulate them as a method ADD, allowing us to simplify
let intermediateOutput0 = model.addOperation(nn.ADD, [tensor0, tensor1, fusedActivationFuncNone]);
to
let intermediateOutput0 = model.ADD([tensor0, tensor1, fusedActivationFuncNone]);
(4) Similarly, for constants, instead of separately creating an operand and then setting its value, we can use a single method to create a constant that does both. This would allow us to simplify:
// Add the operand for the NONE activation function, and set its value to FUSED_NONE.
let fusedActivationFuncNone = operandIndex++;
model.addOperand(scalarInt32Type);
model.setOperandValue(fusedActivationFuncNone, newInt32Array([nn.FUSED_NONE]));
to:
// Add the operand for the NONE activation function, and set its value to FUSED_NONE.
let fusedActivationFuncNone = model.Constant(scalarInt32Type, newInt32Array([nn.FUSED_NONE]));
(As discussed earlier, the type can be omitted if it can be inferred.)
Originally posted by @gramalingam in #15 (comment)
Based on today's call the following resolution was made:
RESOLVED: Evolve WebNN API specification using https://github.com/intel/webml-polyfill/blob/master/docs/api.md as foundation specification
@huningxin - please begin the process of moving this proposal over to spec text so that issues can be filed against it.
Thank you.
[Edited by @anssiko: fix broken link]
Per resolution on the 9 May 2019 CG call, this issue is for surveying graph-building APIs from native ecosystem which aims to support the discussion in #16. The current foundation spec is direct derivative from Android NNAPI which is a C style API, during the CG call, the participants agreed to survey other graph-building APIs in native ecosystem to learn API design patterns.
There were three APIs mentioned in the CG call.
@walrusmcd mentioned "would love to contribute our learnings from two Microsoft's graph-building APIs". Feel free to add them into the list. Thanks.
In webmachinelearning/webnn-polyfill#23 (comment), @pyu10055 mentioned
this is bit confusing, the builder creates models, but the topology is created before the model is created, and how made the topology immutable for the model?
Today's spec lacks the detailed steps of ModelBuilder.createModel that causes the confusion.
Per resolution on the 14 Feb 2019 call, this issue is for discussing requirements for an API for executing operations.
IIRC @dsmilkov volunteered to take the first stab at this issue (thanks!). To frame the discussion, perhaps a good start is to evaluate the requirements through the lens of existing ML frameworks as API consumers. I believe also @huningxin's proof-of-concept might provide useful input.
Starting a thread to open the discussion for supporting custom operations.
The ML field is fast moving and model architectures and operations are evolving quickly. In TensorFlow.js, we have around ~200 ops and we still run into issues of missing ops when someone is trying to port a new model to the browser. I believe that the number of built-in ops will be relatively small and will grow very slowly due to standardization.
Thus, it is important to provide a way for library authors to write custom ops that can interop with the built-in neural net ops. That means having high-performance data exchange between custom ops and built-in ops. I stress the importance of high-performance, otherwise lib authors would revert back to implementing all of the ops using lower-level APIs (e.g. WebGPU).
A good way to start is to understand the scope and complexity of the problem. We can look at technical details on how browser-vendors plan to implement built-in ops, which gives us details about where these ops run and where the data lives.
A syntax-only question/suggestion. There may be reasons for the current design (disclaimer: I am long C) but wouldn't the JavaScript code look better with shorter signature containing the operands and the rest would be named in a dictionary?
For instance, now there is
partial interface NeuralNetworkContext {
Operand gemm(Operand a, Operand b, optional Operand c,
optional float alpha = 1.0, optional float beta = 1.0,
optional boolean aTranspose = false, optional boolean bTranspose = false);
};
and it could be also written (following the pattern used in many other web APIs) as
partial interface NeuralNetworkContext {
Operand gemm(Operand a, Operand b, optional Operand c, optional GemmOptions options);
};
dictionary GemmOptions {
float alpha = 1.0;
float beta = 1.0;
boolean aTranspose = false;
boolean bTranspose = false;
};
while the code using defaults would stay the same, and code using non-defaults would look more clear:
let r = nn.gemm(a, b, c);
let q = nn.gemm(a, b, c, { alpha: 1.1, beta: 1.1, aTranspose: true });
as opposed to
let q = nn.gemm(a, b, c, 1.1, 1.1, true);
Maybe not so big improvement for gemm
, but bigger opportunities for the likes of gru
ops, improving clarity of the function semantics and code readability as well.
I am aware this doesn't add much value to counter the arguments for the original design, if there is more to that than to make it look like a C API :).
As raised in CG meeting, the first foundation spec only lists 32 operation types without information about how to use them.
We need to define the set of operations and their specification/semantics.
The set of operations could be derived from the use cases and corresponding models. For WebNN POC, there is a spreadsheet that lists supported models and their required operations. It can be used as a starting point.
By following the spirit of WebML CG charter, the specification will be implementable on top of existing major platform APIs, such as Android NNAPI, Windows DirectML, and macOS/iOS MPS/BNNS. So when specifying operations, the platform APIs mapping/support need to be looked into. For WebNN POC, there is another spreadsheet the captures the native API mapping of supported operations. It can also be leveraged.
We can file individual issue for each operation specification and use this one as the meta issue.
The current proposal has support for quantized types like tensor-quant8-asymm
and some operators support them. Many networks run in mixed precision i.e. quantized output matrix multiply followed by logsoftmax in float32.
Propose adding https://github.com/onnx/onnx/blob/master/docs/Operators.md#DequantizeLinear and https://github.com/onnx/onnx/blob/master/docs/Operators.md#QuantizeLinear to make the quantized operators actually usable for many models.
Opening this issue to follow up the discussion of #94 (comment).
@pyu10055 mentioned:
With this API, the computation is tied with the compilation. should the compile method have the inputs/outputs pair and execution can only execution the graph compiled with the input/output pair? Otherwise, the compilation might not support the execution of the sub-graph scenario?
Actually, the existing builder.createModel
allows to specify the outputs. With that, developers could create different sub-graphs (models) from a topology within a builder and compile/compute them individually. For example:
const builder = nn.createModelBuilder();
const a = builder.input('a', descA);
const b = builder.constant(descB, bufferB);
const c = builder.constant(descC, bufferC);
const d = builder.mul(a, b);
const e = builder.add(d, c);
const model1 = builder.createModel({d}); // d = a * b
const model2 = builder.createModel({e}); // e = a* b + c
@pyu10055 , please let us know any gaps of the existing API and share more details of your proposal. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.