GithubHelp home page GithubHelp logo

w3c / machine-learning-charter Goto Github PK

View Code? Open in Web Editor NEW
9.0 16.0 3.0 145 KB

Discussions on a possible charter for a future W3C Working Group developing Machine Learning solutions

Home Page: https://w3c.github.io/machine-learning-charter/charter.html

HTML 100.00%
machine-learning standardization

machine-learning-charter's Introduction

machine-learning-charter

Discussions on a possible charter for a future W3C Working Group developing Machine Learning solutions

machine-learning-charter's People

Contributors

anssiko avatar dontcallmedom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machine-learning-charter's Issues

Descriptive input formats for building graphs?

I wonder if the WG has discussed and rejected defining descriptive input (e.g. JSON, JSON-LD) for building graphs.

Currently compute graphs are built programmatically, but using dictionary options.
Taking that a bit further, an alternative could be adding generic ops (ops as string enum parameters) with input and output descriptions, options etc, that are specified in a dictionary (and serialized as JSON/-LD) and fed to a builder method.

That would allow e.g. prior transformations based on client needs. The code would look similar, just more dictionaries and less method calls.

WebGPU interoperability

(Related to the WebML WG Charter in development at #19)

We discussed WebGPU interoperability expectations on our 6 October 2022 call and concluded working with WebGPU contributors is important for the success of the WebNN spec. I'd want us to revise the charter language around WebNN-WebGPU interoperability expectations accordingly.

The initial charter mentions WebGPU in the context of Out of Scope and Coordination as follows:

To avoid overlap with existing work, generic primitives used by traditional machine learning algorithms such as base linear algebra operations are out of scope. The WebGL and WebGPU shaders and WebAssembly SIMD are expected to address these requirements, see the Coordination section for details.

The GPU for the Web Working Group defines a WebGPU Shading Language that may be used to implement traditional machine learning algorithms efficiently. The Web Machine Learning Working Group should coordinate with this group to avoid overlap.

This issue is open to solicit proposals how to update the above. Tagging @RafaelCintron who provided feedback on the call.

Is a graph of operations the right level of abstraction for a web standard?

Given the rapid evolution of operations, and how fast ML as a field is evolving, is it premature to create a working group to propose a standard that’s expected to endure forever?

The number of operations defined in TensorFlow, ONNX, PyTorch and other machine learning frameworks has been growing at double-digit rates every year. Meanwhile, incompatible changes have led to new versions of some operations (eg, suffixed with v2) while others have fallen into disuse.

At Google, the Android and TensorFlow teams have experience shipping a neural network API to a billion devices. They are moving away from an operation-based API along the lines of what the Web NN API is proposing. For a little more context, see this FAQ in the Model Loader API Explainer.

Should this work continue to incubate in the community group until the field of ML stabilizes a bit more, or at least there’s wider agreement about a stable path forward?

Even without new APIs, ML on the web has been rapidly improving, as JavaScript libraries have added support for WebGL and WASM. Developers are only beginning to take advantage of the capabilities that exist today. We’d love for them to get access to the latest hardware acceleration too, in a way that makes sense for the long-term.

On-device training

Currently, on-device training is out of scope:

Training capabilities are out of scope due to limited availability of respective platform APIs.

We should assess whether to adjust this for 2023-2025.

We have explored on-device training in the context of our W3C workshop w3c/machine-learning-workshop#82. Google's Teachable Machine project is one success case where on-device training has been used in web context to solve real-world problems.

WebRTC coordination

(Related to the WebML WG Charter in development at #19)

We added an Integration with real-time video processing use case based on learnings from our experimentation in webmachinelearning/webnn#226

For the next charter, we could be more explicit and confident in Coordination, currently reads:

WebRTC Working Group
The WebRTC Working Group defines the MediaStream interface and related media processing APIs that likely make use of Machine Learning capabilities afforded by the WebNN API.

We could tweak this a little:

WebRTC Working Group
The WebRTC Working Group defines the MediaStream interface and related media processing APIs that enable integration with Machine Learning capabilities afforded by the WebNN API.

Or keep this intact. Thoughts @dontcallmedom?

Level of abstraction for neural net operations

(Related to the WebML WG Charter in development at #19)

WebNN explainer has a nice section that explains the rationale for the chosen level of abstraction for the neural network operations in WebNN API.

It was proposed we could integrate some of this explainer text into the next charter to provide more context on the level of abstraction. This could fit into the Scope section.

Tagging @wchao1115 @huningxin and @jbingham to confirm this would be appropriate.

Making sure the charter is flexible enough

Could we make the charter broad enough or extensible enough so that it can accommodate other APIs in the future? In the community group, we've discussed a graph API, a model loader API, and operation-specific APIs. It would be nice if we didn't have to update the charter if we want to add other APIs in the future.

@anssiko has some ideas

Detailed explainer for Web NN API

We at Google are concerned there is insufficient detail in this incubation to proceed to a standard. Could the CG have a conversation about the potential for creating a Working Group prior to drafting this charter? For example, it would be great to see a detailed Explainer fleshed out first.

Let’s clarify what the rationale is for this approach, as opposed to others, and why some feel it’s ready to move to the standard track.

Evidence of customer demand that’s not met by WebGL and WASM

In order to support adding new web ML APIs to Chromium, it would be helpful to have examples of customers who would in fact use this functionality if it shipped.

An ideal customer would meet these criteria:

  • Has already deployed ML models into web apps using existing JavaScript libraries
  • Has tried WebGL or WASM in earnest (or WebGL/WASM-based libraries like TensorFlow.js) and has performance data indicating that it will not work for their needs
  • Is willing to meet and talk about their requirements.

We already believe that web ML APIs will be beneficial for performance, and we are convinced of the theoretical benefit. In this thread, let’s capture customers willing to go on the record about their experiences with ML on the web today.

Coordination between the CG and WG

Nota bene: The aspects discussed in this issue are operational, and as such we do not need to codify these aspects in the proposed charter document itself. That said, I thought it'd be good to share this proposal here to capture any comments and in general to give a heads-up to the community on what we're planning.

Given PR #9 suggests a close coordination between the Community Group and Working Group is needed, I propose the WG adopts a work mode similar to the WebGPU CG and WG to minimize the friction for participants while crossing the group boundary.

In practice this proposal, if adopted, would mean the CG/WG boundaries can remain pretty transparent for regular participants and the existing already familiar day-to-day tooling used in the CG would be shared with the WG. For example, both the CG and WG would use the same:

  • Mailing list(s)
  • GitHub org and repos
  • Teleconferences and meetings

We'd automate tasks such as IPR checks on pull requests targeted at Working Group deliverables to make sure the participants can focus on the technical work without the need to worry about (less exciting) process-related aspects.

@tidoust, you set up the WebGPU CG/WG in this fashion couple of months ago. Feel free to share your experiences on how that has worked and what we should learn from you.

FYI @dontcallmedom

Speech synthesis and machine learning

[@dontcallmedom suggested that i should raise this here. It was originally a comment at https://github.com/w3c/strategy/issues/367#issuecomment-1431300909]

The description of areas in scope in the charter appears a little lop-sided to me, since Speech Recognition is called out but not Speech Synthesis. Specifically, just as you need to infer text from wave input in speech recognition, you need to infer meaning and pronunciation guides ('text understanding') as a prep for speech synthesis, and i wonder why machine learning is not being applied to that and included in the scope of the charter (ie. for the linguistic analysis of the text, rather than what drives the audio hardware, and could provide input to other TTS specs at the W3C - such as SSML, CSS, etc). If that doesn't fall within the scope of the work, I think the charter should probably indicate briefly why it is not addressing the use of machine learning for that function, whereas it does make inferences for speech recognition.

Graph vs model loader

If the Community Group could agree on a model format, would we prefer to ship a model loader API instead?

Dedicated ML hardware accelerators: NPU, VPU, xPU

(Related to the WebML WG Charter in development at #19)

The initial version of WebNN specifies two device types, "cpu" and "gpu".

However, the API is extensible with new device types and in our discussion support for NPU, VPU, or XPU has come up as a new "v2" feature.

The initial charter refers to "dedicated ML hardware accelerators" in its Motivation and Background, but if this is important we could be more explicit regarding NPU/VPU/XPU device type support.

@wchao1115 @huningxin for feedback.

AC review comments

I have a few comments and questions following my AC review of the Web ML charter:

  1. In section 2, the "Allows to ..." phrasing seems strange. Suggest "Allows construction of ...", "Allows compilation of ...", ""Allows input to be set up ..."
  2. In section 2, should "are not be tied" be "are not tied" or "are not to be tied"? And similar for "are to be implementable"
  3. In section 2, "It may also work on", it's not clear what "It" refers to. I suggest "The Working Group may also work on"
  4. In section 3.2, the Model Loader API state is "Adopted from the Web ML CG". Has this document now been adopted into the WG? If it has, what further progress is expected from the CG (the charter says "Depending on the CG progress")? If it hasn't, should the status be "Draft CG Report" rather than "Adopted from the Web ML CG"?
  5. In section 5.2, "how WebNN access data" should be "how WebNN accesses data"
  6. In section 6, "(Working|Interest)" should be "Working"
  7. In section 12, the change log is empty. Should this section describe changes from the previous charter?

Possible data process spec

In our practice of porting Machine Learning/Deep Learning eco-system to JavaScript, we found that, despite the well-discussed machine learning use cases, runtime, efficiency, and privacy concern, the data and data process pipeline/spec should also be discussed.

The needs are mainly addressed by the fact that the deep learning models can not work independently and data process is needed for both the inputs and outputs of one model.

Therefore, we are thinking if it is possible to set up such a data process spec that offers a universal & handy-dandy API set for data related operation.

OpenXLA coordination

We explored areas of collaboration with the OpenXLA project during our WebML WG Teleconference – 17 November 2022.

We have done a small-scale WebNN op compatibility exploration with XLA-HLO in 2020. During our call today there was an agreement we should consider revising this study with StableHLO that targets feature completeness EOY 2022. A good start for this effort would thus be ~Q1-Q2'23.

In terms of charter edits, this could be an addition to the External Coordination section, for example:

OpenXLA Project

OpenXLA Project develops StableHLO, a portable ML compute operation set that makes frameworks easier to deploy across different hardware. WebNN API targets diverse hardware platforms and defines an operation set whose high-level operations can be decomposed to low level primitives that can map to StableHLO operations. Coordination and alignment between these two operation sets is beneficial to the open ML ecosystem.

Set of ops supported must be more comprehensive

(Related to the WebML WG Charter in development at #19)

We discussed "v2" use cases for WebNN and @wchao1115 shared the following feedback:

… for v2, one of the constant feedback from our external partners when discussing WebNN for their use case has been the ops
… the set of ops supported must be more comprehensive
… this needs to be more explicit goal, this is important
… related to that, use cases around transformers

The current charter Scope enumerates a few common ones: "convolution, pooling, softmax, normalization, fully connected, activation, recurrent neural network (RNN) and long short-term memory (LSTM)". This is not meant to be an all inclusive list and does give the WG ability to adapt to the changes in this landscape.

At minimum, we should review the bullets in the Scope section, and see whether to explicitly mention some of the more recent work such as transformers. We want to give enough detail to give good direction without constraining the WG too much. The list of ops mentioned in the charter would be open-ended.

Is this API likely to be a long-term solution?

Google has already shared with the group that we believe the operations in Web NN (and ONNX and TF Lite) are too high level to be a good long-term solution for the Web, or for Android, or for ML practitioners more generally.

The number of operations in the Android NN API has grown from 30-40 in 2017 to 120 in 2020. The number of operations in TensorFlow has grown to over 1000. ML researchers are publishing new operations all the time, even daily. The potential number of operations is unbounded. Growth year over year has been 20-30%. That would be really hard to maintain for a web standard. Worse yet, operations fall into disuse, or are superseded, or undergo incompatible changes. The web could be stuck supporting them forever.

Also, given that devices don’t get updated often due to the hardware release cycle and device upgrade cycle, a static set of operations is limited in its ability to meet developers’ and users’ needs.

That’s why the TensorFlow and Android NN API teams are actively working on replacements for the current TensorFlow, TF Lite, and NN API operation sets, with the goal of having something extensible, that does not require defining and growing an operation set at such a rapid rate.

The plan on Android is to replace the current NN API with a lower level instruction set. There are multiple candidates, with no clear winner yet. We want to develop the instruction set in an open, vendor-neutral, standards-based way that would work for Android (an open-source project) as well as the Web -- and including Windows, MacOS and iOS.

This is the plan for the TensorFlow ecosystem too.

In other words, at Google we expect that a graph API -- on Android -- will be obsolete around the time the Web NN might ship in the major browsers. So what should we do?

IIUC, one possible argument is that it’s ok if web APIs are replaced. There’s precedent. It’s more important for the web to evolve and provide better solutions for web developers, even if those have a lifespan of just a few years.

And to be fair, the new solutions don’t exist yet, and there’s a risk they might not be available for a long time. Is it better to move ahead with a tried-and-true approach, modeled after the Android NN API? That was announced in 2017 and is still supported. Even after a replacement launches, the NN API in its present form will continue to be supported for some number of years. Why not give the web the same opportunity?

First, is this the argument that others have for moving forward?

Second, what do the web standards experts think? Is it ok to launch an API we expect to replace in a few years?

If we decide it’s worth moving ahead, even with the risk that we’re shipping a stop-gap solution with significant known limitations, we can talk about how to mitigate those risks, probably in a separate issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.