GithubHelp home page GithubHelp logo

Comments (9)

padenot avatar padenot commented on June 3, 2024

I seem to recall there is a ongoing discussions about reconciling WhatWG Streams and MediaStreams, but I'm not sure where it happens.

from mediacapture-worker.

domenic avatar domenic commented on June 3, 2024

Hey, Streams spec editor here, just wanting to say I'm really interested in figuring out if there's something streams can help with here. Maybe the FAQ entry I wrote up, Are readable streams appropriate for my use case?, would be helpful reading?

I seem to recall there is a ongoing discussions about reconciling WhatWG Streams and MediaStreams, but I'm not sure where it happens.

I have briefly started looking in to this, largely in service of my toy proposal MediaStreamRecorder, which is kind of a proof of concept of how and why to integrate them, at least for one major use case. I don't know that I fully understand media streams enough to figure out what a full integration looks like. For example, I am confused as to why and how a single type (MediaStream) is used for such disparate cases as media from a camera, and media fetched from a URL, and media sent over RTC. (I am sure it makes sense, I just haven't really been able to sit down and read all the specs and see how it works.) So I'd need some help from willing participants on the "MediaStream side" to figure out how we can best mesh these together.


Regarding this issue in particular, and following the link to #30 (comment), it seems like one way streams could potentially integrate is to have VideoProcessors be transform streams. That is, instead of

const processor = new VideoProcessor();
processor.onvideoprocess = imageBitmap => {
  process(imageBitmap).then(output => {
    event.outputImageBitmap = output;
  });
};

mediaStreamTrack.addProcessor(processor)

you could do something like

const processor = new TransformStream({
  transform(imageBitmap) {
    return process(imageBitmap);
  }
});

mediaStreamTrack.addProcessor(processor);

or perhaps some more functional API like

mediaStreamTrack.readable.pipeThrough(processor).pipeTo(someAppropriateDestination);
// mediaStreamTrack.readable exposes a stream of ImageBitmaps, and is
// generically usable by anyone, not just processors.

although I don't know if this kind of source -> processor -> dest setup is appropriate, as the existing drafted API seems more of a mutate-in-place thing, maybe for good reason.

Extending this API to allow processors to exist in either a worker or the main thread is pretty easy and I could outline that too.


Now, the bad news is that transform streams are not fully specced yet :(. And so I don't want to block you on them in any way. So maybe this is all just dreaming, or "v2" work. Or maybe I'll get my act together and get transform streams working soon enough so that by the time every browser is ready to implement offline video processing, transform streams are standing ready. We'll see.

Hope this helps!

from mediacapture-worker.

padenot avatar padenot commented on June 3, 2024

MediaStream are objects that contain MediaStreamTracks. Those MediaStreamTracks contain a stream of media (either audio frames or video frames). The important characteristics of MediaStreams and MediaStreamTracks is that they are clocked to a real-time clock, in the sense that we can associate a (real-)time to each frame. The fact that the clock has to be real time (per spec) makes them unsuitable for our offline processing needs here.

We need a construct that has the following characteristics:

  • Not clocked to a real-time clock
  • Let author access the underlying data in meaningful unit (e.g., a full video frame, or a video frame tile, etc.)
  • Can be connected to multiple output (tee-ing is fine)
  • Can have multiple inputs

It looks like Streams could work here, but I need to read more on them, and, as you mention, the interesting part for us is not done yet.

It has been suggested that we could have a model similar to the Web Audio API, where you connect various "generators" and "processors" together, and then work around the MediaStream/Stream issue in the same way the Web Audio API did it (via the AudioContext and the OfflineAudioContext). Although this model can work for audio, I don't think exclusively relying on connecting built-in processing units it is a suitable model for video, in the sense that copying a video frame is very expensive (unlike copying an audio buffer), maybe including round trips back and forth from a GPU, etc.

Better would be to use the new Worklets, and let author fold multiple processing pass in one loop over each pixels, instead of iterating multiple times over the image, maybe sending a texture over to the GPU to be processed by a shader, being well aware of the latency implications, before the eventual compositing or read-back.

from mediacapture-worker.

kakukogou avatar kakukogou commented on June 3, 2024

Agree with @padenot that the MediaStream is not suitable for offline processing here. Instead, we turned to focus on a more basic issue that there is no "offline (non-realtime) video source" on the web platform now and it came out that we want to have an object which can decode video and has interfaces for “seek to a certain position and then grab frames sequentially”. The current proposal is to extend the HTMLMediaElement in this way:

partial interface HTMLMediaElement {
    Promise<void> SeekToNextFrame();
};

Developers could seek (or actually play) a video element in the "frame" unit (instead of "time" unit) so that the video source no longer sticks to realtime clock and the underlying data could be accessed in the "frame" unit.

Please refer to Mozilla Bugzilla bug #1235301 for discussion and experimental implementation.

For accessing the video frame data, developers could just draw the video element onto a canvas element and then get an ImageData out, or upload the video element as a WebGL texture, or create an ImageBitmap via createImageBitmap() API from the video element.

This proposal aims to provide an offline (non-realtime) video source so it does not solve the inverse problem that "how to save (or recode) video frames into a video file in a non-realtime clock".

from mediacapture-worker.

kakukogou avatar kakukogou commented on June 3, 2024

@mhofman, this proposal, HTMLMediaElement::SeekTonNextFrame(), is somewhat similar to your idea at TPAC that separate the realtime and offine processing API, any thought from the APP developers' view?

@domenic, I'm feeling that this proposal, HTMLMediaElement::SeekTonNextFrame(), is related to the Readable Stream, is there any possibility to reshape the idea of this proposal into Stream-likes API?

from mediacapture-worker.

domenic avatar domenic commented on June 3, 2024

@kakukogou I'm sorry, I got a little lost by what you mean by "this proposal"? Do you mean the seekToNextFrame()?

from mediacapture-worker.

kakukogou avatar kakukogou commented on June 3, 2024

@domenic, yes, and sorry for my bad wording.

from mediacapture-worker.

domenic avatar domenic commented on June 3, 2024

No problem! It seems to me that the seekToNextFrame() idea is not too connected to streams; that's more about playback and what the user sees on their screen, if I understand.

But if that API idea is just a workaround for allowing developers to access frame data, and there's no desire for showing users precisely-seeked-to individual frames, then I agree it would be nicer to represent this as a ReadableStream.

I'd imagine something like new VideoFrameStream(arg) where arg can be a HTMLMediaElement or maybe a MediaStream. It then gives you a readable stream of... ImageBitmaps, or ImageData, or Blob, or ArrayBuffer; I am not sure which would be better. If that is what you were thinking I'd be happy to help work on that. It seems pretty self-contained.

from mediacapture-worker.

padenot avatar padenot commented on June 3, 2024

I agree with @domenic, we should try to unify everything with Streams and MediaStream instead of designing ad-hoc solutions for everything. It will be a clear win on the long run.

from mediacapture-worker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.