GithubHelp home page GithubHelp logo

floe / backscrub Goto Github PK

View Code? Open in Web Editor NEW
730.0 29.0 84.0 14.92 MB

Virtual Video Device for Background Replacement with Deep Semantic Segmentation

License: Apache License 2.0

Makefile 3.22% C++ 73.07% Python 5.83% C 9.79% Shell 1.77% CMake 6.33%
tensorflow tflite deeplab cpp python opencv bodypix body-pix video deep-learning deeplabv3 mediapipe

backscrub's Introduction

BackScrub

(or The Project Formerly Known As DeepBackSub)

Virtual Video Device for Background Replacement with Deep Semantic Segmentation

Screenshots with my stupid grinning face (Credits for the nice backgrounds to Mary Sabell and PhotoFunia)

Maintainers

License

backscrub is licensed under the Apache License 2.0. See LICENSE file for details.

Building

Install dependencies (sudo apt install libopencv-dev build-essential v4l2loopback-dkms curl).

Clone this repository with git clone --recursive https://github.com/floe/backscrub.git. To speed up the checkout you can additionally pass --depth=1 to git clone. This is okay, if you only want to download and build the code, however, for development it is not recommended.

Use cmake to build the project: create a subfolder (e.g. build), change to that folder and run: cmake .. && make -j $(nproc || echo 4).

Deprecated: Another option to build everything is to run make in the root directory of the repository. While this will download and build all dependencies, it comes with a few drawbacks like missing support for XNNPACK. Also this might break with newer versions of Tensorflow Lite as upstream support for this option has been removed. Use at you own risk.

Usage

First, load the v4l2loopback module (extra settings needed to make Chrome work):

sudo modprobe v4l2loopback devices=1 max_buffers=2 exclusive_caps=1 card_label="VirtualCam" video_nr=10

Then, run backscrub (-d -d for full debug, -c for capture device, -v for virtual device, -b for wallpaper):

./backscrub -d -d -c /dev/video0 -v /dev/video10 -b ~/wallpapers/forest.jpg

Some cameras (like e.g. Logitec Brio) need to switch the video source to MJPG by passing -f MJPG in order for higher resolutions to become available for use.

For regular usage, setup a configuration file /etc/modprobe.d/v4l2loopback.conf:

# V4L loopback driver
options v4l2loopback max_buffers=2
options v4l2loopback exclusive_caps=1
options v4l2loopback video_nr=10
options v4l2loopback card_label="VirtualCam"

To auto-load the driver on startup, create /etc/modules-load.d/v4l2loopback.conf with the following content:

v4l2loopback

Requirements

Tested with the following dependencies:

  • Ubuntu 20.04, x86-64
    • Linux kernel 5.6 (stock package)
    • OpenCV 4.2.0 (stock package)
    • V4L2-Loopback 0.12.5 (stock package)
    • Tensorflow Lite 2.5.0 (from repo)
  • Ubuntu 18.04.5, x86-64
    • Linux kernel 4.15 (stock package)
    • OpenCV 3.2.0 (stock package)
    • V4L2-Loopback 0.10.0 (stock package)
    • Tensorflow Lite 2.1.0 (from repo)

Tested with the following software:

  • Firefox
    • 90.0.2 (works)
    • 84.0 (works)
    • 76.0.1 (works)
    • 74.0.1 (works)
  • Skype
    • 8.67.0.96 (works)
    • 8.60.0.76 (works)
    • 8.58.0.93 (works)
  • guvcview
    • 2.0.6 (works with parameter -c read)
    • 2.0.5 (works with parameter -c read)
  • Microsoft Teams
    • 1.3.00.30857 (works)
    • 1.3.00.5153 (works)
    • 1.4.00.26453 (works)
  • Chrome
    • 87.0.4280.88 (works)
    • 81.0.4044.138 (works)
  • Zoom - yes, I'm a hypocrite, I tested it with Zoom after all :-)
    • 5.4.54779.1115 (works)
    • 5.0.403652.0509 (works)

Background

In these modern times where everyone is sitting at home and skype-ing/zoom-ing/webrtc-ing all the time, I was a bit annoyed about always showing my messy home office to the world. Skype has a "blur background" feature, but that starts to get boring after a while (and it's less private than I would personally like). Zoom has some background substitution thingy built-in, but I'm not touching that software with a bargepole (and that feature is not available on Linux anyway). So I decided to look into how to roll my own implementation without being dependent on any particular video conferencing software to support this.

This whole shebang involves three main steps with varying difficulty:

  • find person in video (hard)
  • replace background (easy)
  • pipe data to virtual video device (medium)

Finding person in video

Attempt 0: Depth camera (Intel Realsense)

I've been working a lot with depth cameras previously, also for background segmentation (see SurfaceStreams), so I just grabbed a leftover RealSense camera from the lab and gave it a shot. However, the depth data in a cluttered office environment is quite noisy, and no matter how I tweaked the camera settings, it could not produce any depth data for my hair...? I looked like a medieval monk who had the top of his head chopped off, so ... next.

Attempt 1: OpenCV BackgroundSubtractor

See https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html for tutorial. Should work OK for mostly static backgrounds and small moving objects, but does not work for a mostly static person in front of a static background. Next.

Attempt 2: OpenCV Face Detector

See https://docs.opencv.org/3.4/db/d28/tutorial_cascade_classifier.html for tutorial. Works okay-ish, but obviously only detects the face, and not the rest of the person. Also, only roughly matches an ellipse which is looking rater weird in the end. Next.

Attempt 3: Deep learning!

I've heard good things about this deep learning stuff, so let's try that. I first had to find my way through a pile of frameworks (Keras, Tensorflow, PyTorch, etc.), but after I found a ready-made model for semantic segmentation based on Tensorflow Lite (DeepLab v3+), I settled on that.

I had a look at the corresponding Python example, C++ example, and Android example, and based on those, I first cobbled together a Python demo. That was running at about 2.5 FPS, which is really excruciatingly slow, so I built a C++ version which manages 10 FPS without too much hand optimization. Good enough.

I've also tested a TFLite-converted version of the Body-Pix model, but the results haven't been much different to DeepLab for this use case.

More recently, Google has released a model specifically trained for person segmentation that's used in Google Meet. This has way better performance than DeepLab, both in terms of speed and of accuracy, so this is now the default. It needs one custom op from the MediaPipe framework, but that was quite easy to integrate. Thanks to @jiangjianping for pointing this out in the corresponding issue.

Replace Background

This is basically one line of code with OpenCV: bg.copyTo(raw,mask); Told you that's the easy part.

Virtual Video Device

I'm using v4l2loopback to pipe the data from my userspace tool into any software that can open a V4L2 device. This isn't too hard because of the nice examples, but there are some catches, most notably color space. It took quite some trial and error to find a common pixel format that's accepted by Firefox, Skype, and guvcview, and that is YUYV. Nicely enough, my webcam can output YUYV directly as raw data, so that does save me some colorspace conversions.

End Result

The dataflow through the whole program is roughly as follows:

  • init
    • load background.png, convert to YUYV
    • initialize TFLite, register custom op
    • load Google Meet segmentation model
    • setup V4L2 Loopback device (w,h,YUYV)
  • loop
    • grab raw YUYV image from camera
    • extract portrait ROI in center
      • downscale ROI to 144 x 256 (*)
      • convert to RGB float32 (*)
      • run Google Meet segmentation model
      • convert result to binary mask using softmax
      • denoise mask using erode/dilate
    • upscale mask to raw image size
    • copy background over raw image with mask (see above)
    • write() data to virtual video device

(*) these are required input parameters for this model

Limitations/Extensions

As usual: pull requests welcome.

See Issues and Pull Requests for currently discussed/in-progress extensions, and also check out the experimental branch.

Fixed

  • The project name isn't catchy enough. Help me find a nice backronym.
  • Resolution is currently hardcoded to 640x480 (lowest common denominator).
  • Only works with Linux, because that's what I use.
  • Needs a webcam that can produce raw YUYV data (but extending to the common YUV420 format should be trivial)
  • Should probably do a erosion (+ dilation?) operation on the mask.
  • Background image size needs to match camera resolution (see issue #1).
  • CPU hog: maxes out two cores on my 2.7 GHz i5 machine for just VGA @ 10 FPS. Fixed via Google Meet segmentation model.
  • Uses stock Deeplab v3+ network. Maybe re-training with only "person" and "background" classes could improve performance? Fixed via Google Meet segmentation model.

Other links

Firefox preferred formats: https://searchfox.org/mozilla-central/source/third_party/libwebrtc/webrtc/modules/video_capture/linux/video_capture_linux.cc#142-159

Feeding obs-studio

We have been notified that some snap packaged versions of obs-studio are unable to detect/use a virtual camera as provided by backscrub. Please check the details for workarounds if this applies to you.

backscrub's People

Contributors

benbe avatar cristicc avatar cypmon avatar floe avatar marschwar avatar mkogan1 avatar oleid avatar peckto avatar phlash avatar progandy avatar razzziel avatar skybert avatar vekkt0r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

backscrub's Issues

Improve temporal stability of masked image

The masked image is often quite unstable in regards to the flow of time. While certain details may appear in one frame they might be gone in the next frame. This is in particular distracting if this affects part of the body or face (not necessarily the borders).

Maybe decrease the likelyhood of a pixel being removed from the mask on the amount of pixels that changed mask membership in their neighborhood: I.e. the more pixels in an area suddently change membership, the less likely it is that the whole area should be removed.

Segmentation result quality is subpar

Hi, first thanks for the project and trying to bring background removal to linux, but for me the result is really bad and I don't know why.
For me it often looks like that:
image

Here is the debug output of the deepseg binary:

./deepseg -d -d -c /dev/video0 -v /dev/video4
deepseg v0.2.0
(c) 2021 by [email protected]
https://github.com/floe/deepbacksub
debug:  2
ccam:   /dev/video0
vcam:   /dev/video4
width:  640
height: 480
back:   images/background.png
threads:2
model:  models/segm_full_v679.tflite

vid_format->type                = 2
vid_format->fmt.pix.width       = 640
vid_format->fmt.pix.height      = 480
vid_format->fmt.pix.pixelformat = 1448695129
vid_format->fmt.pix.sizeimage   = 614400
vid_format->fmt.pix.field       = 1
vid_format->fmt.pix.bytesperline= 1280
vid_format->fmt.pix.colorspace  = 8

tensor #0: 1
tensor #0: 144
tensor #0: 256
tensor #0: 3
tensor #244: 1
tensor #244: 144
tensor #244: 256
tensor #244: 2

And this is from v4l2-ctl --all -d /dev/video0

Driver name      : uvcvideo
Video input : 0 (Camera 1: ok)
Format Video Capture:
        Width/Height      : 640/480
        Pixel Format      : 'YUYV' (YUYV 4:2:2)
        Field             : None
        Bytes per Line    : 1280
        Size Image        : 614400
        Colorspace        : sRGB
        Transfer Function : Rec. 709
        YCbCr/HSV Encoding: ITU-R 601
        Quantization      : Default (maps to Limited Range)
        Flags             : 
opencv_version
4.5.1

So looks ok for me

Any ideas where I should start debugging this?

Green blob!

I'm getting started trying deepbacksub and what I've found is this result. I'm a big green blob!

I've tried all the different models included in the models folder. I've also tried more threads. I've adjusted the resolution to as low as 640x480 and as high as 1080p. I'm not sure what else to try and debug this issue. Any ideas?

Backronym suggestion: ViViBaRe

Virtual Video Background Removal

Pros: Catchy, explanatory, clever (i.e. makes your background bare)
Cons: Looks like spongebob meme text

Compile error

Hello,

I get the following compile error and cannot find the problem:

g++ deepseg.cc loopback.cc -Ofast -march=native -fno-trapping-math -fassociative-math -funsafe-math-optimizations -Wall -pthread -I ../tensorflow.git -I ../tensorflow.git/tensorflow/lite/tools/make//downloads/absl -I ../tensorflow.git/tensorflow/lite/tools/make//downloads/flatbuffers/include -I/usr/include/opencv4/opencv -I/usr/include/opencv4 -lrt -ldl -L ../tensorflow.git/tensorflow/lite/tools/make//gen/linux_x86_64/lib/ -ltensorflow-lite -lopencv_stitching -lopencv_aruco -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_dnn_objdetect -lopencv_dnn_superres -lopencv_dpm -lopencv_highgui -lopencv_face -lopencv_freetype -lopencv_fuzzy -lopencv_hdf -lopencv_hfs -lopencv_img_hash -lopencv_line_descriptor -lopencv_quality -lopencv_reg -lopencv_rgbd -lopencv_saliency -lopencv_shape -lopencv_stereo -lopencv_structured_light -lopencv_phase_unwrapping -lopencv_superres -lopencv_optflow -lopencv_surface_matching -lopencv_tracking -lopencv_datasets -lopencv_text -lopencv_dnn -lopencv_plot -lopencv_ml -lopencv_videostab -lopencv_videoio -lopencv_viz -lopencv_ximgproc -lopencv_video -lopencv_xobjdetect -lopencv_objdetect -lopencv_calib3d -lopencv_imgcodecs -lopencv_features2d -lopencv_flann -lopencv_xphoto -lopencv_photo -lopencv_imgproc -lopencv_core -o deepseg
/usr/bin/ld: ../tensorflow.git/tensorflow/lite/tools/make//gen/linux_x86_64/lib//libtensorflow-lite.a(interpreter_builder.o): undefined reference to symbol 'dlsym@@GLIBC_2.2.5'
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libdl.so: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make: *** [Makefile:27: deepseg] Fehler 1

Thank you

core dumped sometimes

deepseg.cc

called after throwing an instance of 'cv::Exception'
what(): OpenCV(3.4.10-dev) /home/zj/opencv/modules/core/src/matrix.cpp:466: error: (-215:Assertion failed) 0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows in function 'Mat'

at the line 233: cv::Mat roi = raw(roidim);

Aborted (core dumped)

Hello,
I got the following error:

deepseg v0.2.0
(c) 2021 by [email protected]
https://github.com/floe/deepbacksub
debug:  2
ccam:   /dev/video0
vcam:   /dev/video1
width:  640
height: 480
flip_h: no
flip_v: no
threads:2
back:   (none)
model:  models/segm_full_v679.tflite

deepseg: loopback.cc:58: int loopback_init(const char*, int, int, int): Assertion `ret_code != -1' failed.
Aborted (core dumped)

(-215:Assertion failed) sz.width % 2 == 0 && sz.height % 3 == 0 in function 'CvtHelper'

This is the output of ./deepseg -d -d -c /dev/video0 -v /dev/video1:

deepseg v0.2.0
(c) 2021 by [email protected]
https://github.com/floe/deepbacksub
debug:  2
ccam:   /dev/video0
vcam:   /dev/video1
width:  640
height: 480
flip_h: no
flip_v: no
threads:2
back:   (none)
model:  models/segm_full_v679.tflite

vid_format->type                = 2
vid_format->fmt.pix.width       = 640
vid_format->fmt.pix.height      = 480
vid_format->fmt.pix.pixelformat = 1448695129
vid_format->fmt.pix.sizeimage   = 614400
vid_format->fmt.pix.field       = 1
vid_format->fmt.pix.bytesperline= 1280
vid_format->fmt.pix.colorspace  = 8

tensor #0: 1
tensor #0: 144
tensor #0: 256
tensor #0: 3
tensor #244: 1
tensor #244: 144
tensor #244: 256
tensor #244: 2
Invalid MIT-MAGIC-COOKIE-1 keyQSettings::value: Empty key passed
QSettings::value: Empty key passed
terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(3.4.9) /home/abuild/rpmbuild/BUILD/opencv-3.4.9/modules/imgproc/src/color.simd_helpers.hpp:104: error: (-215:Assertion failed) sz.width % 2 == 0 && sz.height % 3 == 0 in function 'CvtHelper'

Thanks in advance!

Other segmentation models

I have stumbled upon this repo:
Portrait-segmentation

It contains multiple portrait segmentation models, mostly geared towards smartphones. Some of the models seem pretty good and fast. Would it be possible to implement some of these models here?

Error: `> Invalid number of channels in input image:`

Hello,
Thank you very much for developing this, was looking for something similar for a while.

Trying to run deepseg the following error occurs:

terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(3.4.8) /builddir/build/BUILD/opencv-3.4.8/modules/imgproc/src/color.simd_helpers.hpp:88: error: (-2:Unspecified error) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<2>; VDcn = cv::impl::{anonymous}::Set<3, 4>; VDepth = cv::impl::{anonymous}::Set<0>; cv::impl::{anonymous}::SizePolicy sizePolicy = cv::impl::<unnamed>::NONE; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'
> Invalid number of channels in input image:
>     'VScn::contains(scn)'
> where
>     'scn' is 3

At line:
https://github.com/floe/deepbacksub/blob/b5d0db71da41fffb79106ff644e6344096ba5f49/deepseg.cc#L152

The camera is Lenovo P50 integrated camera:

v4l2-ctl --all
Driver Info:
        Driver name      : uvcvideo
        Card type        : Integrated Camera: Integrated C
        Bus info         : usb-0000:00:14.0-8
        Driver version   : 5.5.15
        Capabilities     : 0x84a00001
                Video Capture
                Metadata Capture
                Streaming
                Extended Pix Format
                Device Capabilities
        Device Caps      : 0x04200001
                Video Capture
                Streaming
                Extended Pix Format
Priority: 2
Video input : 0 (Camera 1: ok)
Format Video Capture:
        Width/Height      : 1280/720
        Pixel Format      : 'YUYV' (YUYV 4:2:2)
        Field             : None
        Bytes per Line    : 2560
        Size Image        : 1843200
        Colorspace        : sRGB
        Transfer Function : Default (maps to sRGB)
        YCbCr/HSV Encoding: Default (maps to ITU-R 601)
        Quantization      : Default (maps to Limited Range)
        Flags             :
Crop Capability Video Capture:
        Bounds      : Left 0, Top 0, Width 1280, Height 720
        Default     : Left 0, Top 0, Width 1280, Height 720
        Pixel Aspect: 1/1
Selection Video Capture: crop_default, Left 0, Top 0, Width 1280, Height 720, Flags:
Selection Video Capture: crop_bounds, Left 0, Top 0, Width 1280, Height 720, Flags:
Streaming Parameters Video Capture:
        Capabilities     : timeperframe
        Frames per second: 10.000 (10/1)
        Read buffers     : 0
                     brightness 0x00980900 (int)    : min=-64 max=64 step=1 default=0 value=0
                       contrast 0x00980901 (int)    : min=0 max=95 step=1 default=0 value=0
                     saturation 0x00980902 (int)    : min=0 max=100 step=1 default=64 value=64
                            hue 0x00980903 (int)    : min=-2000 max=2000 step=1 default=0 value=0
 white_balance_temperature_auto 0x0098090c (bool)   : default=1 value=1
                          gamma 0x00980910 (int)    : min=100 max=300 step=1 default=100 value=100
           power_line_frequency 0x00980918 (menu)   : min=0 max=2 default=1 value=1
                                0: Disabled
                                1: 50 Hz
                                2: 60 Hz
      white_balance_temperature 0x0098091a (int)    : min=2800 max=6500 step=1 default=4600 value=4600 flags=inactive
                      sharpness 0x0098091b (int)    : min=0 max=7 step=1 default=2 value=2
         backlight_compensation 0x0098091c (int)    : min=0 max=2 step=1 default=1 value=1
                  exposure_auto 0x009a0901 (menu)   : min=0 max=3 default=3 value=3
                                1: Manual Mode
                                3: Aperture Priority Mode
              exposure_absolute 0x009a0902 (int)    : min=10 max=2047 step=1 default=384 value=384 flags=inactive
         exposure_auto_priority 0x009a0903 (bool)   : default=0 value=1

Any advice on how to resolve this appreciated,
Many thanks and regards.

Update experimental to latest tflite release

There's some new upstream tflite release r2.5 we could update our submodule link for tflite to?
Or any particular reasons to keep it at the current (old, untagged) revision?

Assertion in loopback_init

Hi
When im trying to run deepseg I get an assertion when trying to set VIDIOC_S_FMT
deepseg: loopback.cc:57: int loopback_init(const char*, int, int, int): Assertion `ret_code != -1' failed.
I put an assertion after VIDIOC_G_FMT and the return code is also -1.

My loopback is configured to /dev/video2 and I have changed to that in deepseg.cc.

Any idea how to proceed from here?

Invalid argument

Hi, congratulation, escuse my bad english

My problem is, I use lenovo (notepad) cam EasyCamera embeeded

deepseg v0.2.0
(c) 2021 by [email protected]
https://github.com/floe/deepbacksub
debug: 2
ccam: /dev/video0
vcam: /dev/video1
width: 640
height: 480
flip_h: no
flip_v: no
threads:2
back: (none)
model: models/segm_full_v679.tflite

loopback.cc:66(loopback_init): Failed to set device video format: Invalid argument
Failed to initialize vcam device.

Custom Training

I just discovered this software and it works, but not particularly well. If I want it to detect me, I have to be farer away from my PC than my headphone cable reaches, so it would be pretty nice if I could train it for myself. Is something like this implemented and if yes how does it work?

Close small holes in mask

The full meet model often leaves small holes in my face. It could be worthwile to add an optional hole filling step.

Acronym suggestion: TinGS

@floe not sure how to contact you but since the name is an "issue" here is my entry:
TinGS => TinGS is not a Green Screen :D

Error when running deepseg

Complied successfully after setting TFBASE=../tensorflow but following error greets me when starting it:

`./deepseg -d -c /dev/video0 -v /dev/video1

deepseg v0.1.0
(c) 2020 by [email protected]
https://github.com/floe/deepseg
debug: 1
ccam: /dev/video0
vcam: /dev/video1
width: 640
height: 480
back: background.png
threads:2
deepseg: loopback.cc:57: int loopback_init(const char*, int, int, int): Assertion `ret_code != -1' failed.

Aborted
`

macOS compatibility

Is there any chance to make this work on macOS? Would definitely fancy up our Slack calls!

'cv::Exception': Invalid number of channels in input image

Hello,

if I run ./deepseg -d -d -c /dev/video0 -v /dev/video1 this output appears:

deepseg v0.2.0
(c) 2021 by [email protected]
https://github.com/floe/deepbacksub
debug:  2
ccam:   /dev/video0
vcam:   /dev/video1
width:  640
height: 480
flip_h: no
flip_v: no
threads:2
back:   (none)
model:  models/segm_full_v679.tflite

vid_format->type                = 2
vid_format->fmt.pix.width       = 640
vid_format->fmt.pix.height      = 480
vid_format->fmt.pix.pixelformat = 1448695129
vid_format->fmt.pix.sizeimage   = 614400
vid_format->fmt.pix.field       = 1
vid_format->fmt.pix.bytesperline= 1280
vid_format->fmt.pix.colorspace  = 8

tensor #0: 1
tensor #0: 144
tensor #0: 256
tensor #0: 3
tensor #244: 1
tensor #244: 144
tensor #244: 256
tensor #244: 2
terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(3.4.9) /home/abuild/rpmbuild/BUILD/opencv-3.4.9/modules/imgproc/src/color.simd_helpers.hpp:88: error: (-2:Unspecified error) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<2>; VDcn = cv::impl::{anonymous}::Set<3, 4>; VDepth = cv::impl::{anonymous}::Set<0>; cv::impl::{anonymous}::SizePolicy sizePolicy = (cv::impl::<unnamed>::SizePolicy)2; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'
> Invalid number of channels in input image:
>     'VScn::contains(scn)'
> where
>     'scn' is 1

Thanks in advance!

Other background replacement code bases - a round up

Having recently discovered that the open source Jitsi video conferencing solution offers ML driven background replacement, I thought it would be interesting to round up who else is doing this here on Github and what tech is used..

  • Search: https://github.com/search?p=13&q=virtual+background&type=Repositories
  • 187 results! which with a bit of duplicate removal and filtering by star rating..
    • Jitsi-Meet, 100% client-side, tflite.js (optionally compiled to WASM, optionally with SIMD support), using BodyPix models.
    • ViBa, a tidy 100% python re-working of the original mixed-tech solution by Ben Elder, using python3-tensorflow, python3-opencv & BodyPix models.
    • Volcomix virtual background, the inspiration for Jitsi team, 100% client-side, using tflite.js (compiled to WASM & SIMD required) and either BodyPix or MediaPipe Meet models. Really well documented and tested.
    • EasyJitsi, 100% client-side React app, using tf.js and BodyPix. Small, nice demo site but slow (3FPS on my laptop)
    • VirtBG, 100% client-side, single file implementation, using tf.js, BodyPix. Similar performance to EasyJitsi above as expected. Great example of minimal bloat though!

Windows 10 Compatibility

I tried to launch after installing opencv and tensorflow through pip with Python 3.8. Let me know what I can do to help troubleshoot.

python .\deepseg.py 2020-04-26 18:52:55.986659: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2020-04-26 18:52:55.989906: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. using NumPy version 1.18.3 using TFLite version 2.2.0-rc3 [ WARN:0] global C:\projects\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (1113) SourceReaderCB::OnReadSample videoio(MSMF): OnReadSample() is called with error status: -1072875772 [ WARN:0] global C:\projects\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (1125) SourceReaderCB::OnReadSample videoio(MSMF): async ReadSample() call is failed with error status: -1072875772 [ WARN:1] global C:\projects\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (1159) CvCapture_MSMF::grabFrame videoio(MSMF): can't grab frame. Error: -1072875772 Traceback (most recent call last): File ".\deepseg.py", line 75, in <module> roi = img[0:540,210:750] # row0:row1, col0:col1 TypeError: 'NoneType' object is not subscriptable [ WARN:1] global C:\projects\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (674) SourceReaderCB::~SourceReaderCB terminating async callback

Core dump while running deepseg

inithb@pop-os:~/workspace/github/deepbacksub$ ./deepseg -d -d -c /dev/video0 -v /dev/video1
deepseg v0.2.0
(c) 2021 by [email protected]
https://github.com/floe/deepbacksub
debug: 2
ccam: /dev/video0
vcam: /dev/video1
width: 640
height: 480
back: images/background.png
threads:2
model: models/segm_full_v679.tflite

deepseg: loopback.cc:58: int loopback_init(const char, int, int, int): Assertion `ret_code != -1' failed.*
Aborted (core dumped)
binithb@pop-os:/workspace/github/deepbacksub$ ls -ltr /dev/video*
crw-rw----+ 1 root video 81, 1 Jan 18 12:37 /dev/video1
crw-rw----+ 1 root video 81, 0 Jan 18 12:37 /dev/video0
crw-rw----+ 1 root video 81, 2 Jan 18 12:38 /dev/video2
crw-rw----+ 1 root video 81, 3 Jan 18 12:38 /dev/video3
crw-rw----+ 1 root video 81, 4 Jan 18 12:54 /dev/video4
binithb@pop-os:
/workspace/github/deepbacksub$ cat /etc/i
ifplugd/ init.d/ initramfs/ initramfs-tools/ inputrc insserv.conf.d/ io.elementary.appcenter/ iproute2/ issue issue.diverted issue.net issue.net.diverted
binithb@pop-os:~/workspace/github/deepbacksub$ cat /etc/issue
Pop!_OS 20.04 LTS \n \l

deepseg_core_popos20.zip
binithb@pop-os:/workspace/github/deepbacksub$ uname -a
Linux pop-os 5.8.0-7630-generic #32
160919370720.04781bb80-Ubuntu SMP Tue Jan 5 21:23:50 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
binithb@pop-os:~/workspace/github/deepbacksub$

Twitch Test

Would be great to have your team test out DeepBackSub on Twitch, for many music artists looking for more tools for live streaming sets to their fans, and Twitch users who stumble across them. if this ever happens, please let us know :)

A huge request from all of you...

can you guys crash my windows 10 irreparably? I will reinstall it in the morning, it's my boss he has infected my whole computer, and denied me access. pls help
I haven't done anything wrong I swear, he is just torturing me, and going through all sorts of files, even my confident medical patient files. I feel like a fool for asking but I am genuinely grossed out. So pls

Error comparison of integer expressions of different signedness

I'm getting this error on Ubuntu 20.04

g++ tensorflow//tensorflow/lite/tools/make//gen/linux_x86_64/lib//libtensorflow-lite.a deepseg.cc loopback.cc transpose_conv_bias.cc -Ofast -march=native -fno-trapping-math -fassociative-math -funsafe-math-optimizations -Wall -pthread -I tensorflow/ -I tensorflow//tensorflow/lite/tools/make//downloads/absl -I tensorflow//tensorflow/lite/tools/make//downloads/flatbuffers/include -ggdb -I/usr/include/opencv4/opencv -I/usr/include/opencv4 -lrt -ldl -L tensorflow//tensorflow/lite/tools/make//gen/linux_x86_64/lib/ -ltensorflow-lite -ldl -lopencv_stitching -lopencv_aruco -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_dnn_objdetect -lopencv_dnn_superres -lopencv_dpm -lopencv_highgui -lopencv_face -lopencv_freetype -lopencv_fuzzy -lopencv_hdf -lopencv_hfs -lopencv_img_hash -lopencv_line_descriptor -lopencv_quality -lopencv_reg -lopencv_rgbd -lopencv_saliency -lopencv_shape -lopencv_stereo -lopencv_structured_light -lopencv_phase_unwrapping -lopencv_superres -lopencv_optflow -lopencv_surface_matching -lopencv_tracking -lopencv_datasets -lopencv_text -lopencv_dnn -lopencv_plot -lopencv_ml -lopencv_videostab -lopencv_videoio -lopencv_viz -lopencv_ximgproc -lopencv_video -lopencv_xobjdetect -lopencv_objdetect -lopencv_calib3d -lopencv_imgcodecs -lopencv_features2d -lopencv_flann -lopencv_xphoto -lopencv_photo -lopencv_imgproc -lopencv_core -o deepseg
deepseg.cc: In function ‘int fourCcFromString(const string&)’:
deepseg.cc:45:22: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::__cxx11::basic_string<char>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
   45 |   for (auto i = 0; i < in.size(); ++i)
      |                    ~~^~~~~~~~~~~

Person is only picked up on part of the video image

For some reason, I'm cut off on one side of the video. You can see the hard cut on the left side.

When using the google meet model, it's even worse and only detects on the right half of the video. Any ideas?

./deepseg -c /dev/video0 -v /dev/video4 -m models/deeplabv3_257_mv_gpu.tflite -b images/bac.jpg -t 16
deepseg v0.2.0
(c) 2021 by [email protected]
https://github.com/floe/deepbacksub
debug:  0
ccam:   /dev/video0
vcam:   /dev/video4
width:  640
height: 480
back:   images/bac.jpg
threads:16
model:  models/deeplabv3_257_mv_gpu.tflite

image

Process parts of image individually / pad image to model aspect

To avoid clipping the image partway it would be nice if the image may be split into two (or more) overlapping areas that are fed to the NN and recombined after detection (by e.g. ORing the results together). This is overall a bit slower but would allow for arbitrary aspect rations to be handled. This might also allow for feeding a scaled image into the NN and refining the result area by area.

Gpu vs Edge TPU

Have you given any thought to offloading the processing to a TF core? Something like a Coral USB TPU.

Instructions

Hi
Is it possible for someone who has this program working to list out the process to install everything and setup deepbacksub so it runs. I have followed the list of requirements but I'm stuck on the error below. All the files are in the correct folders but something is not set up correct.

Error:
In file included from deepseg.cc:19:0:
tensorflow/lite/interpreter.h:26:10: fatal error: tensorflow/lite/allocation.h: No such file or directory
#include "tensorflow/lite/allocation.h"
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:39: recipe for target 'deepseg' failed
make: *** [deepseg] Error 1

Cheers Ivan

MAC Support

Hello,

Thank you very much for your code on Github. I was wondering if there is a way to port this code on Mac. Any suggestion
will be helpful.

Allow for Debug Cam output

Allow for a second v4l2loopback cam output to be enabled which receives the image mask detected by the NN (plain mask, no source image). Depending on what is available this might provide the greyscale "likelyhood" map from the NN for further processing.

Possible Model Improvements

While searching for background removal solutions, I found a similar project that doesn't have a proper repository. It apparently uses the bodypix 2 model for segmentation which is specialized for humans. Seeing as they both are tensorflow with similar API, it might be worth adding this alternative model as an option for higher performance.

loopback error when launch deepseg

Hi,

I've successfully compiled deepsig, but when i execute deepseg shows me this error:

deepseg v0.1.0
(c) 2020 by [email protected]
https://github.com/floe/deepseg
debug: 0
ccam: /dev/video1
vcam: /dev/video0
width: 640
height: 480
back: background.png
threads:2
deepseg: loopback.cc:57: int loopback_init(const char*, int, int, int): Assertion `ret_code != -1' failed.
[1] 115948 abort (core dumped) ./deepseg

Im working on Pop_OS 20.04, OpenCV 4.2 and tensorflow lite from repo.

OpenCV

Hello,

Although I followed the directions on the OpenCV site to install, when I run make on the deepbacksub project it still gives me the error:
Makefile:23: *** Couldn't find OpenCV. Stop.

Could anyone suggest the proper way to install OpenCV? Maybe I'm missing something.

Idea: Pre-Filtering on camera noise

While discussing some things about ANN with @martok it has been noted, that when experimenting with such networks they seem to be sensitive to noise in their inputs. As camera's are physical sensors[citation needed] they create noise. This noise is especially noticeable in dark environments, e.g. at night. Filtering out this noise (doesn't need to be perfect) should help with getting better detection rates with the various ANN. Having this pre-processing step might even help to adapt the white balance of the input image to that of the ANN's training data (as noted in #29 et. al.).

Recommendation: Multiple segmentation runs for each frame

Calculating multiple slightly shifted segmentation masks would let us make a higher resolution combined mask. The segmentation input would be shifted by a few pixels up or down each time it is executed (usually in a square grid). Next the masks are upscaled and shifted by the same amount and then averaged. This is a form of image super resolution. This all could be done in parallel.

This same technique can also be extended to include seperate larger jumps sideways to increase the span of the mask. Each larger jump would then be followed by the small super resolution related jumps.

The idea to cover the whole image was previously mentioned by @BenBE in #58 (comment)

This can also be extended to slightly change the sampling positions between frames, and then average two or more frames.

Just a green screen

I'm not even really sure where to start providing info or debugging. Everything seems to be "working" but when the window pops up it's just a green screen. Any idea where to start?

Meet model card has different specs

The output shape in the model card is 265x256x2, however the model that we have here outputs 256x256x1. The model size also differs by a few KB.

deepseg.py: error: unrecognized arguments: -d -d -c /dev/video0 -v /dev/video2

Hi there. First thank you for your time! :)

When I'm running deepseg.py I get this error about unrecognized arguments:

./deepseg.py -d -d -c /dev/video0 -v /dev/video2

[ WARN:0] global /build/opencv/src/opencv-4.5.1/modules/videoio/src/cap_gstreamer.cpp (961) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
using NumPy  version 1.20.0
using TFLite version 2.4.1
usage: deepseg.py [-h] [-i IMAGE] [-m MODEL_FILE] [--input_mean INPUT_MEAN] [--input_std INPUT_STD]
deepseg.py: error: unrecognized arguments: -d -d -c /dev/video0 -v /dev/video2

If I run without arguments the output is:

./deepseg.py

[ WARN:0] global /build/opencv/src/opencv-4.5.1/modules/videoio/src/cap_gstreamer.cpp (961) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
using NumPy  version 1.20.0
using TFLite version 2.4.1
Traceback (most recent call last):
  File "/home/frojnd/System/deepbacksub/./deepseg.py", line 58, in <module>
    interpreter = tf.lite.Interpreter(model_path=args.model_file)
  File "/usr/lib/python3.9/site-packages/tensorflow/lite/python/interpreter.py", line 205, in __init__
    _interpreter_wrapper.CreateWrapperFromFile(
ValueError: Could not open 'deeplabv3_257_mv_gpu.tflite'.

pacman -Q opencv

opencv 4.5.1-1

pacman -Q gstreamer

gstreamer 1.18.3-1

lsmod | grep v4l

v4l2loopback           45056  0
videobuf2_v4l2         36864  1 uvcvideo
videobuf2_common       65536  2 videobuf2_v4l2,uvcvideo
videodev              290816  4 videobuf2_v4l2,v4l2loopback,uvcvideo,videobuf2_common
mc                     61440  4 videodev,videobuf2_v4l2,uvcvideo,videobuf2_common

v4l2-ctl --all

Driver Info:
	Driver name      : uvcvideo
	Card type        : Integrated Camera: Integrated C
	Bus info         : usb-0000:00:1a.0-1.6
	Driver version   : 5.10.11
	Capabilities     : 0x84a00001
		Video Capture
		Metadata Capture
		Streaming
		Extended Pix Format
		Device Capabilities
	Device Caps      : 0x04200001
		Video Capture
		Streaming
		Extended Pix Format
Priority: 2
Video input : 0 (Camera 1: ok)
Format Video Capture:
	Width/Height      : 1280/720
	Pixel Format      : 'YUYV' (YUYV 4:2:2)
	Field             : None
	Bytes per Line    : 2560
	Size Image        : 1843200
	Colorspace        : sRGB
	Transfer Function : Rec. 709
	YCbCr/HSV Encoding: ITU-R 601
	Quantization      : Default (maps to Limited Range)
	Flags             : 
Crop Capability Video Capture:
	Bounds      : Left 0, Top 0, Width 1280, Height 720
	Default     : Left 0, Top 0, Width 1280, Height 720
	Pixel Aspect: 1/1
Selection Video Capture: crop_default, Left 0, Top 0, Width 1280, Height 720, Flags: 
Selection Video Capture: crop_bounds, Left 0, Top 0, Width 1280, Height 720, Flags: 
Streaming Parameters Video Capture:
	Capabilities     : timeperframe
	Frames per second: 10.000 (10/1)
	Read buffers     : 0
                     brightness 0x00980900 (int)    : min=-64 max=64 step=1 default=-16 value=-16
                       contrast 0x00980901 (int)    : min=0 max=95 step=1 default=28 value=28
                     saturation 0x00980902 (int)    : min=0 max=100 step=1 default=40 value=40
                            hue 0x00980903 (int)    : min=-180 max=180 step=1 default=0 value=0
 white_balance_temperature_auto 0x0098090c (bool)   : default=1 value=1
                          gamma 0x00980910 (int)    : min=48 max=300 step=1 default=100 value=100
           power_line_frequency 0x00980918 (menu)   : min=0 max=2 default=2 value=1
				0: Disabled
				1: 50 Hz
				2: 60 Hz
      white_balance_temperature 0x0098091a (int)    : min=2800 max=6500 step=1 default=4600 value=4600 flags=inactive
                      sharpness 0x0098091b (int)    : min=1 max=7 step=1 default=1 value=1
         backlight_compensation 0x0098091c (int)    : min=0 max=2 step=1 default=0 value=0
                  exposure_auto 0x009a0901 (menu)   : min=0 max=3 default=3 value=0
				0: Auto Mode
				1: Manual Mode
				2: Shutter Priority Mode
				3: Aperture Priority Mode
              exposure_absolute 0x009a0902 (int)    : min=1 max=5000 step=1 default=333 value=500 flags=inactive
         exposure_auto_priority 0x009a0903 (bool)   : default=0 value=1
                   pan_absolute 0x009a0908 (int)    : min=-36000 max=36000 step=3600 default=0 value=0
                  tilt_absolute 0x009a0909 (int)    : min=-36000 max=36000 step=3600 default=0 value=0
                  zoom_absolute 0x009a090d (int)    : min=0 max=10 step=1 default=0 value=0
                        privacy 0x009a0910 (bool)   : default=0 value=0

v4l2-ctl --list-formats-ext

ioctl: VIDIOC_ENUM_FMT
	Type: Video Capture

	[0]: 'YUYV' (YUYV 4:2:2)
		Size: Discrete 640x480
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 640x360
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 352x288
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 320x240
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 800x448
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 960x540
			Interval: Discrete 0.100s (10.000 fps)
		Size: Discrete 1280x720
			Interval: Discrete 0.100s (10.000 fps)
		Size: Discrete 424x240
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
	[1]: 'MJPG' (Motion-JPEG, compressed)
		Size: Discrete 640x480
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 640x360
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 352x288
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 320x240
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 800x448
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 960x540
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
		Size: Discrete 1280x720
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.067s (15.000 fps)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.