GithubHelp home page GithubHelp logo

segmentanyrgbd's Introduction

SAD: Segment Any RGBD

๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Welcome to the Segment Any RGBD GitHub repository! ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰


๐Ÿค—๐Ÿค—๐Ÿค— Segment AnyRGBD is a toolbox to segment rendered depth images based on SAM! Don't forget to star this repo if you find it interesting!
Hugging Face Spaces Hugging Face Spaces

Input to SAM (RGB or Rendered Depth Image) SAM Masks with Class and Semantic Masks 3D Visualization for SAM Masks with Class and Semantic Masks

๐Ÿฅณ Introduction

We find that humans can naturally identify objects from the visulization of the depth map, so we first map the depth map ([H, W]) to the RGB space ([H, W, 3]) by a colormap function, and then feed the rendered depth image into SAM. Compared to the RGB image, the rendered depth image ignores the texture information and focuses on the geometry information. The input images to SAM are all RGB images in SAM-based projects like SSA, Anything-3D, and SAM 3D. We are the first to use SAM to extract the geometry information directly. The following figures show that depth maps with different colormap functions has different SAM results.

๐Ÿ˜Ž Method

In this repo, we provide two alternatives for the users, including feeding the RGB images or rendered depth images to the SAM. In each mode, the user could obtain the semantic masks (one color refers to one class) and the SAM masks with the class. The overall structure is shown in the following figure. We use OVSeg for zero-shot semantic segmentation.

๐Ÿคฉ Comparison

  • RGB images mainly represents the texture information and depth images contains the geometry information, so the RGB images are more colorful than the rendered depth image. In this case, SAM provides much more masks for RGB inputs than depth inputs, as shown in the following figure.
  • The rendered depth image alleviates the over-segment results of SAM. For example, the table is segmented as four parts on the RGB images, and one of them is classified as the chair in the semantic results (yellow circles in the following figure). In contrast, the table is regarded as a whole object on the depth image and correctly-classified. A part of the head of a human is classified as the wall on the RGB image (blue circles in the following figure), but it is well classified on the depth image.
  • Two objects which are very close may be segmented as one object on the depth image, such as the chair in the red circle. In this case, texture information in the RGB images are essential to find out the object.

๐Ÿ”ฅ Demos

Sailvos3D Dataset

Input to SAM (RGB or Rendered Depth Image) SAM Masks with Class and Semantic Masks 3D Visualization for SAM Masks with Class and Semantic Masks

ScannetV2 Dataset

Input to SAM (RGB or Rendered Depth Image) SAM Masks with Class and Semantic Masks 3D Visualization for SAM Masks with Class and Semantic Masks

โš™๏ธ Installation

Please see installation guide.

๐Ÿ’ซ Try Demo

๐Ÿค— Try Demo on Huggingface

Hugging Face Spaces Hugging Face Spaces

๐Ÿค— Try Demo Locally

We provide the UI (ui.py) and example inputs (/UI/) to reproduce the above demos. We use the OVSeg checkpoints ovseg_swinbase_vitL14_ft_mpt.pth for zero-shot semantic segmentation, and SAM checkpoints sam_vit_h_4b8939.pth. Put them under this repo. Simply try our UI on your own computer:

python ui.py 

Simply click one of the Examples at the bottom and the input examples will be automatically fill in. Then simply click 'Send' to generate and visualize the results. The inference takes around 2 and 3 minutes for ScanNet and SAIL-VOS 3D respectively.

Data Preparation

Please download SAIL-VOS 3D and ScanNet to try more demos.

LICENSE

Shield: CC BY-NC 4.0

This repo is developed based on OVSeg which is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0

However portions of the project are under separate license terms: CLIP and ZSSEG are licensed under the MIT license; MaskFormer is licensed under the CC-BY-NC; openclip is licensed under the license at its repo; SAM is licensed under the Apache License.

segmentanyrgbd's People

Contributors

jingkang50 avatar w1zheng avatar jun-cen avatar xingyi-li avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.