GithubHelp home page GithubHelp logo

whu-usi3dv / mobile-seed Goto Github PK

View Code? Open in Web Editor NEW
109.0 14.0 2.0 156.66 MB

[IEEE RAL'24 & IROS'24] Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots

Home Page: https://whu-usi3dv.github.io/Mobile-Seed/

License: BSD 2-Clause "Simplified" License

Python 95.48% MATLAB 3.78% Dockerfile 0.52% Shell 0.21%
boundary-detection semantic-segmentation dual-task-learning realtime-segmentation robot-vision

mobile-seed's Introduction

This is the official PyTorch implementation of the following publication:

Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots
Youqi Liao, Shuhao Kang, Jianping Li, Yang Liu, Yun Liu, Zhen Dong, Bisheng Yang,Xieyuanli Chen,
IEEE RA-L 2024
Paper | Arxiv | Project-page | Video

🔭 Introduction

TL;DR: Mobile-Seed is an online framework for simultaneous semantic segmentation and boundary detection on compact robots.

Motivation

Abstract: Precise and rapid delineation of sharp boundaries and robust semantics is essential for numerous downstream robotic tasks, such as robot grasping and manipulation, realtime semantic mapping, and online sensor calibration performed on edge computing units. Although boundary detection and semantic segmentation are complementary tasks, most studies focus on lightweight models for semantic segmentation but overlook the critical role of boundary detection. In this work, we introduce Mobile-Seed, a lightweight, dual-task framework tailored for simultaneous semantic segmentation and boundary detection. Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach. The encoder is divided into two pathways: one captures category-aware semantic information, while the other discerns boundaries from multi-scale features. The AFD module dynamically adapts the fusion of semantic and boundary information by learning channel-wise relationships, allowing for precise weight assignment of each channel. Furthermore, we introduce a regularization loss to mitigate the conflicts in dual-task learning and deep diversity supervision. Compared to existing methods, the proposed Mobile-Seed offers a lightweight framework to simultaneously improve semantic segmentation performance and accurately locate object boundaries. Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline by 2.2 percentage points (pp) in mIoU and 4.2 pp in mF-score, while maintaining an online inference speed of 23.9 frames-per-second (FPS) with 1024×2048 resolution input on an RTX 2080 Ti GPU. Additional experiments on CamVid and PASCAL Context datasets confirm our method’s generalizability.

🆕 News

  • 2023-11-22: [Project page] (with introduction video) is aviliable!🎉
  • 2023-11-22: [Preprint paper] is aviliable!🎉
  • 2023-11-26: We update the video in the project page with in-depth analysis of our Mobile-Seed. More qualitative results will be available soon!
  • 2023-11-27: Introduction video on YouTube is available now!
  • 2024-01-20: Code and pre-trained models are available now !
  • 2024-02-21: Accepted by IEEE RAL'24 !
  • 2024-03-06: Update the data pre-processing code for Camvid and PASCAL Context datasets!

💻 Installation

Our Mobile-Seed is built on MMsegmentation 0.29.1. Please refer to the installation page for more details. We provide a Docker image on onedrive and baidudisk(code: djry) for quick start.

If you want to building from source, here is a quick installation example :

conda create --name mobileseed python=3.7 -y
conda activate mobileseed
pip install -r requirements.txt
mim install mmengine
mim install mmcv-full
git clone https://github.com/WHU-USI3DV/Mobile-Seed.git
cd Mobile-Seed
pip install -v -e .

🚅 Usage

Evaluation

NOTE: data preprocssing is not necessary for evaluation. We provide pre-trained models for Cityscapes, CamVid and PASCAL Context datasets. Please download the weights of Mobile-Seed from onedrive or Baidu-disk(code:MS24) and put them in a folder like ckpt/. We also provide our pre-trained weights of baseline method AFFormer on onedrive and baidudisk(code: zesm) for fair comparison.

Example: evaluate Mobile-Seed on Cityscapes:

# Single-gpu testing
bash tools/dist_test.sh ./configs/Mobile_Seed/MS_tiny_cityscapes.py /path/to/checkpoint_file.pth 1 --eval mIoU
  • Mobile-Seed Performance:
Dataset mIoU mBIoU (3px) FLOPs
Cityscapes 78.4 43.3 31.6G
CamVid 73.4 45.2 4.1G
PASCAL Context (60) 47.2 22.1 3.7G
PASCAL Context (59) 43.0 16.2 3.7G

Training

Download weights of AFFormer pretrained on ImageNet-1K from google-drive or alidrive and put them in a folder like ckpt/. On the Cityscapes dataset, we trained the Mobile-Seed with an Intel Core i9-13900K CPU and a NVIDIA RTX 4090 GPU for 160K iterations and cost approximately 22 hours. Example: train Mobile-Seed on Cityscapes:

# Single-gpu training
bash tools/dist_train.sh ./configs/Mobile_Seed/MS_tiny_cityscapes.py

# Multi-gpu training
bash tools/dist_train.sh ./configs/Mobile_Seed/MS_tiny_cityscapes.py <GPU_NUM>

Data preprocessing

Cityscapes

We provide processed Cityscapes data on onedrive and baidudisk(code: 5n7t). If you want to process the data from scratch, please refer to following steps:

  • Download the files gtFine_trainvaltest.zip, leftImg8bit_trainvaltest.zip and leftImg8bit_demoVideo.zip from the Cityscapes website to data_orig/, and unzip them:
unzip data_orig/gtFine_trainvaltest.zip -d data_orig && rm data_orig/gtFine_trainvaltest.zip
unzip data_orig/leftImg8bit_trainvaltest.zip -d data_orig && rm data_orig/leftImg8bit_trainvaltest.zip
unzip data_orig/leftImg8bit_demoVideo.zip -d data_orig && rm data_orig/leftImg8bit_demoVideo.zip
  • create training semantic label: python data_preprocess/cityscapes_preprocess/code/createTrainIdLabelImgs.py <data_path>
  • Generate .png training semantic boundary labels by running the following command:
# In Matlab Command Window
run code/demoPreproc_gen_png_label.m

This will create instance-insensitive semantic boundary labels for network training in data_proc_nis/. For the difference between instance-insensitive and instance-sensitive, please refer to the SEAL.

CamVid & PASCAL Context

  • semantic boundary label generation: python data_preprocess/camvid_pascal_preprocess/label_generator.py <dataset> <data_path> We split the training and test set of Camvid according to PIDNet to avoid same-area evaluation.

🔦 Demo

Here is a demo script to test a single image. More details refer to MMSegmentation's Doc.

python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SEG_FILE} \
[--out_sebound ${SEBOUND_FILE}] [--out_bibound ${BIBOUND_FILE}] [--device ${DEVICE_NAME}] [--palette-thr ${PALETTE}] 

Example: visualize the Mobile-Seed on Cityscapes:

python demo/image_demo.py demo/demo.png configs/Mobile_Seed/MS_tiny_cityscapes.py \
/path/to/checkpoint_file /path/to/outseg.png --device cuda:0 --palette cityscapes

💡 Citation

If you find this repo helpful, please give us a star~.Please consider citing Mobile-Seed if this program benefits your project.

@article{liao2024mobileseed,
  title={Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots},
  author={Youqi Liao and Shuhao Kang and Jianping Li and Yang Liu and Yun Liu and Zhen Dong and Bisheng Yang and Xieyuanli Chen},
  journal={IEEE Robotics and Automation Letters},
  year={2024},
  doi={10.1109/LRA.2024.3373235}
}

🔗 Related Projects

We sincerely thank the excellent projects:

  • AFFormer for head-free Transformer;
  • SeaFormer for Squeeze-enhanced axial Transformer;
  • FreeReg for excellent template;
  • DDS for a novel view in deep diverse supervision;
  • DFF for dynamic feature fusion and Cityscapes data-preprocessing;

mobile-seed's People

Contributors

martin-liao avatar

Stargazers

 avatar he avatar  avatar  avatar Ling Tong avatar  avatar  avatar  avatar  avatar Yan Xu avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar Roronoa-Zoro avatar Zihong Yan avatar lfz avatar  avatar  avatar  avatar  avatar Houzhan Zhang avatar  avatar HaoRui avatar Liang Shaofeng avatar  avatar Jeff Carpenter avatar ray avatar Michael Gregory avatar LongChen avatar TheFourthRome avatar  avatar yzhw-1231 avatar  avatar  avatar Chengsj avatar  avatar ReturnXI avatar KepWalk avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar FangYuan avatar ZTC avatar  avatar miaomiaomaioye avatar hmcy-zbx avatar  avatar Yan MENG avatar  avatar  avatar  avatar  avatar  avatar SLAMDUNK avatar rotorliu avatar  avatar Pengyu Zheng avatar JackieLee avatar  avatar  avatar YizheZhang avatar An Xiao avatar VARUN SAKUNIA avatar Weitong Wu avatar  avatar  avatar 赵焕峰 avatar  avatar Ecneics avatar Chenghao Xu avatar 张惊涛 avatar Kuang Haofei avatar RunsongZhu avatar  avatar tm avatar Robert H. Tang avatar 爱可可-爱生活 avatar small bird avatar Haiping Wang avatar Yi Xie avatar  avatar Simone Silenzi avatar Masoud Pourreza avatar FeiXue avatar Lihan Jiang avatar james.yookh avatar sumyyyyy avatar Huy Manh avatar  avatar

Watchers

Jianping.Li avatar Saimouli Katragadda avatar Xieyuanli Chen avatar Robert H. Tang avatar  avatar  avatar  avatar FangYuan avatar  avatar  avatar  avatar  avatar  avatar miaomiaomaioye avatar

mobile-seed's Issues

How to train Cityscapes dataset with Mobile-Seed?

I have the following questions about training:

  1. can I use tools/train.py for training?
  2. Where is the dataset path set?
  3. the overall framework of the network is in that file?
    Please help the author to answer this question, thank you very much!

Dataset format conversion

Hi author, I would like to ask if you know how to generate Cityscapes datasets that can be used for instance segmentation? After I do the format conversion using the script in the Cityscapes project, it doesn't differentiate the instances with different colors.
For example:
aachen_000000_000019_gtFine_instanceIds
@martin-liao

demoPreproc_gen_png_label.m

您好!我用mobile-seed的预训练进行推理,得到的效果非常好,所以我想自己去重新训练一遍模型,但对其中的demoPreproc_gen_png_label.m脚本使用方法有点困惑,希望能得到您们的解答和帮助,非常感谢!

About other fusion methods

Hello!
I have had the opportunity to read your paper, in which you wrote, "... Therefore, the fusion weights should be conditioned on the input. There are dynamic fusion methods [17], [19] that can adapt the weights for semantic edge detection and semantic segmentation tasks.However, calculating fusion weights in both spatial and channel dimensions is still too cumbersome for the lightweight framework." While [17] represents the paper "Dynamic Feature Fusion for Semantic Edge Detection" , I am wondering how this "adaptive weight learner" proposed in the DFF model could fuse semantic edge detection and semantic segmentation results. I am guessing that maybe by inserting edge detection layers into semantic segmentation layers and multiply this tensor by the weight came from the "adaptive weight learner" could it help to fuse these two tasks' outputs. But fusing different side outputs of edge detection is so different from fusing the results of edge and segmentation, could it still be helpful?
I would greatly appreciate your insights on this matter. By the way, the method proposed from your work is impressive, I am also curious about how this idea came up into your mind😊.

The bibound prediction picture shows blank

Hello Author! I was able to get very good results in semantic segmentation and edge detection tasks after training, but during prediction, the bibound graph shows up blank and there is no effect graph like in the demo. What is the reason for this? Thank you for your answer.

bibound graph:
demo_bibound_iter160000

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.