Zero-Shot Pose Estimation and Tracking of Autonomous Mobile Robots - A Multi-Camera Multi-Robot Perception Framework

This repo contains code, data and instructions to replicate the paper - 'Zero-Shot Pose Estimation and Tracking of Autonomous Mobile Robots - A Multi-Camera Multi-Robot Perception Framework'. This work is divided into 5 phases - zero-shot robot pose estimation, pose uncertainty estimation, camera placement optimziation, data association for multi-camera multi-robot navigation and sensor fusion. Below is the overall framework proposed. We also provide instructions for model-based instance-level training for pose estimation to compare it wiith the proposed zero-shot pipeline.

Zero-Shot Pose Estimation and Uncertainty Quantification

Instructions for stage-wise execution of the pose pipeline -

Step 0 - Follow this notebook for segmenting out the target robot from the query image using a text prompt.

Step 1 - Follow the instructions provided in this notebook to generate segmentation masks and associated centroids for the target robot. Save the mask and centroid locations in a text file.

Step 2 - Run ALIKE to obtain 2D keypoints (Note - We also re-train ALIKE and fine tune the model with custom data. Custom weights will be made available soon!)

git clone https://github.com/Shiaoming/ALIKE
cd ALIKE
pip install -r requirements.txt
python demo.py -h
usage: demo.py [-h] [--model {alike-t,alike-s,alike-n,alike-l}]
               [--device DEVICE] [--top_k TOP_K] [--scores_th SCORES_TH]
               [--n_limit N_LIMIT] [--no_display] [--no_sub_pixel]
               input

positional arguments:
  input                 Image directory or movie file or "camera0" (for
                        webcam0).

optional arguments:
  -h, --help            show this help message and exit
  --model {alike-t,alike-s,alike-n,alike-l}
                        The model configuration
  --device DEVICE       Running device (default: cuda).
  --top_k TOP_K         Detect top K keypoints. -1 for threshold based mode,
                        >0 for top K mode. (default: -1)
  --scores_th SCORES_TH
                        Detector score threshold (default: 0.2).
  --n_limit N_LIMIT     Maximum number of keypoints to be detected (default:
                        5000).
  --no_display          Do not display images to screen. Useful if running
                        remotely (default: False).
  --no_sub_pixel        Do not detect sub-pixel keypoints (default: False).

Step 3 - Run keypointness.py to filter 2D keypoints based on confidence scores and assign semantic labels.

Step 4 - Run ZoeDepth using the below instructions to obtain a depth map for the query image. This depth map is used to create an initial point cloud of the target robot. The correponding 3D keypoint locations can also be computed using the pinhole camera model.

It is recommended to fetch the latest MiDaS repo via torch hub.

import torch
torch.hub.help("intel-isl/MiDaS", "DPT_BEiT_L_384", force_reload=True)  # Triggers fresh download of MiDaS repo

Use ZoeDepth to predict depth in the query image

##### sample prediction
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
zoe = model_zoe_n.to(DEVICE)

# Local file
from PIL import Image
image = Image.open("/path/to/image.jpg").convert("RGB")  # load
depth_numpy = zoe.infer_pil(image)  # as numpy

depth_pil = zoe.infer_pil(image, output_type="pil")  # as 16-bit PIL Image

depth_tensor = zoe.infer_pil(image, output_type="tensor")  # as torch tensor

# Tensor 
from zoedepth.utils.misc import pil_to_batched_tensor
X = pil_to_batched_tensor(image).to(DEVICE)
depth_tensor = zoe.infer(X)

# From URL
from zoedepth.utils.misc import get_image_from_url

# Example URL
URL = "path_to_image"

image = get_image_from_url(URL)  # fetch
depth = zoe.infer_pil(image)

# Save raw
from zoedepth.utils.misc import save_raw_16bit
fpath = "/path/to/output.png"
save_raw_16bit(depth, fpath)

# Colorize output
from zoedepth.utils.misc import colorize

colored = colorize(depth)

# save colored output
fpath_colored = "/path/to/output_colored.png"
Image.fromarray(colored).save(fpath_colored)

Generate a point cloud using the above depth map following these instructions. This serves as the initial pointcloud.

Step 5 - Follow these instructions and install colmap to build a 3D SfM representation of the target object. This serves as the final accurate point cloud.

Step 6 - Run icp.py to align both the pointclouds and obtain a transformation matrix T. Use T to refine the locations of the 3D keypoints and get more accurate estimates.

Step 7 - Use cv2.solvePnP to estimate the robot's 6D pose. Refer here for more details.

Step 8 - Run zero_uncertain.py to estimate the covariance associated with the above pipeline.

We are working on providing docker capabilities for user-friendly usage. Stay tuned!

Instance Level Robot Pose Estimation (Optional)

Download the pre-trained weights using this link. If you wish to retrain the model, download the dataset from here: Volta Pose. We fine-tune the YOLOv8n-pose model for keypoint detection. Refer to this link for more details on the model architecture.

Model Training

from ultralytics import YOLO
# Load a model
model = YOLO('yolov8n-pose.pt')  # load a pretrained model
# Train the model
results = model.train(data='volta_pose.yaml', epochs=200, imgsz=640)

Place the trained model in the same directory as markerless_cam_pose.py and execute the script. The pose of the robot is estimated with respect to the camera's global origin.

Instance Level Pose Uncertainty Estimation (Optional)

Install ultralytics and replace the default predict.py file with nms_predict.py provided in this repository. The model is expected to return 300 bounding boxes during inference. NOTE - You may have to rebuild the script for changes to take place.
Run the uncertainty_computation.py script to estimate epistemic model uncertainty and associated covariance. (refer to the paper for the exact method).
Sample covariance data is provided in cov_data.txt.

Camera Placement Optimization

Replace the path to your map file in the cam_placement_optimizer script.
Tune the hyperparameters such as covariance_threshold, d_max and grid size based on the nature of the operating environment.
Run the cam_placement_optimizer script, the corresponding camera locations are printed on the map.
Using the homography of the camera, project the camera positions in the image plane to the world plane. NOTE: This algorithm only determines the x and y locations of the camera. The cameras are assumed to be fixed at a height d = 750 cm.

Multi-Robot Tracking and Data Association

Execute the track_volta_annotated.py script. This is used for tracking one or more robots within the field of view of the camera. The corresponding track locations/pose of the robot are printed on every frame.
The association.py script is used for associating multiple sensor measurements with respective robot tracks. It internally runs a Hungarian algorithm which further involves minimization of a cost function leading to the assignments of measurements to corresponding tracks. The cost function depends on the norm of the covariance associated with the camera. Both cases of overlapping views and non-overlapping views are handled separately. Refer to the paper for more details.

Sensor Fusion

In this work, we consider RTAB Map as the base RGBD SLAM framework. We use the robot_localization package and implement an extended Kalman filter to communicate with the RTAB map node.
We consider 5 different sensors - onboard LiDAR, IMU, wheel encoders, onboard depth camera and external monocular camera.
Install RTAB Map using the instructions provided in the RTABMap Repo and launch RTAB Map with the custom launch file camera_sensor_fusion.launch provided in this repo. NOTE - The initial covariance and the noise covariance matrices might need some tuning based on the experimental setup and operating conditions.

We extend our thanks to the many wonderful works that were used in this project -

Additional details can be found here.

Recommended citation

@inproceedings{raghavan2024multiviewperception,
  author = {Dharini Raghavan, Raghu Krishnapuram and Bharadwaj Amrutur},
  pages = {1--8},
  title = {{Zero-Shot Pose Estimation and Tracking of Autonomous Mobile Robots - A Multi-Camera Multi-Robot Perception Framework}},
  year = {2024}
}

rdharini2001 / multi-view-perception Goto Github PK

multi-view-perception's Introduction

Zero-Shot Pose Estimation and Tracking of Autonomous Mobile Robots - A Multi-Camera Multi-Robot Perception Framework

Zero-Shot Pose Estimation and Uncertainty Quantification

Instance Level Robot Pose Estimation (Optional)

Model Training

Instance Level Pose Uncertainty Estimation (Optional)

Camera Placement Optimization

Multi-Robot Tracking and Data Association

Sensor Fusion

Recommended citation

multi-view-perception's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs