paulpanwang / pope Goto Github PK

Welcome to the project repository for POPE (Promptable Pose Estimation), a state-of-the-art technique for 6-DoF pose estimation of any object in any scene using a single reference.

Home Page: https://paulpanwang.github.io/POPE/

Shell 0.25% Python 96.54% JavaScript 0.59% TypeScript 2.17% HTML 0.08% SCSS 0.01% Dockerfile 0.36%

pose-estimation segment-anything dinov2 image-matching

pope's Introduction

👋 Panwang Pan: https://paulpanwang.github.io/

👨‍💻 I'm a Research and Development specialist at ByteDance Pico, with prior experience as a Senior Algorithm Engineer at Alibaba Cloud. We are seeking Research Intern/Full Time Employee in Beijing or Shanghai, aspiring to contribute to top-tier journals and conferences. Interested candidates passionate about my research domains are encouraged to reach out via email.

🌱 Research Focus
- 3D Object/Scene Generation
- 3D Human Generation
🎸 Resources Available
- Access to 200+ GPUs for training
- Expert mentorship and guidance
📫 How to reach out to me: [email protected]
⭐ My Github Profile: https://paulpanwang.github.io/

pope's People

Contributors

Stargazers

Watchers

Forkers

poposit breaktt lucasqaq prideoiltiao hongzhengdong hiyyg karltan0328 graboosky athinkingneal

pope's Issues

About using multiple images

In your work, it was mentioned that better results can be achieved by inputting multiple images. I would like to evaluate multiple images as input, how to proceed.

Thanks in advance for your help!

no module name einops

Hello, i followed your tutorial, but when i run python3 visual_sam.py, i got this error: Traceback (most recent call last):
File "/home/apicoo3569/POPE/visual_sam.py", line 1, in
from pope_model_api import *
File "/home/apicoo3569/POPE/pope_model_api.py", line 53, in
from src.matcher import Matcher, default_cfg
File "/home/apicoo3569/POPE/src/matcher/init.py", line 1, in
from .matcher import Matcher
File "/home/apicoo3569/POPE/src/matcher/matcher.py", line 3, in
from einops.einops import rearrange
ModuleNotFoundError: No module named 'einops'

please help me to fix. thank you

Explanation on the demo

Thanks for the contribution and releasing the code for this project, the work done is really interesting.

Regarding the visual_3dbbox.py demo, could you explain what are the prompt.txt and target.txt? I look forward to testing on other prompt and target images.

question about dinov2

Hello, I have a question about how to use DINOv2. Could you please help me?I instantiated a vit_small ViT model and tried to load the pretrained weights using the load_pretrained_weights function from utils. Here's the code I wrote:
self.vit_model = vits.dict'vit_small'
load_pretrained_weights(self.vit_model,'https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth', None)
However, I encountered the following error:
Traceback (most recent call last):
File "/data/PycharmProjects/train.py", line 124, in
model = model(aff_classes=args.num_classes)
File "/data/PycharmProjects/models/locate.py", line 89, in init
load_pretrained_weights(self.vit_model, pretrained_url, None)
File "/data/PycharmProjects/models/dinov2/dinov2/utils/utils.py", line 32, in load_pretrained_weights
msg = model.load_state_dict(state_dict, strict=False)
File "/home/ustc/anaconda3/envs/locate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DinoVisionTransformer:
size mismatch for pos_embed: copying a param with shape torch.Size([1, 1370, 384]) from checkpoint, the shape in current model is torch.Size([1, 257, 384]).

Could you please help me understand what might be causing this issue? Thank you for your assistance.

Do you plan to release the code?

I see that most of this repo so far does not have code to run POPE, do you have plans in the future to release.

About demo

Hello, I would like to run your code. A simple demo or sample script or inference on a complete data set(, etc.) is fine. Could you tell me the corresponding running command or method?

Thanks in advance for your help.

Estimate the 6DoF Object Pose

Hi,

Thanks for the nice work!

I noticed that only the relative 3D rotation accuracy is evaluated and reported in the paper, how about the relative 3D translation?
Is it possible to estimate the full 6DoF pose using POPE?

xFormers not available

Greetings,
it shows xFormers is not available, but I checked the pytorch index, there is no xformers for cuda 11.3, may I ask how did you install the xformers?

The index i checked and the warning are as follows:

https://download.pytorch.org/whl/cu113

Bounding Box Visualization

I saw that you mentioned visualization in the paper, "It is important to note that the visualization of object boxes incorporates ground-truth translation to address scale ambiguity."

Q. Does it also need the 3D size of the object to visualize like below? Or do you not need the 3D size of the matched CAD Model to estimate the bounding box with the pose?

POPE/visual_3dbbox.py

Lines 31 to 41 in 92c5cdb

 x, y , z = 3.793429999999999719e-02, 3.879959999999999659e-02 ,4.588450000000000167e-02 

 _3d_bbox = np.array([ 

 [-x, -y , -z], 

 [-x, -y , z], 

 [-x, y , z], 

 [-x, y , -z], 

 [x, -y , -z], 

 [x, -y , z], 

 [x, y , z], 

 [x, y , -z], 

 ])

The OnePose dataset download link is invalid

Runnable Dockerfile

Hi, I have build a dockerfile that can pass current codebase test command, should I just submit a pull request?

P.S.

the codebase is runnable but still some problems remain like the unavailable xFormers

The generation of data pairs

Thank you for making the awesome project open-source!

I've noticed that the data pairs are pre-defined in the JSON files (like LMO). So how do you generate these data pairs, by random or according to some principles?

Looking forward to your reply.

How to calculate K0

Hello, I am currently trying to use your algorithm. I was wondering how you calculated K0? This is the intrinsic matrix for the reference image. The reference image in turn is a section of a scene image, right?

CrossAtten(default_cfg['coarse'], token_dim, ['cross']*2 ) TypeError: init() takes 2 positional arguments but 4 were given

Namespace(w_tr=10.0, w_rot=10.0, warmup=10000, batch=32, steps=120000, lr=0.003, clip=2.5, weight_decay=1e-05, num_workers=4, no_ddp=True, gpus=4, ckpt='', name='bla', exp=None, use_mini_dataset=False, dataset='objverse', no_pos_encoding=False, noess=False, cross_features=False, use_single_softmax=False, l1_pos_encoding=False)
xFormers not available
xFormers not available
Traceback (most recent call last):
File "/data/users/liming/CV/POPE/train_dinov2_pose.py", line 243, in
train(args.gpus, args)
File "/data/users/liming/CV/POPE/train_dinov2_pose.py", line 50, in train
model = DINOv2Poser(default_cfg)
File "/data/users/liming/CV/POPE/models/dinov2_regression_modelv3.py", line 105, in init
self.cross_attentionAll = CrossAtten(default_cfg['coarse'], token_dim, ['cross']*2 )
TypeError: init() takes 2 positional arguments but 4 were given

Relative pose vs Actual pose

Hi,

I am new to pose estimation, so this question might be stupid. But I wanted to ask that your method POPE gives relative position and methods like OnePose provide the actual pose, is that correct?
I would greatly appreciate your help!

	x, y , z = 3.793429999999999719e-02, 3.879959999999999659e-02 ,4.588450000000000167e-02
	_3d_bbox = np.array([
	[-x, -y , -z],
	[-x, -y , z],
	[-x, y , z],
	[-x, y , -z],
	[x, -y , -z],
	[x, -y , z],
	[x, y , z],
	[x, y , -z],
	])

paulpanwang / pope Goto Github PK

pope's Introduction

👋 Panwang Pan: https://paulpanwang.github.io/

pope's People

Contributors

Stargazers

Watchers

Forkers

pope's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs