GithubHelp home page GithubHelp logo

paulpanwang / pope Goto Github PK

View Code? Open in Web Editor NEW
126.0 10.0 11.0 214.75 MB

Welcome to the project repository for POPE (Promptable Pose Estimation), a state-of-the-art technique for 6-DoF pose estimation of any object in any scene using a single reference.

Home Page: https://paulpanwang.github.io/POPE/

Shell 0.25% Python 96.54% JavaScript 0.59% TypeScript 2.17% HTML 0.08% SCSS 0.01% Dockerfile 0.36%
pose-estimation segment-anything dinov2 image-matching

pope's Introduction

๐Ÿ‘จโ€๐Ÿ’ป I'm a Research and Development specialist at ByteDance Pico, with prior experience as a Senior Algorithm Engineer at Alibaba Cloud. We are seeking Research Intern/Full Time Employee in Beijing or Shanghai, aspiring to contribute to top-tier journals and conferences. Interested candidates passionate about my research domains are encouraged to reach out via email.

  • ๐ŸŒฑ Research Focus

    • 3D Object/Scene Generation
    • 3D Human Generation
  • ๐ŸŽธ Resources Available

    • Access to 200+ GPUs for training
    • Expert mentorship and guidance
  • ๐Ÿ“ซ How to reach out to me: [email protected]

  • โญ My Github Profile: https://paulpanwang.github.io/

pope's People

Contributors

athinkingneal avatar ir1d avatar paulpanwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pope's Issues

About using multiple images

In your work, it was mentioned that better results can be achieved by inputting multiple images. I would like to evaluate multiple images as input, how to proceed.

Thanks in advance for your help!

no module name einops

Hello, i followed your tutorial, but when i run python3 visual_sam.py, i got this error: Traceback (most recent call last):
File "/home/apicoo3569/POPE/visual_sam.py", line 1, in
from pope_model_api import *
File "/home/apicoo3569/POPE/pope_model_api.py", line 53, in
from src.matcher import Matcher, default_cfg
File "/home/apicoo3569/POPE/src/matcher/init.py", line 1, in
from .matcher import Matcher
File "/home/apicoo3569/POPE/src/matcher/matcher.py", line 3, in
from einops.einops import rearrange
ModuleNotFoundError: No module named 'einops'

please help me to fix. thank you

Explanation on the demo

Thanks for the contribution and releasing the code for this project, the work done is really interesting.

Regarding the visual_3dbbox.py demo, could you explain what are the prompt.txt and target.txt? I look forward to testing on other prompt and target images.

question about dinov2

Hello, I have a question about how to use DINOv2. Could you please help me?I instantiated a vit_small ViT model and tried to load the pretrained weights using the load_pretrained_weights function from utils. Here's the code I wrote:
self.vit_model = vits.dict'vit_small'
load_pretrained_weights(self.vit_model,'https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth', None)
However, I encountered the following error:
Traceback (most recent call last):
File "/data/PycharmProjects/train.py", line 124, in
model = model(aff_classes=args.num_classes)
File "/data/PycharmProjects/models/locate.py", line 89, in init
load_pretrained_weights(self.vit_model, pretrained_url, None)
File "/data/PycharmProjects/models/dinov2/dinov2/utils/utils.py", line 32, in load_pretrained_weights
msg = model.load_state_dict(state_dict, strict=False)
File "/home/ustc/anaconda3/envs/locate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DinoVisionTransformer:
size mismatch for pos_embed: copying a param with shape torch.Size([1, 1370, 384]) from checkpoint, the shape in current model is torch.Size([1, 257, 384]).

Could you please help me understand what might be causing this issue? Thank you for your assistance.

About demo

Hello, I would like to run your code. A simple demo or sample script or inference on a complete data set(, etc.) is fine. Could you tell me the corresponding running command or method?

Thanks in advance for your help.

Estimate the 6DoF Object Pose

Hi,

Thanks for the nice work!

I noticed that only the relative 3D rotation accuracy is evaluated and reported in the paper, how about the relative 3D translation?
Is it possible to estimate the full 6DoF pose using POPE?

Bounding Box Visualization

I saw that you mentioned visualization in the paper, "It is important to note that the visualization of object boxes incorporates ground-truth translation to address scale ambiguity."

Q. Does it also need the 3D size of the object to visualize like below? Or do you not need the 3D size of the matched CAD Model to estimate the bounding box with the pose?

POPE/visual_3dbbox.py

Lines 31 to 41 in 92c5cdb

x, y , z = 3.793429999999999719e-02, 3.879959999999999659e-02 ,4.588450000000000167e-02
_3d_bbox = np.array([
[-x, -y , -z],
[-x, -y , z],
[-x, y , z],
[-x, y , -z],
[x, -y , -z],
[x, -y , z],
[x, y , z],
[x, y , -z],
])

Runnable Dockerfile

Hi, I have build a dockerfile that can pass current codebase test command, should I just submit a pull request?

P.S.

the codebase is runnable but still some problems remain like the unavailable xFormers

The generation of data pairs

Thank you for making the awesome project open-source!

I've noticed that the data pairs are pre-defined in the JSON files (like LMO). So how do you generate these data pairs, by random or according to some principles?

Looking forward to your reply.

How to calculate K0

Hello, I am currently trying to use your algorithm. I was wondering how you calculated K0? This is the intrinsic matrix for the reference image. The reference image in turn is a section of a scene image, right?

CrossAtten(default_cfg['coarse'], token_dim, ['cross']*2 ) TypeError: __init__() takes 2 positional arguments but 4 were given

Namespace(w_tr=10.0, w_rot=10.0, warmup=10000, batch=32, steps=120000, lr=0.003, clip=2.5, weight_decay=1e-05, num_workers=4, no_ddp=True, gpus=4, ckpt='', name='bla', exp=None, use_mini_dataset=False, dataset='objverse', no_pos_encoding=False, noess=False, cross_features=False, use_single_softmax=False, l1_pos_encoding=False)
xFormers not available
xFormers not available
Traceback (most recent call last):
File "/data/users/liming/CV/POPE/train_dinov2_pose.py", line 243, in
train(args.gpus, args)
File "/data/users/liming/CV/POPE/train_dinov2_pose.py", line 50, in train
model = DINOv2Poser(default_cfg)
File "/data/users/liming/CV/POPE/models/dinov2_regression_modelv3.py", line 105, in init
self.cross_attentionAll = CrossAtten(default_cfg['coarse'], token_dim, ['cross']*2 )
TypeError: init() takes 2 positional arguments but 4 were given

Relative pose vs Actual pose

Hi,

I am new to pose estimation, so this question might be stupid. But I wanted to ask that your method POPE gives relative position and methods like OnePose provide the actual pose, is that correct?
I would greatly appreciate your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.