GithubHelp home page GithubHelp logo

Comments (8)

cansik avatar cansik commented on June 11, 2024 1

Ok, I will have a look if the openvino runtime has the same issues (should be able to drop the onnx directly into it), maybe it's working with an alternative backend. Anyway, already thanks a lot for the model conversion and example script you've provided!

It would be really helpful to be able to be able to run bodypix on various machines, since it's one of the only pretrained bodypart-segmentation models. Using the python-tf-bodypix version is quite difficult to install lately, that's why I wanted to make a clean rewrite based on onnx or openvino.

from pinto_model_zoo.

PINTO0309 avatar PINTO0309 commented on June 11, 2024 1

That is a great initiative. I've done some miscellaneous ONNX conversions as a hobby and will implement a way to eliminate the above error when I get around to it. It's not very difficult.

from pinto_model_zoo.

cansik avatar cansik commented on June 11, 2024 1

I was able to add openvino as additional runtime. The output seems correct as far as I can tell (somehow weird translated, but I experienced that with DirectML as well). It's of course not as fast, but at least it's a solution which runs on any OS (on CPU).

Would you be open for a PR with the openvino runtime and some additional cleanups of the demo script?

Screenshot 2024-01-26 at 12 20 59

from pinto_model_zoo.

cansik avatar cansik commented on June 11, 2024 1

Affine Transformation and Resize to Original Size

Ok, the problem of the translation was because I did not resize the output maps to the original image size before applying the affine transform. It seems to work now, but I am not sure if it's the correct way. For poses that are detected further away from the center, I am still getting masks that do not match the original image (are a bit shifted to the left or right):

image

I am applying the affine transformation (# Fine-tune position of mask image) and then resize the output to the original image size. Am I missing something?

I've already added padding for the input image, but it the offset is still visible.

Unique Keypoints

I also noticed, that sometimes too many keypoints are returned from extract_max_score_points_unique. I've added the following line of code to only extract the unique indices to always be able to create a valid pose.

# only get unique values
unique_first_values, unique_indices = np.unique(keypoints_classidscorexy[:, 0], return_index=True)
keypoints_classidscorexy = keypoints_classidscorexy[unique_indices]

Thresholds / Constants

I am just thinking about if it would make sense to not fix thresholds inside the graph, but expose them as inputs:

image

Of course it would be possible to use onnx and get the specific node and adjust the value by code, but wouldn't an input make more sense?

Part Color Overlapping

Maybe related to the threshold thing, but it seems that the colored part map overlaps at the edge and creates a rainbow of parts. Do you have an idea why this is happening?

Screenshot 2024-01-26 at 16 41 54

from pinto_model_zoo.

PINTO0309 avatar PINTO0309 commented on June 11, 2024

I know; it doesn't work on Linux as well as Mac.

from pinto_model_zoo.

cansik avatar cansik commented on June 11, 2024

It doesn't seem to be a problem with the os, but with the execution provider. I tired it on windows with CPU and the same error happened. With DirectML or CUDA, it isn't a problem.

from pinto_model_zoo.

PINTO0309 avatar PINTO0309 commented on June 11, 2024

You're right. I have already confirmed that in advance as well. So far, except for the TensorRT Provider, the runtime seems to be buggy.

from pinto_model_zoo.

PINTO0309 avatar PINTO0309 commented on June 11, 2024

I am applying the affine transformation (# Fine-tune position of mask image) and then resize the output to the original image size. Am I missing something?

I have made significant changes to the processing flow to keep processing to only the minimum necessary to optimize the model. The meaning of this optimization is that all computational graphs that would not have a fixed model shape were recalculated to have a fixed shape. (This is my own tuning technique and may be difficult to understand.)

I also noticed, that sometimes too many keypoints are returned from extract_max_score_points_unique. I've added the following line of code to only extract the unique indices to always be able to create a valid pose.

I don't think you are wrong. I think my implementation is pretty messy because I was processing the model while optimizing the model and testing single human inference, while also thinking of ideas to efficiently perform multiple person detection. I agree with your implementation.

Of course it would be possible to use onnx and get the specific node and adjust the value by code, but wouldn't an input make more sense?

Yes, it is. That's the part I was quite torn about what to do, too. Many of the models I have committed to are very rarely used by good engineers like yourself. Thus, it seems more likely that people are looking for a demonstration that works quickly and without thinking. This was the only material for the final decision. In fact, all of the models I use in my research are processed into models where thresholds can be entered externally.

Maybe related to the threshold thing, but it seems that the colored part map overlaps at the edge and creates a rainbow of parts. Do you have an idea why this is happening?

I think that the Resize (Bilinear) and Sigmoid near the last layer and the part of the mask generation that forces the division by 255 and the resultant 0 or 1 to be processed as a boolean may be the cause of such an error. I haven't looked into it in much detail, but I can't deny that there are such discrepancies, especially since the post-processing part is implemented in a very forced manner. 16 strides, so a very small vertical and horizontal ROI is linearly stretched by a factor of 16. The specification is to be compensated by offset, but it does not seem to be working. This offset was the output of the model in Google's design, but in my design it is calculated and embedded in the model.

Both Google's official implementation five years ago and the tf-bodypix repository I cited were originally where all post-processing was handled programmatically. However, those post-processing was designed in such a way that GPUs and accelerators could not be used effectively. In other words, the post-processing is designed to be as efficient as possible in hardware in exchange for allowing a small amount of accuracy degradation.

from pinto_model_zoo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.