GithubHelp home page GithubHelp logo

Comments (3)

junyongyou avatar junyongyou commented on June 14, 2024

Hi, there are two issues you need to think about. 1. Image generator in a batch. I personally think padding should be avoided as it can either change image quality or change convolution results. I grouped images based on their resolutions, such that the images served in each batch have same resolution. 2. Basically TRIQ can handle arbitrary resolutions. However, then I developed the first version of TRIQ (i.e., the repo now), I simply used the largest resolution in the image set. Therefore, you can define the maximum_position_encoding (line 127 in transformer_iqa.py) according to your images. This value should be set to HW/(3232) + 1, and H,W are the largest resolution of your images. However, I have also improved TRIQ and got comparable performance, in which I used a spatial pooling method. I will release it later. For now, if you want to test TRIQ, you can just set maximum_position_encoding.

from triq.

arp95 avatar arp95 commented on June 14, 2024

The grouping solution sounded the best to me. But the problem with our dataset was that we don't have equal distribution of classes for the images of every possible resolution. Meaning, for image size of (2000x2000) we might have only three of the four classes. This is why I couldn't go ahead with the grouping approach.
The best solution ahead was using padding only during training phase which would give a fixed size feature map on top of which transformer could be used. What do you think about this?

from triq.

junyongyou avatar junyongyou commented on June 14, 2024

The grouping solution sounded the best to me. But the problem with our dataset was that we don't have equal distribution of classes for the images of every possible resolution. Meaning, for image size of (2000x2000) we might have only three of the four classes. This is why I couldn't go ahead with the grouping approach.
The best solution ahead was using padding only during training phase which would give a fixed size feature map on top of which transformer could be used. What do you think about this?

Hi, the way I handle the situation is that I carefully split train_val_test sets, and then use augmentation (in my case I only use horizontal flip) to make sure the images in each batch have same resolution. I first group images in terms of their resolution. Then in each batch (probably cannot use a large batch size), I just serve the image with same resolutions. If the number of images with same resolution is less than batch size, then I use both duplication of the images and their horizontally flipped images to fill. I personally don't think padding is a good solution, as it potentially changes image quality and definitely changes the convolutional results.

from triq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.