Hi, Could you please tell how you handled different image sizes as i

Handling different size inputs during training about triq HOT 3 CLOSED

junyongyou commented on June 14, 2024

Handling different size inputs during training

from triq.

Comments (3)

junyongyou commented on June 14, 2024

Hi, there are two issues you need to think about. 1. Image generator in a batch. I personally think padding should be avoided as it can either change image quality or change convolution results. I grouped images based on their resolutions, such that the images served in each batch have same resolution. 2. Basically TRIQ can handle arbitrary resolutions. However, then I developed the first version of TRIQ (i.e., the repo now), I simply used the largest resolution in the image set. Therefore, you can define the maximum_position_encoding (line 127 in transformer_iqa.py) according to your images. This value should be set to HW/(3232) + 1, and H,W are the largest resolution of your images. However, I have also improved TRIQ and got comparable performance, in which I used a spatial pooling method. I will release it later. For now, if you want to test TRIQ, you can just set maximum_position_encoding.

from triq.

arp95 commented on June 14, 2024

The grouping solution sounded the best to me. But the problem with our dataset was that we don't have equal distribution of classes for the images of every possible resolution. Meaning, for image size of (2000x2000) we might have only three of the four classes. This is why I couldn't go ahead with the grouping approach.
The best solution ahead was using padding only during training phase which would give a fixed size feature map on top of which transformer could be used. What do you think about this?

from triq.

junyongyou commented on June 14, 2024

The grouping solution sounded the best to me. But the problem with our dataset was that we don't have equal distribution of classes for the images of every possible resolution. Meaning, for image size of (2000x2000) we might have only three of the four classes. This is why I couldn't go ahead with the grouping approach.
The best solution ahead was using padding only during training phase which would give a fixed size feature map on top of which transformer could be used. What do you think about this?

Hi, the way I handle the situation is that I carefully split train_val_test sets, and then use augmentation (in my case I only use horizontal flip) to make sure the images in each batch have same resolution. I first group images in terms of their resolution. Then in each batch (probably cannot use a large batch size), I just serve the image with same resolutions. If the number of images with same resolution is less than batch size, then I use both duplication of the images and their horizontally flipped images to fill. I personally don't think padding is a good solution, as it potentially changes image quality and definitely changes the convolutional results.

from triq.

Handling different size inputs during training about triq HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs