junyongyou / triq Goto Github PK

View Code? Open in Web Editor NEW

133.0 3.0 23.0 6.53 MB

TRIQ implementation

License: MIT License

Python 3.37% Jupyter Notebook 96.63%

image-quality-assessment

triq's Introduction

TRIQ Implementation

TF-Keras implementation of TRIQ as described in Transformer for Image Quality Assessment.

Installation

Clone this repository.
Install required Python packages. The code is developed by PyCharm in Python 3.7. The requirements.txt document is generated by PyCharm, and the code should also be run in latest versions of the packages.

Training a model

An example of training TRIQ can be seen in train/train_triq.py. Argparser should be used, but the authors prefer to use dictionary with parameters being defined. It is easy to convert to take arguments. In principle, the following parameters can be defined:

args = {}
args['multi_gpu'] = 0 # gpu setting, set to 1 for using multiple GPUs
args['gpu'] = 0  # If having multiple GPUs, specify which GPU to use

args['result_folder'] = r'..\databases\experiments' # Define result path
args['n_quality_levels'] = 5  # Choose between 1 (MOS prediction) and 5 (distribution prediction)

args['transformer_params'] = [2, 32, 8, 64]

args['train_folders'] =  # Define folders containing training images
    [
    r'..\databases\train\koniq_normal',
    r'..\databases\train\koniq_small',
    r'..\databases\train\live'
    ]
args['val_folders'] =  # Define folders containing testing images
    [
    r'..\databases\val\koniq_normal',
    r'..\databases\val\koniq_small',
    r'..\databases\val\live'
    ]
args['koniq_mos_file'] = r'..\databases\koniq10k_images_scores.csv'  # MOS (distribution of scores) file for KonIQ database
args['live_mos_file'] = r'..\databases\live_mos.csv'   # MOS (standard distribution of scores) file for LIVE-wild database

args['backbone'] = 'resnet50' # Choose from ['resnet50', 'vgg16']
args['weights'] = r'...\pretrained_weights\resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'  # Define the path of ImageNet pretrained weights
args['initial_epoch'] = 0  # Define initial epoch for use in fine-tune

args['lr_base'] = 1e-4 / 2  # Define the back learning rate in warmup and rate decay approach
args['lr_schedule'] = True  # Choose between True and False, indicating if learning rate schedule should be used or not
args['batch_size'] = 32  # Batch size, should choose to fit in the GPU memory
args['epochs'] = 120  # Maximal epoch number, can set early stop in the callback or not

args['image_aug'] = True # Choose between True and False, indicating if image augmentation should be used or not

Predict image quality using the trained model

After TRIQ has been trained, and the weights have been stored in h5 file, it can be used to predict image quality with arbitrary sizes,

    args = {}
    args['n_quality_levels'] = 5
    args['backbone'] = 'resnet50'
    args['weights'] = r'..\\TRIQ.h5'
    model = create_triq_model(n_quality_levels=args['n_quality_levels'],
                              backbone=args['backbone'],])
    model.load_weights(args['weights'])

And then use ModelEvaluation to predict quality of image set.

In the "examples" folder, an example script examples\image_quality_prediction.py is provided to use the trained weights to predict quality of example images. In the "train" folder, an example script train\validation.py is provided to use the trained weights to predict quality of images in folders.

A potential issue is image shape mismatch. For example, if an image is too large, then line 146 in transformer_iqa.py should be changed to increase the pooling size. For example, it can be changed to self.pooling_small = MaxPool2D(pool_size=(4, 4)) or even larger.

Prepare datasets for model training

This work uses two publicly available databases: KonIQ-10k KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment by V. Hosu, H. Lin, T. Sziranyi, and D. Saupe; and LIVE-wild Massive online crowdsourced study of subjective and objective picture quality by D. Ghadiyaram, and A.C. Bovik

The two databases were merged, and then split to training and testing sets. Please see README in databases for details.

Make MOS files (note: do NOT include head line):

For database with score distribution available, the MOS file is like this (koniq format):

    image path, voter number of quality scale 1, voter number of quality scale 2, voter number of quality scale 3, voter number of quality scale 4, voter number of quality scale 5, MOS or Z-score
    10004473376.jpg,0,0,25,73,7,3.828571429
    10007357496.jpg,0,3,45,47,1,3.479166667
    10007903636.jpg,1,0,20,73,2,3.78125
    10009096245.jpg,0,0,21,75,13,3.926605505

For database with standard deviation available, the MOS file is like this (live format):

    image path, standard deviation, MOS or Z-score
    t1.bmp,18.3762,63.9634
    t2.bmp,13.6514,25.3353
    t3.bmp,18.9246,48.9366
    t4.bmp,18.2414,35.8863

The format of MOS file ('koniq' or 'live') and the format of MOS or Z-score ('mos' or 'z_score') should also be specified in misc/imageset_handler/get_image_scores.

In the train script in train/train_triq.py the folders containing training and testing images are provided.
Pretrained ImageNet weights can be downloaded (see README in.\pretrained_weights) and pointed to in the train script.

Trained TRIQ weights

TRIQ has been trained on KonIQ-10k and LIVE-wild databases, and the weights file can be downloaded here.

State-of-the-art models

Other three models are also included in the work. The original implementations of metrics are employed, and they can be found below.

Koncept512 KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment

SGDNet SGDNet: An end-to-end saliency-guided deep neural network for no-reference image quality assessment

CaHDC End-to-end blind image quality prediction with cascaded deep neural network

Comparison results

We have conducted several experiments to evaluate the performance of TRIQ, please see results.pdf for detailed results.

Error report

In case errors/exceptions are encountered, please first check all the paths. After fixing the path isse, please report any errors in Issues.

FAQ

To be added

ViT (Vision Transformer) for IQA

This work is heavily inspired by ViT An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. The module vit_iqa contains implementation of ViT for IQA, and mainly followed the implementation of ViT-PyTorch. Pretrained ViT weights can be downloaded here.

triq's People

Contributors

Stargazers

Watchers

triq's Issues

Combined database normalisation

How was the combined database normalisation in the script group_generator.py (starting from 64 line) done?

Accuracy and loss function visualisation

Hi! How was the training accuracy and loss function visualised? I tried using TensorBoard, but it only shows "epoch_loss".

Does transformer really help?

Hi @junyongyou, I noticed that your triq model has a total of 23M parameters, most of which are from ResNet50. In this sense, Transformer layers are just like an FC head. The transformer layers you used (with parameters (2, 32, 8, 64)) even have fewer parameters than the projection head used in Koncept512.

So I am wondering how much does transformers indeed help over using an FC head? Did you have the standard train-test results on CLIVE and Koniq datasets such that I can easily compare with other SoTAs? Thank you very much.

Training

hello，I want to rapeat your work and rewrite it by ｐｙtorch.　can you tell me more about the detail about training，＂A base learning rate 5e-5 was used for pretraining＂you mean pretrain in the same dataset(Koniq-10k and livec)?

Same output for every input image

def create_triq_model(n_quality_levels,
                      input_shape=(None, None, 3),
                      backbone='resnet50',
                      transformer_params=(2, 32, 8, 64),
                      maximum_position_encoding=193,
                      vis=False):
    chanDim = -1
    # define the model input
    inputs = Input(shape=input_shape)
    filters = (32, 64, 128)
    # loop over the number of filters
    for (i, f) in enumerate(filters):
        # if this is the first CONV layer then set the input
        # appropriately
        if i == 0:
            x = Rescaling(1./255)(inputs)

        # CONV => RELU => BN => POOL
        x = Conv2D(f, (3, 3), padding="same")(x)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
    
    x = Conv2D(256, (3, 3), padding="same")(x)
    x = Activation("relu")(x)
    x = BatchNormalization(axis=chanDim)(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    
    x = ZeroPadding2D(padding=(1, 1))(x)
    x = Conv2D(2048, (3, 3), padding="same")(x)
    x = Activation("relu")(x)
    x = BatchNormalization(axis=chanDim)(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    dropout_rate = 0.1
    
    transformer = TriQImageQualityTransformer(
        num_layers=transformer_params[0],
        d_model=transformer_params[1],
        num_heads=transformer_params[2],
        mlp_dim=transformer_params[3],
        dropout=dropout_rate,
        n_quality_levels=n_quality_levels,
        maximum_position_encoding=maximum_position_encoding,
        vis=vis
    )
    outputs = transformer(x)
  
    model = Model(inputs=inputs, outputs=outputs)
    model.summary()
    return model

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
input_shape = (564, 504, 3)
#model = create_triq_model(n_quality_levels=5, input_shape=input_shape, backbone='vgg16')
model = create_triq_model(n_quality_levels=1, input_shape=input_shape, backbone='resnet50')

from tensorflow.keras.optimizers import Adam
opt = Adam(learning_rate=0.001, decay=1e-3 / 200)
model.compile(loss="mean_squared_error", optimizer=opt)
model.fit(trainImagesX, trainY, validation_data=(valImagesX, valY),
          epochs=108, batch_size=16)

In the above code, I have modified the create_triq_model function in such a way that it uses a custom CNN model instead of the RSNET or VGGNet. The custom CNN model is such that its output shape is (18, 16, 2048). This output is fed to TriqImageQualityTransformer.

The issue is that after training the model predicts the same value for every input. I have experimented with various hyperparameters. It might output different values for different hyperparameter settings but for a particular setting, for every image as input, it outputs the same output. One more thing to note is that if I do not use a transformer but instead use an Artificial Neural Network, then the network trains well.

Ca you please suggest what am I doing wrong here?

About the tensorflow version

Hi,

The requirement.txt file said that the TensorFlow version used in this project is 2.2.0. However, when I tried to run the train_triq.py file, the error happened, which said that "Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.3.0 and strictly below 2.5.0 (nightly versions are not supported)". It seems like the function tensorflow_addons.activations.gelu does not support the TensorFlow 2.2.0.

I'm not familiar with TensorFlow. Therefore, I want to check the TensorFlow version and discuss why this error happened.

Could you please provide me a copy of the CSIQ dataset?

it's hard to find in the official path,thanks a lot

不能运行 image_quality_prediction.py

当我运行image_quality_prediction.py 报You are trying to load a weight file containing 13 layers into a model with 14 layers.错误，能否帮忙看看哪里出问题了？

About dataset

Hello, I have a question, the data shape of koniq-10k dataset is not consistent. Some is (224,224), otherwise some is(224,224,3)。but I do not find the process about the difference. Can you tell me more about the detail? thanks a lot.

The test set

Hi, I would like to ask what is the size of the KonIQ test set when it comes to testing? Also, when using the LIVE test set, its quality score is 0-100, however the prediction is 0-5, how do I calculate SROCC and PLCC

About koniq-10k dataset

hello，I can't open this website http://database.mmsp-kn.de/koniq-10k-database.html to download koniq-10k dataset, do you have any method to download the data set, thanks.

OOM

hello, i have a question. I want to predict all the pictures of koniq using the trained model. So, I used a loop to process all the pictures in the folder, but there are some problem like this, can you help me?

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[64,386,514] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_10/pool1_pad/Pad (defined at E:/Graudate/Code/triq-master-play/src/examples/image_quality_prediction.py:21) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference_predict_function_106843]

Input

Hello, I have been reproducing this project recently, and it feels great. Now I encounter a problem. I want to input two pictures at a time (or input one, and then enter the model and then segment it), I have read it for a long time, but I didn't find where to modify it. For the input-shape of None type, I can do nothing. Looking forward to your comments and guidance, thank you very much. - a beginner

training

Have you encountered such a problem?

File "E:/Graudate/Code/triq-master-play/src/train/train_triq.py", line 77, in train_main
model.compile(loss=loss, optimizer=optimizer, metrics=[metrics])
File "D:\tools\Anaconda\set\envs\python37tf\lib\site-packages\tensorflow\python\keras\engine\training.py", line 324, in compile
with self.distribute_strategy.scope():
File "D:\tools\Anaconda\set\envs\python37tf\lib\site-packages\tensorflow\python\keras\engine\training.py", line 455, in distribute_strategy
return self._distribution_strategy or ds_context.get_strategy()
AttributeError: 'Model' object has no attribute '_distribution_strategy'

paper link

Hi Junyong, could you provide the paper link? Thank you very much!

Save model config data

Hey, I wanted to ask which would be the best way to save model config data? After training it only saves weights and not the model config data itself. I tried changing it in callbacks.py "save_weights_only=False", but it did not work. Are there any other ways how to deal with this? Thank you in advance!

plcc

Hello, I would like to ask what is the value of PLCC of the training set you get, when the epoch of training is 120? I think the result I get is a bit wrong.

run image_quality_prediction.py shape erro

thanks for your work. it is very cool. I test jpg image with size 1919 × 1440. it will show me that:

tensorflow.python.framework.errors_impl.InvalidArgumentError:  Incompatible shapes: [1,661,32] vs. [1,193,32]
	 [[node model/tri_q_image_quality_transformer/add_1 (defined at /data2/zhx3/triq/src/models/transformer_iqa.py:198) ]] [Op:__inference_predict_function_9663]

Errors may have originated from an input operation.
Input Source operations connected to node model/tri_q_image_quality_transformer/add_1:
 model/tri_q_image_quality_transformer/concat (defined at /data2/zhx3/triq/src/models/transformer_iqa.py:194)

Function call stack:
predict_function

save model architecture

Hey, I wanted to ask - is it possible to save the whole model architecture in a json file?

Error when run image_quality_prediction.py

I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
段错误 (核心已转储)

报这样的错，修改栈大小没用，不知道是什么原因

Error when run image_quality_prediction.py

Hello, an error occurred when I ran image_quality_predict.py with your TRIQ.h. I don't know how to solve it. Would you please look at it for me? Thank you

request for trained model

Hi author,

I read your paper and think the ideas in it are very ingenious.
But I don't have enough computing power to train this model. Can you provide a trained weight file?

thank you very much :)

Issue with Training - Generator error

Hello!
I followed all the instructions for training and prepared the data & labels accordingly. When I ran the training script it runs for a few steps say 170/2135 and then it stops throwing exception errors.

I then changed return np.array(images_aug), np.array(y_scores) to return np.array(images_aug, dtype='object'), np.array(y_scores, dtype='object'), but now script is just stuck and doesn't consume much GPU memory after a while(700MB/16GB). I even tried training from scratch(not loaded ImageNet pretrained weights) but still no luck.

My conda env details:
tensorflow-gpu==2.1.0
tensorflow_addons==0.8.3
h5py==2.10.0

TRIQ failure on images of particular size range

Hello,

Thank you for the great implementation of TRIQ.

I am able to run TRIQ successfully on most images, however it seems a particular range of resolutions causes failure.

First, I load in TRIQ model,

args = {}

args['n_quality_levels'] = 5
args['backbone'] = 'resnet50'
args['weights'] = 'path/TRIQ.h5'

model = create_triq_model(n_quality_levels=args['n_quality_levels'],
                          backbone=args['backbone'])

model.load_weights(args['weights'])

An example image link is below,

https://hpmlawatl.com/wp-content/uploads/2013/07/640x4802.gif

test_image = "/path/640x4802.gif"
image = Image.open(test_image).convert('RGB')
image = np.asarray(image, dtype=np.float32)
image = image[:,:,:3]
image /= 127.5
image -= 1.
prediction = model.predict(np.expand_dims(image, axis=0))

This shows an error,

InvalidArgumentError:  required broadcastable shapes
	 [[node model/tri_q_image_quality_transformer/add_1
 (defined at /home/ubuntu/production/triq/src/models/transformer_iqa.py:197)
]] [Op:__inference_predict_function_10094]

However, I can then resize the same image, to sizes both LARGER OR SMALLER, and the image will run successfully. As an example, this image can be set to either 512 X 384 OR 1024 X 768 and TRIQ will run fine.

test_image = "/path/640x4802.gif"
image = Image.open(test_image).convert('RGB')
img_sizes = image.size
print("Original Image size is, " + str(img_sizes[0])+  " " + str(img_sizes[1]))

size_cutoff = 1024 # This sets to 1024 X 768
size_cutoff = 512 # This sets to 512 X 384

if img_sizes[0] != size_cutoff and img_sizes[1] != size_cutoff:
    max_size = max(img_sizes)
    scale_factor = size_cutoff / max_size
    x_dim = round(img_sizes[0]*scale_factor)
    y_dim = round(img_sizes[1]*scale_factor)
    image = image.resize((x_dim,y_dim),Image.ANTIALIAS)

image = np.asarray(image, dtype=np.float32)
image = image[:,:,:3]
image /= 127.5
image -= 1.
prediction = model.predict(np.expand_dims(image, axis=0))

In order to pin this down I did a bit of empirical testing, and:

Values of size_cutoff = 513 will fail, while size_cutoff = 512 is okay.

Similarly, size_cutoff = 1057 will fail while size_cutoff = 1056 is okay.

I understand if an image is too small or large the TRIQ will fail. What I am not understanding is why images of a particular size (640X480) will fail, while the same image resized to be smaller (512, 384) or larger (1024, 768) will run successfully.

Any insight you have would be helpful.

dataset

mos_scale = [1, 2, 3, 4, 5]
image_files = {}
with open(mos_file, 'r+') as f:#打开文件为只读模式，文件指针位于文件开头
lines = f.readlines()#按行读取文件
for line in lines:
content = line.split(',')#将文件按行分成一个个数组
image_file = content[0].replace('"', '').lower()#取出数组第一列也就是文件中第一列图片名字

        if using_single_mos:
            score = float(content[-1]) if mos_format == 'mos' else float(content[1]) / 25. + 1

Hello，We have a puzzle. If use "single_mos" ，you have changed the MOS in the live-challenge data to [1-5], but the MOS tag in the LIVE data set in the code should be content [-1] instead of content [1]. We think the code should be
score = float(content[-1]) if mos_format == 'mos' else float(content[-1]) / 25. + 1

AttributeError: 'MyCSVLogger' object has no attribute 'file_flags'

A error that "AttributeError: 'MyCSVLogger' object has no attribute 'file_flags'" had happened when I run the code.

Handling different size inputs during training

Hi,

Could you please tell how you handled different image sizes as input during the training phase? Lets say we have three images of size (1080x1080), (1608x1608) and (2000x2000). If we give these images as an input to the network during training, how was this taken care of? Were the images padded with zeros to the image resolution of maximum size?
Thanks.