I found the emd definition here <a href="https://github.com/titu1994/neural-image-asse

There is also scan function for tensorflow. <div

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

EMD loss function maybe wrong about neural-image-assessment HOT 11 CLOSED

titu1994 commented on July 29, 2024 1

EMD loss function maybe wrong

from neural-image-assessment.

Comments (11)

titu1994 commented on July 29, 2024 3

Turns out, there exists K.cumsum with which I can compute the CDF quite easily. Yeesh. Turns out this gives the correct answer for the loss :

The following script has the output :

import numpy as np

y_true = np.array([[0, 0, 0, 0, 0, 0, 0, 0.9, 0.1, 0]])
y_pred1 = np.array([[0, 0, 0, 0, 0, 0, 0.9, 0, 0.1, 0]])
y_pred2 = np.array([[0.9, 0, 0, 0, 0, 0, 0, 0, 0.1, 0]])

def emd_1(y_true, y_pred):
    return np.sqrt(np.mean(np.square(np.abs(np.cumsum(y_true, axis=-1) - np.cumsum(y_pred, axis=-1)))))

def emd_2(y_true, y_pred):
    return np.sqrt(np.mean(np.square(np.abs(y_true - y_pred))))

print("EMD 1")
print("Loss 1: ", emd_1(y_true, y_pred1))
print("Loss 2: ", emd_1(y_true, y_pred2))

print("EMD 2")
print("Loss 1: ", emd_2(y_true, y_pred1))
print("Loss 2: ", emd_2(y_true, y_pred2))

EMD 1
Loss 1:  0.284604989415
Loss 2:  0.752994023881

EMD 2
Loss 1:  0.40249223595
Loss 2:  0.40249223595

from neural-image-assessment.

titu1994 commented on July 29, 2024

And how would I compute the cdf inside the loss measure ? It's a tensor, not a numpy array.

from neural-image-assessment.

qzchenwl commented on July 29, 2024

There is also scan function for tensorflow.

def cumsum(tensor):
    return tf.scan(lambda a, b: tf.add(a, b), tensor)

from neural-image-assessment.

titu1994 commented on July 29, 2024

Well, since K.cumsum already calls tf.cumsum in the backend, its good enough for loss calculation.

from neural-image-assessment.

titu1994 commented on July 29, 2024

It will take roughly 16 hours to train for 10 epochs again. Yeesh. At least my laptop is free for today anyway..

from neural-image-assessment.

tfriedel commented on July 29, 2024

@titu1994 I noticed you are only training the top layer (whereas in the paper they train the inner layers with a 10x lower learning rate). I guess you are doing it for performance reasons. You know this trick where you just make a new network consisting only of the fully connected layer + dropout + softmax and just feed the predictions you got with the other layers as input? That's a LOT faster.
See an example here:
https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson3.ipynb

from neural-image-assessment.

titu1994 commented on July 29, 2024

@tfriedel Yes, I am training only the final dense layer since I don't have the computational memory requirements to train the full MobileNet model at image sizes of 224x224x3 with a batchsize of 200 on a 4GB laptop GPU.

I know about that "trick" you mentioned. Under ordinary circumstances, I would think about applying that. However, this is a dataset of 255,000 images, taking roughly 13 GB of diskspace. On top of that, I am doing random horizontal flips on the train set. So make that 510,000 images x 7 x 7 spatial size x 1024 filters x 4 bytes ~= 510 000 * 7 * 7 * 1024 * (4 bytes) = 102.35904 gigabytes.

Edit : If you take the output of the global average pooled features, you would require only 2.1 GB of diskspace. Hmm, perhaps this can be done afterall. I however won't have the time improving this codebase after I finish finetuning the current model.

To compute a forward pass for that many images, it would take roughly 3.5 hours. Ofc, after that, training on the single FCN would be blazingly fast, if I was able to load that large a numpy array into my 16 GB RAM (which I can't). Now, if there were some way to chunk the numpy arrays into separate files, and load them via the TF dataset api, it would be more tractable.

Edit: I forgot to mention that this isnt an ordinary classification problem where you can simply save the class number in a file and load that later and do a one hot encoding to get the final classification output. For each image, you need an array of size 10, normalized by its scores that need to be fed to the nn in order to get the correct output score and minimize the earth mover distance loss. To save and load such an aligned set of image features and output scores would require even more space and make the data loading even more unwieldy.

Simply put, it would require significant engineering of the entire codebase to do it the "fast" way. The method you suggest is for toy datasets (which you can save and load feature arrays quickly), or for those who have dedicated supercomputers and enough time to engineer such a training framework.

Given the significant challenges, the only "plus" side I can see is that in doing something like this, I could possibly train larger NIMA classifiers (as in using a NASNet, or an Inception-ResNet-v2 model as the base classifier).

from neural-image-assessment.

tfriedel commented on July 29, 2024

I think the 7 * 7 in your calculation is before avg pooling, but you would get the values out after it, so it
only takes 4k per image really or about 2gb ram. So it would fit into ram.
But yeah it's a problem with the image augmentation. If you are not only doing flipping but also cropping..
The chunking of the numpy arrays can be done with bcolz, like here for example:
https://github.com/fastai/courses/blob/master/deeplearning2/imagenet_process.ipynb

I'm currently trying to finetune the whole network with code that's based on yours but does random cropping and finetuning of the whole network with different learning rates. Will keep you updated!

from neural-image-assessment.

titu1994 commented on July 29, 2024

@tfriedel make sure you are using the updated calculation of the loss measure that I posted a few hours back. The difference is slight, but maybe by finetuning the whole network you would get more of a difference.

from neural-image-assessment.

tfriedel commented on July 29, 2024

Yeah I've already incorporated the new loss, thanks!
I'm not using the TF dataset api but I adapted code I've once written for a kaggle competition. It's based on ImageDataGenerator, which I modified to use a BcolzArrayIterator (so I don't have to have these huge numpy arrays in ram) and uses a function which does random cropping/flipping using the torchvision transforms API as a preprocessing step.
That said I looked into what TF has to offer in that regard and there are some functions like tf.random_crop, tf.image.crop_and_resize and so on.

from neural-image-assessment.

titu1994 commented on July 29, 2024

Ah got it. Seems I was looking in the wrong directory. tf.random_crop is what I needed, and I was searching for it in tf.image.* (semantic mistake I guess?). Anyway, I am just about done finetuning 5 epochs on the new loss, and it seems somewhat promising.

I'm now gonna continue the next 15 epochs using random crops. Hopefully it yields even better results.

from neural-image-assessment.

EMD loss function maybe wrong about neural-image-assessment HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs