GithubHelp home page GithubHelp logo

Comments (9)

jonycgn avatar jonycgn commented on June 11, 2024

Hi Jillian,

I'm not sure why you think that mean squared error must be averaged across the RGB channels. The current code treats the RGB information as a colorspace. The computed mean squared error is then the squared Euclidean distance in this 3-dimensional colorspace, averaged across pixels. I can't see anything fundamentally wrong with that, it is simply a different convention (which makes sense for other colorspaces as well).

When you set lambda to a very small number, eventually, the bit rate should end up collapsing to 0 bpp, and the distortion will be very large. This should be expected. Have you tried raising lambda? What is the smallest bpp that you can successfully train?

from compression.

Jillian2017 avatar Jillian2017 commented on June 11, 2024

Hi, Johannes:
Thanks for your explanation.
Yes, I agree with that this is not fundamentally error, and I think this make little difference in training. In my opinion, the mean squared error should be averaged across points instead of RGB channels. Suppose input and output are not images with 3 channels, they are arrays that are made up of some elements, then the MSE should be the average value across all these points.
For the problem of convergence, I have tried to train the model with lambda equals 0.001, the trend of the RD curve is downward actually, after certain iteration(maybe a hundred thousand) the MSE value oscillates between 150 and 600, and it will not converge even the last_step is very high(1e7). My training data contains about 2 thousand patches, the learning rate is the same. Is data augmentation needed for small bitrate? As for higher lambda, such as 0.1, it does not oscillate a lot.
Best regards.

from compression.

jonycgn avatar jonycgn commented on June 11, 2024

Hi!

While it doesn't matter fundamentally how you compute MSE and is a matter of preference (it's just off by a factor of 3), I realized that almost all publications (including my own) have been using the way of computing MSE that you prefer. So, to make things consistent, I changed it in commit 164bc19. This should also resolve some issues where people reported not being able to reproduce results in the papers.

Thanks for reporting this!

Next, I'm going to try to reproduce your problem at low bitrates. I'll report back soon.

from compression.

Jillian2017 avatar Jillian2017 commented on June 11, 2024

Hi, Johannes:
Thanks a lot. Very appreciate your reply and look forward to your report.
Best regards.

from compression.

jonycgn avatar jonycgn commented on June 11, 2024

Jillian,

I was able to train models at quite low bitrates just fine (lambda=0.001 and lambda=0.00025). I don't see any oscillation happening, or any other problems with convergence.

From what dataset did you gather your image patches for training?

Could you check out the latest code, try again, and see if the problem is still happening?

from compression.

Jillian2017 avatar Jillian2017 commented on June 11, 2024

Johannes:
Thanks a lot for your timely reply. I have changed the way of reading images, in my side, 256*256 patches are randomly cropped from high resolution images. What's more, my dataset contains less than 3 thousand high resolution images which are made up of DIV2K, CLIC's training images. So, is it possible that there are not much images in dataset. There are many high resolution in website, but most of them are in lossy format, such as jpeg, I think they are not good candidates for image compression. Do you have any suggestion for data augment based on these lossy images?
What's more I will download your lastest code and train again.
Best regards.

from compression.

jonycgn avatar jonycgn commented on June 11, 2024

~3000 images should be more than enough to train a model. These image compression models are generally in underfitting regime (meaning it's very hard for them to "memorize" the data).

However, you should know that the resolution of images can have a big effect on the performance. If you are evaluating on Kodak, for example, you should train with images that have a similar resolution (or if you use high resolution images, downsample them for training).

The latest change also contains some improvements to the input pipeline, so it's possible that fixes it. Please try again and let me know.

from compression.

Jillian2017 avatar Jillian2017 commented on June 11, 2024

Hi, Johannes:
I haved downloaded your latest code and run it again, but the result seems not improve. My training dataset contains almost 3000 thousands images in png format, lambda is 0.001, patch size is 256*256, and number filters is 192. After 526144 steps, the loss still oscillates, attached is the screenshot from tensorboard.
screenshot at 2019-01-01 19-30-52
Wish your advice. Best regards.

from compression.

jonycgn avatar jonycgn commented on June 11, 2024

Hi, these curves look completely normal. What you are seeing is the natural fluctuation of the loss due to stochastic gradients. Try running the model on one of the Kodak images (with "python bls2017.py -v compress") and compare the results to the corresponding plot in the appendix of this paper. You should get something matching the Balle 2017 curve quite closely.

I'm closing the bug for now. If your results are way off, and you still think there is a bug in the library, you can reopen it. If you need tips for tweaking the dataset/training, please post to the Google group.

Again, thanks for pointing out the MSE inconsistency!

from compression.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.