GithubHelp home page GithubHelp logo

Comments (16)

NoahVl avatar NoahVl commented on July 30, 2024

Hey! Thanks for taking an interest in our code 😄 Sorry for my late reply, I didn't get notified for some reason.

It seems like we got an accuracy of 87% on the testing set (this was on the age labels, which we didn't use in the end), but I will look into the notebook more tomorrow. Could you tell me what dataset you're training on and what you're trying to predict?

If I remember correctly, the resizing you're describing was done on purpose. This was done because when we're training the StyleGAN2 model, we use smaller images than what the pre-trained ResNet/MobileNet models were trained on (224x224). This means that when we want to classify the (StyleGAN) generated images, we have to upscale the images to 224x224 (we let StyleGAN generate 64x64 images due to computational constraints). To properly capture the noise/artifacts introduced by the interpolation method- we therefore decided to downscale and then upscale the images during the training of the classifier as well. I believe we do this using the bilinear interpolation method, not doing so caused worse performance because it will become pixelated which these pre-trained classifiers don't seem to appreciate.

Our reasoning was that this way of downscaling and upscaling would be less out of distribution and therefore a safer bet. However, you'd have to test this for yourself to be sure. I'm not actually sure if we did.

Also feel free to ask more questions, I'm sorry we didn't properly comment this notebook. It was quickly thrown together because the MobileNet classification was causing a lot of frustration.

from explaining-in-style-reproducibility-study.

NoahVl avatar NoahVl commented on July 30, 2024

Also, are you using the same pre-trained ResNet classifier we're using? Or are you trying to train it from scratch?

from explaining-in-style-reproducibility-study.

DavidMrd avatar DavidMrd commented on July 30, 2024

Hi, thank you for your answer. I am using the pre-trained Resnet classifier. I am trying to predict the gender using the CelebA dataset. I tried to fine-tune/retrain it on the CelebA dataset but the ACC that I got was about 0.57.

from explaining-in-style-reproducibility-study.

NoahVl avatar NoahVl commented on July 30, 2024

Hey! So, the accuracy I reported previously was for the age classifier that we didn't end up using. For the ResNet CelebA gender classifier (which we used for generating the explanations of both face models) we got a validation accuracy of 97% and a testing accuracy of 97% as well. I made a new notebook for validating and testing the trained models, so you can see it there too. While testing on the FFHQ dataset then, where we use labels that likely aren't golden truth labels, we get an accuracy of 88% over the whole dataset. Note that we didn't train on this dataset, because the authors did not and we found the labels to be unreliable. But I suppose it is good to know how the model would perform on the data the StyleGAN2 model is trained on, if the labels are at least somewhat reliable.

I also tried re-running the classifier_training_celeba.ipynb notebook (which I updated now to show the results) and got around the same validation accuracy (97%). I did not run it on the testing set, because I saw that unfreezing and training the third to last layer caused the validation performance to drop a bit this time. You could therefore choose to not train that layer and just stop earlier.

All that being said, it shouldn't be the case you get an accuracy of 57% while fine-tuning the PyTorch provided pre-trained model with this notebook. Are you using a smaller batch size that might be the cause of this? Could you try running the training notebook again and see if you get similar performance to ours? You might have changed some things that caused the huge drop in performance.

Do let me know if you try, I'll be happy to help!

from explaining-in-style-reproducibility-study.

DavidMrd avatar DavidMrd commented on July 30, 2024

Hi, I tested the new notebook and got again a validation and test acc of ~57%. Did you preprocess the images or the labels?

from explaining-in-style-reproducibility-study.

NoahVl avatar NoahVl commented on July 30, 2024

I'll clone the repository again and download the data from scratch using the notebook we provided and see if I can replicate your behavior, how strange. I'm certain we didn't change the CelebA images (we did manually filter the plant dataset, because there were some pictures of frogs and houses in there I believe), it could have something to do with the labels but we'll see.

from explaining-in-style-reproducibility-study.

DavidMrd avatar DavidMrd commented on July 30, 2024

Ok, thank you a lot!

from explaining-in-style-reproducibility-study.

NoahVl avatar NoahVl commented on July 30, 2024

After having downloaded everything from scratch and running the classifier_testing_celeba.ipynb code again, I get the exact same results as before, as listed below:
image

Maybe something has gone wrong during your Kaggle download? Which causes you to miss some images? I have 202,599 images after I download the CelebA dataset in that folder.

Maybe you can also try cloning this repo from scratch and downloading everything again too, to see if that changes something. I'm not sure what is causing this sadly, but will gladly help if you have questions still :)

from explaining-in-style-reproducibility-study.

NoahVl avatar NoahVl commented on July 30, 2024

Ah I see, if I understand correctly you're trying to train your own classifier and want to know what our accuracy was (we should've put this in the paper), but you're not trying to test our model to see if you get the same accuracy on your machine? You're not re-running our training notebook and getting the 57% but you're using your own training script? Sorry for the misunderstanding.

I just quickly looked at your repo and see that when you train you immediately unfreeze all the layers of the pre-trained network. From my experience when fine-tuning these image models, it is usually better to gradually unfreeze the top layers, or only select a few of the top layers to leave unfrozen when fine-tuning (depending on the size of the data, and how close it was to the initial dataset). Have you tried this already? Code for the layer freezing of the classifiers is available in the training notebook.

from explaining-in-style-reproducibility-study.

DavidMrd avatar DavidMrd commented on July 30, 2024

hi, sorry for the late reply. I was trying to use your code, but somehow the labels and the images are not loaded in the same order. That is why when training or testing, the labels do not correspond to the images. This is the issue that I am experimenting when using your code in my machine.

from explaining-in-style-reproducibility-study.

NoahVl avatar NoahVl commented on July 30, 2024

Hmm how odd, are you using Windows? Were you able to fix it?

from explaining-in-style-reproducibility-study.

DavidMrd avatar DavidMrd commented on July 30, 2024

You can fix it by sorting the image_paths list in the__init__ methods of the Dataset classes

    image_path = os.path.join(celeb_dir, "img_align_celeba", "img_align_celeba")
    list_sorted = os.listdir(image_path)
    list_sorted.sort()
    self.images = [os.path.join(image_path, file)
                   for file in sorted_list if file.endswith('.jpg')]

from explaining-in-style-reproducibility-study.

NoahVl avatar NoahVl commented on July 30, 2024

And now you do get the same accuracy? If you want you can create a merge request and I'll merge it after testing, then you'll have contributed to this repo!

from explaining-in-style-reproducibility-study.

tmabraham avatar tmabraham commented on July 30, 2024

I can confirm that I had this same issue and @DavidMrd's fix worked for me (though it should be list_sorted not sorted_list in the last line).

image

from explaining-in-style-reproducibility-study.

tmabraham avatar tmabraham commented on July 30, 2024

A similar change is needed for the FFHQ dataset as well.

from explaining-in-style-reproducibility-study.

NoahVl avatar NoahVl commented on July 30, 2024

Hey, thanks guys! @DavidMrd let me know if you want to make that merge request, otherwise I'll do it.

I think the reason for this error is that I'm using Windows and you guys are using Linux. Thank you both for testing :)

from explaining-in-style-reproducibility-study.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.