GithubHelp home page GithubHelp logo

ats-privacy-replication's Introduction

ats-privacy-replication's People

Contributors

stfwn avatar

Watchers

 avatar

ats-privacy-replication's Issues

Bug in CIFAR-100 transforms

There seems to be a bug in the transforms used to normalize CIFAR-100: the mean and std from CIFAR-10 is used. This is where the bug occurs:

https://github.com/stfwn/mscai-fact-ai/blob/4e1bec5a2162ff9e5a27aa346d1e395f2e0e725e/original/benchmark/comm.py#L52-L53

The test set is normalized correctly. It seems likely that this harms test accuracy.

Details

At first the transforms are initialized correctly by a _build_cifar100 function in the inversefed module here:

https://github.com/stfwn/mscai-fact-ai/blob/4e1bec5a2162ff9e5a27aa346d1e395f2e0e725e/original/benchmark/comm.py#L104

But these are then overridden by the ones containing the bug in their own build_transform function here:

https://github.com/stfwn/mscai-fact-ai/blob/4e1bec5a2162ff9e5a27aa346d1e395f2e0e725e/original/benchmark/comm.py#L110-L111

Misc

Notably (but not neccessarily a bug), the first set of transformations contains a random crop and random horizontal flip while the second does not.

# First transforms
Compose(
    RandomCrop(size=(32, 32), padding=4)
    RandomHorizontalFlip(p=0.5)
    Compose(
    ToTensor()
    Normalize(mean=[0.5071598291397095, 0.4866936206817627, 0.44120192527770996], std=[0.2673342823982239, 0.2564384639263153, 0.2761504650115967])
)
)

# Bugged transforms
Compose(
    ToTensor()
    Normalize(mean=[0.4914672374725342, 0.4822617471218109, 0.4467701315879822], std=[0.24703224003314972, 0.24348513782024384, 0.26158785820007324])
)
  • Discuss with TA
  • Figure out what difference this makes if any.
  • Discuss with TA again

Refactor of attack

  • create blackbox attack class with all functions
  • add attack to main
  • Ask authors about which images are used to compute PSNR metrics for tables

Unclear if M^s and M^r are implemented/respected

Here's the paragraph above algorithm 1, which outlines the top policy search process.

image

  • Does the original implementation do this correctly? It seems like it doesn't in original/benchmark/search_transform_attack.py Yes
  • Do we do this correctly atm? --> Atm we call into search_transform_attack.py, so I think not. Yes

Thoughts?

Set up GPU environment

  • Receive credentials & instructions on how to connect and run
  • (optional) Write convenience scripts to make it nicer ๐Ÿ˜„

Bug in batch_generate.py

They are using the following code to draw a random transformation from the list of subpolicies:

random.randint(-1, 50)

The problem is that there are 50 policies, and they set the upper bound to 50 (inclusive) which occasionally causes IndexError: list index out of range exception.

  • Fix it in our codebase

`sample_list` vs. "100 randomly selected images"

https://github.com/stfwn/mscai-fact-ai/blob/c0d88b3dfd9568ada9a4e079e25e8a06952b79f6/original/benchmark/search_transform_attack.py#L187

# Same as:
list(range(200, 700, 5))

This line creates sample indices to be used in computing S_pri and seems to refer to this part of appendix A:

image

  • This does not seem random. Maybe they mean 'random' as in 'non-sequential' and implemented it this way so that it's reproducable?

Later on they have these sequential sets of 100 samples to compute S_acc:

https://github.com/stfwn/mscai-fact-ai/blob/c0d88b3dfd9568ada9a4e079e25e8a06952b79f6/original/benchmark/search_transform_attack.py#L209-L210

Here's they don't claim any randomness.

Add ConvNet model to `models.py`

I refactored the ConvNet model into our codebase and did some trial runs to reverse-engineer which width they could have used. I think we can safely go with width=16 and come close enough to reproduce. That one levels out at about 94% test accuracy on F-MNISt, and they report 94.25 ๐Ÿ‘

image

  • Red: width=16
  • Dark blue: width=8
  • Light blue: width=32

Undocumented augmentations

Summary

In cifar100_train.py, as soon as you add any augmentation, you also get a random crop and flip for free. This should be documented somewhere, but I don't think it is.

Details

When printing the transforms in the actual train function with trainloader.dataset.transform, here's what you get in various situations.

No augmentations

Run with:

python benchmark/cifar100_train.py --arch ResNet20-4 --data cifar100 --epochs 200 --aug_list ''  --mode aug

And get augmentations:

# Test
Compose(
    ToTensor()
    Normalize(mean=[0.4914672374725342, 0.4822617471218109, 0.4467701315879822], std=[0.24703224003314972, 0.24348513782024384, 0.26158785820007324])
)

# Test
Compose(
    ToTensor()
    Normalize(mean=[0.5071598291397095, 0.4866936206817627, 0.44120192527770996], std=[0.2673342823982239, 0.2564384639263153, 0.2761504650115967])
)

Run with:

python benchmark/cifar100_train.py --arch ResNet20-4 --data cifar100 --epochs 200 --aug_list ''  --mode crop

And get:

# Train
Compose(
    RandomCrop(size=(32, 32), padding=4)
    RandomHorizontalFlip(p=0.5)
    ToTensor()
    Normalize(mean=[0.4914672374725342, 0.4822617471218109, 0.4467701315879822], std=[0.24703224003314972, 0.24348513782024384, 0.26158785820007324])
)
# Test
 Compose(
    ToTensor()
    Normalize(mean=[0.5071598291397095, 0.4866936206817627, 0.44120192527770996], std=[0.2673342823982239, 0.2564384639263153, 0.2761504650115967])
)

With augmentations

Run with:

python benchmark/cifar100_train.py --arch ResNet20-4 --data cifar100 --epochs 200 --aug_list '43-18-18'  --mode aug

And get these transforms on the train set:

# Train
Compose(
    RandomCrop(size=(32, 32), padding=4)
    RandomHorizontalFlip(p=0.5)
    <benchmark.comm.sub_transform object at 0x7f40fd0a41c0>
    ToTensor()
    Normalize(mean=[0.4914672374725342, 0.4822617471218109, 0.4467701315879822], std=[0.24703224003314972, 0.24348513782024384, 0.26158785820007324])
)

# Test
Compose(
    ToTensor()
    Normalize(mean=[0.5071598291397095, 0.4866936206817627, 0.44120192527770996], std=[0.2673342823982239, 0.2564384639263153, 0.2761504650115967])
)

Small model for augmentations search

That is about first model from section 4.4, page 5ft: "Ms is used for
privacy quantification. It is trained only with 10% of the
original training set for 50 epochs. This overhead is equivalent to the training with the entire set for 5 epochs, which is
very small. "

  • Create script for creating small, reproductible subset of CIFAR100 for train and evaluation of small model for policy search. (with seed)
  • Train small model on this subset

Run all the experiments required for tables

Table 1

Done

Table 2

Done (with possible extra runs for enhancement if there is time)

Table 3

TODO: list this

Table 4

Running this tonight (commented out are part of table but already ran):

# Table 4
## a) Training
# python3.9 main.py train --model resnet20 --dataset fmnist -e 50 --bugged-loss
python3.9 main.py train --model resnet20 --dataset fmnist -e 50 --bugged-loss --aug-list 3-1-7
python3.9 main.py train --model resnet20 --dataset fmnist -e 50 --bugged-loss --aug-list 43-18-18
python3.9 main.py train --model resnet20 --dataset fmnist -e 50 --bugged-loss --aug-list 3-1-7+43-18-18

## b) Training
# python3.9 main.py train --model convnet --dataset fmnist -e 60 --bugged-loss
python3.9 main.py train --model convnet --dataset fmnist -e 50 --bugged-loss --aug-list 21-13-3
python3.9 main.py train --model convnet --dataset fmnist -e 50 --bugged-loss --aug-list 7-4-15
python3.9 main.py train --model convnet --dataset fmnist -e 50 --bugged-loss --aug-list 7-4-15+21-13-3

## a) attacks
for img_idx in {0..5}
do
        # python3.9 main.py attack --model resnet20 --dataset fmnist --optimizer inversed --image-index $img_idx
        python3.9 main.py attack --model resnet20 --dataset fmnist --optimizer inversed --image-index $img_idx --aug-list 3-1-7
        python3.9 main.py attack --model resnet20 --dataset fmnist --optimizer inversed --image-index $img_idx --aug-list 43-18-18
        python3.9 main.py attack --model resnet20 --dataset fmnist --optimizer inversed --image-index $img_idx --aug-list 3-1-7+43-18-18

        # python3.9 main.py attack --model convnet --dataset fmnist -e 50 --optimizer inversed --image-index $img_idx
        python3.9 main.py attack --model convnet --dataset fmnist -e 50 --optimizer inversed --image-index $img_idx --aug-list 21-13-3
        python3.9 main.py attack --model convnet --dataset fmnist -e 50 --optimizer inversed --image-index $img_idx --aug-list 7-4-15
        python3.9 main.py attack --model convnet --dataset fmnist -e 50 --optimizer inversed --image-index $img_idx --aug-list 7-4-15+21-13-3
done

Minor accuracy bug

Summary

Accuracy is computed per batch and averaged over the batch dimension, resulting in about 0.2% error in practice versus computing it per sample.

Details

Accuracy is computed per batch here:

https://github.com/stfwn/mscai-fact-ai/blob/d838a4baf957fcb59e6ed702566ccaf8a9af974f/ATSPrivacy/inversefed/data/loss.py#L120-L128

All the batch accuracies are summed up:

https://github.com/stfwn/mscai-fact-ai/blob/d838a4baf957fcb59e6ed702566ccaf8a9af974f/ATSPrivacy/inversefed/training/training_routine.py#L88-L89

And averaged to produce the epoch metric:

https://github.com/stfwn/mscai-fact-ai/blob/d838a4baf957fcb59e6ed702566ccaf8a9af974f/ATSPrivacy/inversefed/training/training_routine.py#L101

But since the test set is 10k samples big, batch size is 128 and drop_last=False in the dataloader, this leads to one last batch of (10000 / 128 - 10000 // 128) * 128 = 16 samples counting 128 / 6 = 8 times more than other samples.

In the worst case those 16 are correct and all hypothetical other samples of the last batch would be wrong, leading to one 100% in the tally versus a 12.5%. If all the other samples have 70% accuracy this results in a score of a 1% percent higher accuracy. (((78*0.7+1)/79 - (78*0.7+0.125)/79)*100)

In practice it only matters about 0.2% on average.

Extensions

Candidates

  • Rescale ImageNet
    1. Train ResNet20 on it
    2. Do policy search on it Reuse policies found from Cifar100.
    3. Do reconstructions on it
    4. Report PSNR and val accuracy
    5. Compare with paper claims
  • Do policy search on different dataset
  • Add transformations to transformation library
  • Use different model
  • Test policies with more transformations
  • Try different kinds of attacks

Shizzle doesn't work

  • Try ResNet20 with width=64 instead of width=16
  • Try ConvNet with more channels

Number of tested policies lower than declared

Random policy generator can occasionally sample the same policy more than once. In that case, number of evaluated policies is lower than expected, i.e. for our setup it's 1592 instead of 1600. The same issue exists in the original implementation

  • Decide whether to fix

Contact authors

  • Ask about 0.5 * loss bug
  • Ask about which images are used to compute PSNR metrics for tables
  • Ask about if the model to compute PSNR from figure 2 is just M^s?
  • Ask what Figure 4-3 (and such) means under 'figure implementation' on page 11
    image

@Sloetoe did this around 12PM today.

Bug in inversefed

I noticed these lines:

https://github.com/stfwn/mscai-fact-ai/blob/4e1bec5a2162ff9e5a27aa346d1e395f2e0e725e/original/inversefed/data/loss.py#L48

https://github.com/stfwn/mscai-fact-ai/blob/4e1bec5a2162ff9e5a27aa346d1e395f2e0e725e/original/inversefed/data/loss.py#L104

In both cases, the loss is halved before being returned.

The line in the Classification class was acknowledged as a bug here and subsequently fixed. The author posted an update for table 1 and said they would try to update the paper on arXiv, which hasn't happened yet.

  • The PSNR class contains the same halving and was not discussed. Find out if this is a bug or intended.

The acknowledged bug is present in the reference implementation we're working with because it depends on this work, and it seems we must assume that experiments were run with it present.

  • Discuss with TA
  • Decide if we fix it in our experiments or not

Create logging structure

  • design logging structure for easy plotting the results
  • do logging for augmentation search part
  • do logging for training model with and without augmentations part
  • do logging for the attack part

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.