- ๐ค Cyber security scientist at TNO.
- ๐ซ [email protected]
stfwn / ats-privacy-replication Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
There seems to be a bug in the transforms used to normalize CIFAR-100: the mean and std from CIFAR-10 is used. This is where the bug occurs:
The test set is normalized correctly. It seems likely that this harms test accuracy.
At first the transforms are initialized correctly by a _build_cifar100
function in the inversefed
module here:
But these are then overridden by the ones containing the bug in their own build_transform
function here:
Notably (but not neccessarily a bug), the first set of transformations contains a random crop and random horizontal flip while the second does not.
# First transforms
Compose(
RandomCrop(size=(32, 32), padding=4)
RandomHorizontalFlip(p=0.5)
Compose(
ToTensor()
Normalize(mean=[0.5071598291397095, 0.4866936206817627, 0.44120192527770996], std=[0.2673342823982239, 0.2564384639263153, 0.2761504650115967])
)
)
# Bugged transforms
Compose(
ToTensor()
Normalize(mean=[0.4914672374725342, 0.4822617471218109, 0.4467701315879822], std=[0.24703224003314972, 0.24348513782024384, 0.26158785820007324])
)
Here's the paragraph above algorithm 1, which outlines the top policy search process.
original/benchmark/search_transform_attack.py
search_transform_attack.py
, so I think not.Thoughts?
E: This might not be this part actually.
They are using the following code to draw a random transformation from the list of subpolicies:
random.randint(-1, 50)
The problem is that there are 50 policies, and they set the upper bound to 50 (inclusive) which occasionally causes IndexError: list index out of range
exception.
# Same as:
list(range(200, 700, 5))
This line creates sample indices to be used in computing S_pri and seems to refer to this part of appendix A:
Later on they have these sequential sets of 100 samples to compute S_acc:
Here's they don't claim any randomness.
I refactored the ConvNet
model into our codebase and did some trial runs to reverse-engineer which width they could have used. I think we can safely go with width=16
and come close enough to reproduce. That one levels out at about 94% test accuracy on F-MNISt, and they report 94.25 ๐
width=16
width=8
width=32
In cifar100_train.py
, as soon as you add any augmentation, you also get a random crop and flip for free. This should be documented somewhere, but I don't think it is.
When printing the transforms in the actual train function with trainloader.dataset.transform
, here's what you get in various situations.
Run with:
python benchmark/cifar100_train.py --arch ResNet20-4 --data cifar100 --epochs 200 --aug_list '' --mode aug
And get augmentations:
# Test
Compose(
ToTensor()
Normalize(mean=[0.4914672374725342, 0.4822617471218109, 0.4467701315879822], std=[0.24703224003314972, 0.24348513782024384, 0.26158785820007324])
)
# Test
Compose(
ToTensor()
Normalize(mean=[0.5071598291397095, 0.4866936206817627, 0.44120192527770996], std=[0.2673342823982239, 0.2564384639263153, 0.2761504650115967])
)
Run with:
python benchmark/cifar100_train.py --arch ResNet20-4 --data cifar100 --epochs 200 --aug_list '' --mode crop
And get:
# Train
Compose(
RandomCrop(size=(32, 32), padding=4)
RandomHorizontalFlip(p=0.5)
ToTensor()
Normalize(mean=[0.4914672374725342, 0.4822617471218109, 0.4467701315879822], std=[0.24703224003314972, 0.24348513782024384, 0.26158785820007324])
)
# Test
Compose(
ToTensor()
Normalize(mean=[0.5071598291397095, 0.4866936206817627, 0.44120192527770996], std=[0.2673342823982239, 0.2564384639263153, 0.2761504650115967])
)
Run with:
python benchmark/cifar100_train.py --arch ResNet20-4 --data cifar100 --epochs 200 --aug_list '43-18-18' --mode aug
And get these transforms on the train set:
# Train
Compose(
RandomCrop(size=(32, 32), padding=4)
RandomHorizontalFlip(p=0.5)
<benchmark.comm.sub_transform object at 0x7f40fd0a41c0>
ToTensor()
Normalize(mean=[0.4914672374725342, 0.4822617471218109, 0.4467701315879822], std=[0.24703224003314972, 0.24348513782024384, 0.26158785820007324])
)
# Test
Compose(
ToTensor()
Normalize(mean=[0.5071598291397095, 0.4866936206817627, 0.44120192527770996], std=[0.2673342823982239, 0.2564384639263153, 0.2761504650115967])
)
a bit exprimental setup and code, tiny-imagenet, computational reqs: hours, epochs
That is about first model from section 4.4, page 5ft: "Ms is used for
privacy quantification. It is trained only with 10% of the
original training set for 50 epochs. This overhead is equivalent to the training with the entire set for 5 epochs, which is
very small. "
Done
Done (with possible extra runs for enhancement if there is time)
TODO: list this
Running this tonight (commented out are part of table but already ran):
# Table 4
## a) Training
# python3.9 main.py train --model resnet20 --dataset fmnist -e 50 --bugged-loss
python3.9 main.py train --model resnet20 --dataset fmnist -e 50 --bugged-loss --aug-list 3-1-7
python3.9 main.py train --model resnet20 --dataset fmnist -e 50 --bugged-loss --aug-list 43-18-18
python3.9 main.py train --model resnet20 --dataset fmnist -e 50 --bugged-loss --aug-list 3-1-7+43-18-18
## b) Training
# python3.9 main.py train --model convnet --dataset fmnist -e 60 --bugged-loss
python3.9 main.py train --model convnet --dataset fmnist -e 50 --bugged-loss --aug-list 21-13-3
python3.9 main.py train --model convnet --dataset fmnist -e 50 --bugged-loss --aug-list 7-4-15
python3.9 main.py train --model convnet --dataset fmnist -e 50 --bugged-loss --aug-list 7-4-15+21-13-3
## a) attacks
for img_idx in {0..5}
do
# python3.9 main.py attack --model resnet20 --dataset fmnist --optimizer inversed --image-index $img_idx
python3.9 main.py attack --model resnet20 --dataset fmnist --optimizer inversed --image-index $img_idx --aug-list 3-1-7
python3.9 main.py attack --model resnet20 --dataset fmnist --optimizer inversed --image-index $img_idx --aug-list 43-18-18
python3.9 main.py attack --model resnet20 --dataset fmnist --optimizer inversed --image-index $img_idx --aug-list 3-1-7+43-18-18
# python3.9 main.py attack --model convnet --dataset fmnist -e 50 --optimizer inversed --image-index $img_idx
python3.9 main.py attack --model convnet --dataset fmnist -e 50 --optimizer inversed --image-index $img_idx --aug-list 21-13-3
python3.9 main.py attack --model convnet --dataset fmnist -e 50 --optimizer inversed --image-index $img_idx --aug-list 7-4-15
python3.9 main.py attack --model convnet --dataset fmnist -e 50 --optimizer inversed --image-index $img_idx --aug-list 7-4-15+21-13-3
done
Accuracy is computed per batch and averaged over the batch dimension, resulting in about 0.2% error in practice versus computing it per sample.
Accuracy is computed per batch here:
All the batch accuracies are summed up:
And averaged to produce the epoch metric:
But since the test set is 10k samples big, batch size is 128 and drop_last=False
in the dataloader, this leads to one last batch of (10000 / 128 - 10000 // 128) * 128 = 16
samples counting 128 / 6 = 8
times more than other samples.
In the worst case those 16 are correct and all hypothetical other samples of the last batch would be wrong, leading to one 100% in the tally versus a 12.5%. If all the other samples have 70% accuracy this results in a score of a 1% percent higher accuracy. (((78*0.7+1)/79 - (78*0.7+0.125)/79)*100
)
In practice it only matters about 0.2% on average.
Overleaf?
Candidates
width=64
instead of width=16
Unlikely.
cnt = 0
...
total_cost = cost / cnt
Leads to division by zero and therefore inf loss. This cripples optimization schemes that don't make use of the sim
cost function (where it's fixed).
Original inversefed repo doesn't have cnt
, so this was added in ATSPrivacy.
One solution could be to only take the angle into account.
Random policy generator can occasionally sample the same policy more than once. In that case, number of evaluated policies is lower than expected, i.e. for our setup it's 1592
instead of 1600
. The same issue exists in the original implementation
I noticed these lines:
In both cases, the loss is halved before being returned.
The line in the Classification
class was acknowledged as a bug here and subsequently fixed. The author posted an update for table 1 and said they would try to update the paper on arXiv, which hasn't happened yet.
PSNR
class contains the same halving and was not discussed. Find out if this is a bug or intended.The acknowledged bug is present in the reference implementation we're working with because it depends on this work, and it seems we must assume that experiments were run with it present.
Instead uses models initialized in inversefed/nn/models.py
, which in turn must be trained with benchmark/cifar100_train.py
.
There is a minus in accuracy metric computing function and it's not in equation (10).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.