GithubHelp home page GithubHelp logo

domainbed's Introduction

Welcome to DomainBed

DomainBed is a PyTorch suite containing benchmark datasets and algorithms for domain generalization, as introduced in In Search of Lost Domain Generalization.

Current results

Result table

Full results for commit 7df6f06 in LaTeX format available here.

Available algorithms

The currently available algorithms are:

Send us a PR to add your algorithm! Our implementations use ResNet50 / ResNet18 networks (He et al., 2015) and the hyper-parameter grids described here.

Available datasets

The currently available datasets are:

Send us a PR to add your dataset! Any custom image dataset with folder structure dataset/domain/class/image.xyz is readily usable. While we include some datasets from the WILDS project, please use their official code if you wish to participate in their leaderboard.

Available model selection criteria

Model selection criteria differ in what data is used to choose the best hyper-parameters for a given model:

  • IIDAccuracySelectionMethod: A random subset from the data of the training domains.
  • LeaveOneOutSelectionMethod: A random subset from the data of a held-out (not training, not testing) domain.
  • OracleSelectionMethod: A random subset from the data of the test domain.

Quick start

Download the datasets:

python3 -m domainbed.scripts.download \
       --data_dir=./domainbed/data

Train a model:

python3 -m domainbed.scripts.train\
       --data_dir=./domainbed/data/MNIST/\
       --algorithm IGA\
       --dataset ColoredMNIST\
       --test_env 2

Launch a sweep:

python -m domainbed.scripts.sweep launch\
       --data_dir=/my/datasets/path\
       --output_dir=/my/sweep/output/path\
       --command_launcher MyLauncher

Here, MyLauncher is your cluster's command launcher, as implemented in command_launchers.py. At the time of writing, the entire sweep trains tens of thousands of models (all algorithms x all datasets x 3 independent trials x 20 random hyper-parameter choices). You can pass arguments to make the sweep smaller:

python -m domainbed.scripts.sweep launch\
       --data_dir=/my/datasets/path\
       --output_dir=/my/sweep/output/path\
       --command_launcher MyLauncher\
       --algorithms ERM DANN\
       --datasets RotatedMNIST VLCS\
       --n_hparams 5\
       --n_trials 1

After all jobs have either succeeded or failed, you can delete the data from failed jobs with python -m domainbed.scripts.sweep delete_incomplete and then re-launch them by running python -m domainbed.scripts.sweep launch again. Specify the same command-line arguments in all calls to sweep as you did the first time; this is how the sweep script knows which jobs were launched originally.

To view the results of your sweep:

python -m domainbed.scripts.collect_results\
       --input_dir=/my/sweep/output/path

Running unit tests

DomainBed includes some unit tests and end-to-end tests. While not exhaustive, but they are a good sanity-check. To run the tests:

python -m unittest discover

By default, this only runs tests which don't depend on a dataset directory. To run those tests as well:

DATA_DIR=/my/datasets/path python -m unittest discover

License

This source code is released under the MIT license, included here.

domainbed's People

Contributors

accumulated avatar aengusl avatar alexrame avatar ashok-arjun avatar daysm avatar dnap512 avatar gordon-guojun-zhang avatar igul222 avatar jc-audet avatar jean72human avatar jpgard avatar jungwon-choi avatar lopezpaz avatar m-just avatar mathieuchevalley avatar minhlong94 avatar mohomran avatar prockenschaub avatar ryoungj avatar shahtalebi avatar shakedpe avatar sirrob1997 avatar teeann avatar yugeten avatar zdhnarsil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

domainbed's Issues

Adding Learning Rate Schedules

Since optimizers are currently defined in algorithms.py and used in the update method, where would one add a learning rate schedule e.g. torch.optim.lr_scheduler.ReduceLROnPlateau to keep the design patterns?

MLP with depth=1

Hi,
Thanks for this great library.
We encountered a minor issue: the MLP class does not support the case of depth=1. The minimum length for the MLP is 2 right now. It could be a nice addition to create linear layers.

2 test environments?

Hi, really cool repository :)

In the sweep, the all_combinations consider both 1 and 2 test envs. However, Appendix B of your paper seems only to consider the 1 test env (indicated by each column). Could you elaborate on the missing part of my understanding?

Best regards,

About the DANN result

when we set 'algorithm=DANN', and run the code directly, the results in the paper are not available. Here is the output of the program run. why? Thanks.
Args:
algorithm: DANN
checkpoint_freq: None
data_dir: /dataset/DG/
dataset: PACS
holdout_fraction: 0.2
hparams: None
hparams_seed: 0
output_dir: train_output
save_model_every_checkpoint: False
seed: 0
skip_model_save: False
steps: None
task: domain_generalization
test_envs: [0]
trial_seed: 0
uda_holdout_fraction: 0
HParams:
batch_size: 32
beta1: 0.5
class_balanced: False
d_steps_per_g_step: 1
data_augmentation: True
grad_penalty: 0.0
lambda: 1.0
lr: 5e-05
lr_d: 5e-05
lr_g: 5e-05
mlp_depth: 3
mlp_dropout: 0.0
mlp_width: 256
nonlinear_classifier: False
resnet18: False
resnet_dropout: 0.0
weight_decay: 0.0
weight_decay_d: 0.0
weight_decay_g: 0.0
env0_in_acc env0_out_acc env1_in_acc env1_out_acc env2_in_acc env2_out_acc env3_in_acc env3_out_acc epoch gen_loss mem_gb step step_time
0.2092739475 0.1931540342 0.2953091684 0.3119658120 0.3083832335 0.2724550898 0.2414122137 0.2242038217 0.0000000000 0.8979552984 7.9268550873 0 0.7343387604
disc_loss env0_in_acc env0_out_acc env1_in_acc env1_out_acc env2_in_acc env2_out_acc env3_in_acc env3_out_acc epoch gen_loss mem_gb step step_time
1.2359811942 0.8078096400 0.7652811736 0.8528784648 0.8482905983 0.9812874251 0.9580838323 0.8374681934 0.8382165605 7.1856287425 -0.685258171 8.2017307281 300 0.4992028658
11.189239563 0.4655277608 0.4621026895 0.7425373134 0.7371794872 0.7559880240 0.7335329341 0.7468193384 0.7414012739 14.371257485 3.4806638861 8.2017307281 600 0.4944124389
238.60761779 0.1061622941 0.1442542787 0.2739872068 0.3055555556 0.1474550898 0.1886227545 0.2512722646 0.2369426752 21.556886227 5.7296134837 8.2017307281 900 0.4963528244
2082.6883511 0.2245271507 0.2371638142 0.3187633262 0.3290598291 0.5022455090 0.4760479042 0.3985368957 0.3885350318 28.742514970 873.57823590 8.2017307281 1200 0.4959800331
1.0357982743 0.2556436852 0.2567237164 0.3928571429 0.3696581197 0.5381736527 0.4940119760 0.4109414758 0.4152866242 35.928143712 3.9007051329 8.2017307281 1500 0.4922414263
24.230077548 0.4002440513 0.3936430318 0.5986140725 0.5854700855 0.7020958084 0.6437125749 0.5540712468 0.5528662420 43.113772455 569.27031482 8.2017307281 1800 0.4939069223
1.1945504181 0.3837705918 0.3911980440 0.5772921109 0.5641025641 0.7215568862 0.6706586826 0.6141857506 0.6394904459 50.299401197 1.0432499917 8.2017307281 2100 0.5033365639
382.80088246 0.4051250763 0.4034229829 0.6753731343 0.6752136752 0.8203592814 0.7754491018 0.7019720102 0.7082802548 57.485029940 392.48228594 8.2017307281 2400 0.4943927431
429.56468669 0.3715680293 0.3716381418 0.6988272921 0.7179487179 0.8248502994 0.7485029940 0.7659033079 0.7605095541 64.670658682 2884.0271361 8.2017307281 2700 0.4916381876
1.2151116145 0.4130567419 0.4107579462 0.6023454158 0.6047008547 0.7754491018 0.7365269461 0.6669847328 0.6433121019 71.856287425 0.6902019918 8.2017307281 3000 0.4960203767
244.13614944 0.3343502135 0.3496332518 0.6476545842 0.6517094017 0.7133233533 0.6826347305 0.7045165394 0.6917197452 79.041916167 314.87799603 8.2017307281 3300 0.4902488399
70.464921302 0.5009151922 0.4938875306 0.6087420043 0.5940170940 0.7717065868 0.7664670659 0.6428117048 0.6471337580 86.227544910 170.52687933 8.2017307281 3600 0.4870569730
2427.8547357 0.1696156193 0.1809290954 0.3678038380 0.3632478632 0.3016467066 0.2814371257 0.4736005089 0.4840764331 93.413173652 1438.1708783 8.2017307281 3900 0.4942047254
890.91614838 0.1848688225 0.1882640587 0.4024520256 0.4273504274 0.5254491018 0.4880239521 0.3555979644 0.3350318471 100.59880239 1838.5080866 8.2017307281 4200 0.4956636135
9005.7036366 0.3123856010 0.2787286064 0.4205756930 0.4059829060 0.5853293413 0.5658682635 0.3619592875 0.3656050955 107.78443113 1594.6786078 8.2017307281 4500 0.4901909248
10238.970430 0.2733374009 0.2591687042 0.5042643923 0.4935897436 0.5209580838 0.4670658683 0.5063613232 0.4777070064 114.97005988 12094.406260 8.2017307281 4800 0.4918605773
385962.36794 0.1928004881 0.2542787286 0.3155650320 0.3354700855 0.3031437126 0.3353293413 0.3005725191 0.3095541401 119.76047904 111583.26512 8.2017307281 5000 0.4945144749

Multi-GPU Launcher

It's a very excellent repo! Thanks for building it.

But I am wondering if there's a way to add a multi-GPU launcher to "command_launchers.py" for more convenient use.

For E.g. If I have an 8-GPUs machine, and I want to launch 8 jobs every time for the 8 GPU.

Is there a way to do it? or some ideas that I would then contribute some efforts to build one.

Unable to get the code running

Getting the following error when even the simple python3 -m domainbed.scripts.train is run:

FileNotFoundError: Found no valid file for the classes raw. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp

Trying out for PACS dataset, and found that all image types are indeed proper (i.e. png)

Different results in the README page and the paper

Hi,

I may have missed something, but it seems what's reported in the "Current results" section in the README page is different from Table 4 in the original paper. I wonder has it been updated, and how?

Thank you!

ColorMNIST domain error

In paper, the colormnist domain is {0.1, 0.3, 0.9}
image

But in code, , the colormnist domain is {0.1, 0.2, 0.9}
image

Implement checkpointing

The way it is currently implemented, we save the model parameters after the last training step is finished. However, this doesn't conveniently allow users to access the best performing model parameters for different validation techniques. In fact, I think only the parameters for oracle validation can be accessed this way, as early stopping is disabled there and with the printing of the best-performing hyperparameters, we can obtain the respective tuning step.

To support the other validation techniques, it would probably be helpful to have a flag where we persist model parameters at each validation checkpoint. Once we run collect_results.py, we might also have an option to prune these model checkpoints and only keep the ones that correspond to the best-performing one for a specific validation technique and test environment. This would help to save disk space (we keep only model checkpoints) and make it even more convenient to browse through them.

Maybe there is even an option of implementing this that would be better.

Extension to Unsupervised Domain Adaptation

Thank you for your research and work on DomainBed.

In Appendix E.1 you mention the possibility of extending the method .update(minibatches, unlabeled) to accept a minibatch of unlabeled examples.

Could you explain how you think this extension would be implemented best? Would unlabeled specify whether minibatches is unlabeled? I'm working on extending the .update method of AbstractDANN for UDA.

sweep accuracy far lower than Full results for commit 7df6f06

I follow the same command line mentioned in README.md and random choose an algorithm to get some results. However, I found my results are more then 20% below what had reported. Could you please help me to figure out the problems?

I have already recheck my PACS dataset, which is the same as "https://drive.google.com/uc?id=0B6x7gtvErXgfbF9CSk53UkRxVzg" in download.py. All command line can be found below.

python -m domainbed.scripts.sweep launch
--data_dir=/home/guoweiyu/yidong/data/PACS
--output_dir=/home/guoweiyu/yidong/dg/sweep
--command_launcher multi_gpu
--algorithms DANN
--datasets PACS
--n_hparams 1
--n_trials 1

Environment:
Python: 3.7.3
PyTorch: 1.3.1
Torchvision: 0.4.2
CUDA: 10.1.243
CUDNN: 7603
NumPy: 1.16.2
PIL: 5.4.1
Args:
algorithm: DANN
checkpoint_freq: None
data_dir: /home/guoweiyu/yidong/data/PACS/
dataset: PACS
holdout_fraction: 0.2
hparams: None
hparams_seed: 0
output_dir: train_output
save_model_every_checkpoint: False
seed: 0
skip_model_save: False
steps: None
test_envs: [0]
trial_seed: 0
HParams:
batch_size: 32
beta1: 0.5
class_balanced: False
d_steps_per_g_step: 1
data_augmentation: True
grad_penalty: 0.0
lambda: 1.0
lr: 5e-05
lr_d: 5e-05
lr_g: 5e-05
mlp_depth: 3
mlp_dropout: 0.0
mlp_width: 256
resnet18: False
resnet_dropout: 0.0
weight_decay: 0.0
weight_decay_d: 0.0
weight_decay_g: 0.0

(base) guoweiyu@bj08:~/yidong/dg/DomainBed-master$ python -m domainbed.scripts.collect_results> --input_dir=/home/guoweiyu/yidong/dg/sweep
Total records: 170

-------- Dataset: PACS, model selection method: training-domain validation set
Algorithm A C P S Avg
DANN 69.7 +/- 0.0 68.1 +/- 0.0 96.9 +/- 0.0 64.8 +/- 0.0 74.9

-------- Averages, model selection method: training-domain validation set
Algorithm PACS Avg
DANN 74.9 +/- 0.0 74.9

-------- Dataset: PACS, model selection method: leave-one-domain-out cross-validation
Algorithm A C P S Avg
DANN 40.3 +/- 0.0 68.1 +/- 0.0 94.1 +/- 0.0 64.8 +/- 0.0 66.8

-------- Averages, model selection method: leave-one-domain-out cross-validation
Algorithm PACS Avg
DANN 66.8 +/- 0.0 66.8

-------- Dataset: PACS, model selection method: test-domain validation set (oracle)
Algorithm A C P S Avg
DANN 21.1 +/- 0.0 18.1 +/- 0.0 29.0 +/- 0.0 18.6 +/- 0.0 21.7

-------- Averages, model selection method: test-domain validation set (oracle)
Algorithm PACS Avg
DANN 21.7 +/- 0.0 21.7
(base) guoweiyu@bj08:~/yidong/dg/DomainBed-master$ Welcome to Ubuntu 14.04.4 LTS (GNU/Linux 4.2.0-27-generic x86_64)

Wrong batch size being used

If I run the ERM on PACS via the following command:

python -m domainbed.scripts.train --data_dir=datasets/ --algorithm ERM --dataset PACS --hparams='{"resnet18": "True"}'

and print the all_x and all_y in the ERM algorithm like:

 def update(self, minibatches):
        all_x = torch.cat([x for x,y in minibatches])
        all_y = torch.cat([y for x,y in minibatches])
        print(all_x.shape)
        print(all_y.shape)

I get this output for the parameters and the sizes:

Args:
        algorithm: ERM
        checkpoint_freq: None
        data_dir: ../../projects/DomainBed/datasets/
        dataset: PACS
        holdout_fraction: 0.2
        hparams: {"resnet18": "True"}
        hparams_seed: 0
        output_dir: train_output
        seed: 0
        skip_model_save: False
        steps: None
        test_envs: [0]
        trial_seed: 0
HParams:
        batch_size: 32
        class_balanced: False
        data_augmentation: True
        groupdro_eta: 0.01
        irm_lambda: 100.0
        irm_penalty_anneal_iters: 500
        lr: 5e-05
        mixup_alpha: 0.2
        mldg_beta: 1.0
        mlp_depth: 3
        mlp_dropout: 0.0
        mlp_width: 256
        mmd_gamma: 1.0
        mtl_ema: 0.99
        resnet18: True
        resnet_dropout: 0.0
        sag_w_adv: 0.1
        weight_decay: 0.0
torch.Size([96, 3, 224, 224])
torch.Size([96])

Since all_x and all_y represent batches of shape batch size x C x H x W shouldn't batch_size = 32 instead of the batch size suggested by the shape batch_size = 96 since it is stated with 32 as hyperparameter?

Maybe I missed something since this is exactly multiplied by factor 3.

About test-domain validation set Score...

Thanks for making a great library.
However, while I was measuring the performance of the algorithm with sweep.py, I found that the test-domain validation set came out lower than the training-domain validation set.
It's not just my algorithm, it's common in the existing sweep data.
I think this is not normal, Any idea why this is happening?

Extending Domainbed code for a new algorithm

Thank you so much for providing such good and efficient responses to all issues.
In the supp material of the paper, it is mentioned that:
Algorithms are classes that implement two methods: .update(minibatches) and
.predict(x). The update method receives a list of minibatches, one minibatch per training
domain, and each minibatch containing one input and one output tensor

what is one input and output tensor ? and also if you can provide a bit more explanation on how to integrate a new model by inheriting the ERM, it would be helpful.

Get best performing hyperparameters / Re-run hyperparameters with different trial seed

As it is currently implemented, launching a sweep with n_trials > 1 allows for reporting std. deviation of the best performing hyperparameter setting based on the different validation techniques. If I correctly understood the implementation, this computes n_trials different in- and out-splits and runs them for each of the sampled hyperparameter settings.

Users with limited resources who are implementing a new algorithm might not want to do that. A more convenient and faster way would be an option where users are able to tune hyperparameters and then run ONLY the best performing setting for each of the validation techniques on additional n_trials seeds to report std. deviations of choosing different in- and out-splits.

Also: What is the current way of obtaining the best performing hyperparameter setting for each of the validation techniques?

cdpl parameters in SelfReg are not registered in the optimizer

In SelfReg algorithm, cdpl is defined in L 1031-1040. However these parameters are not appended to parameters in self.network or self.optimizer. This leads to no update in self.cdpl whatsoever.
I went through the paper's main code base independent of domainbed at https://github.com/dnap512/SelfReg/blob/334af5fc2430d4183210956bc4d86d86816feeee/codes/model/resnet18_selfreg.py#L174
and observed that self.proj which is self.cdpl is directly registered to the model and hence optimizer will optimize these parameters too.
Thank you

Slow training speed for custom dataset (million images)

My dataset have 2 domains, the first domain is for training, which contains 1.35 million images.
And the other domain contains about 10,000 images.

I modified the source code in domainbed/datasets.py so that I can load my dataset

I trained with the following command:

CUDA_VISIBLE_DEVICES=2 python3 -m domainbed.scripts.train --data_dir=../mydataset/ --algorithm CORAL --dataset mydataset

However, the screen stuck after the following text output

Environment:
        Python: 3.6.7
        PyTorch: 1.7.1+cu92
        Torchvision: 0.8.2+cu92
        CUDA: 9.2
        CUDNN: 7603
        NumPy: 1.19.5
        PIL: 8.3.1
Args:
        algorithm: CORAL
        checkpoint_freq: None
        data_dir: ../mydataset/
        dataset: mydataset
        holdout_fraction: 0.2
        hparams: None
        hparams_seed: 0
        output_dir: train_output
        save_model_every_checkpoint: False
        seed: 0
        skip_model_save: False
        steps: None
        task: domain_generalization
        test_envs: [1]
        trial_seed: 0
        uda_holdout_fraction: 0
HParams:
        batch_size: 256
        class_balanced: False
        data_augmentation: True
        lr: 5e-05
        mmd_gamma: 1.0
        nonlinear_classifier: False
        resnet18: True
        resnet_dropout: 0.0
        weight_decay: 0.0

Is there any method to handle such a big dataset? or anything I can do for this situation?

Severe decrease in MMD performance on DomainNet dataset

It seems the CORAL and MMD algorithms are very similar, and based on the "current result" table, in most datasets, they achieve close results. However, there is a big difference in their accuracy on the DomainNet dataset (41.5-23.4). I was wondering, is there any specific reason for this, or might it be a bug?

Single source performance

Hi, are there any results for the single-source domain generalization case? I will appreciate it if reliable results can be provided. I really appreciate any help you can provide.

Getting access to domain knowledge

Currently, samples from the training domains are just passed as minibatches to the algorithms in algorithms.py. If I'm correct, this leaves the implemented algorithms no option for splitting them into the different domains and revokes access to domain knowledge. However, this is a common strategy for domain generalization algorithms.

Am I missing this option or how would one go about implementing this? Can one split the minibatches based on num_domains to reconstruct the domains or are they shuffled?

Interpreting results after training algorithms

I tried to train the algorithm ERM on PACS dataset. The final logs at the end of training are as follows:

env0_in_acc env0_out_acc env1_in_acc env1_out_acc env2_in_acc env2_out_acc env3_in_acc env3_out_acc epoch loss step step_time

0.8377059182 0.8239608802 0.9845415778 0.9423076923 0.9947604790 0.9670658683 0.9812340967 0.9299363057 3900.0000000 0.0155880866 3900 0.5439552005

However, I do not understand what do they represent. If out_acc represents the out of domain accuracy then the numbers do not make sense, since its greater than 90 percent for 3 domains, while in PACS only the test domain Photo obtains more than 90 percent out of domain accuracy

How do I obtain the out of domain accuracy on individual test domains ( Photo, Art Painting, Cartoon, Sketch )?

Library versions

Could you please specify the library versions (pytorch, torchvision and CUDA) for this scipts.

About algorithm MMD

I'm wondering if you really implement the "Domain Generalization with Adversarial Feature Learning" the MMD algorithm?
Since it seems no adversarial autoencode in the code?

ERM results from a randomly initialized network.

The paper provides results for ERM using ResNet50 models initialized with ImageNet weights. Please let me know if there are results available for models trained from scratch as well.

It'll be interesting to see how much of an effect ImageNet weights have on the final results. Using an ImageNet initialization is in a way equivalent to indirectly using the entire ImageNet dataset for training. While this will not invalidate the claims in the paper as most DG methods use ImageNet weights as a starting point, training a model from scratch will give a fairer idea of the contribution of the ERM algorithm itself for DG.

About MLDG's update()

This was written in the MLDG's update function.
"TODO: update() has at least one bug, possibly more. Disabling this whole algorithm until it gets figured out."
Maybe this bug is about second-order derivatives?
If not, please explain what kind of problem this bug is about.

Predefined groups for training groupDRO

Hello,
Thank you very much for this amazing comparison. I have slight confusion regarding the implementation of groupDRO.
In the groupDRO paper authors carefully generate the training datasets to contain a mixture of groups that are defined by leveraging human knowledge (identify some spurious correlations).

However, in this repository and comparison in the paper I fail to see such a construction of predefined groups for groupDRO. As far as I understand the implementation, dataset is randomly sampled and each sample in the minibatch is considered to arrive from one particular training group. I understand that predefining groups on a new dataset is difficult but is it justifiable to compare groupDRO with such a relaxation. I tried to look into the paper associated with this repo, however, I did not see any mention on the same.

Thank you so much for your time and patience!

underlying_length of train_loader is 1

I noticed that you use WeightedRandomSampler when training, but it lead to wrong steps_per_epoch and wrong 'epoch' in logs.
When length==self.INFINITE, self.underlying_length becomes 1.

class FastDataLoader(object):
    INFINITE = 'infinite'
    EPOCH = 'epoch'
    """DataLoader wrapper with slightly improved speed by not respawning worker
    processes at every epoch."""
    def __init__(self, dataset, weights, batch_size, num_workers,
        length=EPOCH):
        super(FastDataLoader, self).__init__()

        if length == self.EPOCH and weights != None:
            raise Exception("Specifying sampling weights with length=EPOCH is "
                "illegal: every datapoint would eventually get sampled exactly "
                "once.")

        if weights == None:
            weights = torch.ones(len(dataset))

        if length == self.INFINITE:
            batch_sampler = torch.utils.data.BatchSampler(
                torch.utils.data.WeightedRandomSampler(weights,
                    replacement=True,
                    num_samples=batch_size),
                batch_size=batch_size,
                drop_last=True)
            print(length==self.INFINITE)
        else:
            batch_sampler = torch.utils.data.BatchSampler(
                torch.utils.data.SequentialSampler(dataset),
                batch_size=batch_size,
                drop_last=False
            )
        self.length = length
        self.underlying_length = len(batch_sampler)

ENV:
Python==3.7
Pytorch==1.3.1
Torchvision==0.4.2

Guarantee that sampled hyperparameters are the same for future commits

Just opening this here to keep track of the issue mentioned in #12:

"We will have to find a solution to guarantee that the same random parameters are sampled in hparams_registry given the same random seed, for all future commits. Right now, adding algorithms or conditional logic in hparams_registry changes the random hyperparameters sampled for a given algorithm, dataset, and seed."

Why are there no proper epochs?

As of now, the training loop just uses continuous update steps and hence the checkpoints usually occur in the middle of epochs as reported by the floats produced in 'epoch': step / steps_per_epoch .

For me, this doesn't make too much sense. Shouldn't we count proper epochs and then define checkpoint_freq in terms of the epochs instead of the steps such that we report validation accuracies for every checkpoint_freq COMPLETED epochs?

This way we could also follow proper design for learning rate schedules since scheduler.step(val_loss) is usually invoked after every epoch and for torch.optim.lr_scheduler.ReduceLROnPlateau the learning rate gets adapted based on the change of the validation loss over multiple consecutive epochs.

Terra Incognita dataset download

Thanks again for the nice work. I am currently working on a new approach for domain generalization, your code/framework has been of great help. I just have a problem downloading the Terra Incognita dataset.

python3 download.py --data_dir $DATA_DIR
Downloading...
From: http://www.vision.caltech.edu/~sbeery/datasets/caltechcameratraps18/eccv_18_all_images_sm.tar.gz
To: terra_incognita/terra_incognita_images.tar.gz
Traceback (most recent call last):
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/tarfile.py", line 1646, in gzopen
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/tarfile.py", line 1623, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/tarfile.py", line 1486, in __init__
    self.firstmember = self.next()
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/tarfile.py", line 2289, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/tarfile.py", line 1094, in fromtarfile
    buf = tarfile.fileobj.read(BLOCKSIZE)
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/gzip.py", line 287, in read
    return self._buffer.read(size)
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/gzip.py", line 474, in read
    if not self._read_gzip_header():
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/gzip.py", line 422, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'<!')

The links in "http://www.vision.caltech.edu/~sbeery/datasets/caltechcameratraps18/eccv_18_all_images_sm.tar.gz" and here "http://www.vision.caltech.edu/~sbeery/datasets/caltechcameratraps18/eccv_18_all_annotations.tar.gz" seem inactive.
I could not find more information on the original dataset website https://beerys.github.io/CaltechCameraTraps/.

I am not sure you can do anything about it, but just to let you know.
Sincerely
Alexandre Ramé

about domain adaptation

Hi, when I run "python train.py --data_dir=../../dataset/ --algorithm ERM --dataset RotatedMNIST --task domain_adaptation --uda_holdout_fraction 0", I got the error below. I did't modify the code, is there someting wrong? What' the meaning of "uda_holdout_fraction"? Thanks!
Traceback (most recent call last): File "/home/weiyuhua/Code/DomainBed/domainbed/scripts/train.py", line 211, in <module> for x,_ in next(uda_minibatches_iterator)] StopIteration

KeyError: 'env5_in_acc' | DomainNet experiment

While running "domainbed.scripts.collect_results", I am getting the "KeyError: 'env5_in_acc" error. Also, I notice the "DomainNet experiment" just print env0-env4 while it supposed to print env5 as well. I was wondering if you could help me to fix it.

The error:
Traceback (most recent call last): File "./anaconda3/envs/wavenet3/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "./anaconda3/envs/wavenet3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/./DomainBed/domainbed/scripts/collect_results.py", line 187, in <module> print_results_tables(records, selection_method, args.latex) File "/./DomainBed/domainbed/scripts/collect_results.py", line 91, in print_results_tables grouped_records = get_grouped_records(records).map(lambda group: File "/./DomainBed/domainbed/lib/query.py", line 111, in map return Q([fn(x) for x in self._list]) File "/./DomainBed/domainbed/lib/query.py", line 111, in <listcomp> return Q([fn(x) for x in self._list]) File "/./DomainBed/domainbed/scripts/collect_results.py", line 92, in <lambda> { **group, 'sweep_acc': selection_method.sweep_acc(group['records']) } File "/./DomainBed/domainbed/model_selection.py", line 34, in sweep_acc .map(lambda _, run_records: self.run_acc(run_records)) File "/./DomainBed/domainbed/lib/query.py", line 109, in map return Q([fn(*x) for x in self._list]) File "/./DomainBed/domainbed/lib/query.py", line 109, in <listcomp> return Q([fn(*x) for x in self._list]) File "/./DomainBed/domainbed/model_selection.py", line 34, in <lambda> .map(lambda _, run_records: self.run_acc(run_records)) File "/./DomainBed/domainbed/model_selection.py", line 89, in run_acc return test_records.map(self._step_acc).argmax('val_acc') File "/./DomainBed/domainbed/lib/query.py", line 111, in map return Q([fn(x) for x in self._list]) File "/./DomainBed/domainbed/lib/query.py", line 111, in <listcomp> return Q([fn(x) for x in self._list]) File "/./DomainBed/domainbed/model_selection.py", line 81, in _step_acc 'test_acc': record[test_in_acc_key] KeyError: 'env5_in_acc'

DomainNet experiment output:
env0_in_acc env0_out_acc env1_in_acc env1_out_acc env2_in_acc env2_out_acc env3_in_acc env3_out_acc env4_in_acc env4_out_acc epoch loss step step_time 0.0028049034 0.0023896104 0.0046749346 0.0036818138 0.0031480809 0.0030443507 0.0027463768 0.0024927536 0.0041326210 0.0050911376 0.0000000000 5.1279506683 0 0.6886174679

About reproduced RSC result in DomainBed

Hi, I want to know about RSC results(reproduced on DomainBed) experimented on datasets(PACS,OfficeHome,VLCS) in DomainBed. I can't find specific results of RSC(results for each domain) in your paper.
If you can, could you show me results like this form?
A C P S
88.1±0.1 77.9±1.3 97.8±0.0 79.1±0.9

Here are what I want to know!

PACS

  1. Average accuracy of each domain (ResNet18 with RSC)

  2. Average accuracy of each domain (ResNet50 with RSC)

OfficeHome

  1. Average accuracy of each domain (ResNet18 with RSC)

  2. Average accuracy of each domain (ResNet50 with RSC)

VLCS

  1. Average accuracy of each domain (ResNet18 with RSC)

6 .Average accuracy of each domain (ResNet50 with RSC)

Training output and settings explaination

Hi!
Thank you for your repository. I'd like to have some clarification about the training output; setting for example test_envs to 0 I get this output:
image
What does the columns represent?
Moreover, what's the difference between different test_envs configurations?
Thank you for your help!

Incorrect reference to CDANN paper in Readme

I suppose the reference to the CDANN paper in Readme is wrong. It points to the following paper (https://arxiv.org/abs/1807.08479), which actually uses MMD in their method, and not domain discriminators that are used in the implementation of CDANN in DomainBed (code).

I think the correct reference to the approach CDANN should be the following paper, which uses domain discriminators.

Ya Li, Xinmei Tian, Mingming Gong, Yajing Liu, Tongliang Liu, Kun Zhang, and Dacheng Tao. Deep domain generalization via conditional invariant adversarial networks.ECCV, 2018d (Link).

An issue related to model selection

Thanks for your work and code framework.

When I run your code with the default hyperparameters, I find the performance of oracle selection is weaker than that of the training domain selection.

Is there anything wrong?

1381603161814_ pic_hd

Could you share the selected hyperparameters to reproduce the results?

Hi, thanks to your great work.

I'm trying to reproduce the results reported in the paper. For the ERM, the default hyperparameter setting yields similar results as reported in the PACS, but it shows a little worse in the VLCS than reported. I suspect the difference come from the hyperparamters, so I wonder the searched values. Could you share the selected hyperparameters, at least for the ERM?

Different Train/Val split for PACS

Hi, I find that you do not use the default train/val split for PACS data-set, and for this data set we usually follow the original train/val split by the author, will this cause the problem?

About the DANN result

when we set 'algorithm=DANN', and run the code directly, the results in the paper are not available. Here is the output of the program run. why? Thanks.
Args:
algorithm: DANN
checkpoint_freq: None
data_dir: /dataset/DG/
dataset: PACS
holdout_fraction: 0.2
hparams: None
hparams_seed: 0
output_dir: train_output
save_model_every_checkpoint: False
seed: 0
skip_model_save: False
steps: None
task: domain_generalization
test_envs: [0]
trial_seed: 0
uda_holdout_fraction: 0
HParams:
batch_size: 32
beta1: 0.5
class_balanced: False
d_steps_per_g_step: 1
data_augmentation: True
grad_penalty: 0.0
lambda: 1.0
lr: 5e-05
lr_d: 5e-05
lr_g: 5e-05
mlp_depth: 3
mlp_dropout: 0.0
mlp_width: 256
nonlinear_classifier: False
resnet18: False
resnet_dropout: 0.0
weight_decay: 0.0
weight_decay_d: 0.0
weight_decay_g: 0.0
env0_in_acc env0_out_acc env1_in_acc env1_out_acc env2_in_acc env2_out_acc env3_in_acc env3_out_acc epoch gen_loss mem_gb step step_time
0.2092739475 0.1931540342 0.2953091684 0.3119658120 0.3083832335 0.2724550898 0.2414122137 0.2242038217 0.0000000000 0.8979552984 7.9268550873 0 0.7343387604
disc_loss env0_in_acc env0_out_acc env1_in_acc env1_out_acc env2_in_acc env2_out_acc env3_in_acc env3_out_acc epoch gen_loss mem_gb step step_time
1.2359811942 0.8078096400 0.7652811736 0.8528784648 0.8482905983 0.9812874251 0.9580838323 0.8374681934 0.8382165605 7.1856287425 -0.685258171 8.2017307281 300 0.4992028658
11.189239563 0.4655277608 0.4621026895 0.7425373134 0.7371794872 0.7559880240 0.7335329341 0.7468193384 0.7414012739 14.371257485 3.4806638861 8.2017307281 600 0.4944124389
238.60761779 0.1061622941 0.1442542787 0.2739872068 0.3055555556 0.1474550898 0.1886227545 0.2512722646 0.2369426752 21.556886227 5.7296134837 8.2017307281 900 0.4963528244
2082.6883511 0.2245271507 0.2371638142 0.3187633262 0.3290598291 0.5022455090 0.4760479042 0.3985368957 0.3885350318 28.742514970 873.57823590 8.2017307281 1200 0.4959800331
1.0357982743 0.2556436852 0.2567237164 0.3928571429 0.3696581197 0.5381736527 0.4940119760 0.4109414758 0.4152866242 35.928143712 3.9007051329 8.2017307281 1500 0.4922414263
24.230077548 0.4002440513 0.3936430318 0.5986140725 0.5854700855 0.7020958084 0.6437125749 0.5540712468 0.5528662420 43.113772455 569.27031482 8.2017307281 1800 0.4939069223
1.1945504181 0.3837705918 0.3911980440 0.5772921109 0.5641025641 0.7215568862 0.6706586826 0.6141857506 0.6394904459 50.299401197 1.0432499917 8.2017307281 2100 0.5033365639
382.80088246 0.4051250763 0.4034229829 0.6753731343 0.6752136752 0.8203592814 0.7754491018 0.7019720102 0.7082802548 57.485029940 392.48228594 8.2017307281 2400 0.4943927431
429.56468669 0.3715680293 0.3716381418 0.6988272921 0.7179487179 0.8248502994 0.7485029940 0.7659033079 0.7605095541 64.670658682 2884.0271361 8.2017307281 2700 0.4916381876
1.2151116145 0.4130567419 0.4107579462 0.6023454158 0.6047008547 0.7754491018 0.7365269461 0.6669847328 0.6433121019 71.856287425 0.6902019918 8.2017307281 3000 0.4960203767
244.13614944 0.3343502135 0.3496332518 0.6476545842 0.6517094017 0.7133233533 0.6826347305 0.7045165394 0.6917197452 79.041916167 314.87799603 8.2017307281 3300 0.4902488399
70.464921302 0.5009151922 0.4938875306 0.6087420043 0.5940170940 0.7717065868 0.7664670659 0.6428117048 0.6471337580 86.227544910 170.52687933 8.2017307281 3600 0.4870569730
2427.8547357 0.1696156193 0.1809290954 0.3678038380 0.3632478632 0.3016467066 0.2814371257 0.4736005089 0.4840764331 93.413173652 1438.1708783 8.2017307281 3900 0.4942047254
890.91614838 0.1848688225 0.1882640587 0.4024520256 0.4273504274 0.5254491018 0.4880239521 0.3555979644 0.3350318471 100.59880239 1838.5080866 8.2017307281 4200 0.4956636135
9005.7036366 0.3123856010 0.2787286064 0.4205756930 0.4059829060 0.5853293413 0.5658682635 0.3619592875 0.3656050955 107.78443113 1594.6786078 8.2017307281 4500 0.4901909248
10238.970430 0.2733374009 0.2591687042 0.5042643923 0.4935897436 0.5209580838 0.4670658683 0.5063613232 0.4777070064 114.97005988 12094.406260 8.2017307281 4800 0.4918605773
385962.36794 0.1928004881 0.2542787286 0.3155650320 0.3354700855 0.3031437126 0.3353293413 0.3005725191 0.3095541401 119.76047904 111583.26512 8.2017307281 5000 0.4945144749

Saving and loading trained models

Hi I was wondering how to save amodel and then use it later. Turned out that each training instance saves 'model.pkl' in to the directory and have 'algorithm.module' only. Is there a way to save the model and use it later as an standalone module. I am moving from Keras to Pytorch so kind of new to the functionalities. Appreciate an answer for this.

Setting the Seed.

Hi,
I am unable to clearly understand the seed setup.
If I understand correctly, you have set the various seeds once in the beginning in train.py and then multiple runs are executed using the same process.
Is this correct or my interpretation is wrong.
Can you please elaborate on the exact protocol used.

IGA implementation

Thank you for the great work! Your code is highly functional and I hope it will become the go-to open-source software for domain generalization with fair comparisons.

My question is related to the current implementation of the IGA algorithm, in

class IGA(ERM):

for i, (x, y) in enumerate(minibatches):
    [....]
    grads.append( autograd.grad(env_loss, self.network.parameters(), retain_graph=True) )
    
mean_loss = total_loss / len(minibatches)
mean_grad = autograd.grad(mean_loss, self.network.parameters(), retain_graph=True)
[....]
        penalty_value += (g - mean_g).pow(2).sum()
[....]
(mean_loss + self.hparams['penalty'] * penalty_value).backward()

I believe there is a small error: a "create_graph" is missing when calling autograd.grad.

In details, with the current implementation the penalty_value is useless in the training. As proof, if we replace the last "backward" by

(self.hparams['penalty'] * penalty_value).backward()

we obtain the error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Another proof is the lack of impact of the hyper-parameter self.hparams['penalty']

The fix is simple: just add create_graph in autograd.grad, to explicitly say to do further operations on gradients, to have a backpropable graph for those gradients

for i, (x, y) in enumerate(minibatches):
    [....]
    grads.append( autograd.grad(env_loss, self.network.parameters(), retain_graph=True, create_graph=True) )
    
mean_loss = total_loss / len(minibatches)
mean_grad = autograd.grad(mean_loss, self.network.parameters(), retain_graph=True, create_graph=True)

Overall, I am still (currently) unable to reproduce the results from the paper, but I believe this is a step forward.

Rotated-MNIST results

Would it be possible to provide the results for Rotated-MNIST for commit 7df6f06 with 2 digits precision?
Indeed, most methods are between 97.8 and 98.0 with the oracle model selection - complexifying the comparison.
Thank you in advance.

About the IRM implementation

Hi,

Thanks very much for this amazing project!

I found the penalty term in IRM is summed over batch samples while the ERM term is averaged. Since both terms are averaged in original IRM code, may I know whether there is any particular reason behind this implementation?

Thank you for your time.

Requirements up to date ?

I just cloned the repo and tried to run the train command. However I got the following error:
AttributeError: module 'torchvision.transforms' has no attribute 'InterpolationMode'

Is the requirements.txt file up to date ?
Here is my setting:

Environment:
        Python: 3.8.5
        PyTorch: 1.7.1
        Torchvision: 0.8.2
        CUDA: 11.0
        CUDNN: 8004
        NumPy: 1.20.3
        PIL: 8.3.2
Args:
        algorithm: ERM
        checkpoint_freq: None
        data_dir: ./domainbed/data/MNIST/
        dataset: RotatedMNIST
        holdout_fraction: 0.2
        hparams: None
        hparams_seed: 0
        output_dir: train_output
        save_model_every_checkpoint: False
        seed: 0
        skip_model_save: False
        steps: None
        task: domain_generalization
        test_envs: [0]
        trial_seed: 0
        uda_holdout_fraction: 0
HParams:
        batch_size: 64
        class_balanced: False
        data_augmentation: True
        lr: 0.001
        nonlinear_classifier: False
        resnet18: False
        resnet_dropout: 0.0
        weight_decay: 0.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.