GithubHelp home page GithubHelp logo

facebookresearch / multigrain Goto Github PK

View Code? Open in Web Editor NEW
230.0 15.0 38.0 2.33 MB

Code for "MultiGrain: a unified image embedding for classes and instances"

Home Page: https://arxiv.org/abs/1902.05509

License: Other

Python 100.00%

multigrain's Introduction

MultiGrain

MultiGrain is a neural network architecture that solves both image classification and image retrieval tasks.

The method is described in "MultiGrain: a unified image embedding for classes and instances" (arXiv link).

BibTeX reference:

@ARTICLE{2019arXivMultiGrain,
       author = {Berman, Maxim and J{\'e}gou, Herv{\'e} and Vedaldi Andrea and
         Kokkinos, Iasonas and Douze, Matthijs},
        title = "{{MultiGrain}: a unified image embedding for classes and instances}",
      journal = {arXiv e-prints},
         year = "2019",
        month = "Feb",
}

Please cite it if you use it.

Installation

The MultiGrain code requires

  • Python 3.5 or higher
  • PyTorch 1.0 or higher

and the requirements highlighted in requirements.txt

The requirements can be installed:

  • Ether by setting up a dedicated conda environment: conda env create -f environment.yml followed by source activate multigrain
  • Or with pip: pip install -r requirements.txt

Using the code

Extracting features with pre-trained networks

We provide pre-trained networks with ResNet-50 trunks for the following settings (top-1 accuracies given at scale 224):

λ p augmentation top-1 weights
1 1 full 76.8 joint_1B_1.0.pth
1 3 full 76.9 joint_3B_1.0.pth
0.5 1 full 77.0 joint_1B_0.5.pth
0.5 3 full 77.4 joint_3B_0.5.pth
0.5 3 autoaugment 78.2 joint_3BAA_0.5.pth

We provide fine-tuned networks for scales bigger than 224, as described in the Supplementary E. Only the pooling coefficient is fine-tuned:

network scale p top-1 weights
NASNet-A-Mobile 350 px 1.7 75.1 joint_1B_1.0.pth
SENet154 400 px 1.6 83.0 joint_3B_1.0.pth
PNASNet-5-Large 500 px 1.7 83.6 joint_1B_0.5.pth

To load a network, use the following PyTorch code:

import torch
from multigrain.lib import get_multigrain

net = get_multigrain('resnet50')

checkpoint = torch.load('base_1B_1.0.pth')

net.load_state_dict(checkpoint['model_state'])

The network takes images in any resolution. A normalization pre-processing step is used, with mean [0.485, 0.456, 0.406]. and standard deviation [0.229, 0.224, 0.225].

The pretrained weights do not include whitening of the features (important for retrieval), which are specific to each evaluation scale; follow steps below to compute and apply a whitening.

Evaluation of the networks

scripts/evaluate.py evaluates the network on standard benchmarks.

Classification results

Evaluate a network on ImageNet-val is straightforward using options from evaluate.py. For instance the following command:

IMAGENET_PATH=  # the path that contains the /val and /train image directories

python scripts/evaluate.py --expdir experiments/joint_3B_0.5/eval_p4_500 \
--imagenet-path $IMAGENET_PATH --input-size 500 --dataset imagenet-val \
--pooling-exponent 4 --resume-from joint_3B_0.5.pth

using the joint_3B_0.5.pth pretrained weights, should reproduce the top-1/top5 results of 78.6%/94.4% given in the article in Table 2 for ResNet-50 MultiGrain p=3, λ=0.5 and p*=4 scale s*=500.

Retrieval results

The implementation of the evaluation on the retrieval benchmarks in evaluate.py is in progress, but one may already use the dataloaders implemented in datasets/retrieval.py for this purpose.

Training

The training is performed in three steps. See help (-h flag) for detailed parameter list of each script. Only the initial joint training script benefits from multi-gpu hardware, the remaining scripts are not parallelized.

Joint training

scripts/train.py trains a MultiGrain architecture.

Important parameters:

  • --repeated-augmentations: number of repeated augmentations in the batches, N=3 was used in our joint trainings; N=1 is vanilla uniform sampling.
  • --pooling-exponent: pooling exponent in GeM pooling, p=1: vanilla average pooling.
  • --classif-weight: weighting factor between margin loss and classification loss (parameter λ in paper)

Other useful parameters:

  • --expdir: dedicated directory for the experiments
  • --resume-from: takes either an expdir or a model checkpoint file to restore from
  • --pretrained-backbone: initialized backbone weights from model zoo

Input size fine-tuning of GeM exponent

scripts/finetune_p.py determines the optimal p* for a given input resolution by fine-tuning (see supplementary E. in paper for details). Alternatively one may use cross-validation to determine p*, as done in the main article.

Whitening of the retrieval features

scripts/whiten.py computes a PCA whitening and modifies the network accordingly, integrating the reversed transformation in the fully-connected classification layer as described in the article. The scripts takes a list and directory of whitening images; the list given in data/whiten.txt is relative to the multimedia commons file structure.

Example training procedure

For example, the results with p=3 and λ=0.5 at scale s*=500 can be obtained with

# train network
python scripts/train.py --expdir experiments/joint_3B_0.5 --repeated-augmentations 3 \
--pooling-exponent 3 --classif-weight 0.5 --imagenet-path $IMAGENET_PATH

# fine-tune p*
python scripts/finetune_p.py --expdir experiments/joint_3B_0.5/finetune500 \
--resume-from experiments/joint_3B_0.5 --input-size 500 --imagenet-path $IMAGENET_PATH

# whitening 
python scripts/whiten.py --expdir experiments/joint_3B_0.5/finetune500_whitened \
--resume-from experiments/joint_3B_0.5/finetune500 --input-size 500 --whiten-path $WHITEN_PATH

Fine-tuning existing network

In appendix E. we report fine-tuning results on several pretrained networks. This experience can be reproduced using the finetune_p.py script. For example, in the case of SENet154 at scale s*=450, the following command should yield 83.1 top-1 accuracy with p*=1.6:

python scripts/finetune_p.py --expdir experiments/se154/finetune450 \
--pretrained-backbone --imagenet-path $IMAGENET_PATH --input-size 450 --backbone senet154 \
--no-validate-first

Contributing

See the CONTRIBUTING file for how to help out.

License

MultiGrain is CC BY-NC 4.0 licensed, as found in the LICENSE file.

The AutoAugment implementation is based on https://github.com/DeepVoltaire/AutoAugment. The Distance Weighted Sampling and margin loss implementation is based on the authors implementation https://github.com/chaoyuaw/sampling_matters.

multigrain's People

Contributors

bermanmaxim avatar mdouze avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multigrain's Issues

Image retrieve

how can I use dataset:UKBench for train, Should I modify the IN1K in train.py ? as follows:
transforms = get_transforms(IN1K, args.input_size, args.augmentation, args.backbone)
datas = {}
for split in ('train', 'val'):
imload = preloader(args.imagenet_path, args.preload_dir_imagenet) if args.preload_dir_imagenet else default_loader
datas[split] = IdDataset(IN1K(args.imagenet_path, split, transform=transforms[split], loader=imload))

that is,write the same file as imagenet.py

where is the data?

as shown in the project, "IMAGENET_PATH= # the path that contains the /val and /train image directories"
so where can I get the val or train image?
Do I need to download from the imagenet or can you provide it?

Thank you for your attention

Train error when using pretrained model

Model state loaded from experiments/joint_3B_0.5/checkpoint_2.pth
Traceback (most recent call last):
File "scripts/train.py", line 254, in
run(args)
File "scripts/train.py", line 158, in run
begin_epoch, loaded_extra = checkpoints.resume(model, optimizers, metrics_history, args.resume_epoch, args.resume_from)
File "/tmp/multigrain/multigrain/utils/checkpoint.py", line 114, in resume
optimizer.load_state_dict('optimizer_state')
File "/tmp/multigrain/multigrain/modules/multioptim.py", line 28, in load_state_dict
for k, v in D.items():
AttributeError: 'str' object has no attribute 'items'

I think this line

optimizer.load_state_dict('optimizer_state')
should changed to :
optimizer.load_state_dict(checkpoint['optimizer_state'])
Is it right?

What is the purpose of `eps` in GeM layer?

I notice that the GeM layer use eps to clamp the input feature maps, but I cannot find any corresponding description in the original paper.

Nevertheless, I directly plug this eps-clamped GeM layer into my own network and it finally improve the performance. However, I still don't understand the utility of eps-clamping, and wonder how GeM performs differently in your experiments w/o this trick?

about the resume training

thanks for your work. I try to train the model from the joint_3B_0.5.pth with the imagenet2012 data. the command:

python scripts/train.py --expdir experiments/joint_3B_0.5 --repeated-augmentations 3 --pooling-exponent 3 --classif-weight 0.5 --imagenet-path /root/cv/data/imagenet2012 --resume-from /root/zhx3/project/multigrain/experiments/joint_3B_0.5.pth

the LOG show me that:

arguments:
augmentation: full
backbone: resnet50
batch_size: 64
beta_init: 1.2
beta_lr: 1.0
classif_weight: 0.5
dry: false
epoch_len_factor: 2.0
epochs: 120
expdir: experiments/joint_3B_0.5
global_sampling: false
gradient_accum: 1
imagenet_path: /root/cv/data/imagenet2012
input_size: 224
learning_rate: 0.2
lr_drops_epochs:
- 30
- 60
- 90
lr_drops_factors:
- 0.1
- 0.1
- 0.1
momentum: 0.9
no_cuda: false
no_validate_first: false
pooling_exponent: 3.0
preload_dir_imagenet: null
pretrained_backbone: null
repeated_augmentations: 3
resume_epoch: -1
resume_from: /root/zhx3/project/multigrain/experiments/joint_3B_0.5.pth
save_every: 10
shuffle_val: false
weight_decay: 0.0001
workers: 20

Indexing IN1K train dataset... OK! cached to data/IN1K-train-cached-list.pth

Indexing IN1K val dataset... OK! cached to data/IN1K-val-cached-list.pth

Multigrain model with resnet50 backbone and p=3.0 pooling:
MultiGrain(
  (features): Sequential(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
    (layer2): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (3): Bottleneck(
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
    (layer3): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (3): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (4): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (5): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
    (layer4): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
  )
  (pool): Layer(name='gem', p=3.0)
  (classifier): Linear(in_features=2048, out_features=1000, bias=True)
  (normalize): Layer(name='l2n')
  (weighted_sampling): DistanceWeightedSampling()
)
Reinitializing metrics, metrics file: experiments/joint_3B_0.5/metrics.yaml
Model state loaded from /root/zhx3/project/multigrain/experiments/joint_3B_0.5.pth
No optimizer state found in /root/zhx3/project/multigrain/experiments/joint_3B_0.5.pth
[Ep -1/120] (0/40037) train_cross_entropy 1.89 (1.89), train_margin 0.0718 (0.0718), 
train_loss 0.981 (0.981), train_beta 0.823 (0.823), train_top1 64.1 (64.1), 
train_top5 78.1 (78.1), train_data_time 2870 (2870), train_batch_time 8710 (8710)
[Ep -1/120] (1/40037) train_cross_entropy 5.35 (3.62), train_margin 0.156 (0.114), 
train_loss 2.76 (1.87), train_beta 0.778 (0.8), train_top1 9.38 (36.7), train_top5 28.1 (53.1), 
train_data_time 3.94 (1440), train_batch_time 214 (4460)
[Ep -1/120] (2/40037) train_cross_entropy 8.77 (5.34), train_margin 0.198 (0.142), 
train_loss 4.48 (2.74), train_beta 0.723 (0.775), train_top1 0 (24.5), train_top5 3.12 (36.5), 
train_data_time 3.89 (960), train_batch_time 208 (3050)
[Ep -1/120] (3/40037) train_cross_entropy 8.57 (6.15), train_margin 0.288 (0.178), 
train_loss 4.43 (3.16), train_beta 0.722 (0.762), train_top1 0 (18.4), train_top5 0 (27.3), 
train_data_time 3.84 (721), train_batch_time 209 (2340)
[Ep -1/120] (4/40037) train_cross_entropy 7.61 (6.44), train_margin 0.487 (0.24), 
train_loss 4.05 (3.34), train_beta 0.752 (0.76), train_top1 0 (14.7), train_top5 0 (21.9), 
train_data_time 3.85 (577), train_batch_time 205 (1910)
[Ep -1/120] (5/40037) train_cross_entropy 8.08 (6.71), train_margin 0.489 (0.282), 
train_loss 4.28 (3.5), train_beta 0.802 (0.767), train_top1 0 (12.2), train_top5 0 (18.2), 
train_data_time 3.91 (482), train_batch_time 207 (1630)
[Ep -1/120] (6/40037) train_cross_entropy 7.65 (6.85), train_margin 0.57 (0.323), 
train_loss 4.11 (3.59), train_beta 0.839 (0.777), train_top1 0 (10.5), train_top5 0 (15.6), 
train_data_time 3.84 (414), train_batch_time 230 (1430)
[Ep -1/120] (7/40037) train_cross_entropy 7.53 (6.93), train_margin 0.62 (0.36), 
train_loss 4.07 (3.65), train_beta 0.874 (0.789), train_top1 0 (9.18), train_top5 0 (13.7), 
train_data_time 4.65 (362), train_batch_time 220 (1280)
[Ep -1/120] (8/40037) train_cross_entropy 7.45 (6.99), train_margin 0.802 (0.409), 
train_loss 4.13 (3.7), train_beta 0.88 (0.799), train_top1 0 (8.16), train_top5 0 (12.2), 
train_data_time 3.87 (323), train_batch_time 227 (1160)
[Ep -1/120] (9/40037) train_cross_entropy 7.84 (7.07), train_margin 0.535 (0.422), 
train_loss 4.19 (3.75), train_beta 0.838 (0.803), train_top1 0 (7.34), train_top5 0 (10.9), 
train_data_time 3.71 (291), train_batch_time 204 (1060)
[Ep -1/120] (10/40037) train_cross_entropy 7.55 (7.12), train_margin 0.577 (0.436), 
train_loss 4.07 (3.78), train_beta 0.793 (0.802), train_top1 0 (6.68), train_top5 0 (9.94), 
train_data_time 3.85 (265), train_batch_time 207 (986)
[Ep -1/120] (11/40037) train_cross_entropy 7.59 (7.16), train_margin 0.665 (0.455), 
train_loss 4.13 (3.81), train_beta 0.73 (0.796), train_top1 0 (6.12), train_top5 0 (9.11), 
train_data_time 3.87 (243), train_batch_time 203 (921)

Does that work right? I think the acc top1 should higher than zero.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.