GithubHelp home page GithubHelp logo

open-mmlab / mmpretrain Goto Github PK

View Code? Open in Web Editor NEW
3.2K 30.0 1.0K 13.79 MB

OpenMMLab Pre-training Toolbox and Benchmark

Home Page: https://mmpretrain.readthedocs.io/en/latest/

License: Apache License 2.0

Python 98.42% Shell 0.20% Dockerfile 0.04% C++ 0.16% Cuda 1.18%
image-classification resnet mobilenet pytorch deep-learning swin-transformer beit clip constrastive-learning convnext

mmpretrain's Introduction

Introduction

MMPreTrain is an open source pre-training toolbox based on PyTorch. It is a part of the OpenMMLab project.

The main branch works with PyTorch 1.8+.

Major features

  • Various backbones and pretrained models
  • Rich training strategies (supervised learning, self-supervised learning, multi-modality learning etc.)
  • Bag of training tricks
  • Large-scale training configs
  • High efficiency and extensibility
  • Powerful toolkits for model analysis and experiments
  • Various out-of-box inference tasks.
    • Image Classification
    • Image Caption
    • Visual Question Answering
    • Visual Grounding
    • Retrieval (Image-To-Image, Text-To-Image, Image-To-Text)
mmpretrain.mp4

What's new

🌟 v1.2.0 was released in 04/01/2023

  • Support LLaVA 1.5.
  • Implement of RAM with a gradio interface.

🌟 v1.1.0 was released in 12/10/2023

  • Support Mini-GPT4 training and provide a Chinese model (based on Baichuan-7B)
  • Support zero-shot classification based on CLIP.

🌟 v1.0.0 was released in 04/07/2023

🌟 Upgrade from MMClassification to MMPreTrain

  • Integrated Self-supervised learning algorithms from MMSelfSup, such as MAE, BEiT, etc.
  • Support RIFormer, a simple but effective vision backbone by removing token mixer.
  • Refactor dataset pipeline visualization.
  • Support LeViT, XCiT, ViG, ConvNeXt-V2, EVA, RevViT, EfficientnetV2, CLIP, TinyViT and MixMIM backbones.

This release introduced a brand new and flexible training & test engine, but it's still in progress. Welcome to try according to the documentation.

And there are some BC-breaking changes. Please check the migration tutorial.

Please refer to changelog for more details and other release history.

Installation

Below are quick steps for installation:

conda create -n open-mmlab python=3.8 pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -y
conda activate open-mmlab
pip install openmim
git clone https://github.com/open-mmlab/mmpretrain.git
cd mmpretrain
mim install -e .

Please refer to installation documentation for more detailed installation and dataset preparation.

For multi-modality models support, please install the extra dependencies by:

mim install -e ".[multimodal]"

User Guides

We provided a series of tutorials about the basic usage of MMPreTrain for new users:

For more information, please refer to our documentation.

Model zoo

Results and models are available in the model zoo.

Overview
Supported Backbones Self-supervised Learning Multi-Modality Algorithms Others
Image Retrieval Task: Training&Test Tips:

Contributing

We appreciate all contributions to improve MMPreTrain. Please refer to CONTRUBUTING for the contributing guideline.

Acknowledgement

MMPreTrain is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and supporting their own academic research.

Citation

If you find this project useful in your research, please consider cite:

@misc{2023mmpretrain,
    title={OpenMMLab's Pre-training Toolbox and Benchmark},
    author={MMPreTrain Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmpretrain}},
    year={2023}
}

License

This project is released under the Apache 2.0 license.

Projects in OpenMMLab

  • MMEngine: OpenMMLab foundational library for training deep learning models.
  • MMCV: OpenMMLab foundational library for computer vision.
  • MIM: MIM installs OpenMMLab packages.
  • MMEval: A unified evaluation library for multiple machine learning libraries.
  • MMPreTrain: OpenMMLab pre-training toolbox and benchmark.
  • MMDetection: OpenMMLab detection toolbox and benchmark.
  • MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
  • MMRotate: OpenMMLab rotated object detection toolbox and benchmark.
  • MMYOLO: OpenMMLab YOLO series toolbox and benchmark.
  • MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
  • MMOCR: OpenMMLab text detection, recognition, and understanding toolbox.
  • MMPose: OpenMMLab pose estimation toolbox and benchmark.
  • MMHuman3D: OpenMMLab 3D human parametric model toolbox and benchmark.
  • MMSelfSup: OpenMMLab self-supervised learning toolbox and benchmark.
  • MMRazor: OpenMMLab model compression toolbox and benchmark.
  • MMFewShot: OpenMMLab fewshot learning toolbox and benchmark.
  • MMAction2: OpenMMLab's next-generation action understanding toolbox and benchmark.
  • MMTracking: OpenMMLab video perception toolbox and benchmark.
  • MMFlow: OpenMMLab optical flow toolbox and benchmark.
  • MMagic: OpenMMLab Advanced, Generative and Intelligent Creation toolbox.
  • MMGeneration: OpenMMLab image and video generative models toolbox.
  • MMDeploy: OpenMMLab model deployment framework.
  • Playground: A central hub for gathering and showcasing amazing projects built upon OpenMMLab.

mmpretrain's People

Contributors

0x4f5da2 avatar bobo0810 avatar congee524 avatar daavoo avatar ezra-yu avatar fangyixiao18 avatar fanqino1 avatar hit-cwh avatar imyhxy avatar invinciblewyq avatar kitecats avatar lxxxxr avatar mzr1996 avatar okotaku avatar qingchuanws avatar qingtian5 avatar techmonsterwang avatar timerring avatar tonysy avatar wangbo-zhao avatar wangruohui avatar xiaojieli0903 avatar xiefeifeihu avatar ycxioooong avatar yingfhu avatar yl-1993 avatar yuanliuuuuuu avatar yyk-wew avatar zwwwayne avatar zzc98 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmpretrain's Issues

CUDA error: device-side assert triggered

When I train the model of resnest101,I changed the class number from 3 to 4,but I got the question above.However, when I train the model with 3 class it didn't occur.My config just like below.How can I solve it?

2020-11-02 05:19:02,910 - mmcls - INFO - Environment info:

sys.platform: linux
Python: 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GPU 0: Tesla V100-SXM2-16GB
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.5.1+cu101
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.1+cu101
OpenCV: 4.1.2
MMCV: 1.1.6
mmcls: 0.1.0+unknown

2020-11-02 05:19:02,912 - mmcls - INFO - Distributed training: False
2020-11-02 05:19:03,235 - mmcls - INFO - Config:
model = dict(
type='ImageClassifier',
backbone=dict(
type='ResNeSt',
depth=101,
num_stages=4,
stem_channels=128,
out_indices=(3, ),
style='pytorch'),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=4,
in_channels=2048,
loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
topk=(1, 4)))
dataset_type = 'ImageNet'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
albu_train_transforms = [
dict(type='Cutout', max_h_size=20, max_w_size=20, num_holes=10, p=0.4),
dict(type='RandomRotate90', p=0.5),
dict(type='RandomFog', fog_coef_lower=0.2, fog_coef_upper=0.6, p=1),
dict(type='ToGray', p=0.2),
dict(type='VerticalFlip', p=0.5),
dict(
type='ShiftScaleRotate',
shift_limit=0.0625,
scale_limit=0.0,
rotate_limit=0,
interpolation=1,
p=0.5),
dict(
type='RandomBrightnessContrast',
brightness_limit=[0.1, 0.3],
contrast_limit=[0.1, 0.3],
p=0.2),
dict(
type='OneOf',
transforms=[
dict(type='Blur', blur_limit=3, p=1.0),
dict(type='MedianBlur', blur_limit=3, p=1.0),
dict(type='MotionBlur', p=0.2),
dict(type='GlassBlur', sigma=0.7, max_delta=4, p=0.2)
],
p=0.2)
]
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', size=(512, 512)),
dict(
type='Albu',
transforms=[
dict(
type='Cutout',
max_h_size=20,
max_w_size=20,
num_holes=10,
p=0.4),
dict(type='RandomRotate90', p=0.5),
dict(
type='RandomFog', fog_coef_lower=0.2, fog_coef_upper=0.6, p=1),
dict(type='ToGray', p=0.2),
dict(type='VerticalFlip', p=0.5),
dict(
type='ShiftScaleRotate',
shift_limit=0.0625,
scale_limit=0.0,
rotate_limit=0,
interpolation=1,
p=0.5),
dict(
type='RandomBrightnessContrast',
brightness_limit=[0.1, 0.3],
contrast_limit=[0.1, 0.3],
p=0.2),
dict(
type='OneOf',
transforms=[
dict(type='Blur', blur_limit=3, p=1.0),
dict(type='MedianBlur', blur_limit=3, p=1.0),
dict(type='MotionBlur', p=0.2),
dict(type='GlassBlur', sigma=0.7, max_delta=4, p=0.2)
],
p=0.2)
],
keymap=dict(img='image'),
update_pad_shape=False),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', size=(512, 512)),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=12,
workers_per_gpu=2,
train=dict(
type='ImageNet',
data_prefix='data/imagenet/train',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='Resize', size=(512, 512)),
dict(
type='Albu',
transforms=[
dict(
type='Cutout',
max_h_size=20,
max_w_size=20,
num_holes=10,
p=0.4),
dict(type='RandomRotate90', p=0.5),
dict(
type='RandomFog',
fog_coef_lower=0.2,
fog_coef_upper=0.6,
p=1),
dict(type='ToGray', p=0.2),
dict(type='VerticalFlip', p=0.5),
dict(
type='ShiftScaleRotate',
shift_limit=0.0625,
scale_limit=0.0,
rotate_limit=0,
interpolation=1,
p=0.5),
dict(
type='RandomBrightnessContrast',
brightness_limit=[0.1, 0.3],
contrast_limit=[0.1, 0.3],
p=0.2),
dict(
type='OneOf',
transforms=[
dict(type='Blur', blur_limit=3, p=1.0),
dict(type='MedianBlur', blur_limit=3, p=1.0),
dict(type='MotionBlur', p=0.2),
dict(
type='GlassBlur',
sigma=0.7,
max_delta=4,
p=0.2)
],
p=0.2)
],
keymap=dict(img='image'),
update_pad_shape=False),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]),
val=dict(
type='ImageNet',
data_prefix='data/imagenet/val',
ann_file='data/imagenet/meta/val.txt',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='Resize', size=(512, 512)),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]),
test=dict(
type='ImageNet',
data_prefix='data/imagenet/val',
ann_file='data/imagenet/meta/val.txt',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='Resize', size=(512, 512)),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]))
evaluation = dict(interval=1, metric='accuracy')
optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(policy='step', step=[20, 60, 90])
total_epochs = 30
checkpoint_config = dict(interval=1)
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = '/content/mmclassification-master/checkpoint/resnest101_converted-032caa52.pth'
resume_from = None
workflow = [('train', 1)]
work_dir = './work_dirs/resnest101'
gpu_ids = range(0, 1)

Train problem

RuntimeError: mat1 dim 1 must match mat2 dim 0

Traceback (most recent call last):
File "tools/train.py", line 157, in
main()
File "tools/train.py", line 153, in main
meta=meta)
File "/media/shuozhang/DATA/Processing/OFFICE/deeplearning/mmlab/mmclassification/mmcls/apis/train.py", line 133, in train_model
runner.run(data_loaders, cfg.workflow)
File "/home/shuozhang/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/shuozhang/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True)
File "/home/shuozhang/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/shuozhang/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/media/shuozhang/DATA/Processing/OFFICE/deeplearning/mmlab/mmclassification/mmcls/models/classifiers/base.py", line 140, in train_step
losses = self(**data)
File "/home/shuozhang/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/shuozhang/DATA/Processing/OFFICE/deeplearning/mmlab/mmclassification/mmcls/models/classifiers/base.py", line 83, in forward
return self.forward_train(img, **kwargs)
File "/media/shuozhang/DATA/Processing/OFFICE/deeplearning/mmlab/mmclassification/mmcls/models/classifiers/image.py", line 58, in forward_train
loss = self.head.forward_train(x, gt_label)
File "/media/shuozhang/DATA/Processing/OFFICE/deeplearning/mmlab/mmclassification/mmcls/models/heads/linear_head.py", line 54, in forward_train
cls_score = self.fc(x)
File "/home/shuozhang/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shuozhang/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/home/shuozhang/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/functional.py", line 1674, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

Could you tell me what's going on?How can I fix it?

Pretrained model from model zoo's keys doesn't match mmcls's source state_dict.

hello, I meet some problem when I finetune resnet50.
I download the resnet50 pretrained model weights from model zoo in mmclassification.
But when I load model to the mmcls's code, I have found that keys from pretrain state_dict is different with the source state_dict, which is build_from_cfg.
I have check the mmcls/models/backbone/resnet.py, I found the model is different.
for example, key in source state_dict is backbone.conv1.weight, but key in pretrain model is backbone.stem.0.conv.weight...

Have you rewrite the model architure of mmcls, but not update the newest pretrained model?

always in eval state

I train resnet50 in distributed mode on imagenet-1k, in train phase, everything is Ok, however, in eval phase, the process is pending all time.
image

About FP16 and TTA

Thank you for your contributation ,I have two questions, I want to know how can I use TTA and FP16 in my project?

When I use the Multi-GPU in resnet50, it doesn't seem to work!

commend: ./tools/dist_train.sh configs/cifar10/resnet50.py 2

Error: subprocess.CalledProcessError: Command '['/home/xatu/anaconda2/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=14', 'configs/cifar10/resnet50.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

fail to load some latest checkpoints with torch<1.6.

I notice some latest checkpoints in the model_zoo cannot be loaded by torch<1.6.
For example

from torch.utils import model_zoo

ckpt = model_zoo.load_url('https://openmmlab.oss-cn-hangzhou.aliyuncs.com/mmclassification'
                          '/v0/imagenet/resnest50_converted-1ebf0afe.pth')

Pytorch will report RuntimeError: Only one file(not dir) is allowed in the zipfile.
The probable reason is that the latest checkpoints are saved by torch>=1.6 with different serialization protocol.

resize mode question

hi:
@ycxioooong in the config file of imagenet, as follows:
image
one way is use cv2.resize an anther way is the implementation in pillow library ,I wonder the difference between them and any gap in the classification accuracy ?

image load erro

image

image

The following error occurs when loading the non-standard format image, but it is ok when reading it using PIL library

Pytorch2onnx

Hello. I wonder which models will be supported for model conversion? All the models in configs will be supported?

mmcls/models/losses/eval_metrics.py confusion_matrix

confusion_matrix[target_label.long(), pred_label.long()] += 1
I think this code is wrong, 【target_label.long(), pred_label.long()】 will list all the coordinates that need + 1, but only once + 1 will work
it should be:
for t, p in zip(target_label, pred_label): confusion_matrix[t.long(), p.long()] += 1

SeResnet50 pretrained model mismatch

  • Question: I use the SeResnet config and the num_classes is 7. My pretrained model is the se-resnet50_batch256_20200708-657b3c36.pth download from this project's model zoo. However, I get the following log, which has lots of unexpected key in source state_dict. why?

  • log:
    size mismatch for head.fc.weight: copying a param with shape torch.Size([1000, 2048]) from checkpoint, the shape in current model is torch.Size([7, 2048]).
    size mismatch for head.fc.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([7]).
    unexpected key in source state_dict: backbone.layer1.0.se_layer.conv1.weight, backbone.layer1.0.se_layer.conv1.bias, backbone.layer1.0.se_layer.conv2.weight, backbone.layer1.0.se_layer.conv2.bias, backbone.layer1.1.se_layer.conv1.weight, backbone.layer1.1.se_layer.conv1.bias, backbone.layer1.1.se_layer.conv2.weight, backbone.layer1.1.se_layer.conv2.bias, backbone.layer1.2.se_layer.conv1.weight, backbone.layer1.2.se_layer.conv1.bias, backbone.layer1.2.se_layer.conv2.weight, backbone.layer1.2.se_layer.conv2.bias, backbone.layer2.0.se_layer.conv1.weight, backbone.layer2.0.se_layer.conv1.bias, backbone.layer2.0.se_layer.conv2.weight, backbone.layer2.0.se_layer.conv2.bias, backbone.layer2.1.se_layer.conv1.weight, backbone.layer2.1.se_layer.conv1.bias, backbone.layer2.1.se_layer.conv2.weight, backbone.layer2.1.se_layer.conv2.bias, backbone.layer2.2.se_layer.conv1.weight, backbone.layer2.2.se_layer.conv1.bias, backbone.layer2.2.se_layer.conv2.weight, backbone.layer2.2.se_layer.conv2.bias, backbone.layer2.3.se_layer.conv1.weight, backbone.layer2.3.se_layer.conv1.bias, backbone.layer2.3.se_layer.conv2.weight, backbone.layer2.3.se_layer.conv2.bias, backbone.layer3.0.se_layer.conv1.weight, backbone.layer3.0.se_layer.conv1.bias, backbone.layer3.0.se_layer.conv2.weight, backbone.layer3.0.se_layer.conv2.bias, backbone.layer3.1.se_layer.conv1.weight, backbone.layer3.1.se_layer.conv1.bias, backbone.layer3.1.se_layer.conv2.weight, backbone.layer3.1.se_layer.conv2.bias, backbone.layer3.2.se_layer.conv1.weight, backbone.layer3.2.se_layer.conv1.bias, backbone.layer3.2.se_layer.conv2.weight, backbone.layer3.2.se_layer.conv2.bias, backbone.layer3.3.se_layer.conv1.weight, backbone.layer3.3.se_layer.conv1.bias, backbone.layer3.3.se_layer.conv2.weight, backbone.layer3.3.se_layer.conv2.bias, backbone.layer3.4.se_layer.conv1.weight, backbone.layer3.4.se_layer.conv1.bias, backbone.layer3.4.se_layer.conv2.weight, backbone.layer3.4.se_layer.conv2.bias, backbone.layer3.5.se_layer.conv1.weight, backbone.layer3.5.se_layer.conv1.bias, backbone.layer3.5.se_layer.conv2.weight, backbone.layer3.5.se_layer.conv2.bias, backbone.layer4.0.se_layer.conv1.weight, backbone.layer4.0.se_layer.conv1.bias, backbone.layer4.0.se_layer.conv2.weight, backbone.layer4.0.se_layer.conv2.bias, backbone.layer4.1.se_layer.conv1.weight, backbone.layer4.1.se_layer.conv1.bias, backbone.layer4.1.se_layer.conv2.weight, backbone.layer4.1.se_layer.conv2.bias, backbone.layer4.2.se_layer.conv1.weight, backbone.layer4.2.se_layer.conv1.bias, backbone.layer4.2.se_layer.conv2.weight, backbone.layer4.2.se_layer.conv2.bias

TypeError: simple_test() got an unexpected keyword argument 'gt_label'

Hi, we followed the steps given in getting_started.md to train a Resnet-50-based classifier with MNIST, but one issue comes up as follows.

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/mnist/train-images-idx3-ubyte.gz
Extracting data/mnist/train-images-idx3-ubyte.gz to data/mnist
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/mnist/train-labels-idx1-ubyte.gz
Extracting data/mnist/train-labels-idx1-ubyte.gz to data/mnist
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/mnist/t10k-images-idx3-ubyte.gz
Extracting data/mnist/t10k-images-idx3-ubyte.gz to data/mnist
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/mnist/t10k-labels-idx1-ubyte.gz
Extracting data/mnist/t10k-labels-idx1-ubyte.gz to data/mnist
/pytorch/torch/csrc/utils/tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program.
2020-10-01 01:15:23,434 - mmcls - INFO - Start running, host: root@047e116c429b, work_dir: /SUANFAZU/SUANFAZU/MMClassification/mmclassification-master/run
2020-10-01 01:15:23,434 - mmcls - INFO - workflow: [('train', 1)], max: 20 epochs
2020-10-01 01:15:25,899 - mmcls - INFO - Epoch [1][100/469] lr: 1.000e-02, eta: 0:03:48, time: 0.025, data_time: 0.022, memory: 43, loss: 1.4555, top-1: 61.9297
2020-10-01 01:15:26,288 - mmcls - INFO - Epoch [1][200/469] lr: 1.000e-02, eta: 0:02:10, time: 0.004, data_time: 0.002, memory: 43, loss: 0.4446, top-1: 88.0938
2020-10-01 01:15:26,676 - mmcls - INFO - Epoch [1][300/469] lr: 1.000e-02, eta: 0:01:38, time: 0.004, data_time: 0.002, memory: 43, loss: 0.3176, top-1: 91.1172
2020-10-01 01:15:27,063 - mmcls - INFO - Epoch [1][400/469] lr: 1.000e-02, eta: 0:01:21, time: 0.004, data_time: 0.002, memory: 43, loss: 0.2701, top-1: 92.3516
2020-10-01 01:15:27,346 - mmcls - INFO - Saving checkpoint at 1 epochs
[ ] 0/10000, elapsed: 0s, ETA:Traceback (most recent call last):
File "tools/train.py", line 157, in
main()
File "tools/train.py", line 153, in main
meta=meta)
File "/SUANFAZU/SUANFAZU/MMClassification/mmclassification-master/mmcls/apis/train.py", line 117, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
self.call_hook('after_train_epoch')
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/SUANFAZU/SUANFAZU/MMClassification/mmclassification-master/mmcls/core/evaluation/eval_hooks.py", line 27, in after_train_epoch
results = single_gpu_test(runner.model, self.dataloader, show=False)
File "/SUANFAZU/SUANFAZU/MMClassification/mmclassification-master/mmcls/apis/test.py", line 20, in single_gpu_test
result = model(return_loss=False, **data)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/mmcv/parallel/data_parallel.py", line 42, in forward
return super().forward(*inputs, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/SUANFAZU/SUANFAZU/MMClassification/mmclassification-master/mmcls/models/classifiers/base.py", line 83, in forward
return self.forward_test(img, **kwargs)
File "/SUANFAZU/SUANFAZU/MMClassification/mmclassification-master/mmcls/models/classifiers/base.py", line 67, in forward_test
return self.simple_test(imgs[0], **kwargs)
TypeError: simple_test() got an unexpected keyword argument 'gt_label'

It seems that the training process can be activated but something happened when the checkpoint was saving. Any response will be greatly appreciated.

Error caught during inference

Hi, after upgrading to the latest commit on master, when I use slurm_train.sh or slurm_test.sh on multiple nodes, the test process throws some errors.

For evaluation after training an epoch (on 8 machines, each with 8 GPUs), the error says

[>>>>>>>>>>>>>>>>>>>>>>>] 50048/50000, 1028.7 task/s, elapsed: 49s, ETA:     0s
Traceback (most recent call last):
  File "tools/train.py", line 157, in <module>
    main()
  File "tools/train.py", line 153, in main
    meta=meta)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/train.py", line 117, in train_model
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/mnt/lustre/lid/mmcv/mmcv/runner/epoch_based_runner.py", line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/mnt/lustre/lid/mmcv/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/mnt/lustre/lid/mmclassification/mmcls/core/evaluation/eval_hooks.py", line 74, in after_train_epoch
    self.evaluate(runner, results)
  File "/mnt/lustre/lid/mmclassification/mmcls/core/evaluation/eval_hooks.py", line 32, in evaluate
    results, logger=runner.logger, **self.eval_kwargs)
  File "/mnt/lustre/lid/mmclassification/mmcls/datasets/base_dataset.py", line 86, in evaluate
    assert len(gt_labels) == num_imgs
AssertionError

For only inference on multiple nodes, the error says

[>>>>>>>>>>>>>>>>>>>>>>>] 50000/50000, 260.8 task/s, elapsed: 192s, ETA:     0sTraceback (most recent call last):    [140/166]
Traceback (most recent call last):
  File "tools/test.py", line 122, in <module>
Traceback (most recent call last):
  File "tools/test.py", line 122, in <module>
  File "tools/test.py", line 122, in <module>
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "tools/test.py", line 122, in <module>
Traceback (most recent call last):
  File "tools/test.py", line 122, in <module>
  File "tools/test.py", line 122, in <module>
Traceback (most recent call last):
  File "tools/test.py", line 122, in <module>
  File "tools/test.py", line 122, in <module>
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    main()
  File "tools/test.py", line 84, in main
    args.gpu_collect)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 75, in multi_gpu_test
    args.gpu_collect)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 75, in multi_gpu_test
    args.gpu_collect)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 75, in multi_gpu_test
    args.gpu_collect)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 75, in multi_gpu_test
    args.gpu_collect)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 75, in multi_gpu_test
    args.gpu_collect)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 75, in multi_gpu_test
    args.gpu_collect)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 75, in multi_gpu_test
    args.gpu_collect)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 75, in multi_gpu_test
    results = collect_results_cpu(results, len(dataset), tmpdir)
    results = collect_results_cpu(results, len(dataset), tmpdir)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 99, in collect_results_cpu
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 99, in collect_results_cpu
    results = collect_results_cpu(results, len(dataset), tmpdir)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 99, in collect_results_cpu
    results = collect_results_cpu(results, len(dataset), tmpdir)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 99, in collect_results_cpu
    results = collect_results_cpu(results, len(dataset), tmpdir)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 99, in collect_results_cpu
    results = collect_results_cpu(results, len(dataset), tmpdir)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 99, in collect_results_cpu
    results = collect_results_cpu(results, len(dataset), tmpdir)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 99, in collect_results_cpu
    results = collect_results_cpu(results, len(dataset), tmpdir)
  File "/mnt/lustre/lid/mmclassification/mmcls/apis/test.py", line 99, in collect_results_cpu
    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/io.py", line 80, in dump
    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/io.py", line 80, in dump
    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/io.py", line 80, in dump
    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/io.py", line 80, in dump
    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/io.py", line 80, in dump
    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/io.py", line 80, in dump
    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/io.py", line 80, in dump
    handler.dump_to_path(obj, file, **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/pickle_handler.py", line 26, in dump_to_path
    handler.dump_to_path(obj, file, **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/pickle_handler.py", line 26, in dump_to_path
    handler.dump_to_path(obj, file, **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/pickle_handler.py", line 26, in dump_to_path
    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/io.py", line 80, in dump
    handler.dump_to_path(obj, file, **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/pickle_handler.py", line 26, in dump_to_path
    handler.dump_to_path(obj, file, **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/pickle_handler.py", line 26, in dump_to_path
    handler.dump_to_path(obj, file, **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/pickle_handler.py", line 26, in dump_to_path
    handler.dump_to_path(obj, file, **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/pickle_handler.py", line 26, in dump_to_path
    handler.dump_to_path(obj, file, **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/pickle_handler.py", line 26, in dump_to_path
    obj, filepath, mode='wb', **kwargs)
    obj, filepath, mode='wb', **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/base.py", line 24, in dump_to_path
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/base.py", line 24, in dump_to_path
    obj, filepath, mode='wb', **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/base.py", line 24, in dump_to_path
    obj, filepath, mode='wb', **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/base.py", line 24, in dump_to_path
    obj, filepath, mode='wb', **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/base.py", line 24, in dump_to_path
    obj, filepath, mode='wb', **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/base.py", line 24, in dump_to_path
    obj, filepath, mode='wb', **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/base.py", line 24, in dump_to_path
    obj, filepath, mode='wb', **kwargs)
  File "/mnt/lustre/lid/mmcv/mmcv/fileio/handlers/base.py", line 24, in dump_to_path
    with open(filepath, mode) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpbarrmwij/part_10.pkl'
    with open(filepath, mode) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpbarrmwij/part_11.pkl'
    with open(filepath, mode) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpbarrmwij/part_12.pkl'
    with open(filepath, mode) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpbarrmwij/part_8.pkl'
    with open(filepath, mode) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpbarrmwij/part_14.pkl'
    with open(filepath, mode) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpbarrmwij/part_13.pkl'
    with open(filepath, mode) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpbarrmwij/part_15.pkl'
    with open(filepath, mode) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpbarrmwij/part_9.pkl'

Performance not match

Hi, thanks for your work.

I have just download the pre-trained model (ResNet18) in model zoo.
However, the performance on val set is actually 69.28 (top-1) with 4 GPUs. It's not matched with your log file.

I am very confused.
Thanks a lot.

A little question about Dataset

I have a little question about the Dataset.
I want to convert the format of my data to the imagenet format, but I don't know what to put in the "meta" folder. Is it just a "val.txt"? Or something else?
Thank you!

demo problem

I have run python demo/image_demo.py demo/demo.JPEG configs/imagenet/vgg19bn.py weights/9c_b7ns_1e_640_ext_15ep_best_fold4.pth.

but get the error blow:
File "demo/image_demo.py", line 29, in
main()
File "demo/image_demo.py", line 21, in main
model = init_model(args.config, args.checkpoint, device=args.device)
File "/home/byronnar/bigfile/projects/mmclassification/mmcls/apis/inference.py", line 35, in init_model
checkpoint = load_checkpoint(model, checkpoint, map_location=map_loc)
File "/home/byronnar/anaconda3/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 259, in load_checkpoint
state_dict = {k[7:]: v for k, v in checkpoint['state_dict'].items()}
KeyError: 'state_dict'

How should I do?
my env :
torch 1.7.0
torchvision 0.8.1
tornado 6.0.2
tqdm 4.27.0
traitlets 4.3.2
transaction 3.0.0
translationstring 1.3

torch-1.5.1 torchvision-0.6.1 get the same error

Test on imagenet

How to generate val.txt for testing on imagenet dataset.
I get this from ILSVRC2012_validation_ground_truth.txt and subtract 1 from each class id,because 1000 is out of class range.

This is my val.txt,
image

The top acc is very low by using the pretrained model of mobilenet_v2_batch256_20200708-3d2dc3af.pth
image

mobilenetV2测试imagenet(val),得到的准确率只有0.02

你好,我使用下面的指令对imagenet 的测试集进行测试,却得不到官方的结果
尝试1:使用pycharm 运行tools/test.py , 加上这个 configs/imagenet/mobilenet_v2_b32x8.py checkpoint/mobilenet_v2_batch256_20200708-3b2dc3af.pth
尝试2:
python tools/test.py configs/imagenet/mobilenet_v2_b32x8.py checkpoint/mobilenet_v2_batch256_20200708-3b2dc3af.pth
尝试3:
bash ./tools/dist_test.sh configs/imagenet/mobilenet_v2_b32x8.py checkpoint/mobilenet_v2_batch256_20200708-3b2dc3af.pth 1

得到的准确率都是0.02,与官方结果相差甚远

About TTA

hello, can I know official maintainer have some plans to support TTA?

AttributeError: 'NoneType' object has no attribute 'shape'

I got an error when loading dataset.
Traceback (most recent call last): File "tools/train.py", line 157, in <module> main() File "tools/train.py", line 153, in main meta=meta) File "/home/jovyan/work/mmclassification/mmcls/apis/train.py", line 117, in train_model runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run epoch_runner(data_loaders[i], **kwargs) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 27, in train for i, data_batch in enumerate(self.data_loader): File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__ data = self._next_data() File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data return self._process_data(data) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data ../../data/imagenet/train/Charles_Taylor/4842.jpg data.reraise() File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 1. Original Traceback (most recent call last): File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jovyan/work/mmclassification/mmcls/datasets/base_dataset.py", line 46, in __getitem__ return self.prepare_data(idx) File "/home/jovyan/work/mmclassification/mmcls/datasets/base_dataset.py", line 40, in prepare_data return self.pipeline(results) File "/home/jovyan/work/mmclassification/mmcls/datasets/pipelines/compose.py", line 32, in __call__ data = t(data) File "/home/jovyan/work/mmclassification/mmcls/datasets/pipelines/loading.py", line 56, in __call__ results['img_shape'] = img.shape AttributeError: 'NoneType' object has no attribute 'shape'

It seems that I give the wrong dataset path and the code can't find these images. Therefore I print image name in code when loading dataset( loading.py), but I find that The program is stuck in different places .

for example:
1st:

......(other image name )
../../data/imagenet/train/Laszlo_Kovacs/7027.jpg

Traceback (most recent call last): File "tools/train.py", line 157, in <module> main() File "tools/train.py", line 153, in main meta=meta) File "/home/jovyan/work/mmclassification/mmcls/apis/train.py", line 117, in train_model runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run epoch_runner(data_loaders[i], **kwargs) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 27, in train for i, data_batch in enumerate(self.data_loader): File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__ data = self._next_data() File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data return self._process_data(data) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data ../../data/imagenet/train/Charles_Taylor/4842.jpg data.reraise() File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 1. Original Traceback (most recent call last): File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jovyan/work/mmclassification/mmcls/datasets/base_dataset.py", line 46, in __getitem__ return self.prepare_data(idx) File "/home/jovyan/work/mmclassification/mmcls/datasets/base_dataset.py", line 40, in prepare_data return self.pipeline(results) File "/home/jovyan/work/mmclassification/mmcls/datasets/pipelines/compose.py", line 32, in __call__ data = t(data) File "/home/jovyan/work/mmclassification/mmcls/datasets/pipelines/loading.py", line 56, in __call__ results['img_shape'] = img.shape AttributeError: 'NoneType' object has no attribute 'shape'

2nd:
......(other image name)
../../data/imagenet/train/Colin_Powell/4132.jpg

Traceback (most recent call last): File "tools/train.py", line 157, in <module> main() File "tools/train.py", line 153, in main meta=meta) File "/home/jovyan/work/mmclassification/mmcls/apis/train.py", line 117, in train_model runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run epoch_runner(data_loaders[i], **kwargs) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 27, in train for i, data_batch in enumerate(self.data_loader): File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__ data = self._next_data() File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data ../../data/imagenet/train/Tim_Henman/8106.jpg return self._process_data(data) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data data.reraise() File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jovyan/.pyenv/versions/3.7.4/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jovyan/work/mmclassification/mmcls/datasets/base_dataset.py", line 46, in __getitem__ return self.prepare_data(idx) File "/home/jovyan/work/mmclassification/mmcls/datasets/base_dataset.py", line 40, in prepare_data return self.pipeline(results) File "/home/jovyan/work/mmclassification/mmcls/datasets/pipelines/compose.py", line 32, in __call__ data = t(data) File "/home/jovyan/work/mmclassification/mmcls/datasets/pipelines/loading.py", line 56, in __call__ results['img_shape'] = img.shape AttributeError: 'NoneType' object has no attribute 'shape'

And I have tried read specific picture with mmcv, it works well.
filename="../../data/imagenet/train/Steve_Mariucci/248.jpg" color_type='color' file_client = mmcv.FileClient() img_bytes = file_client.get(filename) img = mmcv.imfrombytes(img_bytes, flag=color_type) img = img.astype(np.float32) print(img.shape)

I dont know if i do something wrong~ Thank you so much for reading my problem!

Roadmap of MMClassification

We keep this issue open to collect feature requests from users and hear your voice. Our monthly release plan is also available here.

You can either:

  1. Suggest a new feature by leaving a comment.
  2. Vote for a feature request with 👍 or be against with 👎. (Remember that developers are busy and cannot respond to all feature requests, so vote for your most favorable one!)
  3. Tell us that you would like to help implement one of the features in the list or review the PRs. (This is the greatest things to hear about!)

How to enable mixed precesion training

Hi @yl-1993 , thanks for the work. could you please show the template use of FP16 training?
I add fp16=dict(loss_scale=512.) to the config file, for example:

_base_ = [
    '../_base_/models/resnet50.py', '../_base_/datasets/imagenet_bs32.py',
    '../_base_/schedules/imagenet_bs256.py', '../_base_/default_runtime.py'
]
fp16=dict(loss_scale=512.)

but got the following error:

Traceback (most recent call last):
  File "tools/train.py", line 157, in <module>
    main()
  File "tools/train.py", line 153, in main
    meta=meta)
  File "/mnt/lustre/liduo/mmclassification/mmcls/apis/train.py", line 117, in train_model
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/mnt/lustre/liduo/.local.pt1.4.0v2/lib/python3.7/sitepackages/mmcv/runner/epoch_based_runner.py", line 122, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/mnt/lustre/liduo/.local.pt1.4.0v2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 32, in train
    **kwargs)
  File "/mnt/lustre/liduo/.local.pt1.4.0v2/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 36, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/mnt/lustre/liduo/mmclassification/mmcls/models/classifiers/base.py", line 136, in train_step
    losses = self(**data)
  File "/mnt/lustre/share/platform/env/miniconda3.7/envs/pt1.4.0v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/lustre/liduo/mmclassification/mmcls/models/classifiers/base.py", line 79, in forward
    return self.forward_train(img, **kwargs)
  File "/mnt/lustre/liduo/mmclassification/mmcls/models/classifiers/image.py", line 55, in forward_train
    x = self.extract_feat(img)
  File "/mnt/lustre/liduo/mmclassification/mmcls/models/classifiers/image.py", line 37, in extract_feat
    x = self.backbone(img)
  File "/mnt/lustre/share/platform/env/miniconda3.7/envs/pt1.4.0v2/lib/python3.7/site-packages/torch/nn/modules/module.py", l$ne 532, in __call__              
    result = self.forward(*input, **kwargs)
  File "/mnt/lustre/liduo/mmclassification/mmcls/models/backbones/resnet.py", line 610, in forward
    x = self.conv1(x)
  File "/mnt/lustre/share/platform/env/miniconda3.7/envs/pt1.4.0v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/lustre/share/platform/env/miniconda3.7/envs/pt1.4.0v2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward                                                                                                                  
    return self.conv2d_forward(input, self.weight)
  File "/mnt/lustre/share/platform/env/miniconda3.7/envs/pt1.4.0v2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward                                                                                                           
    self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same                                                    

KeyError: 'LinearHead is not in the head registry'

use config

model = dict(
    head=dict(
        type='LinearHead',
        num_classes=1000,
        in_channels=2048,
        loss=dict(
            type='LabelSmoothLoss',
            loss_weight=1.0,
            label_smooth_val=0.1,
            num_classes=1000),
    ))

got trackback

Traceback (most recent call last):
  File "/home/code/open_mmlab_codebase/huatian_bump_blur_cls/tools/train.py", line 177, in <module>
    main()
  File "/home/code/open_mmlab_codebase/huatian_bump_blur_cls/tools/train.py", line 151, in main
    model = build_classifier(cfg.model)
  File "/home/code/open_mmlab_codebase/mmclassification/mmcls/models/builder.py", line 38, in build_classifier
    return build(cfg, CLASSIFIERS)
  File "/home/code/open_mmlab_codebase/mmclassification/mmcls/models/builder.py", line 18, in build
    return build_from_cfg(cfg, registry, default_args)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg
    return obj_cls(**args)
  File "/home/code/open_mmlab_codebase/mmclassification/mmcls/models/classifiers/image.py", line 18, in __init__
    self.head = build_head(head)
  File "/home/code/open_mmlab_codebase/mmclassification/mmcls/models/builder.py", line 26, in build_head
    return build(cfg, HEADS)
  File "/home/code/open_mmlab_codebase/mmclassification/mmcls/models/builder.py", line 18, in build
    return build_from_cfg(cfg, registry, default_args)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 164, in build_from_cfg
    f'{obj_type} is not in the {registry.name} registry')
KeyError: 'LinearHead is not in the head registry'

check /mmcls/models/heads/*.py, not exist LinearHead registered

Type Error

hello,I want to know this error is caused by what?
QQ图片20201005115150

Add a new loss

I want to add a new loss, but got this problem:
KeyError: 'XXLoss is not in the loss registry'
could you tell me how to add a new loss?

LMDBDataset is not in the dataset registry

你好,我在mmclassification v0.6.0版本基础上写了一个LMDB数据读取方式,但是当我创建一个LMDBDataset类的时候,代码提示错误:

  • KeyError: 'LMDBDataset is not in the dataset registry'

如何注册一个数据类别?

我的LMDBDataset类别已经添加了@DATASETS.register_module()注册代码。

import mmcv
import lmdb
import numpy as np

from .builder import DATASETS
from .base_dataset import BaseDataset


@DATASETS.register_module()
class LMDBDataset(BaseDataset):

    def read_txt(self):
        data_infos = []
        with open(self.ann_file) as f:
            samples = [x.strip().split(' ') for x in f.readlines()]
            for filename, gt_label in samples:
                info = {'img_prefix': self.data_prefix}
                info['img_info'] = {'filename': filename}
                info['gt_label'] = np.array(gt_label, dtype=np.int64)
                data_infos.append(info)
            return data_infos

    def read_lmdb(self):
        data_infos = []
        env = lmdb.open(self.ann_file)
        txn = env.begin()
        class_num = int(txn.get('class_num'.encode()).decode())
        img_idx = 0
        for item_class in range(class_num):
            img_num = int(txn.get(f'class#{item_class}'.encode()).decode())
            for item_img in range(img_num):
                img_key = '###'.join(map(str, [img_idx, item_class])).encode()
                data_infos.append({'img_info': {'img_key': img_key},
                                   'gt_label': np.array(item_class, dtype=np.int64)})
        return data_infos

    def load_annotations(self):
        assert isinstance(self.ann_file, str)

        if self.ann_file.endswith('lmdb'):
            pass
        else:
            return self.read_txt()

感谢给予帮助。

Set different learning rate for different layer

How to set different learning rates for different layers?
Such as:

torch.optim.SGD([{'params':model.backbone.parameters(), 'lr': learning_rate*0.1},
                 {'params':model.clshead.parameters() ,'lr': learning_rate},
                 momentum=0.9, weight_decay=1e-4}])

A error in tutorials/new_dataset.md

In this markdown,
We can create a new dataset in mmdet/datasets/filelist.py to load the data.
But this project does not have mmdet. I think it should be mmcls/datasets.

Please updata this markdown.
Thanks!

Why mmcls collects gt_label for test_pipeline

I found some inconsistencies in mmcls, mmdet and mmseg
One major difference between mmcls and mmdet + mmseg is that mmcls requires validation dataset to collect gt_label
see https://github.com/open-mmlab/mmclassification/blob/master/configs/_base_/datasets/imagenet_bs32.py, the test pipeline

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', size=(256, -1)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='ToTensor', keys=['gt_label']),
    dict(type='Collect', keys=['img', 'gt_label'])
]

and when performing validation, the forward step computes the accuracy, the dataset itself only gather the batch accuracy together to compute the final accuracy, see https://github.com/open-mmlab/mmclassification/blob/master/mmcls/datasets/base_dataset.py#L48

mmdet and mmseg compute metrics differently, in validate step, only network outputs are collected, accuracy and other metrics are computed in the standalone validation function

mmcls's test_pipeline's collecting gt_label behavior also makes the inference a little tricker than mmdet and mmseg,
I write inference_model in mmcls/apis/inference.py like this, notice how I modified the test_pipeline

class LoadImage:
    """A simple pipeline to load image."""

    def __call__(self, results):
        """Call function to load images into results.

        Args:
            results (dict): A result dict contains the file name
                of the image to be read.

        Returns:
            dict: ``results`` will be returned containing loaded image.
        """

        if isinstance(results['img'], str):
            results['filename'] = results['img']
            results['ori_filename'] = results['img']
        else:
            results['filename'] = None
            results['ori_filename'] = None
        img = image.imread(results['img'])
        results['img'] = img
        results['img_shape'] = img.shape
        results['ori_shape'] = img.shape
        return results


def inference_model(model, img):
    """Inference image(s) with the classifier.

        Args:
            model (nn.Module): The loaded segmentor.
            imgs (str/ndarray or list[str/ndarray]): Either image files or loaded
                images.

        Returns:
            (list[Tensor]): The segmentation result.
        """
    cfg = model.cfg
    device = next(model.parameters()).device  # model device
    # build the data pipeline
    test_pipeline = [LoadImage()] + cfg.data.test.pipeline[1:-3] +\
        [ImageToTensor(keys=['img']), Collect(keys=['img'])]
    test_pipeline = Compose(test_pipeline)
    # prepare data
    data = dict(img=img)
    data = test_pipeline(data)
    data = collate([data], samples_per_gpu=1)
    if next(model.parameters()).is_cuda:
        # scatter to specified GPU
        data = scatter(data, [device])[0]
    else:
        if 'img_metas' in data:
            data['img_metas'] = data['img_metas'][0].data

    # forward the model
    with torch.no_grad():
        result = model(return_loss=False, **data)
    return result

difference of the backbones between mmclassification and mmdetection

hello!
I am in trouble, i want to merge mmdetection and mmclassification, i note that, there are great simillarity between backbone of the mmclassification and mmdetection. so, i want to use one of these. and then, i decide to use backbone of mmdetection rather than mmcls, but trouble is come, a classification which consist of backbone of mmdetection(for example, ResNet), neck named GAP(comes from mmclassifaction) and a LinearClsHead( comes from mmclassification). Now, this loss of model is not down during the train, No matter what sample the input of the model is, for example, the samples are [0,0,0,1,1,0,0,0,0] or [1,1,0,0,1,0,0,1, 1], but the output after softmax and argmax of model always are the same number. So, when I replace the backbone with mmclassification. it works normarlly, I look carefully these files and want to find the difference, finally, I don't find the difference. So, I want to know, where is the difference of backbone of mmdet and mmcls? thank you!

How to plot ROC curve?

Thank your for your amazing job!
I have a little question about the mmcls.
How to plot ROC curve and get Recall?
Thank you!

Customize dataset :KeyError: Caught KeyError in DataLoader worker process 0.

Hi, I run train.py using my customize dataset according to Tutorial 2: Adding New Dataset, and I get an error:

2020-11-26 16:41:35,990 - mmcls - INFO - Start running, host: hkuit164@hkuit164-desktop, work_dir: /media/hkuit164/TOSHIBA/mmclassification/tools/work_dirs/restnet18
2020-11-26 16:41:35,990 - mmcls - INFO - workflow: [('train', 1)], max: 200 epochs
Traceback (most recent call last):
File "/media/hkuit164/TOSHIBA/mmclassification/tools/train.py", line 157, in
main()
File "/media/hkuit164/TOSHIBA/mmclassification/tools/train.py", line 153, in main
meta=meta)
File "/media/hkuit164/TOSHIBA/mmclassification/mmcls/apis/train.py", line 133, in train_model
runner.run(data_loaders, cfg.workflow)
File "/home/hkuit164/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/hkuit164/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/home/hkuit164/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next
data = self._next_data()
File "/home/hkuit164/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/home/hkuit164/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/home/hkuit164/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/hkuit164/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/home/hkuit164/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/hkuit164/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/media/hkuit164/TOSHIBA/mmclassification/mmcls/datasets/base_dataset.py", line 86, in getitem
return self.prepare_data(idx)
File "/media/hkuit164/TOSHIBA/mmclassification/mmcls/datasets/base_dataset.py", line 80, in prepare_data
return self.pipeline(results)
File "/media/hkuit164/TOSHIBA/mmclassification/mmcls/datasets/pipelines/compose.py", line 32, in call
data = t(data)
File "/media/hkuit164/TOSHIBA/mmclassification/mmcls/datasets/pipelines/transforms.py", line 98, in call
img = results[key]
KeyError: 'img'

Process finished with exit code 1

I just add a filelist.py in mmcls/datasets/filelist.py and modify the dataset_type to the same name of filelist. Here is my filelist.py.

`import mmcv
import numpy as np

from .builder import DATASETS
from .base_dataset import BaseDataset

@DATASETS.register_module()
class MyDataset(BaseDataset):

def load_annotations(self):
    assert isinstance(self.ann_file, str)

    data_infos = []
    with open(self.ann_file) as f:
        samples = [x.strip().split(' ') for x in f.readlines()]
        for filename, gt_label in samples:
            # print(filename)
            info = {'img_prefix': self.data_prefix}
            info['img_info'] = {'filename': filename}
            info['gt_label'] = np.array(gt_label, dtype=np.int64)
            data_infos.append(info)
        return data_infos`

bug report

The attribute meta_keys is lost in Collect.

@PIPELINES.register_module()
class Collect(object):
    """
    Collect data from the loader relevant to the specific task.
    This is usually the last stage of the data loader pipeline. Typically keys
    is set to some subset of "img" and "gt_label".
    """

    def __init__(self, keys):
        self.keys = keys

    def __call__(self, results):
        data = {}
        for key in self.keys:
            data[key] = results[key]
        return data

    def __repr__(self):
        return self.__class__.__name__ + \
            f'(keys={self.keys}, meta_keys={self.meta_keys})'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.