GithubHelp home page GithubHelp logo

cheerss / crossformer Goto Github PK

View Code? Open in Web Editor NEW
358.0 358.0 43.0 3.14 MB

The official code for the paper: https://openreview.net/forum?id=_PHymLIxuI

License: MIT License

Python 99.48% Shell 0.52%
classification deep-learning instance-segmentation object-detection pytorch semantic-segmentation vision-transformer

crossformer's People

Contributors

cheerss avatar clarissayl avatar cwdghh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crossformer's Issues

关于LSDA与CEL设计的疑问

大神好,非常感谢你们的作品。有几个小疑问:

  1. 貌似LDA和SDA是交替使用的,如果调整S和L的比例或顺序是否会对结果有影响。比如一个stage中SLS或SSLLL这样的。
  2. 我看G貌似一直都是7,如果在金字塔结构中,将G逐步变为7,5,3,1(vanilla attention),不知对效果影响如何
  3. CEL放在最开始可以理解为提取多尺度信息。但随着层数的加深,可能H*W这个维度的空间位置意义越来越稀薄,那么再去提取多尺度信息可能很难用多尺度空间信息去解释了。不知后面不再加[2,4]的CEL是否有显著影响?
  4. 抛开代码实现的整洁问题,kernel=32是不是太大了,换成kernel=3的堆叠应该不影响吧

非常感谢

Evaluation question

I would like to know if the accuracy values output from each epoch during training are the results obtained from testing on the validation set

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Question about the pseudo code of the LSDA.

Thanks for your excellent work!

I have a question about the pseudo code of the LSDA which was implemented with only ten lines of code, and only reshape
and permute operations are used:

if type == "SDA":
x = x.reshaspe(H // G, G, W // G, G, D).permute(0, 2, 1, 3, 4)
elif type == "LDA":
x = x.reshaspe(G, H // G, G, W // G, D).permute(1, 3, 0, 2, 4)

Although they do have difference in the way of reshaping, I still have question about the reason for this special design. Can you explain from another perspective why these two different design results correspond to different attention (Long or Short ) implementations?

Thanks a lot !

Semantic FPN CrossFormer-S The weight file is not uploaded completely

main()
File "./test.py", line 127, in main
checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 522, in load_checkpoint
checkpoint = _load_checkpoint(filename, map_location, logger)
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 466, in _load_checkpoint
return CheckpointLoader.load_checkpoint(filename, map_location, logger)
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 243, in load_checkpoint
return checkpoint_loader(filename, map_location)
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 260, in load_from_local
checkpoint = torch.load(filename, map_location=map_location)
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/torch/serialization.py", line 845, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/torch/serialization.py", line 833, in load_tensor
storage = zip_file.get_storage_from_record(name, size, dtype).storage()
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading file data/2154620144: invalid header or archive is corrupted

question about LDA and SDA for irregular feature map

Thank you very much for your careful reply about LDA and SDA ,but I have another question about LDA and SDA for irregular
feature map。

In your paper,the LDA and SDA used for regular input image size,like 224x224 or 384x384. So the group size is default 7,
and the I is set (8, 4, 2, 1) . And for Stage-1, the I = 8, because of the need to meet GxI = feature map width/height(56x56).

However, for irregular feature map size, for example, 80 x 134, now for the group size and interval ,It seems that we can no longer design as mentioned in the paper。If the group size is 7,it's need to padding feature map to apply the group size,then the feature map size is become 84x140, the feature map reshape to [W_nG, G, H_nG, G], SDA here can be executed normally。 but for next LDA,how to set the interval I ?Besides,the feature map is irregular,so the interval I is different for width and height。How can I reasonably set the parameter interval I ?

Can you give me some advice about this question?Thanks!

Some questions about your last paper

Dear author, I have read your article recently:'Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework', and I am particularly interested in your article. Do you have an open source plan? Thank you for your answer.

Some wrongs with the pre-trained model crossformer-b.pth

Hi, thanks for your great work. I am using your crossformer_base as my backbone network for downstream tracking tasks. But now when I load your pre-trained model, a very correct Unexpected key(s) appears. My loading code is as follow:
ckpt = torch.load(ckpt_path, map_location='cpu')
missing_keys, unexpected_keys = backbone.body.load_state_dict(ckpt['model'], strict=False)

The result as follow:
unexpected keys: ['norm.weight', 'norm.bias', 'head.weight', 'head.bias', 'layers.0.blocks.0.attn.biases', 'layers.0.blocks.0.attn.relative_position_index', 'layers.0.blocks.1.attn.biases', .....

how to train segmentation in win10

Dear author, I have a question: how to train segmentation in win10?
I used the "python train.py configs/fpn_crossformer_s_ade20k_40k.py --cfg-options pretrained/backbone-corssformer-s.pth --work-dir output --launcher pytorch" but got an error msg as follows:

Traceback (most recent call last):
File "train.py", line 152, in
main()
File "train.py", line 65, in main
args = parse_args()
File "train.py", line 57, in parse_args
args = parser.parse_args()
File "C:\Python37\lib\argparse.py", line 1755, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "C:\Python37\lib\argparse.py", line 1787, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "C:\Python37\lib\argparse.py", line 1993, in _parse_known_args
start_index = consume_optional(start_index)
File "C:\Python37\lib\argparse.py", line 1933, in consume_optional
take_action(action, args, option_string)
File "C:\Python37\lib\argparse.py", line 1861, in take_action
action(self, namespace, argument_values, option_string)
File "C:\Python37\lib\site-packages\mmcv\utils\config.py", line 739, in call
key, val = kv.split('=', maxsplit=1)
ValueError: not enough values to unpack (expected 2, got 1)

and I also tried to use ur shell (dist_train.sh) directly, but also got an error as

$ /bin/sh E:/project_c/crossformer-debug/segmentation/dist_train.sh
NOTE: Redirects are currently not supported in Windows or MacOs.
C:\Python37\lib\site-packages\torch\distributed\launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

FutureWarning,
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\torch\distributed\run.py", line 564, in determine_local_world_size
return int(nproc_per_node)
ValueError: invalid literal for int() with base 10: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Python37\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Python37\lib\site-packages\torch\distributed\launch.py", line 193, in
main()
File "C:\Python37\lib\site-packages\torch\distributed\launch.py", line 189, in main
launch(args)
File "C:\Python37\lib\site-packages\torch\distributed\launch.py", line 174, in launch
run(args)
File "C:\Python37\lib\site-packages\torch\distributed\run.py", line 709, in run
config, cmd, cmd_args = config_from_args(args)
File "C:\Python37\lib\site-packages\torch\distributed\run.py", line 617, in config_from_args
nproc_per_node = determine_local_world_size(args.nproc_per_node)
File "C:\Python37\lib\site-packages\torch\distributed\run.py", line 582, in determine_local_world_size
raise ValueError(f"Unsupported nproc_per_node value: {nproc_per_node}")
ValueError: Unsupported nproc_per_node value:

so can u give some suggestions for solutions?
Thanks.

failed reading file data

Thank you for your contribution to science. I encountered the following issues during the reproduction process:
Traceback (most recent call last):
File "/tmp/pycharm_project_864/tools/train.py", line 194, in
main()
File "/tmp/pycharm_project_864/tools/train.py", line 183, in main
train_detector(
File "/tmp/pycharm_project_864/mmdet/apis/train.py", line 185, in train_detector
runner.load_checkpoint(cfg.load_from)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 349, in load_checkpoint
return load_checkpoint(
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 627, in load_checkpoint
checkpoint = _load_checkpoint(filename, map_location, logger)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 561, in _load_checkpoint
return CheckpointLoader.load_checkpoint(filename, map_location, logger)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 303, in load_checkpoint
return checkpoint_loader(filename, map_location) # type: ignore
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 323, in load_from_local
checkpoint = torch.load(filename, map_location=map_location)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/serialization.py", line 1172, in _load
result = unpickler.load()
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/serialization.py", line 1112, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage)._typed_storage()._untyped_storage
RuntimeError: PytorchStreamReader failed reading file data/2237523104: invalid header or archive is corrupted
May I ask if the weight file of your detection is damaged?

Validation accuracy keeps to be 0.09% during training

Dear authors,

I'm interested in your paper and perfom training from scratch on ImageNet. However, the validation accuracy keeps to be * Acc@1 0.090 during training.

Do you have any idea why this happens? I train Swin Transformer, it works.

I use Pytorch 1.7.1 and 1.6.0, no mixed precision, 100 epochs.

--amp-opt-level O0 --output ./output --opts TRAIN.EPOCHS 100

Thanks,
Eddie

Crossformer for small object detect

Hello, thank you for your work. I used crossformer for small object detection in my own dataset, and the effect was very poor. Is there any way to improve the accuracy of small target detection? Tansnks very much!

detection test question

I followed the steps of the detection direction configuration, why is the detection result poor after training?

Some question about your paper and code

Hi,I'm very interested in your work about Multi-scale Attention in Transformer. but I have some questions about your work:

  1. In Appendix 2. DPB, Why do i and j parameters range from 0 to 2G-1 instead of 0 to G-1?Besides,the inputs of DPB
    module is (1-G+i, 1-G+j), What is the reason for this setting? Why not just use i and j as inputs?

  2. When I debug your code , I add a parameters due to I have only one 3090 with 24G memory, like this:

parser = argparse.ArgumentParser('CrossFormer training and evaluation script', add_help=False)
parser.add_argument('--cfg', type=str, required=True, metavar="FILE",
default='/configs/small_patch4_group7_224.yaml', help='path to config file')
parser.add_argument(
"--opts",
help="Modify config options by adding 'KEY VALUE' pairs. ",
default=None,
nargs='+'
)
# easy config modification
parser.add_argument('--batch-size', type=int, default=32, help="batch size for single GPU")
parser.add_argument('--data-set', type=str, default='flower', help='dataset to use')
parser.add_argument('--data-path', type=str, help='path to dataset', default='/media/data2/huzhen/flower_data')
parser.add_argument('--zip', action='store_true', help='use zipped dataset instead of folder dataset')
parser.add_argument('--cache-mode', type=str, default='part', choices=['no', 'full', 'part'],
help='no: no cache, '
'full: cache all data, '
'part: sharding the dataset into nonoverlapping pieces and only cache one piece')
parser.add_argument('--resume', help='resume from checkpoint', default='')
parser.add_argument('--accumulation-steps', type=int, help="gradient accumulation steps")
parser.add_argument('--use-checkpoint', action='store_true',
help="whether to use gradient checkpointing to save memory")
parser.add_argument('--amp-opt-level', type=str, default='native', choices=['native', 'O0', 'O1', 'O2'],
help='mixed precision opt level, if O0, no amp is used')
parser.add_argument('--output', default='./Flower_weights', type=str, metavar='PATH',
help='root of output folder, the full path is /<model_name>/ (default: output)')
parser.add_argument('--tag', help='tag of experiment')
parser.add_argument('--eval', action='store_true', help='Perform evaluation only')
parser.add_argument('--throughput', action='store_true', help='Test throughput only')
parser.add_argument('--num_workers', type=int, default=8, help="")
parser.add_argument('--mlp_ratio', type=int, default=4, help="")
parser.add_argument('--warmup_epochs', type=int, default=20, help="#epoches for warm up")
parser.add_argument("--local_rank", type=int, required=True, default=0, help='local rank for DistributedDataParallel')
parser.add_argument('--device', default='cuda:2',
help='device to use for training / testing')

args, unparsed = parser.parse_known_args()

but its report an error: 发生异常: SystemExit 2
The above is my parameter setting. Is there a problem?
I sincerely hope I can receive for your help!

Does CrossFormer require a fixed input size?

Hi there and thanks for the nice work.
I'm currently trying to use CrossFormer_B as my backbone in detection/instance_segmentation.
I've noticed that we need to define the img_size in the backbone configs. However, defining that can be limiting in the sense that we usually use cropping augmentations during training, or multi-scale inference at test time. Is there any way to keep these methods working with the current implementation?

I'll copy the related part of my config file down here:

model = dict(
type='CascadeRCNN',
pretrained=None,
backbone=dict(
type='CrossFormer',
img_size=[3840, 1920],
patch_size=[4, 8, 16, 32],
in_chans=3,
num_classes=7,
embed_dim=96,
depths=[2, 2, 18, 2],
num_heads=[3, 6, 12, 24],
group_size=[7, 7, 7, 7],
crs_interval=[8, 4, 2, 1],
mlp_ratio=4,
qkv_bias=True,
qk_scale=None,
drop_rate=0.0,
drop_path_rate=0.3,
patch_norm=True,
use_checkpoint=False,
merge_size=[[2, 4], [2, 4], [2, 4]]),

...
...

train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(
type='Resize',
img_scale=[(3840, 1080), (3840, 1560)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.0),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]

...
...

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.