cheerss / crossformer Goto Github PK
View Code? Open in Web Editor NEWThe official code for the paper: https://openreview.net/forum?id=_PHymLIxuI
License: MIT License
The official code for the paper: https://openreview.net/forum?id=_PHymLIxuI
License: MIT License
大神好,非常感谢你们的作品。有几个小疑问:
非常感谢
I would like to know if the accuracy values output from each epoch during training are the results obtained from testing on the validation set
I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.
Here are the OpenMMLab 2.0 repos branches:
OpenMMLab 1.0 branch | OpenMMLab 2.0 branch | |
---|---|---|
MMEngine | 0.x | |
MMCV | 1.x | 2.x |
MMDetection | 0.x 、1.x、2.x | 3.x |
MMAction2 | 0.x | 1.x |
MMClassification | 0.x | 1.x |
MMSegmentation | 0.x | 1.x |
MMDetection3D | 0.x | 1.x |
MMEditing | 0.x | 1.x |
MMPose | 0.x | 1.x |
MMDeploy | 0.x | 1.x |
MMTracking | 0.x | 1.x |
MMOCR | 0.x | 1.x |
MMRazor | 0.x | 1.x |
MMSelfSup | 0.x | 1.x |
MMRotate | 1.x | 1.x |
MMYOLO | 0.x |
Attention: please create a new virtual environment for OpenMMLab 2.0.
Thanks for your excellent work!
I have a question about the pseudo code of the LSDA which was implemented with only ten lines of code, and only reshape
and permute operations are used:
if type == "SDA":
x = x.reshaspe(H // G, G, W // G, G, D).permute(0, 2, 1, 3, 4)
elif type == "LDA":
x = x.reshaspe(G, H // G, G, W // G, D).permute(1, 3, 0, 2, 4)
Although they do have difference in the way of reshaping, I still have question about the reason for this special design. Can you explain from another perspective why these two different design results correspond to different attention (Long or Short ) implementations?
Thanks a lot !
main()
File "./test.py", line 127, in main
checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 522, in load_checkpoint
checkpoint = _load_checkpoint(filename, map_location, logger)
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 466, in _load_checkpoint
return CheckpointLoader.load_checkpoint(filename, map_location, logger)
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 243, in load_checkpoint
return checkpoint_loader(filename, map_location)
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 260, in load_from_local
checkpoint = torch.load(filename, map_location=map_location)
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/torch/serialization.py", line 845, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File "/home/wangnan/anaconda3/envs/yolo-v5/lib/python3.6/site-packages/torch/serialization.py", line 833, in load_tensor
storage = zip_file.get_storage_from_record(name, size, dtype).storage()
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading file data/2154620144: invalid header or archive is corrupted
Thank you very much for your careful reply about LDA and SDA ,but I have another question about LDA and SDA for irregular
feature map。
In your paper,the LDA and SDA used for regular input image size,like 224x224 or 384x384. So the group size is default 7,
and the I is set (8, 4, 2, 1) . And for Stage-1, the I = 8, because of the need to meet GxI = feature map width/height(56x56).
However, for irregular feature map size, for example, 80 x 134, now for the group size and interval ,It seems that we can no longer design as mentioned in the paper。If the group size is 7,it's need to padding feature map to apply the group size,then the feature map size is become 84x140, the feature map reshape to [W_nG, G, H_nG, G], SDA here can be executed normally。 but for next LDA,how to set the interval I ?Besides,the feature map is irregular,so the interval I is different for width and height。How can I reasonably set the parameter interval I ?
Can you give me some advice about this question?Thanks!
Dear author, I have read your article recently:'Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework', and I am particularly interested in your article. Do you have an open source plan? Thank you for your answer.
Hi, thanks for your great work. I am using your crossformer_base as my backbone network for downstream tracking tasks. But now when I load your pre-trained model, a very correct Unexpected key(s) appears. My loading code is as follow:
ckpt = torch.load(ckpt_path, map_location='cpu')
missing_keys, unexpected_keys = backbone.body.load_state_dict(ckpt['model'], strict=False)
The result as follow:
unexpected keys: ['norm.weight', 'norm.bias', 'head.weight', 'head.bias', 'layers.0.blocks.0.attn.biases', 'layers.0.blocks.0.attn.relative_position_index', 'layers.0.blocks.1.attn.biases', .....
Since the input resolution of my task weight is not 224, can I still load the imagenet pre-training weights
$ /bin/sh E:/project_c/crossformer-debug/segmentation/dist_train.sh
NOTE: Redirects are currently not supported in Windows or MacOs.
C:\Python37\lib\site-packages\torch\distributed\launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank
argument to be set, please
change it to read from os.environ['LOCAL_RANK']
instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\torch\distributed\run.py", line 564, in determine_local_world_size
return int(nproc_per_node)
ValueError: invalid literal for int() with base 10: ''
During handling of the above exception, another exception occurred:
so can u give some suggestions for solutions?
Thanks.
when i run the program, there is a mistake:
KeyError: "EncoderDecoder: 'CrossFormer_L is not in the backbone registry'"
how can i registry this?
Thank you for your contribution to science. I encountered the following issues during the reproduction process:
Traceback (most recent call last):
File "/tmp/pycharm_project_864/tools/train.py", line 194, in
main()
File "/tmp/pycharm_project_864/tools/train.py", line 183, in main
train_detector(
File "/tmp/pycharm_project_864/mmdet/apis/train.py", line 185, in train_detector
runner.load_checkpoint(cfg.load_from)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 349, in load_checkpoint
return load_checkpoint(
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 627, in load_checkpoint
checkpoint = _load_checkpoint(filename, map_location, logger)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 561, in _load_checkpoint
return CheckpointLoader.load_checkpoint(filename, map_location, logger)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 303, in load_checkpoint
return checkpoint_loader(filename, map_location) # type: ignore
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 323, in load_from_local
checkpoint = torch.load(filename, map_location=map_location)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/serialization.py", line 1172, in _load
result = unpickler.load()
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/serialization.py", line 1112, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage)._typed_storage()._untyped_storage
RuntimeError: PytorchStreamReader failed reading file data/2237523104: invalid header or archive is corrupted
May I ask if the weight file of your detection is damaged?
Dear authors,
I'm interested in your paper and perfom training from scratch on ImageNet. However, the validation accuracy keeps to be * Acc@1 0.090 during training.
Do you have any idea why this happens? I train Swin Transformer, it works.
I use Pytorch 1.7.1 and 1.6.0, no mixed precision, 100 epochs.
--amp-opt-level O0 --output ./output --opts TRAIN.EPOCHS 100
Thanks,
Eddie
Hello, thank you for your work. I used crossformer for small object detection in my own dataset, and the effect was very poor. Is there any way to improve the accuracy of small target detection? Tansnks very much!
I followed the steps of the detection direction configuration, why is the detection result poor after training?
Hi,I'm very interested in your work about Multi-scale Attention in Transformer. but I have some questions about your work:
In Appendix 2. DPB, Why do i and j parameters range from 0 to 2G-1 instead of 0 to G-1?Besides,the inputs of DPB
module is (1-G+i, 1-G+j), What is the reason for this setting? Why not just use i and j as inputs?
When I debug your code , I add a parameters due to I have only one 3090 with 24G memory, like this:
parser = argparse.ArgumentParser('CrossFormer training and evaluation script', add_help=False)
parser.add_argument('--cfg', type=str, required=True, metavar="FILE",
default='/configs/small_patch4_group7_224.yaml', help='path to config file')
parser.add_argument(
"--opts",
help="Modify config options by adding 'KEY VALUE' pairs. ",
default=None,
nargs='+'
)
# easy config modification
parser.add_argument('--batch-size', type=int, default=32, help="batch size for single GPU")
parser.add_argument('--data-set', type=str, default='flower', help='dataset to use')
parser.add_argument('--data-path', type=str, help='path to dataset', default='/media/data2/huzhen/flower_data')
parser.add_argument('--zip', action='store_true', help='use zipped dataset instead of folder dataset')
parser.add_argument('--cache-mode', type=str, default='part', choices=['no', 'full', 'part'],
help='no: no cache, '
'full: cache all data, '
'part: sharding the dataset into nonoverlapping pieces and only cache one piece')
parser.add_argument('--resume', help='resume from checkpoint', default='')
parser.add_argument('--accumulation-steps', type=int, help="gradient accumulation steps")
parser.add_argument('--use-checkpoint', action='store_true',
help="whether to use gradient checkpointing to save memory")
parser.add_argument('--amp-opt-level', type=str, default='native', choices=['native', 'O0', 'O1', 'O2'],
help='mixed precision opt level, if O0, no amp is used')
parser.add_argument('--output', default='./Flower_weights', type=str, metavar='PATH',
help='root of output folder, the full path is /<model_name>/ (default: output)')
parser.add_argument('--tag', help='tag of experiment')
parser.add_argument('--eval', action='store_true', help='Perform evaluation only')
parser.add_argument('--throughput', action='store_true', help='Test throughput only')
parser.add_argument('--num_workers', type=int, default=8, help="")
parser.add_argument('--mlp_ratio', type=int, default=4, help="")
parser.add_argument('--warmup_epochs', type=int, default=20, help="#epoches for warm up")
parser.add_argument("--local_rank", type=int, required=True, default=0, help='local rank for DistributedDataParallel')
parser.add_argument('--device', default='cuda:2',
help='device to use for training / testing')
args, unparsed = parser.parse_known_args()
but its report an error: 发生异常: SystemExit 2
The above is my parameter setting. Is there a problem?
I sincerely hope I can receive for your help!
Sorry,I have just watched the model structure,would you mind telling me the difference between CrossFormer and swin_transformer?
Hi there and thanks for the nice work.
I'm currently trying to use CrossFormer_B as my backbone in detection/instance_segmentation.
I've noticed that we need to define the img_size in the backbone configs. However, defining that can be limiting in the sense that we usually use cropping augmentations during training, or multi-scale inference at test time. Is there any way to keep these methods working with the current implementation?
I'll copy the related part of my config file down here:
model = dict(
type='CascadeRCNN',
pretrained=None,
backbone=dict(
type='CrossFormer',
img_size=[3840, 1920],
patch_size=[4, 8, 16, 32],
in_chans=3,
num_classes=7,
embed_dim=96,
depths=[2, 2, 18, 2],
num_heads=[3, 6, 12, 24],
group_size=[7, 7, 7, 7],
crs_interval=[8, 4, 2, 1],
mlp_ratio=4,
qkv_bias=True,
qk_scale=None,
drop_rate=0.0,
drop_path_rate=0.3,
patch_norm=True,
use_checkpoint=False,
merge_size=[[2, 4], [2, 4], [2, 4]]),
...
...
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(
type='Resize',
img_scale=[(3840, 1080), (3840, 1560)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.0),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
...
...
Thanks,
I have read your paper and noticed that CEL was used in all stages.
But I found in codes that you used it in the first stage only.
Am I mistaken?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.