GithubHelp home page GithubHelp logo

labsaint / spd-conv Goto Github PK

View Code? Open in Web Editor NEW
224.0 224.0 25.0 191 KB

Code for ECML PKDD 2022 paper: No More Strided Convolutions or Pooling: A Novel CNN Architecture for Low-Resolution Images and Small Objects

License: MIT License

Python 99.40% Shell 0.43% Dockerfile 0.17%

spd-conv's People

Contributors

raja-sunkara avatar rajasunkara avatar tluocs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

spd-conv's Issues

Some Questions about the FLOPs?

I validated complexity of your 's' the model (YOLOv5-SPD-s) mentioned in your paper, but the results show that the flops of this model are extremely huge compared with same input size of yolov5-s. the detailed information are:
YOLOv5-SPD-s: Model Summary: 277 layers, 8771389 parameters, 8771389 gradients, 33.9 GFLOPs
YOLOv5-s: Model Summary: 270 layers, 7235389 parameters, 7235389 gradients, 16.5 GFLOPs
so, I wonder what the value of your idea in paper which is intended to improve the detection performance of some small targets at the cost of increase of model computation complexity?

question about inference time

I tested SPD-yolov5s and yolov5s in RK3588 which is a platform using NPU,but I find that yolov5s with 30 FPS was faster than SPD-yolov5s with 7 FPS,so what's the reason or how to improve it?thank you!

Runtime error while testing yolov7 on testing image after adding SPD-Conv

!python detect.py --weight runs/train/yolov7-carspd9/weights/best.pt --conf 0.5 --img-size 224 --device 1 --source 9.jpg
File "detect.py", line 196, in
detect()
File "detect.py", line 39, in detect
model = TracedModel(model, device, opt.img_size)
File "/root/code/yolov7/utils/torch_utils.py", line 362, in init
traced_script_module = torch.jit.trace(self.model, rand_example, strict=False)
File "/root/anaconda3/envs/cuda11_1/lib/python3.8/site-packages/torch/jit/_trace.py", line 735, in trace
return trace_module(
File "/root/anaconda3/envs/cuda11_1/lib/python3.8/site-packages/torch/jit/_trace.py", line 952, in trace_module
module._c._create_method_from_trace(
File "/root/anaconda3/envs/cuda11_1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/envs/cuda11_1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
result = self.forward(*input, **kwargs)
File "/root/code/yolov7/models/yolo.py", line 599, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/root/code/yolov7/models/yolo.py", line 625, in forward_once
x = m(x) # run
File "/root/anaconda3/envs/cuda11_1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/envs/cuda11_1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
result = self.forward(*input, **kwargs)
File "/root/code/yolov7/models/common.py", line 282, in forward
return torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1)
RuntimeError: torch.cat(): Sizes of tensors must match except in dimension 1. Got 4 and 3 in dimension 2 (The offending index is 1)

extending to 3D?

Hi,i think this idea is amazing. If i want to extend it to 3d to solve some question, such as video, should i add a dimension or just use it on every feature map in all channel

About space_to_depth

Hello, thanks for your significant contribution. I have a question about space_to_depth fuction.
When space_to_depth is used in model, the channel will be expand to 4X. For the next conv, the filter become 4X bigger.
My question is: Need I add a new conv to sqeenze the channel to 2X , because the stride=2 conv often expand channel to 2X or bigger than X,
Can you give me some advise?
Thanks again!!!

Requirements

Why is there everything twice in your requirements file for the yolov5?

Equivalent to Conv2D with kernel size S and stride S?

Hi,

It's not clear to me what is the difference between SPD-Conv and Conv with stride S and kernel size S. I.e. this is what many transformer papers (e.g. Swin) use as their "patchify" stem.

It's also used in their downsampling layer. Although, they formulate it without using Conv2D (to avoid NHWC to NCHW reshape) but it is conceptually equivalent to Conv2D with stride 2 and kernel size 2.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.