dabeschte / deltacnn Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 3.0 3.23 MB

Python 19.12% Cuda 77.67% C++ 3.21%

deltacnn's People

Contributors

Stargazers

Watchers

Forkers

jiaerwang0328 lukasc-ch-archive sankeerth95

deltacnn's Issues

DCBackend: deltacnn, delta_cudnn

what's the difference of two choices of DCBackend, deltacnn and delta_cudnn?

using sparse conv is slower

There is a module inside my network:

self.network = nn.Sequential(nn.Conv2d(7, 32, 3, padding=1),
                             nn.ReLU(),
                             nn.Conv2d(32, 32, 3, padding=1),
                             nn.ReLU(),
                             nn.Conv2d(32, 32, 3, padding=1),
                             nn.ReLU(),
                             nn.Conv2d(32, 14, 3, padding=1),
                             nn.ReLU()
                             )

I replaced it with sparse operations:

self.sparsify = dc.DCSparsify(delta_threshold=0.1, dilation=5)
self.densify = dc.DCDensify()
self.network = nn.Sequential(dc.DCConv2d(7, 32, 3, padding=1),
                             dc.DCActivation(activation="relu"),
                             dc.DCConv2d(32, 32, 3, padding=1),
                             dc.DCActivation(activation="relu"),
                             dc.DCConv2d(32, 32, 3, padding=1),
                             dc.DCActivation(activation="relu"),
                             dc.DCConv2d(32, 14, 3, padding=1),
                             dc.DCActivation(activation="relu")
                             )

Then I tested its time cost on a Titan RTX GPU:

# ...
def forward(self, inp):
    torch.cuda.synchronize()
    starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
    starter.record()

    encodes = self.densify(self.network(self.sparsify(inp)))
    # encodes = self.network(inp)

    ender.record()
    torch.cuda.synchronize()
    print(starter.elapsed_time(ender))
    # ...

The original time cost is 8.2-8.5ms. However, the time cost after implementing sparse operations increased to 15ms. I am pretty sure the input frames have huge coherence. How can I fix this?

Conv2d not working for kernel size other than 3x3

First of all, thank you for your amazing work.

I am trying to run this example code:

import torch
import deltacnn as dc

model = torch.nn.Sequential(dc.DCSparsify(), dc.DCConv2d(3, 128, (7,7), 2, padding=2, bias=False), dc.DCDensify()).to('cuda')
t = 100*torch.rand(size=(1, 3, 128, 128)).to('cuda')

out = model(t)

for every kernel size other than 3x3 it gives the following error:

RuntimeError: Caught an unknown exception!

Please help!!

sparse_concatenate(x1,x2) The result doesn't seem right

Thanks for your amazing work！
When inference on the first frame， I use DCConcatenate to concatenate x1,x2 , the result is not right.
x1=[3.2004e+00, 7.1649e+00, 4.6880e+00, ..., 1.0017e+00, 2.4762e-01, 1.3532e+00]
x2=[ 2.4562, 1.1336, 2.0847, ..., 2.1969, 2.4067, 3.0504]
result=[3.2004e+00, 7.1649e+00, 0.0000e+00, ..., 1.0017e+00, 0.0000e+00, 0.0000e+00]
can you give me some advice？ @dabeschte

In 16 bit mode, the output value after the 4th frame is NAN

I use the following code to run in 16-bit mode：

original_model.to(torch.float16)
dc_model.to(torch.float16)
input_batch.to(torch.float16)

But ,I get the correct resul tin 32 bit mode.
output :

original: 37.60ms, dc: 43.09ms box_diff_mean=0.060     
original: 32.77ms, dc: 27.92ms box_diff_mean=0.073     
original: 32.87ms, dc: 30.71ms box_diff_mean=0.097     
original: 32.24ms, dc: 32.57ms box_diff_mean=nan

Encountering RuntimeError: CUDA error: an illegal memory access was encountered

I am trying to train ResNet101 backbone for my mask-rcnn network. During training, after some iteration, (sometimes 2 sometimes 4, mostly 2) session crashes without any errors. After some debugging, I have narrowed it down to line 1259 of sparse_layers.py in DCSparsify class.

sparsify(input, self.prev_in, x, self.mask, threshold)

I tried to print all input arguments of this function before and after this line and I got RuntimeError: CUDA error: an illegal memory access was encountered. Since I don't have much experience in programming in Cuda c/c++, I couldn't debug further. Please help.

Can DeltaCNN be deployed with tensorrt, or is there any other way

I am trying to deploy my models with DeltaCNN,but I don't know how to start,. I am wondering if there is any easier way for deploying models such that I could directly utilize c++ for fast inference? could you give me some advice? Thanks in advance.

dabeschte / deltacnn Goto Github PK

deltacnn's People

Contributors

Stargazers

Watchers

Forkers

deltacnn's Issues

DCBackend: deltacnn, delta_cudnn

using sparse conv is slower

Conv2d not working for kernel size other than 3x3

sparse_concatenate(x1,x2) The result doesn't seem right

In 16 bit mode, the output value after the 4th frame is NAN

Encountering RuntimeError: CUDA error: an illegal memory access was encountered

Can DeltaCNN be deployed with tensorrt, or is there any other way

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs