deltacnn's People
deltacnn's Issues
DCBackend: deltacnn, delta_cudnn
what's the difference of two choices of DCBackend, deltacnn and delta_cudnn?
using sparse conv is slower
There is a module inside my network:
self.network = nn.Sequential(nn.Conv2d(7, 32, 3, padding=1),
nn.ReLU(),
nn.Conv2d(32, 32, 3, padding=1),
nn.ReLU(),
nn.Conv2d(32, 32, 3, padding=1),
nn.ReLU(),
nn.Conv2d(32, 14, 3, padding=1),
nn.ReLU()
)
I replaced it with sparse operations:
self.sparsify = dc.DCSparsify(delta_threshold=0.1, dilation=5)
self.densify = dc.DCDensify()
self.network = nn.Sequential(dc.DCConv2d(7, 32, 3, padding=1),
dc.DCActivation(activation="relu"),
dc.DCConv2d(32, 32, 3, padding=1),
dc.DCActivation(activation="relu"),
dc.DCConv2d(32, 32, 3, padding=1),
dc.DCActivation(activation="relu"),
dc.DCConv2d(32, 14, 3, padding=1),
dc.DCActivation(activation="relu")
)
Then I tested its time cost on a Titan RTX GPU:
# ...
def forward(self, inp):
torch.cuda.synchronize()
starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
starter.record()
encodes = self.densify(self.network(self.sparsify(inp)))
# encodes = self.network(inp)
ender.record()
torch.cuda.synchronize()
print(starter.elapsed_time(ender))
# ...
The original time cost is 8.2-8.5ms. However, the time cost after implementing sparse operations increased to 15ms. I am pretty sure the input frames have huge coherence. How can I fix this?
Conv2d not working for kernel size other than 3x3
First of all, thank you for your amazing work.
I am trying to run this example code:
import torch
import deltacnn as dc
model = torch.nn.Sequential(dc.DCSparsify(), dc.DCConv2d(3, 128, (7,7), 2, padding=2, bias=False), dc.DCDensify()).to('cuda')
t = 100*torch.rand(size=(1, 3, 128, 128)).to('cuda')
out = model(t)
for every kernel size other than 3x3 it gives the following error:
RuntimeError: Caught an unknown exception!
Please help!!
sparse_concatenate(x1,x2) The result doesn't seem right
Thanks for your amazing work!
When inference on the first frame, I use DCConcatenate to concatenate x1,x2 , the result is not right.
x1=[3.2004e+00, 7.1649e+00, 4.6880e+00, ..., 1.0017e+00, 2.4762e-01, 1.3532e+00]
x2=[ 2.4562, 1.1336, 2.0847, ..., 2.1969, 2.4067, 3.0504]
result=[3.2004e+00, 7.1649e+00, 0.0000e+00, ..., 1.0017e+00, 0.0000e+00, 0.0000e+00]
can you give me some advice? @dabeschte
In 16 bit mode, the output value after the 4th frame is NAN
I use the following code to run in 16-bit mode:
original_model.to(torch.float16)
dc_model.to(torch.float16)
input_batch.to(torch.float16)
But ,I get the correct resul tin 32 bit mode.
output :
original: 37.60ms, dc: 43.09ms box_diff_mean=0.060
original: 32.77ms, dc: 27.92ms box_diff_mean=0.073
original: 32.87ms, dc: 30.71ms box_diff_mean=0.097
original: 32.24ms, dc: 32.57ms box_diff_mean=nan
Encountering RuntimeError: CUDA error: an illegal memory access was encountered
I am trying to train ResNet101 backbone for my mask-rcnn network. During training, after some iteration, (sometimes 2 sometimes 4, mostly 2) session crashes without any errors. After some debugging, I have narrowed it down to line 1259
of sparse_layers.py
in DCSparsify
class.
sparsify(input, self.prev_in, x, self.mask, threshold)
I tried to print all input arguments of this function before and after this line and I got RuntimeError: CUDA error: an illegal memory access was encountered
. Since I don't have much experience in programming in Cuda c/c++, I couldn't debug further. Please help.
Can DeltaCNN be deployed with tensorrt, or is there any other way
I am trying to deploy my models with DeltaCNN,but I don't know how to start,. I am wondering if there is any easier way for deploying models such that I could directly utilize c++ for fast inference? could you give me some advice? Thanks in advance.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.