matpalm / cached_dilated_causal_convolutions Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 0.0 7.67 MB

1D dilated causal convolutions with extreme caching for 5µs inference on an FPGA

Home Page: https://matpalm.com/blog/wavenet_on_fpga/

License: MIT License

C++ 0.74% Jupyter Notebook 91.31% Python 5.27% Makefile 0.24% C 0.40% Shell 0.11% SystemVerilog 1.94%

cached_dilated_causal_convolutions's People

Contributors

Stargazers

Watchers

cached_dilated_causal_convolutions's Issues

incorporate feedback from josh re: verilog design

josh provided this awesome feedback, https://gist.github.com/jwise/5619bf4271fc41f9ae14d31b3b1531f4

need to digest and work through this, will likely result in a large ( and much better ) rewrite

HP filter for discontinuity cases?

demo with ramp or square as core input shows how glitcvhy it is with a discontinuity
maybe a HP filter would help? can prototype by trying with other euro filter and if it looks worth it, can add in module from eurorack pmod for it.

generate synthetic training data

the current project builds on a previous one where i was collecting data for another task. but if the goal is just wave shaping the initital training data can be simply generated from scratch synthetically ( with augmentation noise etc )

optimise implementation

po2

i'm 90% sure the way i've done the po2 multiply, especially w.r.t using single bits in memory for is_negative etc, could be done in a much more effecient way. it feels wrong to optimise something to the level of shift operators, but require a lookup for the value to shift by ( maybe i'm missing something re: compilation etc )
- note: an extreme value of this could be writing specific modules per multiply with the shift and ( possible negation ) logic built in...

qb

in the qb_ network we have >50% free cycles, but not many free multiple DSPs. so how best to restructure the modules so the same module can be clocked twice with different sets of weights?

reuse inner conv

the v1.0 release code is clumsy and even though we have cycles left we can't add another conv.

but we can reuse one using tied weights

so we instead of training input -> conv0 -> output we could train ``input -> conv0 -> conv0 -> output` by just reusing qconv layer which will train with tied weights, then at inference time we just need 1 additional activation cache and the ability to switch in/out for the inner cache ( which i guess is done with registering? )

skip next sample when network still running?

current v1.0 behaviour is to reset network each sample_clk
but if the network hasn't finished running, this means we just reset and never emit anything

should we set/check a network_running bit so that if sample_clk occurs we can decide to skip it?
or will this just then the same as running at half the clock speed?

do eurorack pmod calibration properly

couldn't work out the right way to do calibration so ended up with a hack on both the verilog and training side :/

from notes ( see 2023 10 17 )

sending +5V => 20,000    ( that's from the COUNT_PER_VOLT = 4000 ) 

...

just realised that +/-5V => +/- 20_000
could be mapped to +/-5V => +/- 5_000 with  >>> 2;
and then back with << 2 ?

this works. and 5000 is close to 4096...

so can we need to rescale on the way in?

i.e. we want 5V = 16_384    ( since 16384 >> 2 = 4096 )
so we want to 0.8192

currently 16384 is 4.096V
so we want to divide everything in the data by 

in fact, even simpler....

if we map everything by >>> 2 coming in  ( and by << 2 on way out )
then we have 5V = 5000 = 0x1388 = 1.220703125

the data is based on 8V=1.0, so 5V = 0.625
so we just need to multiply all the data by 1.220703125/(5/8)=1.953125 
during training....

means the max the net will take, or output, is +/- 1.22 but this should be fine and is well with FP4.12

matpalm / cached_dilated_causal_convolutions Goto Github PK

cached_dilated_causal_convolutions's People

Contributors

Stargazers

Watchers

cached_dilated_causal_convolutions's Issues

incorporate feedback from josh re: verilog design

HP filter for discontinuity cases?

generate synthetic training data

optimise implementation

po2

qb

reuse inner conv

skip next sample when network still running?

do eurorack pmod calibration properly

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs