Comments (7)
The model you posted has been inspected, in which the size of buffers is only 36KB. So, the only way to lower the size of the model is to rewrite the GRU operation using tfl.UnidirectionalGRU
or subgraph related ops like tfl.While
and tfl.Call
. Also, the implementation of separated_rnn_gate_calc=False
for GRU may help.
from tinyneuralnetwork.
The model you posted has been inspected, in which the size of buffers is only 36KB. So, the only way to lower the size of the model is to rewrite the GRU operation using
tfl.UnidirectionalGRU
or subgraph related ops liketfl.While
andtfl.Call
. Also, the implementation ofseparated_rnn_gate_calc=False
for GRU may help.
- In contrast of LSTM, since nt relies on rt, it looks unlikely to put GRU's weights together because you need to split rt from the output(Maybe I'm wrong). Will it take effect? I'm worried about the performance of optimizing this part.
- It seems that developing
tfl.UnidirectionalGRU
as a custom op needs another compilation, which means the new integrated op cannot be executed on others' PC and it brings me a heavy burden since we only use TFLite as an intermediate format. - With above concerns, what's your suggestion and what's the fastest way as you think? Looking forward to your reply.
from tinyneuralnetwork.
In contrast of LSTM, since nt relies on rt, it looks unlikely to put GRU's weights together because you need to split rt from the output(Maybe I'm wrong). Will it take affect? I'm worried about the performance of optimizing this part.
# separated_rnn_gate_calc=False
rzt_left = FC_i{r,z}(x)
rzt_right = FC_h{r,z}(h)
rzt_sum = rzt_left + rzt_right
rzt = sigmoid(rzt_sum)
rt, zt = split(rzt, 2)
# separated_rnn_gate_calc=True
rt_left = FC_ir(x)
rt_right = FC_hr(h)
rt_sum = rt_left + rt_right
rt = sigmoid(rzt_sum)
zt_left = FC_zr(x)
zt_right = FC_hz(h)
zt_sum = zt_left + zt_right
zt = sigmoid(zt_sum)
So it will be optimized from 8 ops (10 tensors) to 5 ops (8 tensors) for each time step.
It seems that developing
tfl.UnidirectionalGRU
as a custom op needs another compilation, which means the new integrated op cannot be executed on others' PC and it brings me a heavy burden since we only use TFLite as an intermediate format.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/core/kernels/register.cc
Unfortunately, it is a custom op as of now (May 30, 2024).
With above concerns, what's your suggestion and what's the fastest way as you think? Looking forward to your reply.
I don't know. It depends on your needs. It the target model with its size around 80-100K is desired, I guess separated_rnn_gate_calc=False
should be enough. But if you want something lower than that, then subgraph related things are your only hope.
from tinyneuralnetwork.
So it will be optimized from 8 ops (10 tensors) to 5 ops (8 tensors) for each time step.
Take a glance at the previous implementation of AtenGRUOperator
, it definitely has some space(6 FC) for further optimization(4FC or even 2FC). And it sounds very easy for me to realize the separated_rnn_gate_calc=False
option because I have realized it before(though failed to pass bidirectional test).
I don't know. It depends on your needs. It the target model with its size around 80-100K is desired, I guess separated_rnn_gate_calc=False should be enough. But if you want something lower than that, then subgraph related things are your only hope.
I'm quite interested in translating such challengable builtin operators. Based on my current ability, may I ask what the procedure it will be to support tfl.while
, I have no answer now due to my limitation of understanding.
from tinyneuralnetwork.
I'm quite interested in translating such challengable builtin operators. Based on my current ability, may I ask what the procedure it will be to support
tfl.while
, I have no answer now due to my limitation of understanding.
Well, it is doable and not hard at all if only GRU is involved, but for a better design, it takes some time.
from tinyneuralnetwork.
Well, it is doable and not hard at all if only GRU is involved, but for a better design, it takes some time.
One week is enough? I'm available 24/7 as long as it can be realized incrementally, lol. You can focus on your business first and any instruction is helpful to me when you free.
P.S. not only float structure but also quantized gru with while
can be supported
from tinyneuralnetwork.
One week is enough? I'm available 24/7 as long as it can be realized incrementally, lol. You can focus on your business first and any instruction is helpful to me when you free.
Well, I cannot guarantee on that. But I can do QA and guide you throughout the process.
P.S. not only float structure but also quantized gru with
while
can be supported
This is just copy paste I think. The main difficulty is to add a new subgraph for the GRU operation, so doesn't bother.
from tinyneuralnetwork.
Related Issues (20)
- KeyError when executing quantization HOT 5
- PyTorch 转 TFLite 使用 int8 量化 HOT 4
- Does tinynn support following int16 quantization? HOT 1
- jit.trace succeed but tinynn tracer failed HOT 1
- It became larger after converting to tflite model HOT 4
- how to do Post-training integer quantization with int16 activation HOT 4
- unnecessary float() variables cause quantization to fail. HOT 7
- aten::index nodes take multiple indices in PyTorch model but cause an error when trying to convert to TFLite HOT 1
- aten::repeat_interleave is considered an unsupported Tensor and causing an error when trying to convert to TFLite HOT 2
- convert model error HOT 5
- 请问下 转tflite 模型能支持签名吗? HOT 9
- [PTQ Converter] 'Linear+relu' module conversion failed HOT 9
- [quantizer] activation nodes that was used multiple times will not work with OP fusion
- [converter] aten::pad with mode="circular"
- [converter] batchnorm + conv fusion
- ONNX进行QAT报错 HOT 3
- OneShot Pruning Tensor Size Mismatch Error HOT 4
- 关于模型中使用了nn.Parameter()会导致转TFlite会失败的问题 HOT 5
- Huge loss when applying QAT on YOLOv8n HOT 27
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tinyneuralnetwork.