ray075hl / attention-ocr-toy-example Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
代码里训练阶段使用TrainingHelper,这样decode的输入就是target,但带来一个问题,就是测试的时候是没有target的,这样训练与测试的环境差的很大。实际我发现,训练阶段准确率很高,但测试的时候准确率很低。所以能不能在训练的时候用decode的输出当做输入?而不是用target当做输入
在生成attention数据时,train_output和target_output不太理解为什么要这么定义,还有就是为什么一开始定义Y,YY都是-2,然后最终又都要+3,图片的标签不应该时图片上是什么数字,标签就是什么数字吗,Y,YY用意何在?,刚开始接触attention,不是很清楚。
Hi, @ray075hl
Have you compared CTC and attention results using ctc&attention model? In the related study,CTC was found to play an auxiliary role.
but...In actual results, CTC results were better than attention results. Do you have an opinion on this?
ps.
related paper : JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING
你指的是输入图片的长度还是 label的长度?
attention 机制应该需要使用encoder的hide states,但是我看代码里enc_state没有返回,decoder也没有用enc_state初始化?这里是不是有问题?
您好,请问下CTC_model里面的feature_length为什么是固定的29呢,我替换为实际的label sequence的长度反而会报错。还有就是是否CTC_model的project_out用np.argmax(project_out, axis = 1)就应该是预测的结果?我在训练的时候输出的loss降低的很快,但是按照上述方式解码出来看上去只有少数两三个字符顺序是正确的。
期待您解惑,非常感谢!
tf.contrib.seq2seq.LuongAttention(num_units=cfg.RNN_UNITS, memory=memory)这里的memory是什么意思?我看代码是把encode的输出当作memory的,另外我把tf.contrib.seq2seq.BasicDecoder(
cell=attn_cell, helper=helper,
initial_state=attn_cell.zero_state(dtype=tf.float32, batch_size=cfg.BATCH_SIZE).clone(cell_state=enc_state[0]),
output_layer=output_layer)
里面的state从0改为encode的状态,效果好了很多,这是为什么?
hi
..
File "../model.py", line 228, in _att_decode
att_outputs, _. _ = tf.contrib.seq2seq.dynamic_decode( decoder=decoder, output_time_major=False, impute_finished=True, maximum_iterations=self.params.attention_iteration)
..
ValueError: Shape must be rank 1 but is rank 2 for '..../while/BasicDecoderStep/decoder/attention_wrapper/concat' (op: 'ConcatV2') with input shapes: [64], [64,256], [].
do you kow this problem?
thanks.
楼主你好,我测试了一下您提供的代码,在3位数图像精度比较高,但是4位数和5位数就比较差了。
这个问题有什么好的解决办法吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.