sml2h3 / dddd_trainer Goto Github PK

View Code? Open in Web Editor NEW

744.0 14.0 305.0 133 KB

ddddocr训练工具

License: Apache License 2.0

Python 100.00%

ddddocr

dddd_trainer's People

Contributors

Stargazers

Watchers

Forkers

qq20967582 matrixhan wingeva1986 ecalose 290149290 skeyxywl idforhyit wang-weart qyzxg kado0413 cqxstevexw kuustudio onegithuber debugallthetime asuralove zhenliangli yinsuhu ptcgh blueicesir fire-star aivoo ys11i fanhuafeng chenpython djun lusi1990 jinnrry wy1k huyang218 lee003 jager-man guroto gongqf cherishhope hackcat wut-fage xlaoshu alex-zeng dragon-brother wy2919 cyberpolaris ipxplay s-kewen wecool magiccode-byte daqiv tianrking walkingsanduo 1290799223 joerh99 y11en colorsssss gcs-zhn simonliu009 jaciyu lfyg scp11 fenglilinglegeluan pengge pesh178 zzzsui3 h1code2 duquancai 97648077 jurafish imhsz cwoner uotogk xla145 squallxgithub a627414850 c93614 qtccz zbx911 kknet a976606645 newgate1983 zhaojiafu yilvshifeng esword618 crewcutbro lo5252 billlv zjwhy suzuke sheldonldev driphub chevalavala chldong frankzzziii blueroutecn g3g4x5x6 caojing20201203 warlock1994 huiyongbao hot1232 zbwh-koukou amoly-cn jbxdlele yansan184

dddd_trainer's Issues

message print error in cache_data.py

dddd_trainer/utils/cache_data.py

Line 106 in 59e236d

logger.error("val setting vaild!")

Should this be val setting invalid ?

多通道图片预处理BUG

复现过程：
1、训练配置文件设置为3个通道（彩色图训练）
2、训练完成以后使用DDDDOCR项目运行模型

首先程序报错同：
#2

看了下源码，错误位于
ddddocr/init.py:1629
将这一行改为强制指定类型
ort_inputs = {'input1': np.array([image], dtype=np.float32)}

再次运行程序又报错：

INVALID_ARGUMENT : Invalid rank for input: input1 Got: 5 Expected: 4 Please fix either the inputs or the model.

我在这里加了一行输出了input1的shape

print(ort_inputs['input1'].shape) # 输出： (1, 1, 160, 649, 3)

我猜测模型的输入数据应该是按单通道图片定义的，因此是(1,1,160,649) 但是我使用彩色图训练，图像预处理和单通道图片应该不一致。

But ，我不知道作者是怎么设计彩色图片的预处理的，这里我不知道该怎么改了

建议创建项目的时候检查名称

建议执行python app.py create {project_name}命令的时候，检查一下project_name中是否包含下划线，包含下划线的时候抛个错误，不要让创建了

原因：
https://github.com/sml2h3/dddd_trainer/blob/main/utils/train.py#L58

在这个地方加载checkpoints的时候，使用下划线分割文件名，如果自己项目名里面包含下划线，那么这里将加载失败。最终就会像我这里，训练了几天的模型，再次训练的时候加载不成功

因磁盘满而中断后,无法自动恢复

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

2023-07-26 09:59:13.215 | INFO     | __main__:__init__:12 - 
Hello baby~
2023-07-26 09:59:13.216 | INFO     | __main__:train:26 - 
Start Train ----> images98

2023-07-26 09:59:13.221 | INFO     | utils.train:__init__:41 - 
Taget:
min_Accuracy: 0.97
min_Epoch: 20
max_Loss: 0.05
2023-07-26 09:59:13.221 | INFO     | utils.train:__init__:45 - 
USE GPU ----> 0
2023-07-26 09:59:13.221 | INFO     | utils.train:__init__:52 - 
Search for history checkpoints...
Traceback (most recent call last):
  File "/www/wwwroot/dddd_trainer/app.py", line 33, in <module>
    fire.Fire(App)
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/fire/core.py", line 480, in _Fire
    target=component.__name__)
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/www/wwwroot/dddd_trainer/app.py", line 27, in train
    trainer = train.Train(project_name)
  File "/www/wwwroot/dddd_trainer/utils/train.py", line 63, in __init__
    os.path.join(self.checkpoints_path, newer_checkpoint), self.device)
  File "/www/wwwroot/dddd_trainer/nets/__init__.py", line 223, in load_checkpoint
    param = torch.load(path, map_location=device)
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/serialization.py", line 600, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/serialization.py", line 242, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

训练的数据集可以开源吗

首先非常感谢提供这么优秀的开源库，但是在自己训练的时候效果没有库自带的训练集效果好，所以想问一下训练集可以开源吗？对于识别不是太好的，我们再自己增加样本进行训练

这个报错是啥意思

2023-07-26 04:37:11.717 | INFO     | utils.train:start:110 - [2023-07-26-04_37_11]	Epoch: 35952	Step: 1725700	LastLoss: 0.00010419684986118227	AvgLoss: 0.00012866455923358445	Lr: 2.7344957135266526e-10
2023-07-26 04:37:14.401 | INFO     | utils.train:start:110 - [2023-07-26-04_37_14]	Epoch: 35954	Step: 1725800	LastLoss: 0.00011213675315957516	AvgLoss: 0.00012821589938539547	Lr: 2.7344957135266526e-10
2023-07-26 04:37:17.084 | INFO     | utils.train:start:110 - [2023-07-26-04_37_17]	Epoch: 35956	Step: 1725900	LastLoss: 0.00011715076107066125	AvgLoss: 0.00012866744575148914	Lr: 2.7344957135266526e-10
Traceback (most recent call last):
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/serialization.py", line 379, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/serialization.py", line 499, in _save
    zip_file.write_record(name, storage.data_ptr(), num_bytes)
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/www/wwwroot/dddd_trainer/app.py", line 33, in <module>
    fire.Fire(App)
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/fire/core.py", line 480, in _Fire
    target=component.__name__)
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/www/wwwroot/dddd_trainer/app.py", line 28, in train
    trainer.start()
  File "/www/wwwroot/dddd_trainer/utils/train.py", line 120, in start
    "epoch": self.epoch, "step": self.step, "lr": lr})
  File "/www/wwwroot/dddd_trainer/nets/__init__.py", line 188, in save_model
    torch.save(net, path)
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/serialization.py", line 380, in save
    return
  File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/serialization.py", line 259, in __exit__
    self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:300] . unexpected pos 17094848 vs 17094736
terminate called after throwing an instance of 'c10::Error'
  what():  [enforce fail at inline_container.cc:300] . unexpected pos 17094848 vs 17094736
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47 (0x7fe49ddc3ae7 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x2797840 (0x7fe4e3e5c840 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x2792e1c (0x7fe4e3e57e1c in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #3: caffe2::serialize::PyTorchStreamWriter::writeRecord(std::string const&, void const*, unsigned long, bool) + 0xb5 (0x7fe4e3e5fa85 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #4: caffe2::serialize::PyTorchStreamWriter::writeEndOfFile() + 0x173 (0x7fe4e3e5fd73 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #5: caffe2::serialize::PyTorchStreamWriter::~PyTorchStreamWriter() + 0x125 (0x7fe4e3e5ffe5 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0xb43ee3 (0x7fe56504eee3 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x2a73f8 (0x7fe5647b23f8 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x2a86fe (0x7fe5647b36fe in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #9: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x480622]
frame #10: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x434697]
frame #11: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x4346a7]
frame #12: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x4346a7]
frame #13: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x4346a7]
frame #14: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x4346a7]
frame #15: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x4346a7]
frame #16: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x4346a7]
frame #17: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x4346a7]
frame #18: PyDict_SetItemString + 0x3b7 (0x4a2647 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3)
frame #19: PyImport_Cleanup + 0x71 (0x565771 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3)
frame #20: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x421d98]
frame #21: Py_Main + 0x640 (0x43b7d0 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3)
frame #22: main + 0x162 (0x41d982 in /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3)
frame #23: __libc_start_main + 0xf5 (0x7fe57390c555 in /lib64/libc.so.6)
frame #24: /www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/bin/python3() [0x41da40]

已放弃
[root@RTX3090 dddd_trainer]#

电脑是2060显卡能搞吗

我是一个小白，想搞个这东西玩玩，弄了3天了，环境一直没搞好，有没有大神心情好能给指导一下，感激不尽，最好能列一个清单，包括python版本，cuda版本和caduu版本，本人电脑是惠普的暗影精灵，显卡3050，8G

StopIteration 問題

只要加入某張圖片就會出現StopIteration，但我完全不知道這圖片有什麼問題，都是用同樣的手法採集的。

：

Traceback (most recent call last):
  File "E:\Daz3D Workshop\Enhance_Queue\_dddd_trainer\utils\train.py", line 124, in start
    test_inputs, test_labels, test_labels_length = next(val_iter)
  File "C:\Users\T1me\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
    data = self._next_data()
  File "C:\Users\T1me\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 560, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "C:\Users\T1me\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 512, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\Daz3D Workshop\Enhance_Queue\_dddd_trainer\app.py", line 33, in <module>
    fire.Fire(App)
  File "C:\Users\T1me\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "C:\Users\T1me\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "C:\Users\T1me\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "E:\Daz3D Workshop\Enhance_Queue\_dddd_trainer\app.py", line 28, in train
    trainer.start()
  File "E:\Daz3D Workshop\Enhance_Queue\_dddd_trainer\utils\train.py", line 128, in start
    test_inputs, test_labels, test_labels_length = next(val_iter)
  File "C:\Users\T1me\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
    data = self._next_data()
  File "C:\Users\T1me\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 560, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "C:\Users\T1me\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 512, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
StopIteration

采用ddddocr训练的模型时数据类型报错！

训练的模型，进行识别的时候数据类型报错！
代码如下：
`import ddddocr

ocr = ddddocr.DdddOcr()

ocr = ddddocr.DdddOcr(det=False, ocr=False, import_onnx_path="testocr_1.0_28_19000_2022-03-21-20-44-59.onnx", charsets_path="charsets.json")

with open("0CB7_1644684875.png", 'rb') as f:
image = f.read()

res = ocr.classification(image)
print(res)`

报错输出内容如下：

2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(double)) , expected: (tensor(float))

Traceback (most recent call last):
File "D:\python\ddddocr\1-验证码识别.py", line 10, in
res = ocr.classification(image)
File "C:\Program Files\Python39\lib\site-packages\ddddocr_init_.py", line 1629, in classification
ort_outs = self.__ort_session.run(None, ort_inputs)
File "C:\Users\GCB\AppData\Roaming\Python\Python39\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 195, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(double)) , expected: (tensor(float))

请教一下该训练是否支持多GPU

我查看配置文件有个GPU_ID，尝试填写0,1,2,3,4貌似不行，是配置有误，还是本身不支持多GPU并行。

nets文件夹下的init.py的“export_onnx”函数多出了一个参数

如上两图，如果多出了“_retain_param_name=False”参数在训练完成导出模型时会报错，将其去掉后就成功了。

如何增量训练

首先，dddd，yyds
其次，模型训练好后，如何在原有基础上新增数据集继续训练？

为啥我的acc通过率一直为0啊，没改过代码

用的是上面的那样的图片，打上标注之后，训练了好久但是通过率一直为0 avgloss 什么的都下降了，就是acc一直不动，看了下tester处的代码，correct_list 这个值一直为空，有没有大佬告诉下我是什么原因呢？

用的是上面的那样的图片，打上标注之后，训练了好久但是通过率一直为0 avgloss 什么的都下降了，就是acc一直不动，看了下tester处的代码，correct_list 这个值一直为空，有没有大佬告诉下我是什么原因呢？

我修改了ImageChannel Acc立马就上来了

我能问一下你是更改了那些配置文件嘛？我比较新手

我和你一样更改了imagechannel 这个参数，但是识别率依旧为0 有没有可能是我图片的问题呢？我看主页中现实的图片按个例子，他没有做任何的处理，所以我也是直接没做处理丢进去了，是不是应该降噪什么的处理一下呢？

彩色图片训练过程24小时仍然acc为0.0

图片数据示例(图片为150高，一共900个数据集)：

配置文件(图片为150高，所以设置高度为144，宽度自动)：

火炬版本：1.10.2，cuda版本：10.2，显卡：2060ti
训练测试数据集时间：24分钟成功，自己的数据长达24小时仍是acc为0.0
运行图(长达24小时，loss一直下降，但acc为0.0，彩色图片) ：

多通道训练后，调用ddddocr，报错，怎么解决

INVALID_ARGUMENT : Invalid rank for input: input1 Got: 5 Expected: 4 Please fix either the inputs or the model

训练测试集一段时间报错

在训练模型的时候训练一段时间会出现
UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LST
M can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the mo
del.

按照教程安装的包使用的数据集是测试二的那个数据集。

配置文件如下

Model:
CharSet: [' ', S, '4', F, X, '9', E, Q, V, U, '1', J, R, '5', '7', Z, H, G, P,
A, '2', '6', '8', Y, B, I, L, W, K, T, D, C, '3']
ImageChannel: 1
ImageHeight: 32
ImageWidth: -1
Word: false
System:
Allow_Ext: [jpg, jpeg, png, bmp]
GPU: true
GPU_ID: 0
Path: images
Project: my_test
Val: 0.03
Train:
BATCH_SIZE: 16
CNN: {NAME: ddddocr}
DROPOUT: 0.3
LR: 0.01
OPTIMIZER: SGD
SAVE_CHECKPOINTS_STEP: 2000
TARGET: {Accuracy: 0.97, Cost: 0.05, Epoch: 20}
TEST_BATCH_SIZE: 16
TEST_STEP: 1000

导出时报错：TypeError: export() got an unexpected keyword argument '_retain_param_name'

训练时正常，导出时报错了

环境如下
显卡 Tesla P100
Ubuntu 22.04.1
Python 3.10.6
torch 2.0.0+cu118
torchaudio 2.0.1+cu118
torchvision 0.15.1+cu118
NVIDIA-SMI 525.85.05 Driver Version: 525.85.05 CUDA Version: 12.0

训练过程中，checkpoints有阶段数据，但没有生成onnx的文件

每2000steps 会生成checkpoints的压缩包，
但是没有生成训练后的 onnx文件，请指导一下，该文件如何生成的。

有可以提供参考的训练时长吗

一
OS：MacOS Ventura 13.1
Processor: 2GHz Quad-Core Intel Core i5
Memory: 16 GB 3733MHz LPDDR4X
数据集1700+
目前CPU训练了24x3个小时，Acc一直在0.2～0.3之间徘徊

二
OS: ubuntu 22.04.2 LTS 64-bit
Processor: 12th Gen Intel@ Core i5-12400 x 12
Graphic: Nvidia RTX 3060ti 8g
Ram: 32g
CUDA：12.0

数据集1700+
目前训练了11个小时，Acc一直在0.4～0.6之间徘徊

有大佬可以分享下训练时长吗

windows下转义问题

windows下使用labels.txt进行缓存，cache_data.py中的line_list = file.split('\t')似乎要改为line_list = file.split(r'\t')，否则无法正确识别分隔符。
话说为什么要用\t这么个带歧义的分隔符呢

cpu怎么训练

我修改了config文件中的GPU为false。
执行python app.py train test之后CPU的占用并没有太明显的变化。

清晰空心数字无法识别

如下图，空心文字无法识别，我该如何自己训练识别呢？

学习素材怎么准备

这些是原始素材，有两个问题，一这些素材的答案怎么准备，只能手动来准备吗；二这个是点选类验证码，应该怎么训练，全部一起训练就好了吗

Val Data Number is 0

when I run cmd cache ，it works well in cache.train.tmp, but cache.val.tmp is blank,no content.
Continue to run cmd train, There is a log when read cache.val.tmp "Read Cache File End! Caches Num is 0."
Finally result in this error： ValueError: num_samples should be a positive integer value, but got num_samples=0

请问CPU训练只有阶段数据没有模型数据为何

win11,配置没改过,数据集只有550张很小的图片,100多个小时了还在继续.

acc准确率始终是0

    def tester(self, inputs, labels, labels_length):
        predict = self.get_features(inputs)
        pred_decode_labels = []
        labels_list = []
        correct_list = []
        error_list = []
        i = 0
        labels = [int(x) for x in labels.tolist()]
        # labels = labels.tolist()

这里labels都是浮点数，所以后面的比较

            if label_res == pred_res:
                correct_list.append(ids)
            else:
                error_list.append(ids)

基本上都是false。

随机哈希值

随机哈希值就是任意值的意思吗？没看懂这个随即哈希值啥意思

export() got an unexpected keyword argument '_retain_param_name'

Traceback (most recent call last):
File "D:\dddd_trainer-main\app.py", line 33, in
fire.Fire(App)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 466, in _Fire
component, remaining_args = CallAndUpdateTrace(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 681, in_CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "D:\dddd_trainer-main\app.py", line 28, in train
trainer.start()
File "D:\dddd_trainer-main\utils\train.py", line 152, in start
self.net.export_onnx(self.net, dummy_input,
File "D:\dddd_trainer-main\nets_init.py", line 216, in export_onnx
torch.onnx.export(net, dummy_input, graph_path, export_params=True, verbose=False,
TypeError: export() got an unexpected keyword argument '_retain_param_name'

报了这个错误是怎么回事

CtrlC中断训练后，重新执行训练命令后报错

环境是mac os M1 pro，使用mps进行训练，训练已经顺利生成了超过10个以上的checkpoint，但中断后再重新执行训练命令后，代码中出现以下报错：

 File "/Users/username/opt/anaconda3/envs/OCR_trainer/lib/python3.8/site-packages/torch/serialization.py", line 267, in default_restore_location
    raise RuntimeError("don't know how to restore data location of "
RuntimeError: don't know how to restore data location of torch.storage.UntypedStorage (tagged with mps:0)

请问如何解决

是否支持滑块验证码学习？

抱歉，没搞过python。从网上搜到了这个开源项目确实yyds

这边测试滑块验证码识别率有点低，但是文档上没介绍滑块验证码该怎么归类文件夹

标注？

Mac也是按这个步骤来吗，改成CPU训练就可以了嘛

能放出来同花顺客户端训练的权重吗？

我看你想搞量化，前面我都搞完了，就差同花顺验证码识别，大小写头疼

希望能出一版colab上可以运行的ipynb代码

显卡太差了，希望能出一版colab上可以运行的代码ipynb

断点恢复训练问题

貌似本训练工具能够保存断点的训练，是支持断点恢复训练的吗？

大佬有适配m1 gpu的计划吗

请问训练出现LastLoss：nan AvgLoss:nan，是怎么回事，要怎么处理，谢谢

共122个图，配置如下：
Model:
CharSet: [' ', '8', '2', r, v, h, m, b, '5', w, '7', t, k, '6', y, p, '3', l,
q, x, a, e, f, n, s, '4']
ImageChannel: 1
ImageHeight: 64
ImageWidth: -1
Word: false
System:
Allow_Ext: [jpg, jpeg, png, bmp]
GPU: true
GPU_ID: 0
Path: E:\Val\images_login
Project: mlogin
Val: 0.03
Train:
BATCH_SIZE: 32
CNN: {NAME: ddddocr}
DROPOUT: 0.3
LR: 0.01
OPTIMIZER: SGD
SAVE_CHECKPOINTS_STEP: 2000
TARGET: {Accuracy: 0.97, Cost: 0.05, Epoch: 20}
TEST_BATCH_SIZE: 32
TEST_STEP: 1000

2022-07-12 00:38:15.250 | INFO | utils.train:start:108 - [2022-07-12-00_38_15] Epoch: 140500 Step: 421500 LastLoss: nan AvgLoss: nan Lr: 0.00015268545525806817

模型对比

请问有这些模型的对比数据吗？哪种模型收敛较快，哪种模型效果最好，哪种模型速度更快，随带一问，我想把模型转成Tensorflow模型，然后再迁移到移动端平台，哪种模型合适点？

ddddocr
effnetv2_l,
effnetv2_m,
effnetv2_xl,
effnetv2_s,
mobilenetv2,
mobilenetv3_s,
mobilenetv3_l

另外，ddddocr这么模型是原创吗，还是基于其他模型改的？

将ddddocr换成其他CNN比如mobilenetv2会变成NAN

utils.train:start:110 - [2023-10-07-17_21_35] Epoch: 1 Step: 100 LastLoss: nan AvgLoss: nan Lr: 0.01
2023-10-07 17:21:39.964 | INFO | utils.train:start:110 - [2023-10-07-17_21_39] Epoch: 2 Step: 200 LastLoss: nan AvgLoss: nan Lr: 0.01

训练了一天英文数字验证码，正确率真的感人，是我电脑不行吗....

新模型训练完成后如何应用到目标检测？

新模型训练完成后，用你的示例代码去调用用来目标检测会出现错误，只能用来识别！请问，能把新模型训练的东西，直接用目标检测的方式显示出来吗？

無法分資料夾訓練

我有二組不同的認證碼, 分別如下, 結果會取不到img, 如果訓練完data1, 再去訓練data2, 有可能CharSet會沒包含到data1的值, 請問這要怎麼解決呢

Folder:
img/data1
img/data2

cmd:
python app.py cache test_project img

Result:
2024-07-05 09:24:15.662 | INFO | utils.cache_data:__get_label_from_name:36 -
Files number is 2.
0%| | 0/2 [00:00<?, ?it/s]2024-07-05 09:24:15.664 | WARNING | utils.cache_data:__collect_data:88 -
File(form1) has a suffix that is not allowed! We will remove it!
2024-07-05 09:24:15.665 | WARNING | utils.cache_data:__collect_data:88 -
File(form2) has a suffix that is not allowed! We will remove it!
100%|█████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2708.62it/s]

请问这个训练时间怎么预估？

我试了40个图，有快36小时了

sml2h3 / dddd_trainer Goto Github PK

dddd_trainer's People

Contributors

Stargazers

Watchers

Forkers

dddd_trainer's Issues

ocr = ddddocr.DdddOcr()

Recommend Projects

Recommend Topics

Recommend Org

Jobs