mhliao / masktextspotter Goto Github PK

View Code? Open in Web Editor NEW

415.0 415.0 95.0 319 KB

A PyTorch implementation of Mask TextSpotter

Home Page: https://github.com/MhLiao/MaskTextSpotter

Python 90.69% C++ 3.84% Cuda 5.41% Shell 0.06%

scene-text-detection-recognition

masktextspotter's People

Contributors

Stargazers

Watchers

Forkers

asa008 banyueqin shengzhang90 gztangde xiaoyubing fendaq ieee820 fightseed cqray1990 vinceyzw challenging6 chadpieere garspace yuckfu ccxu xgmiao wilburd darlwen sunzhuojun wangqiang1588 wind-l iamrishab sunxingxingtf xiesibo enbacoo cosen1024 kapitsa2811 zhuguangqiang dun933 wuxiaolianggit hell-to-heaven larry273 thanujpilli zhengfangwu yifan-zhao kwsp hardsoft2023 ocrworld thaidv96 alperkesen sumail2333 annihilation7 aaristotle garylia tukjet parsonszeng jayveehe miss-bug forlovezed jiyuxuan926 toth-adam tranquangdai nikeliza yanyuliren car-ops 0xcreo rosesor azuredsky davis-love-ai ravichaurasia wangjianyuweg yjdqk anciubotaru dy1998 hoppaq jiangxiluning cv-ip wuhuikx hheracles gogopen zamling sarundel dimplesl euphoriayan leeensub lizeyujack allenyawang maxpark adityakane2001 pradeepmoturi hunt-cat cubicimage elisach xiangrui-li rootzzp doem97 lanfeng4659 xuyanging panghongwei17 shuyansy dark-knight-02 holladiewal aniketgurav initgo xujuan1

masktextspotter's Issues

core dump in testing

core dump was occured when I runing the test.sh with the download model.
pytorch 1.2 gcc 4.8 python3.6
what's wrong with it? gcc or pytorch' version?

pretrain model loading error

Hi @MhLiao ，
Thanks for your amazing work，
when I try to load your pretrain model to run the test,
I meet the error below:

 File "tools/test_net.py", line 95, in <module>
    main()
  File "tools/test_net.py", line 64, in main
    _ = checkpointer.load(cfg.MODEL.WEIGHT)
  File "/home/pc/MaskTextSpotter/maskrcnn_benchmark/utils/checkpoint.py", line 62, in load
    self._load_model(checkpoint)
  File "/home/pc/MaskTextSpotter/maskrcnn_benchmark/utils/checkpoint.py", line 98, in _load_model
    load_state_dict(self.model, checkpoint.pop("model"))
  File "/home/pc/MaskTextSpotter/maskrcnn_benchmark/utils/model_serialization.py", line 80, in load_state_dict
    model.load_state_dict(model_state_dict)
  File "/home/pc/.conda/envs/mts/lib/python3.7/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for GeneralizedRCNN:
        size mismatch for roi_heads.box.feature_extractor.fc6.weight: copying a param with shape torch.Size([1024, 12544]) from checkpoint, the shape in current model is torch.Size([1024, 50176]).

I don't remember where I fixed except the torch.bool to torch.uint8 because I use the pytorch1.1,
and in the pretrain.yaml I only change the SOLVER part...
Is there any reason may cause this problem?
Thx!

how to train(to detect my custom objects),how to make labels?

多语种训练问题

请问对于mlt这种没有character-level的标注，不会像fots算法一样，在前期训练时检测不准确从而导致识别出现问题的情况吗？训练mlt是在预训练模型上微调的吗？

Whether a single GPU can run

运行sh test.sh提示文件不存在

提示错误：FileNotFoundError: [Errno 2] No such file or directory: 'datasets/icdar2013/test_gts/gt_img_1.txt'

No such file or directory: 'datasets/icdar2013/test_gts/gt_img_224.txt'

作者您好，我在执行sh test.sh的过程中，发生了如下错误。icdar2013数据集是在您给的网盘地址下载的，并且解压到了datasets文件夹下。

Traceback (most recent call last):
File "tools/test_net.py", line 95, in
main()
File "tools/test_net.py", line 89, in main
cfg=cfg,
File "/home/luoyijie/MaskTextSpotter/maskrcnn_benchmark/engine/text_inference.py", line 380, in inference
predictions = compute_on_dataset(model, data_loader, device)
File "/home/luoyijie/MaskTextSpotter/maskrcnn_benchmark/engine/text_inference.py", line 55, in compute_on_dataset
for i, batch in tqdm(enumerate(data_loader)):
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/tqdm/std.py", line 1102, in iter
for obj in iterable:
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next
return self._process_data(data)
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/luoyijie/MaskTextSpotter/maskrcnn_benchmark/data/datasets/icdar.py", line 32, in getitem
words,boxes,charsbbs,segmentations=self.load_gt_from_txt(gt_path,height,width)
File "/home/luoyijie/MaskTextSpotter/maskrcnn_benchmark/data/datasets/icdar.py", line 89, in load_gt_from_txt
lines = open(gt_path).readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/icdar2013/test_gts/gt_img_224.txt'

total_text and scut label

what do the labels of total_text and scut label look like?
The total_text‘s label is same to CTW1500?
and the scut using the char labels, whose each box has 8 value?

thanks a lot

else gts_dir = None

@MhLiao

Add a code in paths_catalog.py to check if the test gts dir is available, if not, then skip it. using:
else gts_dir = None
Add a code in test_net.py to delete the ./outputs/*/inference folder, before running the actual inferencing test.

Training stuck

Thank you for your excellent work!
I got stuck trying to train for unknown reasons.

2019-12-05 16:06:33,983 maskrcnn_benchmark.trainer INFO: Start training tensor(0, device='cuda:0') chars_boxes.shape: 0

Volatile GPU-Util is 0%
Please tell me what's wrong with me.Thank you very much

Vertical Text

Will this project work with vertical text if given a vertical text dataset? or would there need to be more changes to get it to work.

How to inference a single image?

Thanks for your great job, maybe we need a demo to inference a single image.

Adding more control parameters for inference

how to train from the begin

The pretrain code loaded the trained model to get the model, how to train from the begin without downloaded model?
the loss seems to be very small.

PAMI论文里面提到训练MLT时，disable character segmentation branch。具体是怎么在config文件里面设置的？

failed to use multi-gpu when testing

i have 8 gpu in my machine but it seems only one is used when testing.

TEST:
CHAR_THRESH: 192
EXPECTED_RESULTS: []
EXPECTED_RESULTS_SIGMA_TOL: 4
IMS_PER_BATCH: 1
VIS: True
2020-03-13 22:28:33,880 maskrcnn_benchmark INFO: Collecting env info (might take some time)
2020-03-13 22:28:38,474 maskrcnn_benchmark INFO:
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti

Nvidia driver version: 440.33.01
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.16.4
[pip3] torch==1.2.0
[pip3] torchvision==0.4.0
[conda] torch 1.4.0 pypi_0 pypi
[conda] torchvision 0.5.0 pypi_0 pypi
Pillow (7.0.0)

Test result info:
2020-03-13 22:28:43,048 maskrcnn_benchmark.inference INFO: Start evaluation on 233 images
233it [01:19, 2.95it/s]
2020-03-13 22:30:02,229 maskrcnn_benchmark.inference INFO: Total inference time: 0:01:19.180692 (0.339831295954823 s / img per device, on 1 devices)
only one device is used?

[ pretrain ] doesn't save the best model

Hi,
When I pretrain using icdar 2013, while training the best model is not saved.
Instead Pretrain is using the default CHECKPOINT_PERIOD which is 2500.
Pretrain should constantly save the best model.

Can you provide the "scut-eng-char_train" dataset for us？

Can you provide the "scut-eng-char_train" dataset for us？ We don't know how to select the 1162 images. Thanks~

train Chinese dataset

Thanks to the author for sharing the code, I would like to ask you, for example, if I add 50 Chinese characters, what should I modify?

Can the model run at 6.7 FPS as written in the paper?

I ran test.sh and the speed was 1.47s/it...

作者你好，请问测试和训练结果什么时候出来？结果表现的如何？

how to get label of segmentations in SynthText

Thank you for your work.
Could you please release the code of pre-processing of SynthText.

Can you release the total-text's model

I want to test the results on total-text, if you release the total-text's model, i will be very grateful.

How to train (to detect my custom objects),how to make labels?

Evaluation code

when will the evaluation code be released?

Typo. It should be 'config_file'

MaskTextSpotter/tools/demo.py

Line 234 in afe2279

parser.add_argument("--config-file", type=str, default='configs/finetune.yaml')

只训练检测分支需要怎么做

The network gets stuck while training on icdar2013 dataset

Hi,

The following line link

parts = line.strip().split(',', 8)

should be changed to

parts = line.strip().split(',')

if one wants to train their network on icdar2013. Obviously this will fail the training on icdar2015.

I will update this while I figure out a workaround across this. meanwhile reporting this issue.

The training gets stuck because of this as the segmentation mask will have zero boxes.

Thanks.

CUDA out of memory

@MhLiao Thank you for sharing the code. When I use one k80 (11G) to train the model, I need to set IMS_PER_BATCH: 1, otherwise CUDA out of memory. I want to know which parameters in finetune.yaml should be modified so that batch_size can be bigger without degrading performance.
I look forward to your reply.

test issue with TEST.IMS_PER_BATCH greater than 1

Hi all,
I'm running text spotting on batches of images, with TEST.IMS_PER_BATCH = 16.
Some error raises from function process_char_mask in text_inference.py, that the length of boxes doesn't match char_masks.shape[0]

MaskTextSpotter/maskrcnn_benchmark/engine/text_inference.py", line 232, in process_char_mask
    box = list(boxes[index])
IndexError: index 3 is out of bounds for axis 0 with size 3

def process_char_mask(char_masks, boxes, threshold=192):
    texts, rec_scores, rec_char_scores, char_polygons = [], [], [], []
    for index in range(char_masks.shape[0]):
       ->  box = list(boxes[index])

I try to trace back, but I find it only pick out the first element in the batch as following in text_inference.py,

def compute_on_dataset(model, data_loader, device):
	model.eval()
	results_dict = {}
	cpu_device = torch.device("cpu")
	for i, batch in tqdm(enumerate(data_loader)):
		images, targets, image_paths = batch
		images = images.to(device)
		with torch.no_grad():
			predictions = model(images)
			if predictions is not None:
				global_predictions = predictions[0]
				char_predictions = predictions[1]
				char_mask = char_predictions['char_mask']
				boxes = char_predictions['boxes']
				seq_words = char_predictions['seq_outputs']
				seq_scores = char_predictions['seq_scores']
				detailed_seq_scores = char_predictions['detailed_seq_scores']
				global_predictions = [o.to(cpu_device) for o in global_predictions]
				results_dict.update(
				->	{image_paths[0]: [global_predictions[0], char_mask, boxes, seq_words, seq_scores, detailed_seq_scores]}
				)
	return results_dict

Is it possible to get all results from the predictions of the model? and How could we distinguish the char_mask, boxes, words in the batch for each image.

Thanks!

lexicon search method code

Hi，i'm confused that where the lexicon search method code is, thanks. In the paper, it has improved the edit distance.

Detectron2

@MhLiao @lvpengyuan Detectron2 was just released,
Do you have plans to upgrade MasTextSpotter to Detectron2?

how to expand to chinese?

is a problem?(mask head will be too heavy that hard to convergence?)

which python version is needed?And I meet the issue of "cuda out of memery" using single gpu GTX2080Ti(11G), what GPU did u use?

Configuration to better detect text-lines

@MhLiao
What configuration do you recommend to better detect text-lines?

Evaluation for Total-Text dataset

Thanks for your excellent work!
Any plans for releasing codes of Total-Text evaluation?

Getting stuck when trying to train

@MhLiao Thank you for your hard work,

When trying to train using the Pretrain command, an error is shown invalid device ordinal, then it gets stuck after loading the config. The log is attached here log.txt

My MaskTextSpotter is installed correctly, the sh test.shfunctions properly without any issue.
My system specification: Ubuntu 18.04/ 16GB RAM/ GTX 1070 ti 8GB/ GTX 960 4GB/ ryzen 2600

The training log:

(mask) home@home-desktop:~/p5/MaskTextSpotter$ python3 -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/pretrain.yaml
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "tools/train_net.py", line 173, in <module>
  File "tools/train_net.py", line 173, in <module>
  File "tools/train_net.py", line 173, in <module>
Traceback (most recent call last):
  File "tools/train_net.py", line 173, in <module>
    main()
  File "tools/train_net.py", line 140, in main
    main()
  File "tools/train_net.py", line 140, in main
    main()
    torch.cuda.set_device(args.local_rank)
  File "tools/train_net.py", line 140, in main
Traceback (most recent call last):
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
  File "tools/train_net.py", line 173, in <module>
    main()
  File "tools/train_net.py", line 140, in main
    torch._C._cuda_setDevice(device)
    torch.cuda.set_device(args.local_rank)
Traceback (most recent call last):
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
  File "tools/train_net.py", line 173, in <module>
    torch.cuda.set_device(args.local_rank)
    torch.cuda.set_device(args.local_rank)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
    main()
  File "tools/train_net.py", line 140, in main
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
    main()
  File "tools/train_net.py", line 140, in main
    torch._C._cuda_setDevice(device)
    torch.cuda.set_device(args.local_rank)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
    torch._C._cuda_setDevice(device)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
    torch.cuda.set_device(args.local_rank)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
2019-10-20 23:27:37,465 maskrcnn_benchmark INFO: Using 8 GPUs
2019-10-20 23:27:37,465 maskrcnn_benchmark INFO: Namespace(config_file='configs/pretrain.yaml', distributed=True, local_rank=0, opts=[], skip_test=False)
2019-10-20 23:27:37,466 maskrcnn_benchmark INFO: Collecting env info (might take some time)
2019-10-20 23:27:38,570 maskrcnn_benchmark INFO: 
PyTorch version: 1.3.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Linux Mint 19.2 Tina
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: 
GPU 0: GeForce GTX 1070 Ti
GPU 1: GeForce GTX 960

Nvidia driver version: 418.74
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.17.3
[pip] torch==1.3.0
[pip] torchvision==0.4.1a0+d94043a
[conda] blas                      1.0                         mkl  
[conda] mkl                       2019.4                      243  
[conda] mkl-service               2.3.0            py36he904b0f_0  
[conda] mkl_fft                   1.0.14           py36ha843d7b_0  
[conda] mkl_random                1.1.0            py36hd6b4f25_0  
[conda] pytorch                   1.3.0           py3.6_cuda10.0.130_cudnn7.6.3_0    pytorch
[conda] torchvision               0.4.1                py36_cu100    pytorch
        Pillow (6.2.0)
2019-10-20 23:27:38,570 maskrcnn_benchmark INFO: Loaded configuration file configs/pretrain.yaml
2019-10-20 23:27:38,570 maskrcnn_benchmark INFO: 
MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  # WEIGHT: "./outputs/synth_pretrain_shrink++/model_0270000.pth"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
    OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    PRE_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_TOP_N_TEST: 1000
  ROI_HEADS:
    USE_FPN: True
    BATCH_SIZE_PER_IMAGE: 512
  ROI_BOX_HEAD:
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    POOLER_SAMPLING_RATIO: 2
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
    PREDICTOR: "FPNPredictor"
    NUM_CLASSES: 2
  ROI_MASK_HEAD:
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
    PREDICTOR: "SeqCharMaskRCNNC4Predictor"
    POOLER_RESOLUTION_H: 16
    POOLER_RESOLUTION_W: 64
    POOLER_SAMPLING_RATIO: 2
    RESOLUTION: 28
    RESOLUTION_H: 32
    RESOLUTION_W: 128
    SHARE_BOX_FEATURE_EXTRACTOR: False
    CHAR_NUM_CLASSES: 37
    USE_WEIGHTED_CHAR_MASK: True
    MASK_BATCH_SIZE_PER_IM: 64
  MASK_ON: True
  CHAR_MASK_ON: True
SEQUENCE:
  SEQ_ON: False
  NUM_CHAR: 38
  BOS_TOKEN: 0
  MAX_LENGTH: 32
  TEACHER_FORCE_RATIO: 1.0
  TWO_CONV: True
DATASETS:
  TRAIN: ("icdar_2015_train",)
  TEST: ("icdar_2015_test",)
DATALOADER:
  SIZE_DIVISIBILITY: 32
  NUM_WORKERS: 4
  ASPECT_RATIO_GROUPING: False
SOLVER:
  BASE_LR: 0.01 #0.02
  WARMUP_FACTOR: 0.1
  WEIGHT_DECAY: 0.0001
  STEPS: (100000, 160000)
  MAX_ITER: 300000
  IMS_PER_BATCH: 4
OUTPUT_DIR: "./outputs/pretrain"
TEST:
  VIS: False
  CHAR_THRESH: 192
  IMS_PER_BATCH: 1
INPUT:
  MIN_SIZE_TRAIN: (600, 800)
  MAX_SIZE_TRAIN: 2333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333

2019-10-20 23:27:38,571 maskrcnn_benchmark INFO: Running with config:
DATALOADER:
  ASPECT_RATIO_GROUPING: False
  NUM_WORKERS: 4
  SIZE_DIVISIBILITY: 32
DATASETS:
  AUG: False
  RANDOM_CROP_PROB: 0.0
  RATIOS: []
  TEST: ('icdar_2015_test',)
  TRAIN: ('icdar_2015_train',)
INPUT:
  MAX_SIZE_TEST: 1333
  MAX_SIZE_TRAIN: 2333
  MIN_SIZE_TEST: 800
  MIN_SIZE_TRAIN: (600, 800)
  PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
  PIXEL_STD: [1.0, 1.0, 1.0]
  TO_BGR255: True
MODEL:
  BACKBONE:
    CONV_BODY: R-50-FPN
    FREEZE_CONV_BODY_AT: 2
    OUT_CHANNELS: 256
  CHAR_MASK_ON: True
  DEVICE: cuda
  MASK_ON: True
  META_ARCHITECTURE: GeneralizedRCNN
  RESNETS:
    NUM_GROUPS: 1
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STEM_FUNC: StemWithFixedBatchNorm
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: True
    TRANS_FUNC: BottleneckWithFixedBatchNorm
    WIDTH_PER_GROUP: 64
  ROI_BOX_HEAD:
    FEATURE_EXTRACTOR: FPN2MLPFeatureExtractor
    MLP_HEAD_DIM: 1024
    NUM_CLASSES: 2
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 2
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    PREDICTOR: FPNPredictor
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
    BG_IOU_THRESHOLD: 0.5
    DETECTIONS_PER_IMG: 100
    FG_IOU_THRESHOLD: 0.5
    NMS: 0.5
    POSITIVE_FRACTION: 0.25
    SCORE_THRESH: 0.05
    USE_FPN: True
  ROI_MASK_HEAD:
    CHAR_NUM_CLASSES: 37
    CONV_LAYERS: (256, 256, 256, 256)
    FEATURE_EXTRACTOR: MaskRCNNFPNFeatureExtractor
    MASK_BATCH_SIZE_PER_IM: 64
    MLP_HEAD_DIM: 1024
    POOLER_RESOLUTION: 14
    POOLER_RESOLUTION_H: 16
    POOLER_RESOLUTION_W: 64
    POOLER_SAMPLING_RATIO: 2
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    PREDICTOR: SeqCharMaskRCNNC4Predictor
    RESOLUTION: 28
    RESOLUTION_H: 32
    RESOLUTION_W: 128
    SHARE_BOX_FEATURE_EXTRACTOR: False
    USE_WEIGHTED_CHAR_MASK: True
  RPN:
    ANCHOR_SIZES: (32, 64, 128, 256, 512)
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    ASPECT_RATIOS: (0.5, 1.0, 2.0)
    BATCH_SIZE_PER_IMAGE: 256
    BG_IOU_THRESHOLD: 0.3
    FG_IOU_THRESHOLD: 0.7
    FPN_POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_TOP_N_TRAIN: 2000
    MIN_SIZE: 0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 1000
    PRE_NMS_TOP_N_TRAIN: 2000
    STRADDLE_THRESH: 0
    USE_FPN: True
  RPN_ONLY: False
  WEIGHT: catalog://ImageNetPretrained/MSRA/R-50
OUTPUT_DIR: ./outputs/pretrain
PATHS_CATALOG: /home/home/p5/MaskTextSpotter/maskrcnn_benchmark/config/paths_catalog.py
SEQUENCE:
  BOS_TOKEN: 0
  MAX_LENGTH: 32
  MEAN_SCORE: False
  NUM_CHAR: 38
  SEQ_ON: False
  TEACHER_FORCE_RATIO: 1.0
  TWO_CONV: True
SOLVER:
  BASE_LR: 0.01
  BIAS_LR_FACTOR: 2
  CHECKPOINT_PERIOD: 2500
  GAMMA: 0.1
  IMS_PER_BATCH: 4
  MAX_ITER: 300000
  MOMENTUM: 0.9
  RESUME: True
  STEPS: (100000, 160000)
  USE_ADAM: False
  WARMUP_FACTOR: 0.1
  WARMUP_ITERS: 500
  WARMUP_METHOD: linear
  WEIGHT_DECAY: 0.0001
  WEIGHT_DECAY_BIAS: 0
TEST:
  CHAR_THRESH: 192
  EXPECTED_RESULTS: []
  EXPECTED_RESULTS_SIGMA_TOL: 4
  IMS_PER_BATCH: 1
  VIS: False

detection

How to use only detection？ thank you.

Finetune [ValueError: Type mismatch] + [returned non-zero exit status 1.]

@MhLiao

When trying to Finetune using icdar2015, I get returned non-zero exit status 1. error:

python3 -m torch.distributed.launch --nproc_per_node=1 tools/train_net.py --config-file configs/finetune.yaml

The terminal log:

(mask) home@home-desktop:~/p5/MaskTextSpotter$ python3 -m torch.distributed.launch --nproc_per_node=1 tools/train_net.py --config-file configs/finetune.yaml
Traceback (most recent call last):
  File "tools/train_net.py", line 173, in <module>
    main()
  File "tools/train_net.py", line 145, in main
    cfg.merge_from_file(args.config_file)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/yacs/config.py", line 213, in merge_from_file
    self.merge_from_other_cfg(cfg)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
    _merge_a_into_b(cfg_other, self, self, [])
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/yacs/config.py", line 460, in _merge_a_into_b
    _merge_a_into_b(v, b[k], root, key_list + [k])
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/yacs/config.py", line 456, in _merge_a_into_b
    v = _check_and_coerce_cfg_value_type(v, b[k], k, full_key)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/yacs/config.py", line 513, in _check_and_coerce_cfg_value_type
    original_type, replacement_type, original, replacement, full_key
ValueError: Type mismatch (<class 'tuple'> vs. <class 'str'>) with values (() vs. icdar_2015_train) for config key: DATASETS.TRAIN
Traceback (most recent call last):
  File "/home/home/anaconda3/envs/mask/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/distributed/launch.py", line 253, in <module>
    main()
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/distributed/launch.py", line 249, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/home/anaconda3/envs/mask/bin/python3', '-u', 'tools/train_net.py', '--local_rank=0', '--config-file', 'configs/finetune.yaml']' returned non-zero exit status 1.

Pretrain [returned non-zero exit status 1.]

@MhLiao

When trying to train icdar2015 using the Pretrain command:

python3 -m torch.distributed.launch --nproc_per_node=1 ./tools/train_net.py --config-file ./configs/pretrain.yaml

I get error:

returned non-zero exit status 1.

My Terminal log:
terminal.txt

The pretrain log is attached:
log.txt

Please Help! shape mismatch!

RuntimeError: shape mismatch: value tensor of shape [52, 256, 16, 64] cannot be broadcast to indexing result of shape [52, 256, 16, 16]

ICDAR datsets throw Index out of range error in segmentation_mask.py

Hi @MhLiao,

Thanks so much for providing this excellent code base. My team at the US Geological Survey is using it to recognize character on historical topographic maps. When training, the ICDAR datasets throw an index out of range error in segmentation_mask. If we don't use those datasets (set ratio to zero in the config files), training proceeds normally. We downloaded those datasets from the place you indicate, and have them in the right spot with a train_list.txt file as directed. Please advise us if possible.

Thanks so much,

Sam

Detect Numbers

@MhLiao If I want to use it to detect just numbers and '.', what should I do? Thanks

module 'torch.distributed' has no attribute 'deprecated'

pip list

toolz 0.10.0
torch 1.2.0
torchvision 0.4.0a0

finetune.yaml

SOLVER:
IMS_PER_BATCH: 1

python -m torch.distributed.launch --nproc_per_node=1 tools/train_net.py --config-file configs/finetune.yaml

My training command:

python3 -m torch.distributed.launch --nproc_per_node=1 tools/train_net.py --config-file configs/pretrain.yaml

INFO:maskrcnn_benchmark.trainer:

loss: nan (nan)  loss_classifier: nan (nan)  loss_box_reg: nan (nan)  loss_mask: nan (nan)  loss_char_mask: 0.0000 (0.0000)  loss_seq: nan (nan)  loss_objectness: nan (nan)  loss_rpn_box_reg: nan (nan)

The errors that I keep getting:

WARNING:root:NaN or Inf found in input tensor.

Use detection only

When I finished training the model,how to use only detection. thanks.

mhliao / masktextspotter Goto Github PK

masktextspotter's People

Contributors

Stargazers

Watchers

Forkers

masktextspotter's Issues

pip list

finetune.yaml

Recommend Projects

Recommend Topics

Recommend Org

Jobs