GithubHelp home page GithubHelp logo

mhliao / masktextspotter Goto Github PK

View Code? Open in Web Editor NEW
415.0 415.0 95.0 319 KB

A PyTorch implementation of Mask TextSpotter

Home Page: https://github.com/MhLiao/MaskTextSpotter

Python 90.69% C++ 3.84% Cuda 5.41% Shell 0.06%
scene-text-detection-recognition

masktextspotter's People

Contributors

mhliao avatar wangqiang1588 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

masktextspotter's Issues

core dump in testing

core dump was occured when I runing the test.sh with the download model.
pytorch 1.2 gcc 4.8 python3.6
what's wrong with it? gcc or pytorch' version?

pretrain model loading error

Hi @MhLiao
Thanks for your amazing work,
when I try to load your pretrain model to run the test,
I meet the error below:

 File "tools/test_net.py", line 95, in <module>
    main()
  File "tools/test_net.py", line 64, in main
    _ = checkpointer.load(cfg.MODEL.WEIGHT)
  File "/home/pc/MaskTextSpotter/maskrcnn_benchmark/utils/checkpoint.py", line 62, in load
    self._load_model(checkpoint)
  File "/home/pc/MaskTextSpotter/maskrcnn_benchmark/utils/checkpoint.py", line 98, in _load_model
    load_state_dict(self.model, checkpoint.pop("model"))
  File "/home/pc/MaskTextSpotter/maskrcnn_benchmark/utils/model_serialization.py", line 80, in load_state_dict
    model.load_state_dict(model_state_dict)
  File "/home/pc/.conda/envs/mts/lib/python3.7/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for GeneralizedRCNN:
        size mismatch for roi_heads.box.feature_extractor.fc6.weight: copying a param with shape torch.Size([1024, 12544]) from checkpoint, the shape in current model is torch.Size([1024, 50176]).

I don't remember where I fixed except the torch.bool to torch.uint8 because I use the pytorch1.1,
and in the pretrain.yaml I only change the SOLVER part...
Is there any reason may cause this problem?
Thx!

多语种训练问题

请问对于mlt这种没有character-level的标注,不会像fots算法一样,在前期训练时检测不准确从而导致识别出现问题的情况吗?训练mlt是在预训练模型上微调的吗?

No such file or directory: 'datasets/icdar2013/test_gts/gt_img_224.txt'

作者您好,我在执行sh test.sh的过程中,发生了如下错误。icdar2013数据集是在您给的网盘地址下载的,并且解压到了datasets文件夹下。

Traceback (most recent call last):
File "tools/test_net.py", line 95, in
main()
File "tools/test_net.py", line 89, in main
cfg=cfg,
File "/home/luoyijie/MaskTextSpotter/maskrcnn_benchmark/engine/text_inference.py", line 380, in inference
predictions = compute_on_dataset(model, data_loader, device)
File "/home/luoyijie/MaskTextSpotter/maskrcnn_benchmark/engine/text_inference.py", line 55, in compute_on_dataset
for i, batch in tqdm(enumerate(data_loader)):
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/tqdm/std.py", line 1102, in iter
for obj in iterable:
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next
return self._process_data(data)
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/luoyijie/anaconda3/envs/masktextspotter/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/luoyijie/MaskTextSpotter/maskrcnn_benchmark/data/datasets/icdar.py", line 32, in getitem
words,boxes,charsbbs,segmentations=self.load_gt_from_txt(gt_path,height,width)
File "/home/luoyijie/MaskTextSpotter/maskrcnn_benchmark/data/datasets/icdar.py", line 89, in load_gt_from_txt
lines = open(gt_path).readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/icdar2013/test_gts/gt_img_224.txt'

total_text and scut label

what do the labels of total_text and scut label look like?
The total_text‘s label is same to CTW1500?
and the scut using the char labels, whose each box has 8 value?

thanks a lot

else gts_dir = None

@MhLiao

  • Add a code in paths_catalog.py to check if the test gts dir is available, if not, then skip it. using:
    else gts_dir = None

  • Add a code in test_net.py to delete the ./outputs/*/inference folder, before running the actual inferencing test.

Training stuck

Thank you for your excellent work!
I got stuck trying to train for unknown reasons.

2019-12-05 16:06:33,983 maskrcnn_benchmark.trainer INFO: Start training tensor(0, device='cuda:0') chars_boxes.shape: 0

Volatile GPU-Util is 0%
Please tell me what's wrong with me.Thank you very much

Vertical Text

Will this project work with vertical text if given a vertical text dataset? or would there need to be more changes to get it to work.

how to train from the begin

The pretrain code loaded the trained model to get the model, how to train from the begin without downloaded model?
the loss seems to be very small.

failed to use multi-gpu when testing

i have 8 gpu in my machine but it seems only one is used when testing.

TEST:
CHAR_THRESH: 192
EXPECTED_RESULTS: []
EXPECTED_RESULTS_SIGMA_TOL: 4
IMS_PER_BATCH: 1
VIS: True
2020-03-13 22:28:33,880 maskrcnn_benchmark INFO: Collecting env info (might take some time)
2020-03-13 22:28:38,474 maskrcnn_benchmark INFO:
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti

Nvidia driver version: 440.33.01
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.16.4
[pip3] torch==1.2.0
[pip3] torchvision==0.4.0
[conda] torch 1.4.0 pypi_0 pypi
[conda] torchvision 0.5.0 pypi_0 pypi
Pillow (7.0.0)

Test result info:
2020-03-13 22:28:43,048 maskrcnn_benchmark.inference INFO: Start evaluation on 233 images
233it [01:19, 2.95it/s]
2020-03-13 22:30:02,229 maskrcnn_benchmark.inference INFO: Total inference time: 0:01:19.180692 (0.339831295954823 s / img per device, on 1 devices)
only one device is used?

train Chinese dataset

Thanks to the author for sharing the code, I would like to ask you, for example, if I add 50 Chinese characters, what should I modify?

The network gets stuck while training on icdar2013 dataset

Hi,

The following line link

parts = line.strip().split(',', 8)

should be changed to

parts = line.strip().split(',')

if one wants to train their network on icdar2013. Obviously this will fail the training on icdar2015.

I will update this while I figure out a workaround across this. meanwhile reporting this issue.

The training gets stuck because of this as the segmentation mask will have zero boxes.

Thanks.

CUDA out of memory

@MhLiao Thank you for sharing the code. When I use one k80 (11G) to train the model, I need to set IMS_PER_BATCH: 1, otherwise CUDA out of memory. I want to know which parameters in finetune.yaml should be modified so that batch_size can be bigger without degrading performance.
I look forward to your reply.

test issue with TEST.IMS_PER_BATCH greater than 1

Hi all,
I'm running text spotting on batches of images, with TEST.IMS_PER_BATCH = 16.
Some error raises from function process_char_mask in text_inference.py, that the length of boxes doesn't match char_masks.shape[0]

MaskTextSpotter/maskrcnn_benchmark/engine/text_inference.py", line 232, in process_char_mask
    box = list(boxes[index])
IndexError: index 3 is out of bounds for axis 0 with size 3
def process_char_mask(char_masks, boxes, threshold=192):
    texts, rec_scores, rec_char_scores, char_polygons = [], [], [], []
    for index in range(char_masks.shape[0]):
       ->  box = list(boxes[index])

I try to trace back, but I find it only pick out the first element in the batch as following in text_inference.py,

def compute_on_dataset(model, data_loader, device):
	model.eval()
	results_dict = {}
	cpu_device = torch.device("cpu")
	for i, batch in tqdm(enumerate(data_loader)):
		images, targets, image_paths = batch
		images = images.to(device)
		with torch.no_grad():
			predictions = model(images)
			if predictions is not None:
				global_predictions = predictions[0]
				char_predictions = predictions[1]
				char_mask = char_predictions['char_mask']
				boxes = char_predictions['boxes']
				seq_words = char_predictions['seq_outputs']
				seq_scores = char_predictions['seq_scores']
				detailed_seq_scores = char_predictions['detailed_seq_scores']
				global_predictions = [o.to(cpu_device) for o in global_predictions]
				results_dict.update(
				->	{image_paths[0]: [global_predictions[0], char_mask, boxes, seq_words, seq_scores, detailed_seq_scores]}
				)
	return results_dict

Is it possible to get all results from the predictions of the model? and How could we distinguish the char_mask, boxes, words in the batch for each image.

Thanks!

lexicon search method code

Hi,i'm confused that where the lexicon search method code is, thanks. In the paper, it has improved the edit distance.

Getting stuck when trying to train

@MhLiao Thank you for your hard work,

When trying to train using the Pretrain command, an error is shown invalid device ordinal, then it gets stuck after loading the config. The log is attached here log.txt

My MaskTextSpotter is installed correctly, the sh test.shfunctions properly without any issue.
My system specification: Ubuntu 18.04/ 16GB RAM/ GTX 1070 ti 8GB/ GTX 960 4GB/ ryzen 2600

The training log:

(mask) home@home-desktop:~/p5/MaskTextSpotter$ python3 -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/pretrain.yaml
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp line=37 error=10 : invalid device ordinal
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "tools/train_net.py", line 173, in <module>
  File "tools/train_net.py", line 173, in <module>
  File "tools/train_net.py", line 173, in <module>
Traceback (most recent call last):
  File "tools/train_net.py", line 173, in <module>
    main()
  File "tools/train_net.py", line 140, in main
    main()
  File "tools/train_net.py", line 140, in main
    main()
    torch.cuda.set_device(args.local_rank)
  File "tools/train_net.py", line 140, in main
Traceback (most recent call last):
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
  File "tools/train_net.py", line 173, in <module>
    main()
  File "tools/train_net.py", line 140, in main
    torch._C._cuda_setDevice(device)
    torch.cuda.set_device(args.local_rank)
Traceback (most recent call last):
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
  File "tools/train_net.py", line 173, in <module>
    torch.cuda.set_device(args.local_rank)
    torch.cuda.set_device(args.local_rank)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
    main()
  File "tools/train_net.py", line 140, in main
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
    main()
  File "tools/train_net.py", line 140, in main
    torch._C._cuda_setDevice(device)
    torch.cuda.set_device(args.local_rank)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
    torch._C._cuda_setDevice(device)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
    torch.cuda.set_device(args.local_rank)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/cuda/__init__.py", line 300, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1570710853631/work/torch/csrc/cuda/Module.cpp:37
2019-10-20 23:27:37,465 maskrcnn_benchmark INFO: Using 8 GPUs
2019-10-20 23:27:37,465 maskrcnn_benchmark INFO: Namespace(config_file='configs/pretrain.yaml', distributed=True, local_rank=0, opts=[], skip_test=False)
2019-10-20 23:27:37,466 maskrcnn_benchmark INFO: Collecting env info (might take some time)
2019-10-20 23:27:38,570 maskrcnn_benchmark INFO: 
PyTorch version: 1.3.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Linux Mint 19.2 Tina
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: 
GPU 0: GeForce GTX 1070 Ti
GPU 1: GeForce GTX 960

Nvidia driver version: 418.74
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.17.3
[pip] torch==1.3.0
[pip] torchvision==0.4.1a0+d94043a
[conda] blas                      1.0                         mkl  
[conda] mkl                       2019.4                      243  
[conda] mkl-service               2.3.0            py36he904b0f_0  
[conda] mkl_fft                   1.0.14           py36ha843d7b_0  
[conda] mkl_random                1.1.0            py36hd6b4f25_0  
[conda] pytorch                   1.3.0           py3.6_cuda10.0.130_cudnn7.6.3_0    pytorch
[conda] torchvision               0.4.1                py36_cu100    pytorch
        Pillow (6.2.0)
2019-10-20 23:27:38,570 maskrcnn_benchmark INFO: Loaded configuration file configs/pretrain.yaml
2019-10-20 23:27:38,570 maskrcnn_benchmark INFO: 
MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  # WEIGHT: "./outputs/synth_pretrain_shrink++/model_0270000.pth"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
    OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    PRE_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_TOP_N_TEST: 1000
  ROI_HEADS:
    USE_FPN: True
    BATCH_SIZE_PER_IMAGE: 512
  ROI_BOX_HEAD:
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    POOLER_SAMPLING_RATIO: 2
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
    PREDICTOR: "FPNPredictor"
    NUM_CLASSES: 2
  ROI_MASK_HEAD:
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
    PREDICTOR: "SeqCharMaskRCNNC4Predictor"
    POOLER_RESOLUTION_H: 16
    POOLER_RESOLUTION_W: 64
    POOLER_SAMPLING_RATIO: 2
    RESOLUTION: 28
    RESOLUTION_H: 32
    RESOLUTION_W: 128
    SHARE_BOX_FEATURE_EXTRACTOR: False
    CHAR_NUM_CLASSES: 37
    USE_WEIGHTED_CHAR_MASK: True
    MASK_BATCH_SIZE_PER_IM: 64
  MASK_ON: True
  CHAR_MASK_ON: True
SEQUENCE:
  SEQ_ON: False
  NUM_CHAR: 38
  BOS_TOKEN: 0
  MAX_LENGTH: 32
  TEACHER_FORCE_RATIO: 1.0
  TWO_CONV: True
DATASETS:
  TRAIN: ("icdar_2015_train",)
  TEST: ("icdar_2015_test",)
DATALOADER:
  SIZE_DIVISIBILITY: 32
  NUM_WORKERS: 4
  ASPECT_RATIO_GROUPING: False
SOLVER:
  BASE_LR: 0.01 #0.02
  WARMUP_FACTOR: 0.1
  WEIGHT_DECAY: 0.0001
  STEPS: (100000, 160000)
  MAX_ITER: 300000
  IMS_PER_BATCH: 4
OUTPUT_DIR: "./outputs/pretrain"
TEST:
  VIS: False
  CHAR_THRESH: 192
  IMS_PER_BATCH: 1
INPUT:
  MIN_SIZE_TRAIN: (600, 800)
  MAX_SIZE_TRAIN: 2333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333

2019-10-20 23:27:38,571 maskrcnn_benchmark INFO: Running with config:
DATALOADER:
  ASPECT_RATIO_GROUPING: False
  NUM_WORKERS: 4
  SIZE_DIVISIBILITY: 32
DATASETS:
  AUG: False
  RANDOM_CROP_PROB: 0.0
  RATIOS: []
  TEST: ('icdar_2015_test',)
  TRAIN: ('icdar_2015_train',)
INPUT:
  MAX_SIZE_TEST: 1333
  MAX_SIZE_TRAIN: 2333
  MIN_SIZE_TEST: 800
  MIN_SIZE_TRAIN: (600, 800)
  PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
  PIXEL_STD: [1.0, 1.0, 1.0]
  TO_BGR255: True
MODEL:
  BACKBONE:
    CONV_BODY: R-50-FPN
    FREEZE_CONV_BODY_AT: 2
    OUT_CHANNELS: 256
  CHAR_MASK_ON: True
  DEVICE: cuda
  MASK_ON: True
  META_ARCHITECTURE: GeneralizedRCNN
  RESNETS:
    NUM_GROUPS: 1
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STEM_FUNC: StemWithFixedBatchNorm
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: True
    TRANS_FUNC: BottleneckWithFixedBatchNorm
    WIDTH_PER_GROUP: 64
  ROI_BOX_HEAD:
    FEATURE_EXTRACTOR: FPN2MLPFeatureExtractor
    MLP_HEAD_DIM: 1024
    NUM_CLASSES: 2
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 2
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    PREDICTOR: FPNPredictor
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
    BG_IOU_THRESHOLD: 0.5
    DETECTIONS_PER_IMG: 100
    FG_IOU_THRESHOLD: 0.5
    NMS: 0.5
    POSITIVE_FRACTION: 0.25
    SCORE_THRESH: 0.05
    USE_FPN: True
  ROI_MASK_HEAD:
    CHAR_NUM_CLASSES: 37
    CONV_LAYERS: (256, 256, 256, 256)
    FEATURE_EXTRACTOR: MaskRCNNFPNFeatureExtractor
    MASK_BATCH_SIZE_PER_IM: 64
    MLP_HEAD_DIM: 1024
    POOLER_RESOLUTION: 14
    POOLER_RESOLUTION_H: 16
    POOLER_RESOLUTION_W: 64
    POOLER_SAMPLING_RATIO: 2
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    PREDICTOR: SeqCharMaskRCNNC4Predictor
    RESOLUTION: 28
    RESOLUTION_H: 32
    RESOLUTION_W: 128
    SHARE_BOX_FEATURE_EXTRACTOR: False
    USE_WEIGHTED_CHAR_MASK: True
  RPN:
    ANCHOR_SIZES: (32, 64, 128, 256, 512)
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    ASPECT_RATIOS: (0.5, 1.0, 2.0)
    BATCH_SIZE_PER_IMAGE: 256
    BG_IOU_THRESHOLD: 0.3
    FG_IOU_THRESHOLD: 0.7
    FPN_POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_TOP_N_TRAIN: 2000
    MIN_SIZE: 0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 1000
    PRE_NMS_TOP_N_TRAIN: 2000
    STRADDLE_THRESH: 0
    USE_FPN: True
  RPN_ONLY: False
  WEIGHT: catalog://ImageNetPretrained/MSRA/R-50
OUTPUT_DIR: ./outputs/pretrain
PATHS_CATALOG: /home/home/p5/MaskTextSpotter/maskrcnn_benchmark/config/paths_catalog.py
SEQUENCE:
  BOS_TOKEN: 0
  MAX_LENGTH: 32
  MEAN_SCORE: False
  NUM_CHAR: 38
  SEQ_ON: False
  TEACHER_FORCE_RATIO: 1.0
  TWO_CONV: True
SOLVER:
  BASE_LR: 0.01
  BIAS_LR_FACTOR: 2
  CHECKPOINT_PERIOD: 2500
  GAMMA: 0.1
  IMS_PER_BATCH: 4
  MAX_ITER: 300000
  MOMENTUM: 0.9
  RESUME: True
  STEPS: (100000, 160000)
  USE_ADAM: False
  WARMUP_FACTOR: 0.1
  WARMUP_ITERS: 500
  WARMUP_METHOD: linear
  WEIGHT_DECAY: 0.0001
  WEIGHT_DECAY_BIAS: 0
TEST:
  CHAR_THRESH: 192
  EXPECTED_RESULTS: []
  EXPECTED_RESULTS_SIGMA_TOL: 4
  IMS_PER_BATCH: 1
  VIS: False

detection

How to use only detection? thank you.

Finetune [ValueError: Type mismatch] + [returned non-zero exit status 1.]

@MhLiao

When trying to Finetune using icdar2015, I get returned non-zero exit status 1. error:

python3 -m torch.distributed.launch --nproc_per_node=1 tools/train_net.py --config-file configs/finetune.yaml

The terminal log:

(mask) home@home-desktop:~/p5/MaskTextSpotter$ python3 -m torch.distributed.launch --nproc_per_node=1 tools/train_net.py --config-file configs/finetune.yaml
Traceback (most recent call last):
  File "tools/train_net.py", line 173, in <module>
    main()
  File "tools/train_net.py", line 145, in main
    cfg.merge_from_file(args.config_file)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/yacs/config.py", line 213, in merge_from_file
    self.merge_from_other_cfg(cfg)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
    _merge_a_into_b(cfg_other, self, self, [])
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/yacs/config.py", line 460, in _merge_a_into_b
    _merge_a_into_b(v, b[k], root, key_list + [k])
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/yacs/config.py", line 456, in _merge_a_into_b
    v = _check_and_coerce_cfg_value_type(v, b[k], k, full_key)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/yacs/config.py", line 513, in _check_and_coerce_cfg_value_type
    original_type, replacement_type, original, replacement, full_key
ValueError: Type mismatch (<class 'tuple'> vs. <class 'str'>) with values (() vs. icdar_2015_train) for config key: DATASETS.TRAIN
Traceback (most recent call last):
  File "/home/home/anaconda3/envs/mask/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/distributed/launch.py", line 253, in <module>
    main()
  File "/home/home/anaconda3/envs/mask/lib/python3.6/site-packages/torch/distributed/launch.py", line 249, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/home/anaconda3/envs/mask/bin/python3', '-u', 'tools/train_net.py', '--local_rank=0', '--config-file', 'configs/finetune.yaml']' returned non-zero exit status 1.

Please Help! shape mismatch!

RuntimeError: shape mismatch: value tensor of shape [52, 256, 16, 64] cannot be broadcast to indexing result of shape [52, 256, 16, 16]

ICDAR datsets throw Index out of range error in segmentation_mask.py

Hi @MhLiao,

Thanks so much for providing this excellent code base. My team at the US Geological Survey is using it to recognize character on historical topographic maps. When training, the ICDAR datasets throw an index out of range error in segmentation_mask. If we don't use those datasets (set ratio to zero in the config files), training proceeds normally. We downloaded those datasets from the place you indicate, and have them in the right spot with a train_list.txt file as directed. Please advise us if possible.

Thanks so much,

Sam

Loss Fluctuation

After running for ~1 hour I see the following loss fluctuations. Is this normal? how long until I can see things start to normalize?

Screen Shot 2020-02-19 at 3 08 51 PM

the SCUT datasets

the SCUT datasets which has 1162 images were used in training process , which datasets does the SCUT datasets includes ? Can you display a download link of it ? thanks.

Training datasets format

@MhLiao Hi, Could you tell me the format of training datasets such as ICDAR13,15,syn, when finetuning the model. And the training process need the ground truth of the every character in one word text? Thank you!

中文识别效果问题

我用Mask TextSpotter训练了一个4000类的中文模型(用了4万行私有弯曲中文文本数据),发现序列识别效果还行,字符分割效果较差,是不是因为中文字符的shrink操作给字符分割带来不好的影响,或者是类别多难度大?这个现象是不是正常的?

Pretrain [WARNING:root:NaN or Inf found in input tensor.]

@MhLiao Thank you for your hard work
My config file is attached: pretrain.yaml.zip

When trying to train ICDAR2015 using Pretrain, I keep getting NaN errors.

My training command:

python3 -m torch.distributed.launch --nproc_per_node=1 tools/train_net.py --config-file configs/pretrain.yaml

INFO:maskrcnn_benchmark.trainer:

loss: nan (nan)  loss_classifier: nan (nan)  loss_box_reg: nan (nan)  loss_mask: nan (nan)  loss_char_mask: 0.0000 (0.0000)  loss_seq: nan (nan)  loss_objectness: nan (nan)  loss_rpn_box_reg: nan (nan)

The errors that I keep getting:

WARNING:root:NaN or Inf found in input tensor.

Use detection only

When I finished training the model,how to use only detection. thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.