GithubHelp home page GithubHelp logo

yeungchenwa / ocr-sam Goto Github PK

View Code? Open in Web Editor NEW
510.0 5.0 37.0 60.06 MB

Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting

Python 100.00%
mmocr text-recognition segment-anything text-detection text-inpainting text-removal

ocr-sam's Introduction

Optical Character Recognition with Segment Anything (OCR-SAM)

🐇 Introduction 🐙

Can SAM be applied to OCR? We take a simple try to combine two off-the-shelf OCR models in MMOCR with SAM to develop some OCR-related application demos, including SAM for Text, Text Removal and Text Inpainting. And we also provide a WebUI by gradio to give a better interaction.

📅 Updates 👀

  • 2023.08.23: 🔥 We create a repo yeungchenwa/Recommendations-Diffusion-Text-Image to provide a paper collection of recent diffusion models for text-image generation tasks.
  • 2023.04.14: 📣 Our repository is migrated to open-mmlab/playground.
  • 2023.04.12: Repository Release
  • 2023.04.12: Supported the Inpainting combined with DBNet++, SAM and Stable-Diffusion.
  • 2023.04.11: Supported the Erasing combined with DBNet++, SAM and Latent-Diffusion / Stable-Diffusion.
  • 2023.04.10: Supported the SAM for text combined tieh DBNet++ and SAM.
  • 2023.04.09: How effective is the SAM used on OCR Text Image, we've discussed it in the Blog.

📸 Demo Zoo 🔥

This project includes:

🚧 Installation 🛠️

Prerequisites(Recommended)

  • Linux | Windows
  • Python 3.8
  • Pytorch 1.12
  • CUDA 11.3

Environment Setup

Clone this repo:

git clone https://github.com/yeungchenwa/OCR-SAM.git

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n ocr-sam python=3.8 -y
conda activate ocr-sam

Step 2: Install related version Pytorch following here.

# Suggested
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

Step 3: Install the mmengine, mmcv, mmdet, mmcls, mmocr.

pip install -U openmim
mim install mmengine
mim install mmocr
# In Window, the following symbol ' should be changed to "
mim install 'mmcv==2.0.0rc4'
mim install 'mmdet==3.0.0rc5'
mim install 'mmcls==1.0.0rc5'


# Install sam
pip install git+https://github.com/facebookresearch/segment-anything.git

# Install required packages
pip install -r requirements.txt

Step 4: Prepare for the diffusers and latent-diffusion.

# Install Gradio
pip install gradio

# Install the diffusers
pip install diffusers

# Install the pytorch_lightning for ldm
pip install pytorch-lightning==2.0.1.post0

📒 Model checkpoints 🖥

We retrain DBNet++ with Swin Transformer V2 as the backbone on a combination of multiple scene text datsets (e.g. HierText, TextOCR). Checkpoint for DBNet++ on Google Drive (1G).

And you should make dir following:

mkdir checkpoints
mkdir checkpoints/mmocr
mkdir checkpoints/sam
mkdir checkpoints/ldm
mv db_swin_mix_pretrain.pth checkpoints/mmocr

Download the rest of the checkpoints to the related path (If you've done so, ignore the following):

# mmocr recognizer ckpt
wget -O checkpoints/mmocr/abinet_20e_st-an_mj_20221005_012617-ead8c139.pth https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_20e_st-an_mj/abinet_20e_st-an_mj_20221005_012617-ead8c139.pth

# sam ckpt, more details: https://github.com/facebookresearch/segment-anything#model-checkpoints
wget -O checkpoints/sam/sam_vit_h_4b8939.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

# ldm ckpt
wget -O checkpoints/ldm/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1

🏃🏻‍♂️ Run Demo 🏊‍♂️

SAM for Text🧐

Run the following script:

python mmocr_sam.py \
    --inputs /YOUR/INPUT/IMG_PATH \ 
    --outdir /YOUR/OUTPUT_DIR \ 
    --device cuda \ 
  • --inputs: the path to your input image.
  • --outdir: the dir to your output.
  • --device: the device used for inference.

Erasing🤓

In this application demo, we use the latent-diffusion-inpainting to erase, or the Stable-Diffusion-inpainting with text prompt to erase, which you can choose one of both by the parameter --diffusion_model. Also, you can choose whether to use the SAM output mask to erase by the parameter --use_sam. More implementation details are listed here

Run the following script:

python mmocr_sam_erase.py \ 
    --inputs /YOUR/INPUT/IMG_PATH \ 
    --outdir /YOUR/OUTPUT_DIR \ 
    --device cuda \ 
    --use_sam True \ 
    --dilate_iteration 2 \ 
    --diffusion_model \ 
    --sd_ckpt None \ 
    --prompt None \ 
    --img_size (512, 512) \ 
  • --inputs : the path to your input image.
  • --outdir: the dir to your output.
  • --device: the device used for inference.
  • --use_sam: whether to use sam for segment.
  • --dilate_iteration: iter to dilate the SAM's mask.
  • --diffusion_model: choose 'latent-diffusion' or 'stable-diffusion'.
  • --sd_ckpt: path to the checkpoints of stable-diffusion.
  • --prompt: the text prompt when use the stable-diffusion, set 'None' if use the default for erasing.
  • --img_size: image size of latent-diffusion.

Run the WebUI: see here

Note: The first time you run may cost some time, because downloading the stable-diffusion ckpt costs a lot, so wait patiently👀

Inpainting

More implementation details are listed here

Run the following script:

python mmocr_sam_inpainting.py \
    --img_path /YOUR/INPUT/IMG_PATH \ 
    --outdir /YOUR/OUTPUT_DIR \ 
    --device cuda \ 
    --prompt YOUR_PROMPT \ 
    --select_index 0 \ 
  • --img_path: the path to your input image.
  • --outdir: the dir to your output.
  • --device: the device used for inference.
  • --prompt: the text prompt.
  • --select_index: select the index of the text to inpaint.

Run WebUI

This repo also provides the WebUI(decided by gradio), including the Erasing and Inpainting.

Before running the script, you should install the gradio package:

pip install gradio

Erasing

python mmocr_sam_erase_app.py
  • Example:

Detector and Recognizer WebUI Result

Erasing WebUI Result

In our WebUI, users can interactly choose the SAM output and the diffusion model. Especially, users can choose which text to be erased.

Inpainting🥸

python mmocr_sam_inpainting_app.py
  • Example:

Inpainting WebUI Result

Note: Before you open the web, it may take some time, so wait patiently👀

💗 Acknowledgement

ocr-sam's People

Contributors

flrngel avatar mountchicken avatar yeungchenwa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ocr-sam's Issues

SAM for Text部分问题请教

请问一下,SAM for Text文本检测部分的预训练权重的训练数据可以说一下出处和大概的训练细节吗?
感谢解答,谢谢!

ldm ckpt not available

failed to run this, wget -O checkpoints/ldm/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1 . The error is that no match source in the link mentioned.

'mmcls.SwinTransformerV2 is not in the model registry

Trying to run python ocr_sam.py --args etc but got the error below,

KeyError: "class `DBNet` in mmocr/models/textdet/detectors/dbnet.py: 'mmcls.SwinTransformerV2 is not in the model registry. 
Please check whether the value of `mmcls.SwinTransformerV2` is correct or it was registered as expected. 
More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'"

Could you show me how to solve this issue? Thank you!

About the example images

Wonderful work! But I have a question about your test images, are they part of the datasets that you used for training? I tried some of your images on another LLM and find out superior performance, so I wonder if these images are part of classic datasets and that LLM used them for training.

Argument parsing issue in `mmocr_sam_erase.py`

Hello,

I tried to run the mmocr_sam_erase.py example as instructed in README.md (with latent-diffusion as pasted below).

python mmocr_sam_erase.py \ 
    --inputs /YOUR/INPUT/IMG_PATH \ 
    --outdir /YOUR/OUTPUT_DIR \ 
    --device cuda \ 
    --use_sam True \ 
    --dilate_iteration 2 \ 
    --diffusion_model latent-diffusion \ 
    --sd_ckpt None \ 
    --prompt None \ 
    --img_size (512, 512) \ 

There are two issues:

  1. It seems the following argument parsing does not function as expect. Specifying type=tuple actually read img_size argument as string and evaluate tuple("(512, 512)") which result to a string of characters and causes error in codes later.

OCR-SAM/mmocr_sam_erase.py

Lines 113 to 118 in e50eeb2

parser.add_argument(
"--img_size",
type=tuple,
default=(512, 512),
help='If use latetn-diffusion for erasing, set the ldm-inpainting '
'image size, also if want to use original size, set `None`')

  1. Spaces after backslash \ and the backslash in the last line is not allowed (at least in bash). So copying and pasting instructions from README.md usually does not work. You should probably remove the last backslash and spaces after every backslash (from \ to \).

Using direct download link for DBNet++ checkpoints

How about using a direct download link for the following DBNet++ checkpoints? So that we can paste it into command line to download it without having to have a web browser.

We retrain DBNet++ with Swin Transformer V2 as the backbone on a combination of multiple scene text datsets (e.g. HierText, TextOCR). **Checkpoint for DBNet++ on [Google Drive (1G)](https://drive.google.com/file/d/1r3B1xhkyKYcQ9SR7o9hw9zhNJinRiHD-/view?usp=share_link)**.

Case sensitive?

非常感谢您的分享!
请问本项目中的OCR是case sensitive的吗?

数据集问题

作者,您好!

请问一下,方便告诉我OCR-SAM中Erasing:DBNet++ + SAM + Latent-Diffusion / Stable Diffusion这部分模型预训练权重所使用的训练数据吗?只需要知道出处即可,谢谢您!

About the mmocr_inferencer error

When I run to this line in mmocr_sam.py the NotImplementedError occured.

# MMOCR inference
result = mmocr_inferencer(
    img, save_vis=True, out_dir=args.outdir)['predictions'][0]

The full error is:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[17], line 3
      1 img = cv2.imread(ori_input)
      2 # MMOCR inference
----> 3 result = mmocr_inferencer(
      4     img, save_vis=True, out_dir=args.outdir)['predictions'][0]

File [c:\Users\CHEN\anaconda3\envs\ocr-sam\lib\site-packages\mmocr\apis\inferencers\mmocr_inferencer.py:317](file:///C:/Users/CHEN/anaconda3/envs/ocr-sam/lib/site-packages/mmocr/apis/inferencers/mmocr_inferencer.py:317), in MMOCRInferencer.__call__(self, inputs, batch_size, det_batch_size, rec_batch_size, kie_batch_size, out_dir, return_vis, save_vis, save_pred, **kwargs)
    315 results = {'predictions': [], 'visualization': []}
    316 for ori_input in track(chunked_inputs, description='Inference'):
--> 317     preds = self.forward(
    318         ori_input,
    319         det_batch_size=det_batch_size,
    320         rec_batch_size=rec_batch_size,
    321         kie_batch_size=kie_batch_size,
    322         **forward_kwargs)
    323     visualization = self.visualize(
    324         ori_input, preds, img_out_dir=img_out_dir, **visualize_kwargs)
    325     batch_res = self.postprocess(
    326         preds,
    327         visualization,
    328         pred_out_dir=pred_out_dir,
    329         **postprocess_kwargs)

File [c:\Users\CHEN\anaconda3\envs\ocr-sam\lib\site-packages\mmocr\apis\inferencers\mmocr_inferencer.py:154](file:///C:/Users/CHEN/anaconda3/envs/ocr-sam/lib/site-packages/mmocr/apis/inferencers/mmocr_inferencer.py:154), in MMOCRInferencer.forward(self, inputs, batch_size, det_batch_size, rec_batch_size, kie_batch_size, **forward_kwargs)
    152     result['rec'] = [[p] for p in predictions]
    153 elif self.mode.startswith('det'):  # 'det'/'det_rec'/'det_rec_kie'
--> 154     result['det'] = self.textdet_inferencer(
    155         inputs,
    156         return_datasamples=True,
    157         batch_size=det_batch_size,
    158         **forward_kwargs)['predictions']
    159     if self.mode.startswith('det_rec'):  # 'det_rec'/'det_rec_kie'
    160         result['rec'] = []

File [c:\Users\CHEN\anaconda3\envs\ocr-sam\lib\site-packages\mmocr\apis\inferencers\base_mmocr_inferencer.py:191](file:///C:/Users/CHEN/anaconda3/envs/ocr-sam/lib/site-packages/mmocr/apis/inferencers/base_mmocr_inferencer.py:191), in BaseMMOCRInferencer.__call__(self, inputs, return_datasamples, batch_size, progress_bar, return_vis, show, wait_time, draw_pred, pred_score_thr, out_dir, save_vis, save_pred, print_result, **kwargs)
    188 inputs = self.preprocess(
    189     ori_inputs, batch_size=batch_size, **preprocess_kwargs)
    190 results = {'predictions': [], 'visualization': []}
--> 191 for ori_inputs, data in track(
    192         inputs, description='Inference', disable=not progress_bar):
    193     preds = self.forward(data, **forward_kwargs)
    194     visualization = self.visualize(
    195         ori_inputs, preds, img_out_dir=img_out_dir, **visualize_kwargs)

File [c:\Users\CHEN\anaconda3\envs\ocr-sam\lib\site-packages\rich\progress.py:168](file:///C:/Users/CHEN/anaconda3/envs/ocr-sam/lib/site-packages/rich/progress.py:168), in track(sequence, description, total, auto_refresh, console, transient, get_time, refresh_per_second, style, complete_style, finished_style, pulse_style, update_period, disable, show_speed)
    157 progress = Progress(
    158     *columns,
    159     auto_refresh=auto_refresh,
   (...)
    164     disable=disable,
    165 )
    167 with progress:
--> 168     yield from progress.track(
    169         sequence, total=total, description=description, update_period=update_period
    170     )

File [c:\Users\CHEN\anaconda3\envs\ocr-sam\lib\site-packages\rich\progress.py:1210](file:///C:/Users/CHEN/anaconda3/envs/ocr-sam/lib/site-packages/rich/progress.py:1210), in Progress.track(self, sequence, total, task_id, description, update_period)
   1208 if self.live.auto_refresh:
   1209     with _TrackThread(self, task_id, update_period) as track_thread:
-> 1210         for value in sequence:
   1211             yield value
   1212             track_thread.completed += 1

File [c:\Users\CHEN\anaconda3\envs\ocr-sam\lib\site-packages\mmocr\apis\inferencers\base_mmocr_inferencer.py:80](file:///C:/Users/CHEN/anaconda3/envs/ocr-sam/lib/site-packages/mmocr/apis/inferencers/base_mmocr_inferencer.py:80), in BaseMMOCRInferencer.preprocess(self, inputs, batch_size, **kwargs)
     70 """Process the inputs into a model-feedable format.
     71 
     72 Args:
   (...)
     77     Any: Data processed by the ``pipeline`` and ``collate_fn``.
     78 """
     79 chunked_data = self._get_chunk_data(inputs, batch_size)
---> 80 yield from map(self.collate_fn, chunked_data)

File [c:\Users\CHEN\anaconda3\envs\ocr-sam\lib\site-packages\mmocr\apis\inferencers\base_mmocr_inferencer.py:98](file:///C:/Users/CHEN/anaconda3/envs/ocr-sam/lib/site-packages/mmocr/apis/inferencers/base_mmocr_inferencer.py:98), in BaseMMOCRInferencer._get_chunk_data(self, inputs, chunk_size)
     96 for _ in range(chunk_size):
     97     inputs_ = next(inputs_iter)
---> 98     pipe_out = self.pipeline(inputs_)
     99     if pipe_out['data_samples'].get('img_path') is None:
    100         pipe_out['data_samples'].set_metainfo(
    101             dict(img_path=f'{self.num_unnamed_imgs}.jpg'))

File [c:\Users\CHEN\anaconda3\envs\ocr-sam\lib\site-packages\mmengine\dataset\base_dataset.py:58](file:///C:/Users/CHEN/anaconda3/envs/ocr-sam/lib/site-packages/mmengine/dataset/base_dataset.py:58), in Compose.__call__(self, data)
     49 """Call function to apply transforms sequentially.
     50 
     51 Args:
   (...)
     55    dict: Transformed data.
     56 """
     57 for t in self.transforms:
---> 58     data = t(data)
     59     # The transform will return None when it failed to load images or
     60     # cannot find suitable augmentation parameters to augment the data.
     61     # Here we simply return None if the transform returns None and the
     62     # dataset will handle it by randomly selecting another data sample.
     63     if data is None:

File [c:\Users\CHEN\anaconda3\envs\ocr-sam\lib\site-packages\mmcv\transforms\base.py:12](file:///C:/Users/CHEN/anaconda3/envs/ocr-sam/lib/site-packages/mmcv/transforms/base.py:12), in BaseTransform.__call__(self, results)
      9 def __call__(self,
     10              results: Dict) -> Optional[Union[Dict, Tuple[List, List]]]:
---> 12     return self.transform(results)

File [c:\Users\CHEN\anaconda3\envs\ocr-sam\lib\site-packages\mmocr\datasets\transforms\loading.py:236](file:///C:/Users/CHEN/anaconda3/envs/ocr-sam/lib/site-packages/mmocr/datasets/transforms/loading.py:236), in InferencerLoader.transform(self, single_input)
    234     inputs = single_input
    235 else:
--> 236     raise NotImplementedError
    238 if 'img' in inputs:
    239     return self.from_ndarray(inputs)

NotImplementedError:

请叫部署的问题

如何将模型部署在mac上面,目前尝试这方面工作,将SAM-StableDiffusion+文本检测/文本识别模型部署在mac。

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

errors when loading model

python mmocr_sam_erase_app.py :

missing keys in source state_dict: backbone.stages.0.blocks.0.attn.w_msa.relative_coords_table, backbone.stages.0.blocks.0.attn.w_msa.relative_position_index, backbone.stages.0.blocks.1.attn.w_msa.relative_coords_table, backbone.stages.0.blocks.1.attn.w_msa.relative_position_index, backbone.stages.1.blocks.0.attn.w_msa.relative_coords_table, backbone.stages.1.blocks.0.attn.w_msa.relative_position_index, backbone.stages.1.blocks.1.attn.w_msa.relative_coords_table, backbone.stages.1.blocks.1.attn.w_msa.relative_position_index, backbone.stages.2.blocks.0.attn.w_msa.relative_coords_table, ........

video pipeline

That looks great! Any idea as to have a fast pipeline for video?

The effect is not very good

thanks for your contribute! I've used this image
1
and the output is
0
,there are many characters that were not found.

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

Inference ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Traceback (most recent call last):
File "mmocr_sam_erase.py", line 215, in
result = mmocr_inferencer(
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmocr/apis/inferencers/mmocr_inferencer.py", line 317, in call
preds = self.forward(
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmocr/apis/inferencers/mmocr_inferencer.py", line 154, in forward
result['det'] = self.textdet_inferencer(
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmocr/apis/inferencers/base_mmocr_inferencer.py", line 193, in call
preds = self.forward(data, **forward_kwargs)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmengine/infer/infer.py", line 296, in forward
return self.model.test_step(inputs)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step
return self._run_forward(data, mode='predict') # type: ignore
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 340, in _run_forward
results = self(**data, mode=mode)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmocr/models/textdet/detectors/base.py", line 74, in forward
return self.predict(inputs, data_samples)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmocr/models/textdet/detectors/single_stage_text_detector.py", line 106, in predict
x = self.extract_feat(inputs)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmocr/models/textdet/detectors/single_stage_text_detector.py", line 57, in extract_feat
inputs = self.backbone(inputs)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmcls/models/backbones/swin_transformer_v2.py", line 478, in forward
x, hw_shape = self.patch_embed(x)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/mmcv/cnn/bricks/transformer.py", line 270, in forward
x = self.projection(x)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 457, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/data/home/winderzhang/.conda/envs/ocr-sam/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

when using Inpainting demo there is an error

1759a3ae3d494825b6801711e1c69cd
2925c3878e0c55d0b0849bbd3f2e7de
939360a417a0a62a095f4d94fe3073a

when following the operations:
(ocr-sam) PS D:\Python\text_removal\OCR-SAM> python mmocr_sam_inpainting.py `

--img_path "D:\Python\text_removal\OCR-SAM\image_test\images_in\13196_4.jpg" `
--outdir "D:\Python\text_removal\OCR-SAM\image_test\images_out" `
--device cuda `
--sam_checkpoint "D:\Python\text_removal\OCR-SAM\checkpoints\sam\sam_vit_h_4b8939.pth" `
--prompt "None" `
--select_index 0

there is an error occuring!
Traceback (most recent call last):
File "mmocr_sam_inpainting.py", line 87, in
args.det_config,
AttributeError: 'Namespace' object has no attribute 'det_config'

Difference in performance between mmocr_sam_erase.py and mmocr_sam_erase_app.py

Hello!
First of all, thank you so much for providing us with a good source code.
I was interested in the task of erasing ocr among your projects, of the two '.py' files that erase ocr, the performance of mmocr_sam_erase_app.py using gradio was much better.
This file clears the ocr neatly, while the result of the 'mmocr_sam_erase.py' file feels like it is changing the shape of the letter, rather than erasing it.
Of course, both dilate_iteration and img_size proceeded the same.

Is this a common phenomenon?
I look forward to hearing from you. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.