GithubHelp home page GithubHelp logo

surya-rs's Introduction

surya-rs

Build Crates.io Version

Rust implementation of surya, a multilingual document OCR toolkit. The implementation is based on a modified version of Segformer and OpenCV.

Please refer to the original project for more details on licensing of the weights.

Roadmap

This project is still in development, feel free to star and check back.

  • model structure, segformer (for inference only)
  • weights loading
  • image input pre-processing
  • heatmap and affinity map
  • bboxes
  • image splitting and stitching
  • text recognition
  • benchmark
  • quantifications

How to build and install

Setup rust toolchain if you haven't yet:

# visit https://rustup.rs/ for more detailed information
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Install llvm and opencv (example on Mac):

brew install llvm opencv

Build and install the binary:

# run this first on Mac if you have a M1 chip
export DYLD_FALLBACK_LIBRARY_PATH="$(xcode-select --print-path)/usr/lib/"
# run this first on other Mac
export DYLD_FALLBACK_LIBRARY_PATH="$(xcode-select --print-path)/Toolchains/XcodeDefault.xctoolchain/"
# optionally you can include features like accelerate, metal, mkl, etc.
cargo install --path . --features=cli

The binary when built does not include the weights file itself, and will instead download via the HuggingFace Hub API. Once downloaded, the weights file will be cached in the HuggingFace cache directory.

Check -h for help:

Surya is a multilingual document OCR toolkit, original implementation in Python and PyTorch

Usage: surya [OPTIONS] <IMAGE>

Arguments:
  <IMAGE>  path to image

Options:
      --batch-size <BATCH_SIZE>
          detection batch size, if not supplied defaults to 2 on CPU and 16 on GPU
      --model-repo <MODEL_REPO>
          detection model's hugging face repo [default: vikp/line_detector]
      --weights-file-name <WEIGHTS_FILE_NAME>
          detection model's weights file name [default: model.safetensors]
      --config-file-name <CONFIG_FILE_NAME>
          detection model's config file name [default: config.json]
      --non-max-suppression-threshold <NON_MAX_SUPPRESSION_THRESHOLD>
          a value between 0.0 and 1.0 to filter low density part of heatmap [default: 0.35]
      --extract-text-threshold <EXTRACT_TEXT_THRESHOLD>
          a value between 0.0 and 1.0 to filter out bbox with low heatmap density [default: 0.6]
      --bbox-area-threshold <BBOX_AREA_THRESHOLD>
          a pixel threshold to filter out small area bbox [default: 10]
      --polygons
          whether to output polygons json file
      --image
          whether to generate bbox image
      --heatmap
          whether to generate heatmap
      --affinity-map
          whether to generate affinity map
      --output-dir <OUTPUT_DIR>
          output directory, under which the input image will be generating a subdirectory [default: ./surya_output]
      --device <DEVICE_TYPE>
          device type, if not specified will try to use GPU or Metal [possible values: cpu, gpu, metal]
      --verbose
          whether to enable verbose mode
  -h, --help
          Print help
  -V, --version
          Print version

You can also use this to control logging level:

export SURYA_LOG=warn # or debug, warn, etc.

Library

This lib is also published as a trait for other rust projects to use.

surya-rs's People

Contributors

jimexist avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.