GithubHelp home page GithubHelp logo

the-black-knight-01 / tabulo Goto Github PK

View Code? Open in Web Editor NEW
197.0 11.0 40.0 10.87 MB

Table Detection and Extraction Using Deep Learning ( It is built in Python, using Luminoth, TensorFlow<2.0 and Sonnet.)

Home Page: https://interviewbubble.com

License: BSD 3-Clause "New" or "Revised" License

Python 97.88% JavaScript 1.04% CSS 0.72% HTML 0.37%
table-detection-using-deep-learning deep-learning table-detection tensorflow luminoth python detection sonnet tabulo faster-r-cnn

tabulo's Issues

tabulo: command not found

When installing Tabulo, I get below:

Installing collected packages: tabulo
  Attempting uninstall: tabulo
    Found existing installation: tabulo 0.2.4.dev0
    Uninstalling tabulo-0.2.4.dev0:
      Successfully uninstalled tabulo-0.2.4.dev0
  Running setup.py develop for tabulo
Successfully installed tabulo

After this, when running below from my CLI:

tabulo --help

I get below error:

-bash: tabulo: command not found

I am running Python on Mac OS X. What am I doing wrong?

Is there any news on support of newer versions of Tensorflow?

I'm a complete newbie when it comes to data science, but this project would greatly help mine own, but sadly I cannot run it due to the TF versions and as far as I'm aware Apple doesn't support TF1.5. (I'm using an M1 Mac) Any help is greatly appreciated

Error: Missing image

Hi,
I have installed all the packages properly and I have been trying the following curl command from my MacBook terminal:

curl -X POST http://localhost:5000/api/fasterrcnn/predict/ -H 'Content-Type: application/x-www-form-urlencoded' -H 'Postman-Token: 70478bd2-e1e8-442f-b0bf-ea5ecf7bf4d8' -H 'cache-control: no-cache' -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' -F "image=@/Users/rudra/Desktop/page_12.jpg"
But it keeps outputting an error saying: {"error":"Missing image"}

Can anyone please help me solve this? I also tried -F image=@/Users/rudra/Desktop/page_12.jpg but no success. I even tried converting the curl request into a python post type request, but the same error.

Thanks in advance.

Checkpoints not downloading

Upon hitting the command "tabulo checkpoint list" I got the below response

image

I haven't got this checkpoint in the list' 6aac7a1e8a8e' though It is present in the luminoth/checkpoints folder.
I have tried hitting the command with another checkpoint it showed in the list -> server web --checkpoint aad6912e94d9
but it says Checkpoint not present locally and downloading... which is never downloaded.

image

Please help me with this.

#luminoth #checkpoints #tabulo

Thanks.

What training data used

Great work. May I ask what training data you used for the pre-trained table detection model? Thanks!

clarificatons..

Hi ,

I have few questions on tabulo

  1. i am unable to download the checkpoint which is trained on tables i am able to see a cooc dataset faster rcnn and ssd.

  2. how does tabulo work? i mean i dont see any installation of tesseract how is the able to display the characters from a table image?

Regards
Sekar

Text extraction technique

Hi, Great work.! I am more interested on text extraction. Could you guide me which module is being used from the text extraction in the pipeline.

Official Docker Image

Thanks for your project.

An official docker image would be great to get started quickly.

Error: No such command "sever".

Error: No such command "sever". when I try to run the Tabulo service using tabulo sever web --checkpoint 6aac7a1e8a8e after downloading and unzipping the pretrained models into the luminoth/utils/pretrained_models directory.

Pretrained models not showing up checkpoint list

Hi, I'm trying to use the pre-trained model. I've downloaded the google drive folder and placed it in pretrained_models. When I run tabulo checkpoint list I do not see the local checkpoint.

Any idea what is going on?

Unable to predict table

I have tried tabulo predict on one of the pdf images from your sample's but it didn't generate anything, Below is the full message of the output.

Found 1 files to predict.
Neither checkpoint not config specified, assuming accurate.
Checkpoint not present locally. Want to download it? [y/N]: y
Downloading checkpoint...
Importing checkpoint... done.
Checkpoint imported successfully.
2020-02-19 17:16:24.245652: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: AVX AVX2
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2020-02-19 17:16:24.248831: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.
Predicting page_8-min.jpg...OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 13264 thread 0 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 12156 thread 1 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 16980 thread 2 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 17288 thread 3 bound to OS proc set 6
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 22584 thread 4 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 18916 thread 5 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 22892 thread 6 bound to OS proc set 5
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 896 thread 8 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 10468 thread 7 bound to OS proc set 7
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 11156 thread 9 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 20656 thread 10 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 14932 thread 11 bound to OS proc set 6
done.
{"file": "page_8-min.jpg", "objects": []}

Failed to create process

i installed tabulo and when I tried to run tabulo --help , I get an error saying Failed to create process. Can someone help me solve it??

Tensorflow 2.0 Support

I followed the installation process until the following command :

tabulo checkpoint list

I get the following error after executing the command:

Error:
AttributeError: module 'tensorflow' has no attribute 'contrib'

I am guessing its because I am using Tensorflow 2.0 and models dont support it. Is there a quick work around to this ?

slow process

Is it normal that the command tabulo predict takes so much time to process (like 30 seconds by image)?

Text extraction technique

Hi, Great work.! I am more interested on text extraction. Could you guide me which module is being used from the text extraction in the pipeline.

Luminoth archived

Hi guys, past month Luminoth was archived. You have plans to migrate to another toolkit? The project is awesome.

Version conflict while installing click for Tabulo

While running the tabulo --help , I am encountering the following error.
There seem to be a conflict between the two libraries, where each library want some specific version of click. I have downloaded both versions of click, i.e. 6.7 and 7.1.2. but still encountering error.

Traceback (most recent call last):
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 583, in _build_master
    ws.require(__requires__)
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (click 6.7 (/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages), Requirement.parse('click>=7.1.2'), {'Flask'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gurjot/Tabulo/tabenv/bin/tabulo", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3252, in <module>
    def _initialize_master_working_set():
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3235, in _call_aside
    f(*args, **kwargs)
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3264, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 585, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 598, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (click 6.7 (/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages), Requirement.parse('click>=7.1.2'), {'Flask'})

Not able to predict default page_8-min.jpg using Postman

I've downloaded a JPEG into the Tabulo directory (that I've cloned) and tried to run PostMan by importing the curl command onto PostMan
Key as 'image'
Value as '2_resized.jpg'

The error is
{
"error": "Missing image"
}

400 BAD REQUEST Error

bug in predict.py script

Predict function doesn't work in command line but well in the web app interface

You should replace :

Open and read the image to predict.

with tf.gfile.Open(path, 'rb') as f:
    try:
        image = Image.open(f).convert('RGB')
    except (tf.errors.OutOfRangeError, OSError) as e:
        click.echo()
        click.echo('Error while processing {}: {}'.format(path, e))
        return

by :

Open and read the image to predict.

with tf.gfile.Open(path, 'rb') as f:
    try:
        image = Image.open(f).convert('RGB')
        img = np.asarray(image)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        b = cv2.distanceTransform(img, distanceType=cv2.DIST_L2, maskSize=5)
        g = cv2.distanceTransform(img, distanceType=cv2.DIST_L1, maskSize=5)
        r = cv2.distanceTransform(img, distanceType=cv2.DIST_C, maskSize=5)

        # merge the transformed channels back to an image
        transformed_image = cv2.merge((b, g, r))
    except (tf.errors.OutOfRangeError, OSError) as e:
        click.echo()
        click.echo('Error while processing {}: {}'.format(path, e))
        return

to be able to use tabulo predict in command line

resnet_v1_101 checkpoint does not load

I placed the 6aac7a1e8a8e checkpoint dir under Tabulo/luminoth/utils/pretrained_models

and ran from Tabulo:
tabulo server web --checkpoint 6aac7a1e8a8e

and it returned:
Checkpoint not found. Check remote repository? [y/N]: y

Retrieving remote index... done.
No changes in remote index.
Checkpoint isn't available in remote repository either.
Traceback (most recent call last):
File "C:\Users\jay\Anaconda2\envs\py36\Scripts\tabulo-script.py", line 11, in
load_entry_point('tabulo', 'console_scripts', 'tabulo')()
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 722, in call
return self.main(*args, **kwargs)
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 697, in main
rv = self.invoke(ctx)
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 1066, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 535, in invoke
return callback(*args, **kwargs)
File "c:\users\jay\documents\ml_projects\luminoth\luminoth\tools\server\web.py", line 83, in web
config = get_checkpoint_config(checkpoint)
File "c:\users\jay\documents\ml_projects\luminoth\luminoth\tools\checkpoint_init
.py", line 193, in get_checkpoint_config
raise ValueError('Checkpoint not found.')
ValueError: Checkpoint not found.

Additionally, functionality to list existing checkpoints does not work.

I'm running on Windows 10, just cloned the repo this AM.

How to call extract api

Hi I have installed the project and predict api is working but their is no documentation how i can call extract api can you help me with this,

Thanks in advance,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.