GithubHelp home page GithubHelp logo

openfoodfacts / off-nutrition-table-extractor Goto Github PK

View Code? Open in Web Editor NEW
205.0 13.0 86.0 105.61 MB

Important: Please have a look at the higher level issue in Robotoff: openfoodfacts/robotoff#372 This is an old model and we have made progress since then.

License: GNU Affero General Public License v3.0

Python 0.80% Jupyter Notebook 99.15% C++ 0.01% Shell 0.01% Cuda 0.02% Cython 0.03%

off-nutrition-table-extractor's Issues

Recognizing nutrition labels in Korean

Problem

Recognizing nutrition labels in Korean

Proposed solution

If I manage to get a mapped dataset, is it possible to teach the model from this project to recognize it?

Additional context

It seems it's very usual for Korean grocers to publish nutrition data as an image on their sites. I've examined top 5 ones, I can share links if required

image

Develop a better image preprocessing algorithm.

Currently, we are using the following filters before sending the images for OCR:
RGB -> Grayscale -> GaussianBlur -> Grayscale -> RGB
The problem we are facing is that some of the bold text is not been able to detect by OCR. Also, some of the images with non-black backgrounds are undetectable.
You can find the algorithm in the file process.py file under the function name preprocess_for_ocr.

cvtColor error

please tell me how to resolve issue

cv2.error: OpenCV(3.4.3) /io/opencv/modules/imgproc/src/color.cpp:181: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

Executor failed to create kernel. Invalid argument: NodeDef mentions attr 'index_type' not in Op

My system environment is ๏ผš
Python3
Tensorflow : 1.5.0
Ubuntu 16

when I run the project, it throws an error:

2018-08-21 22:41:14.115372: E tensorflow/core/common_runtime/executor.cc:651] Executor failed to create kernel. Invalid argument: NodeDef mentions attr 'index_type' not in Op<name=Fill; signature=dims:int32, value:T -> output:T; attr=T:type>; NodeDef: MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones = Fill[T=DT_INT32, index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/Reshape, Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/range/delta). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.). [[Node: MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones = Fill[T=DT_INT32, index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/Reshape, Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/range/delta)]] 2018-08-21 22:41:14.148748: E tensorflow/core/common_runtime/executor.cc:651] Executor failed to create kernel. Invalid argument: NodeDef mentions attr 'index_type' not in Op<name=Fill; signature=dims:int32, value:T -> output:T; attr=T:type>; NodeDef: MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones = Fill[T=DT_INT32, index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/Reshape, Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/range/delta). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.). [[Node: MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones = Fill[T=DT_INT32, index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/Reshape, Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/range/delta)]] Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1329, in _run_fn status, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'index_type' not in Op<name=Fill; signature=dims:int32, value:T -> output:T; attr=T:type>; NodeDef: MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones = Fill[T=DT_INT32, index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/Reshape, Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/range/delta). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.). [[Node: MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones = Fill[T=DT_INT32, index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/Reshape, Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/range/delta)]]
During handling of the above exception, another exception occurred:
`
Traceback (most recent call last):
File "detection.py", line 110, in
main()
File "detection.py", line 107, in main
print(detect(args.image))
File "detection.py", line 25, in detect
boxes, scores, classes, num = obj.get_classification(image)
File "/home/loop/off-nutrition-table-extractor/detect_table_class.py", line 29, in get_classification
feed_dict={self.image_tensor: img_expanded})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'index_type' not in Op<name=Fill; signature=dims:int32, value:T -> output:T; attr=T:type>; NodeDef: MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones = Fill[T=DT_INT32, index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/Reshape, Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/range/delta). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[Node: MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones = Fill[T=DT_INT32, index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/Reshape, Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/range/delta)]]

Caused by op 'MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones', defined at:
File "detection.py", line 110, in
main()
File "detection.py", line 107, in main
print(detect(args.image))
File "detection.py", line 22, in detect
obj = NutritionTableDetector()
File "/home/loop/off-nutrition-table-extractor/detect_table_class.py", line 14, in init
tf.import_graph_def(od_graph_def, name='')
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 554, in import_graph_def
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1625, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): NodeDef mentions attr 'index_type' not in Op<name=Fill; signature=dims:int32, value:T -> output:T; attr=T:type>; NodeDef: MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones = Fill[T=DT_INT32, index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/Reshape, Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/range/delta). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[Node: MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/ones = Fill[T=DT_INT32, index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MultipleGridAnchorGenerator/Meshgrid/ExpandedShape/Reshape, Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/range/delta)]]

Can someone help me with this @sgrpanchal31

AttributeError: 'NoneType' object has no attribute 'shape'

Demo for test_images/0044000030667_1.jpg
Traceback (most recent call last):
File "detection.py", line 141, in
main()
File "detection.py", line 138, in main
print(detect(args.image, args.debug))
File "detection.py", line 62, in detect
text_blob_list = text_detection(cropped_image)
File "/content/drive/My Drive/STY/off-nutrition-table-extractor/nutrition_extractor/text_detection.py", line 80, in text_detection
img, scale = resize_im(img, scale=TextLineCfg.SCALE, max_scale=TextLineCfg.MAX_SCALE)
File "/content/drive/My Drive/STY/off-nutrition-table-extractor/nutrition_extractor/text_detection.py", line 29, in resize_im
f = float(scale) / min(im.shape[0], im.shape[1])
AttributeError: 'NoneType' object has no attribute 'shape'

Develop a flask or a similar backend server for hosting it on the cloud.

Currently, we are running and testing the pipeline from command line tool using the python command. We want to host the pipeline to a server for real-time inference and detection of nutritional tables.
You can use any python backend of your choice although Flask is preferred due to its simplicity of use.

Load models when the server starts.

When we were developing the pipeline, the models were loading as the pipeline proceeds since it won't have any effect on the total inference time. But when deploying the models, it is better to load them in the RAM when the server starts so that the response time could get reduced and it will also reduce the load on the server.
Keep in mind to load the models in such a way that it won't affect the pipeline when we try to run python detection.py -i [image-path] command.

Improving the spatial mapping algorithm

As the name suggests, this algorithm maps all the text blobs in accordance with their spacial position and append the text with the help of that mapping. During the time of GSoC, I was able to develop the algorithm for a single nutritional values column but failed to extend it for multiple columns.
You can check the code for the algorithm here.

AttributeError: 'NoneType' object has no attribute 'shape'

Instructions for updating:
Use tf.gfile.GFile.
Text Weights Loaded!
Demo for test_images/0044000030667_1.jpg
Traceback (most recent call last):
File "detection.py", line 143, in
main()
File "detection.py", line 140, in main
print(detect(args.image, args.debug))
File "detection.py", line 65, in detect
text_blob_list = text_detection(cropped_image)
File "/content/drive/My Drive/STY/off-nutrition-table-extractor/nutrition_extractor/text_detection.py", line 80, in text_detection
img, scale = resize_im(img, scale=TextLineCfg.SCALE, max_scale=TextLineCfg.MAX_SCALE)
File "/content/drive/My Drive/STY/off-nutrition-table-extractor/nutrition_extractor/text_detection.py", line 29, in resize_im
f = float(scale) / min(im.shape[0], im.shape[1])
AttributeError: 'NoneType' object has no attribute 'shape'

Why I am getting this error? Can anyone help me to resolve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.