GithubHelp home page GithubHelp logo

Comments (10)

stefanklut avatar stefanklut commented on August 26, 2024 1

Can you try and visualize the results using the visualizer.py. That way we can better see what the results are doing, as well as what the GT looks like. Maybe I can give suggestions based on that. Also when you say far from ideal, what sort of issues are you running into

from laypa.

stefanklut avatar stefanklut commented on August 26, 2024 1

Thank you very much for the images. I'm not sure why the visualization tool doesn't work it might be because the docker doesn't have a graphic interface. In that case, run with the --save flag (I think. Check the -h flag to be certain). But it will give similar images to what you have sent.

For the problem of regions being both purple and red. Is there anything that distinguishes these classes except the textual data. If the only way to tell them apart is through the text, laypa alone will not be enough. Since it doesn't actually read the text and does it mainly based on layout. For example could you tell these regions apart when squinting? If not then you'll probably need to do something with the text itself. You can try to combine these region classes and then post processing for example. If they are visually distinct we'll have to look deeper at the problem.

For the problem of the regions not being apart there might be something. First have a look at the GT if a lot of whitespace is labeled as being part of a class. This has proven to be a major reason for why whitespace is incorrectly assigned. But also know that this is just a problem that can occur when pixels are labeled incorrect. If you know that they will all look like this, you should also have a look at instance segmentation. That unfortunately is not completely finished in laypa. But when working with separated blocks of text might work very well.

from laypa.

stefanklut avatar stefanklut commented on August 26, 2024 1

Nice to see that kraken performs so well 👍 Maybe it's more suited to your particular problem.

But I don't think the methods are that dissimilar, so why it performs better is something I'm very interested in. But that's for me to figure out 😄

To try this, do I need to change some parameters in the config file?

Yes, This is done using the PREPROCESS.RESIZE and INPUT.RESIZE parameters and their subparameters

from laypa.

fattynoparents avatar fattynoparents commented on August 26, 2024

Thank you, I will try to use the visualizer.
Here's a simple illustration, below are a couple of examples of perfect region detection:
image
image
If we have some information in the right margin of the page, the post in the middle should be violet (bibliographic post), if however there's nothing in the margins or the information stands in the left margin, the post in the middle should be orange.

Then when we have many posts on one page, the model gets confused and we get plenty of various regions in the middle. Moreover, quite often the model fails to recognize different posts and creates one single region for several posts:
image

EDIT: I have tried to run the visualizer, got the following error:

INPUT.SCALING_TEST is not set, inferring from INPUT.SCALING_TRAIN and PREPROCESS.RESIZE.SCALING to be 0.5
Traceback (most recent call last):
File "/src/laypa/tooling/visualization.py", line 298, in <module> main(args)
File "/src/laypa/tooling/visualization.py", line 211, in main fig_manager.window.showMaximized()
^^^^^^^^^^^^^^^^^
AttributeError: 'FigureManagerBase' object has no attribute 'window'       

from laypa.

fattynoparents avatar fattynoparents commented on August 26, 2024

For the problem of regions being both purple and red. Is there anything that distinguishes these classes except the textual data.

Yes, the main thing here is not the textual data, but that some textual data exists in the margins. The text in the middle should have a violet class if there also exists some data in the right margin, like here:
image
or here:
image
Then when there's nothing in the right margin, it should be orange, like here:
image

The model is actually sometimes quite good at predicting these cases and assigns the various classes correctly, so I guess it somehow can detect if there is something in the margins or not.

First have a look at the GT if a lot of whitespace is labeled as being part of a class. This has proven to be a major reason for why whitespace is incorrectly assigned.

Thanks for the suggestion. How can I understand if a lot of whitespace is labeled as part of a class? Would this f.ex. be such a case?
image

from laypa.

stefanklut avatar stefanklut commented on August 26, 2024

Ok, if that is the context than it seems plausible to me that you can indeed make these predictions without the text information. But what you are doing seems correct, so I can't pinpoint something that will definitely improve the model. I have seen this mixing of regions, but in that case they were only really different in the type of text they contained.

Then when we have many posts on one page, the model gets confused and we get plenty of various regions in the middle. Moreover, quite often the model fails to recognize different posts and creates one single region for several posts:

Just so I'm clear what would the GT for this type of data look like?

Would this f.ex. be such a case?

That is quite a lot of whitespace. But considering it is mostly horizontal, I'm not sure how much impact it will be. Also the boundaries of the box are fairly well defined due to the black border. You could experiment with assigning less whitespace, but considering how much work this might be we can maybe first try something else.

Another idea I had was to change the scale at which the prediction is done. I think you are currently using 1024 for the smallest side? At least that was the default value. You can try to experiment with this value. Or change the resize mode to be scaling. As this will affect the context that the model can take into account

from laypa.

fattynoparents avatar fattynoparents commented on August 26, 2024

Just so I'm clear what would the GT for this type of data look like?

Here's what the GT for that picture looks like:
image

Here's what I get with Laypa:
image

Just as a comparison, here's what I get with kraken (I was curious what it can get me so I tried that project as well)
image

I think you are currently using 1024 for the smallest side? At least that was the default value. You can try to experiment with this value. Or change the resize mode to be scaling.

Thanks for the tips! To try this, do I need to change some parameters in the config file?

from laypa.

fattynoparents avatar fattynoparents commented on August 26, 2024

Hi again, I have now experimented with the RESIZE parameter, which unfortunately didn't influence the result. Will be grateful for any other ideas :)

Some further observations: kraken segmentation performs better in terms of detecting regions, but is quite worse than Laypa in terms of drawing correct baselines.

from laypa.

stefanklut avatar stefanklut commented on August 26, 2024

Thank you for your observations.

I'm not sure what to do next to improve results on your side. Maybe it's possible to combine the results from Kraken and Laypa.

I am gonna try to see if implementing the loss method used in Kraken (multiple BCE loss) might improve the region results of Laypa. However, I'm not sure when I'll have time for this, and when it will be finished (vacation also coming up 😄).

from laypa.

fattynoparents avatar fattynoparents commented on August 26, 2024

Vacation is important :) Thanks for all your help and have a good rest!

from laypa.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.