GithubHelp home page GithubHelp logo

Comments (4)

lquirosd avatar lquirosd commented on June 12, 2024 1

PRImA 2010 schema is too restrictive related to region definition, for that reason we use an slightly modified version developed by Transkribus team. On this version there is a new attribute to define the structure of the document, for example:
At 2010 Prima Schema:
<TextRegion type="page-number" id="region_1489561772109_198">
Is now updated to:
<TextRegion type="page-number" id="region_1489561772109_198" custom="readingOrder {index:3;} structure {type:page-number;}">

This new codification will allow us to define any type of TextRegion we need, not only those ones defined on the PRImA schema.

P2PaLA will search for the "structure" attribute not the "type", The message you got is just a Warning that some region is gonna be ignored becouse the "type" is unknown.

To convert your files to be compatible you can use a simple sed command:

sed 's:type="\([^ ]\+\)":type="\1" custom="structure {type\:\1;}":g' in_file > out_file

from p2pala.

lquirosd avatar lquirosd commented on June 12, 2024 1

You have to define which regions (header, paragraph, etc ...) do you want to analyze and the type of each region (TextRegion, ImageRegion, GraphRegion, ....), from the help:

 --regions REGIONS [REGIONS ...]
                        List of regions to be extracted. Format: --regions r1
                        r2 r3 ... (default: ['$tip', '$par', '$not', '$nop',
                        '$pag'])
  --merge_regions MERGE_REGIONS [MERGE_REGIONS ...]
                        Merge regions on PAGE file into a single one. Format
                        --merge_regions r1:r2,r3 r4:r5, then r2 and r3 will be
                        merged into r1 and r5 into r4 (default: {})
  --nontext_regions NONTEXT_REGIONS [NONTEXT_REGIONS ...]
                        List of regions where no text lines are expected.
                        Format: --nontext_regions r1 r2 r3 ... (default: None)
  --region_type REGION_TYPE [REGION_TYPE ...]
                        Type of region on PAGE file. Format --region_type
                        t1:r1,r3 t2:r5, then type t1 will assigned to regions
                        r1 and r3 and type t2 to r5 and so on... (default:
                        None)

from p2pala.

vndee avatar vndee commented on June 12, 2024

I've converted all XML file to Transkribus version and I've got something like:

Element type "header"undefined on color dic, set to default=175

So can you explain how to define type color in color dic.

from p2pala.

vndee avatar vndee commented on June 12, 2024

Ok, thank you.

from p2pala.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.