Comments (4)
PRImA 2010 schema is too restrictive related to region definition, for that reason we use an slightly modified version developed by Transkribus team. On this version there is a new attribute to define the structure of the document, for example:
At 2010 Prima Schema:
<TextRegion type="page-number" id="region_1489561772109_198">
Is now updated to:
<TextRegion type="page-number" id="region_1489561772109_198" custom="readingOrder {index:3;} structure {type:page-number;}">
This new codification will allow us to define any type of TextRegion we need, not only those ones defined on the PRImA schema.
P2PaLA will search for the "structure" attribute not the "type", The message you got is just a Warning that some region is gonna be ignored becouse the "type" is unknown.
To convert your files to be compatible you can use a simple sed
command:
sed 's:type="\([^ ]\+\)":type="\1" custom="structure {type\:\1;}":g' in_file > out_file
from p2pala.
You have to define which regions (header, paragraph, etc ...) do you want to analyze and the type of each region (TextRegion, ImageRegion, GraphRegion, ....), from the help:
--regions REGIONS [REGIONS ...]
List of regions to be extracted. Format: --regions r1
r2 r3 ... (default: ['$tip', '$par', '$not', '$nop',
'$pag'])
--merge_regions MERGE_REGIONS [MERGE_REGIONS ...]
Merge regions on PAGE file into a single one. Format
--merge_regions r1:r2,r3 r4:r5, then r2 and r3 will be
merged into r1 and r5 into r4 (default: {})
--nontext_regions NONTEXT_REGIONS [NONTEXT_REGIONS ...]
List of regions where no text lines are expected.
Format: --nontext_regions r1 r2 r3 ... (default: None)
--region_type REGION_TYPE [REGION_TYPE ...]
Type of region on PAGE file. Format --region_type
t1:r1,r3 t2:r5, then type t1 will assigned to regions
r1 and r3 and type t2 to r5 and so on... (default:
None)
from p2pala.
I've converted all XML file to Transkribus version and I've got something like:
Element type "header"undefined on color dic, set to default=175
So can you explain how to define type color in color dic.
from p2pala.
Ok, thank you.
from p2pala.
Related Issues (20)
- [config] HOT 1
- evaluation HOT 1
- Different predictions when changing the images order in prod_img_list HOT 1
- Error when run pretrained model HOT 1
- JoseRPrietoF version? HOT 2
- TextLine region HOT 4
- XML generator HOT 5
- [Enhancement] Page-XML extractor HOT 2
- Weights initialisation error
- minor typo in command HOT 1
- error while running the pre trained model in google colab HOT 3
- Baseline + polygon detection of handwriting HOT 4
- Demo not working HOT 1
- Wights used on the Website demo HOT 1
- "Neural Networks based Model" link in docs not working HOT 1
- How to see output polygon drawn on an input image? HOT 4
- Does P2Pala support reading order? HOT 1
- Any chance to see more pre-trained models? HOT 2
- model for zone segmentation HOT 1
- require opencv-python-headless variant HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from p2pala.