Comments (11)
@enridaga Yes!
Note: It's not the whatwg
namespace, but the xhtml
namespace.
Actually I've renamed them in my script, see the comments below as to where they appear:
prefix x: <http://www.w3.org/1999/xhtml#> # raw data prefix in HTML->RDF, prepended to any other prefix, eg x:gml:featurecollection, x:plu:indicationreference, x:nilreason, x:xsi:nil
prefix html: <https://html.spec.whatwg.org/#> # html:innerHTML, html:innerText
from sparql.anything.
I mean the complete XML structure with sub-elements, not just its text content
Interesting use case, unfortunately, this can't be done with the XML triplifier. Maybe you can try to pretend it's HTML, I think that triplifier should have an innerHTML
property that in principle should return the GML code. See the documentation for details and let us know if it solves your problem.
from sparql.anything.
@enridaga It kind of works, but not quite. See test files at https://gist.github.com/VladimirAlexiev/318ab4924c756ead618ebdded6428509 and compare
- sparql-anything-test-xml.ttl
- sparql-anything-test-html.ttl
I use these prefixes:
prefix x: <http://www.w3.org/1999/xhtml#> # raw data prefix in HTML->RDF, prepended to any other prefix, eg x:gml:featurecollection, x:plu:indicationreference, x:nilreason, x:xsi:nil
prefix html: <https://html.spec.whatwg.org/#> # html:innerHTML, html:innerText
prefix xlink: <http://www.w3.org/1999/xlink#>
prefix gml: <http://www.opengis.net/gml/3.2#>
prefix plu: <http://inspire.ec.europa.eu/schemas/plu/4.0#>
prefix xyz: <http://sparql.xyz/facade-x/data/> # raw prefix in XML->RDF data
prefix fx: <http://sparql.xyz/facade-x/ns/> # fx:properties in SERVICE call, fx:root in output
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#># each XML element becomes rdf:type
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> # rdfs:member
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix xsi: <http://www.w3.org/2001/XMLSchema-instance#> # xsi:nil "true": filter out such elements. xsi:schemaLocation
Problems:
- all elements are lowercased: both inside innerHTML and in RDF
- all elements in RDF are prepended the
x:
prefix, egx:gml:featurecollection, x:plu:indicationreference, x:xsi:nil
(the originals are gml:FeatureCollection, plu:indicationReference, xsi:nil`
I guess I can live with this, but the script will be completely non-portable between XML and HTML input...
from sparql.anything.
I guess I can live with this, but the script will be completely non-portable between XML and HTML input...
Of course, this is a hack, XML is different from HTML.
We can open issues to support the case better, when using the HTML triplifier with XML content. To summarise, these seems to be the problems:
[ ] xmlns is not considered a ns prefix, and the default whatwg namespace is prepended
[ ] element names are lowercased, while in XML tags are case sensitive
from sparql.anything.
@enridaga Yes!
Note: It's not the
whatwg
namespace, but thexhtml
namespace. Actually I've renamed them in my script, see the comments below as to where they appear:prefix x: <http://www.w3.org/1999/xhtml#> # raw data prefix in HTML->RDF, prepended to any other prefix, eg x:gml:featurecollection, x:plu:indicationreference, x:nilreason, x:xsi:nil prefix html: <https://html.spec.whatwg.org/#> # html:innerHTML, html:innerText
My suspicion is that this depends on the fact that the HTML parser (beatifulsoup) does not handle prefixes. Will investigate.
from sparql.anything.
Related Issues (20)
- Prepare v1.0-DEV
- Document how to extend SPARQL Anything by adding new Triplifiers HOT 1
- How to get comments from DOCX?
- Generate format page automatically HOT 5
- Rename bib module as bibliograpghy HOT 2
- Move Triplifier.getPropertyValues() to PropertyUtils class
- CodeQL action fails HOT 6
- Add SVG among XML extensions
- Wrong dependency sparql-anything-bib HOT 1
- Docker not working in branch v1.0-DEV HOT 5
- `fx:literal` should work for longer lang tags HOT 4
- `whatwg:innerText` is duplicated several times HOT 2
- Update copyright year
- Update branches in workflows
- Skip tests requiring HTTP connections when offline
- Missing information in the RDF format page
- Remove format pages that has been renamed
- Repair broken links in formats/RDF.md
- Sponsoring the project through Patreon
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparql.anything.