rdfhdt / hdt-java Goto Github PK
View Code? Open in Web Editor NEWHDT Java library and tools.
License: Other
HDT Java library and tools.
License: Other
I implemented support for CONSTRUCT queries in the hdtsparql
command line tool (#27). However, the CONSTRUCT results are first loaded to an in-memory Model. This causes problems when the results are large and don't fit in memory.
I propose adding a -stream
command line argument to hdtsparql
that would instead output result triples immediately instead of loading them into a Model. The downside is that some triples may be duplicated in the output. I'm planning to do a PR implementing this feature.
Should be:
'last = i - 1;'
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68)
at java.lang.StringBuilder.<init>(StringBuilder.java:89)
at org.rdfhdt.hdt.hdt.impl.TempHDTImporterOnePass$TripleAppender.processTriple(TempHDTImporterOnePass.java:75)
at org.rdfhdt.hdt.rdf.parsers.RDFParserSimple.doParse(RDFParserSimple.java:80)
at org.rdfhdt.hdt.hdt.impl.TempHDTImporterOnePass.loadFromRDF(TempHDTImporterOnePass.java:100)
at org.rdfhdt.hdt.hdt.HDTManagerImpl.doGenerateHDT(HDTManagerImpl.java:103)
at org.rdfhdt.hdt.hdt.HDTManager.generateHDT(HDTManager.java:129)
at org.rdfhdt.hdt.tools.RDF2HDT.execute(RDF2HDT.java:110)
at org.rdfhdt.hdt.tools.RDF2HDT.main(RDF2HDT.java:175)
I get this error when trying to convert a 1GB .nt
graph to HDT using rdf2hdt.sh
. My computer has 8GB of RAM. This is also not the largest graph that I need to convert.
How can I convert large graphs to HDT?? For example, how do you convert Wikidata which is ~200GB?
Looks like hdtsparql.sh
only returns CSV values. Is it possible to return results in another format, for example json or json-ld?
Hi,
I tried next query from command line tool on the dataset DBLP 2017 from http://www.rdfhdt.org/datasets/:
SELECT DISTINCT ?property_type WHERE {?p a ?property_type . ?s ?p ?o .} LIMIT 10
and what i got:
Exception in thread "main" java.lang.IndexOutOfBoundsException
at org.rdfhdt.hdt.compact.sequence.SequenceLog64Map.get(SequenceLog64Map.java:190)
at org.rdfhdt.hdt.triples.impl.PredicateIndexArray.getOccurrence(PredicateIndexArray.java:44)
at org.rdfhdt.hdt.triples.impl.BitmapTriplesIteratorYFOQ.goToStart(BitmapTriplesIteratorYFOQ.java:158)
at org.rdfhdt.hdt.triples.impl.BitmapTriplesIteratorYFOQ.<init>(BitmapTriplesIteratorYFOQ.java:77)
at org.rdfhdt.hdt.triples.impl.BitmapTriples.search(BitmapTriples.java:239)
at org.rdfhdt.hdtjena.solver.StageMatchTripleID.makeNextStage(StageMatchTripleID.java:140)
at org.rdfhdt.hdtjena.solver.StageMatchTripleID.makeNextStage(StageMatchTripleID.java:53)
at org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:48)
at org.rdfhdt.hdtjena.util.IterAbortable.hasNext(IterAbortable.java:62)
at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:303)
at org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:53)
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:58)
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at org.apache.jena.sparql.engine.iterator.QueryIterDistinct.getInputNextUnseen(QueryIterDistinct.java:104)
at org.apache.jena.sparql.engine.iterator.QueryIterDistinct.hasNextBinding(QueryIterDistinct.java:70)
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at org.apache.jena.sparql.engine.iterator.QueryIterSlice.hasNextBinding(QueryIterSlice.java:76)
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74)
at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59)
at org.apache.jena.sparql.resultset.CSVOutput.format(CSVOutput.java:81)
at org.apache.jena.query.ResultSetFormatter.outputAsCSV(ResultSetFormatter.java:624)
at org.rdfhdt.hdtjena.cmd.HDTSparql.execute(HDTSparql.java:66)
at org.rdfhdt.hdtjena.cmd.HDTSparql.main(HDTSparql.java:131)
I have tried on other HDT files generated my me and still happening.
Regards
Hi,
I got the following error message when I build the current version with Fuseki 2.3.1
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.1:compile (default-compile) on project hdt-fuseki: Compilation failure: Compilation failure:
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[21,36] error: package org.apache.jena.fuseki does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[21,0] error: static import only from classes and interfaces
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[31,29] error: package org.apache.jena.fuseki does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[32,33] error: package org.apache.jena.fuseki.mgt does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[33,36] error: package org.apache.jena.fuseki.server does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[34,36] error: package org.apache.jena.fuseki.server does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[35,36] error: package org.apache.jena.fuseki.server does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[40,31] error: package org.eclipse.jetty.server does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[46,14] error: package arq.cmd does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[47,18] error: cannot find symbol
[ERROR]
[ERROR] package arq.cmdline
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[51,28] error: package com.hp.hpl.jena.graph does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[52,28] error: package com.hp.hpl.jena.query does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[53,28] error: package com.hp.hpl.jena.query does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[54,34] error: package com.hp.hpl.jena.sparql.core does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[55,34] error: package com.hp.hpl.jena.sparql.core does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[56,34] error: package com.hp.hpl.jena.sparql.core does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[57,26] error: package com.hp.hpl.jena.tdb does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[58,26] error: package com.hp.hpl.jena.tdb does not exist
[ERROR]
[ERROR] /home/fug2/hdt-java/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[59,38] error: package com.hp.hpl.jena.tdb.transaction does not exist
[ERROR]
I have recently added an HDTProcessor in Luzzu, however, I am getting a GC overhead limit exceeded when parsing the HDT version of DBpedia (the index was used as well):
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Long.valueOf(Long.java:840)
at pl.edu.icm.jlargearrays.LongLargeArray.get(LongLargeArray.java:148)
at org.rdfhdt.hdt.compact.sequence.SequenceLog64Big.getField(SequenceLog64Big.java:129)
at org.rdfhdt.hdt.compact.sequence.SequenceLog64Big.get(SequenceLog64Big.java:239)
at org.rdfhdt.hdt.dictionary.impl.section.PFCDictionarySectionBig.extract(PFCDictionarySectionBig.java:344)
at org.rdfhdt.hdt.dictionary.impl.BaseDictionary.idToString(BaseDictionary.java:219)
at org.rdfhdt.hdtjena.NodeDictionary.getNode(NodeDictionary.java:114)
at io.github.luzzu.io.impl.HDTProcessor.startProcessing(HDTProcessor.java:95)
at io.github.luzzu.io.AbstractIOProcessor.processorWorkFlow(AbstractIOProcessor.java:134)
at io.github.luzzu.communications.resources.v4.AssessmentResource$1.call(AssessmentResource.java:210)
at io.github.luzzu.communications.resources.v4.AssessmentResource$1.call(AssessmentResource.java:206)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I implemented the processor [1] in a streaming fashion, and the error seems to be in this line: Node o = this.nodeDictionary.getNode(this.hdtDictionary.stringToId(triple.getObject(), TripleComponentRole.OBJECT), TripleComponentRole.OBJECT);
Do you know what might have triggered that exception?
Thanks!
Jeremy
[1] https://gist.github.com/jerdeb/60ce2a8c07413c0a3b6c816124590e57
hdt-java currently uses Jena 3.0.1, which was released in December 2015. Jena 3.2.0 was just released so hdt-java is now three Jena releases behind. There has been a lot of work on Jena, including many fixes to the SPARQL query engine.
I think hdt-java should be upgraded to the newest Jena release. Jena APIs are pretty stable so I don't expect this to be difficult, though I haven't tried (yet).
Hi,
When I convert a hdt file using hdt-cpp tool hdt2rdf
, the generated nt file cannot be read by hdt-mr.
I got the following error:
Error: java.lang.IllegalArgumentException: Unescaped backslash in: "buttpark 63\09-92"@en
at org.rdfhdt.hdt.util.UnicodeEscape.unescapeString(UnicodeEscape.java:225)
at org.rdfhdt.hdt.triples.TripleString.read(TripleString.java:217)
at org.rdfhdt.mrbuilder.dictionary.DictionarySamplerMapper.map(DictionarySamplerMapper.java:40)
at org.rdfhdt.mrbuilder.dictionary.DictionarySamplerMapper.map(DictionarySamplerMapper.java:33)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
shall we make less strict rule in the class org.rdfhdt.hdt.util.UnicodeEscape
?
Best,
Gang
I noticed that the hdt-java tools cannot handle empty HDT files i.e. files with zero triples.
Trying to generate a HDT file based on an empty N-Triple file fails:
$ touch empty.nt # create an empty N-Triples file
$ rdf2hdt.sh empty.nt empty.hdt
Converting empty.nt to empty.hdt as null
Exception in thread "main" java.lang.IllegalArgumentException: Adjacency list bitmap and array should have the same size
at org.rdfhdt.hdt.compact.bitmap.AdjacencyList.<init>(AdjacencyList.java:50)
at org.rdfhdt.hdt.triples.impl.BitmapTriples.load(BitmapTriples.java:207)
at org.rdfhdt.hdt.triples.impl.BitmapTriples.load(BitmapTriples.java:224)
at org.rdfhdt.hdt.hdt.impl.HDTImpl.loadFromModifiableHDT(HDTImpl.java:377)
at org.rdfhdt.hdt.hdt.HDTManagerImpl.doGenerateHDT(HDTManagerImpl.java:107)
at org.rdfhdt.hdt.hdt.HDTManager.generateHDT(HDTManager.java:129)
at org.rdfhdt.hdt.tools.RDF2HDT.execute(RDF2HDT.java:106)
at org.rdfhdt.hdt.tools.RDF2HDT.main(RDF2HDT.java:167)
Another way of triggering the same exception is to generate the zero-triple HDT file using hdt-cpp (which works) and then attempt to use it using hdtsparql.sh:
$ touch empty.nt # create an empty N-Triples file
$ rdf2hdt empty.nt empty.hdt # make a HDT file out of it using rdf2hdt from the hdt-cpp suite
$ hdtsparql.sh empty.hdt "select * {?s ?p ?o}"
Exception in thread "main" java.lang.IllegalArgumentException: Adjacency list bitmap and array should have the same size
at org.rdfhdt.hdt.compact.bitmap.AdjacencyList.<init>(AdjacencyList.java:50)
at org.rdfhdt.hdt.triples.impl.BitmapTriples.mapFromFile(BitmapTriples.java:372)
at org.rdfhdt.hdt.hdt.impl.HDTImpl.mapFromHDT(HDTImpl.java:260)
at org.rdfhdt.hdt.hdt.HDTManagerImpl.doMapIndexedHDT(HDTManagerImpl.java:62)
at org.rdfhdt.hdt.hdt.HDTManager.mapIndexedHDT(HDTManager.java:93)
at org.rdfhdt.hdtjena.cmd.HDTSparql.main(HDTSparql.java:38)
While one can argue about the usefulness of empty (i.e. zero triples) HDT files, I don't think this special case should trigger an exception. I noticed this while writing unit tests for my application; the tests exercise some special situations, and one of them happens to generate an empty NT file which will then be converted to HDT and queried using hdtsparql.sh.
I have used the HDT package for a while using my own fork but it would be great if the following could be implemented if not already present:
When a query is executed I would like to have an option to return the results like:
ResultSet results = qe.execSelect();
return results;
Such that it can be directly incorporated into other programs. Or is this already possible? #
I think this should do it?
String query = createQueryFromFile("queries/" + queryFile,args).getQuery().toString();
long millis = System.currentTimeMillis();
queryFile = "queries/" + queryFile;
// long millis = System.currentTimeMillis();
ResultSet result = null;
HDTGraph graph = new HDTGraph(hdtFile);
Model model = ModelFactory.createModelForGraph(graph);
QueryExecution qe = QueryExecutionFactory.create(query,model);
result = qe.execSelect();
ResultIteratorRaw walker = new ResultIteratorRaw(result);// new
// Iteration<HashMap<String,RDFNode>>
LinkedList<HashMap<String, RDFNode>> res = new LinkedList<HashMap<String, RDFNode>>();
Hi,
I'm following this guide http://www.rdfhdt.org/manual-of-hdt-integration-with-jena/#fuseki
to run a Fuseki server on top of HDT files. It works fine, but I have to move my config file
around between several instances, and some graphs (HDT files) might not be available on some
instances. Right now, if one HDT file is not available, Fuseki blows up with a "File not found"
exception and the entire endpoint is unusable because of 1 missing file. Would it be possile,
please..., to add a flag to simply ignore any HDT graph that has been defined in
the config file, but the corresponding HDT is missing?
Thank you very much!
I just downloaded the hdt file from http://lod-a-lot.lod.labs.vu.nl/data/LOD_a_lot_v1.hdt
Reading the file the following error appears:
1936 [main] ERROR org.rdfhdt.hdtjena.bindings.BindingHDTNode - get1(?o)
java.lang.NegativeArraySizeException
at org.rdfhdt.hdtjena.cache.DictionaryCacheArray.put(DictionaryCacheArray.java:63)
at org.rdfhdt.hdtjena.NodeDictionary.getNode(NodeDictionary.java:127)
at org.rdfhdt.hdtjena.NodeDictionary.getNode(NodeDictionary.java:110)
at org.rdfhdt.hdtjena.bindings.BindingHDTNode.get1(BindingHDTNode.java:115)
at org.apache.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:121)
at org.rdfhdt.hdtjena.bindings.BindingHDTNode.format(BindingHDTNode.java:133)
at org.apache.jena.sparql.engine.binding.BindingBase.format1(BindingBase.java:163)
at org.apache.jena.sparql.engine.binding.BindingBase.toString(BindingBase.java:138)
at org.apache.jena.sparql.core.ResultBinding.toString(ResultBinding.java:91)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at TestHDT.main(TestHDT.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
java.lang.NullPointerException
at org.rdfhdt.hdtjena.bindings.BindingHDTNode.format(BindingHDTNode.java:135)
at org.apache.jena.sparql.engine.binding.BindingBase.format1(BindingBase.java:163)
at org.apache.jena.sparql.engine.binding.BindingBase.toString(BindingBase.java:138)
at org.apache.jena.sparql.core.ResultBinding.toString(ResultBinding.java:91)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at TestHDT.main(TestHDT.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Also the Jena Model.size() is a negative number.
My code:
public class TestHDT {
public static void main(String[] args) throws IOException {
File file = new File("LOD_a_lot_v1.hdt");
HDT hdt = null;
try {
hdt = HDTManager.mapHDT(file.getAbsolutePath(), null);
HDTGraph graph = new HDTGraph(hdt);
Model model = new ModelCom(graph);
String sparql = "select * where {?s ?p ?o} limit 10";
Query query = QueryFactory.create(sparql);
QueryExecution qe = QueryExecutionFactory.create(query, model);
ResultSet results = qe.execSelect();
String csvName = "unidomains.csv";
int count = 0;
System.out.println("Model.size(): " + results.getResourceModel().size());
while (results.hasNext()) {
QuerySolution thisRow = results.next();
System.out.println("Row " + (++count) + ": " + thisRow);
}
qe.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
if (hdt != null) {
hdt.close();
}
}
}
}
I think I've been able to setup a Fuseki endpoint with HDT files. I'm wondering, though, if it's possible to setup a "union graph"? According to HDT documentation, I can choose a default graph like this ja:defaultGraph <#graph-name> ;
, but there is no mention of any "union" graph as the union of all HDT files. Any idea? Or do I have to create an additional HDT file containing the same content of all other HDT files, and use this as ja:defaultGraph
?
Cross posting here since it's probably more relevant: rdfhdt/hdt-cpp#142
Hi all,
I managed to convert DBpedia language versions to HDT with the CPP develop branch, see e.g. here:
http://downloads.dbpedia.org/2016-10/tmp/data/ja/
using this commit: rdfhdt/hdt-cpp@b0bb661
The Ntriples are in unicode, which is fine according to the 1.1 spec. However, the unicode does not seem supported, below using the japanese version.
So I wrote this with CPP and then read it with Java. Not sure where the incompatibility is.
http://wikidata.dbpedia.org/resource/Q11178088 http://xmlns.com/foaf/0.1/name "������������"@ja
http://wikidata.dbpedia.org/resource/Q11178088 http://xmlns.com/foaf/0.1/name "������������������"@ja
http://wikidata.dbpedia.org/resource/Q11178276 http://dbpedia.org/ontology/address "������������������"@ja
Hi
NegativeArraySizeException gets thrown in org.rdfhdt.hdt.util.io.IOUtil.readBuffer while trying to load freebase hdt from http://www.rdfhdt.org/datasets/.
Any ideas?
All the best,
Leon
Compiled the HDT-Java library to transform a set of N3 and RDF/XML files to HDT.
When launching the script I get an error:
/hdt-java/hdt-java-package/target/hdt-java-package-2.0-distribution/hdt-java-package-2.0/bin# ./rdf2hdt.sh -rdftype n3 <n3 file> <hdt file>
Converting <n3file> to <hdt> as n3
Exception in thread "main" org.rdfhdt.hdt.exceptions.ParserException
at org.rdfhdt.hdt.rdf.parsers.RDFParserRIOT.doParse(RDFParserRIOT.java:89)
at org.rdfhdt.hdt.hdt.impl.TempHDTImporterOnePass.loadFromRDF(TempHDTImporterOnePass.java:100)
at org.rdfhdt.hdt.hdt.HDTManagerImpl.doGenerateHDT(HDTManagerImpl.java:103)
at org.rdfhdt.hdt.hdt.HDTManager.generateHDT(HDTManager.java:129)
at org.rdfhdt.hdt.tools.RDF2HDT.execute(RDF2HDT.java:106)
at org.rdfhdt.hdt.tools.RDF2HDT.main(RDF2HDT.java:167)
Tried all shell scripts available after the maven install, the error still exists.
The raw files can be imported in Virtuoso without any errors.
How do I process non-NT files using java lib?
http://www.rdfhdt.org/manual-of-the-java-hdt-library/#download points to the (archived) google code repo - I assume it should point here
I am loading a dataset made of a graph indexed with lucene (graphb) and another graph (grapha).
I tried to use an HDT file to replace grapha (grapha_hdt).
But as soon as I load the grapha_hdt, Lucene queries stop working on graphb.
Even if I create a HDT dataset on another service, Lucene queries stops working.
As soon as I remove the HDT-only dataset or the HDT-graph from my combined dataset, Lucene queries work again.
I am using Fuseki 3.8.0 (and updated pom.xml accordingly).
There are no errors logged and I can query the HDT graph properly, as well as the Lucene-backed graph.
I'm guessing that HDTgraph doesn't really like to be in a text dataset. But it doesn't work even when it is in a separate service.
I don't know how I could query my grapha and graphb differently (I use both in my queries, looking for things from grapha into graphb)
@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb2: <http://jena.apache.org/2016/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> .
@prefix hdt: <http://www.rdfhdt.org/fuseki#> .
hdt:HDTGraph rdfs:subClassOf ja:Graph .
tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
[] ja:loadClass "org.rdfhdt.hdtjena.HDTGraphAssembler" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
<#mixed> rdf:type fuseki:Service ;
rdfs:label "mixed" ;
fuseki:name "mixed" ;
fuseki:serviceQuery "query" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:dataset :indexed_textset ;
.
:indexed_textset rdf:type text:TextDataset ;
text:dataset :mixed_dataset ;
text:index <#indexLucene> ;
tdb2:unionDefaultGraph true ;
.
:grapha a tdb2:GraphTDB2;
tdb2:location "DB1" .
:grapha_hdt rdfs:label "RDF Graph1 from HDT file" ;
rdf:type hdt:HDTGraph ;
hdt:fileName "grapha.hdt" .
:graphb a tdb2:GraphTDB2;
tdb2:location "DB2" .
:mixed_dataset a ja:RDFDataset;
ja:namedGraph
[ ja:graphName <http://grapha>;
# Switching to grapha make it work
# ja:graph :grapha ; ];
ja:graph :grapha_hdt ; ];
ja:namedGraph
[ ja:graphName <http://graphb>;
ja:graph :graphb ; ];
tdb2:unionDefaultGraph true;
.
# Text index description
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:DB2_index> ;
text:entityMap <#entMap> ;
text:storeValues true ;
text:analyzer [ a text:StandardAnalyzer ] ;
text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
text:queryParser text:AnalyzingQueryParser ;
text:multilingualSupport true ;
.
<#entMap> a text:EntityMap ;
text:defaultField "label" ;
text:entityField "uri" ;
text:uidField "uid" ;
text:langField "lang" ;
text:graphField "graph" ;
text:map (
[ text:field "label" ;
text:predicate rdfs:label ]
) .
When I run hdt2rdf.sh
without parameters I get this:
Usage: hdt2rdf [options] <input RDF> <output HDT>
Options:
-version
Prints the HDT version number
Default: false
The first line is clearly wrong, it should be <input HDT> <output RDF>
However the tool seems to be limited to N-Triples output, I think the usage message could state this as well as it wasn't obvious without looking at the source code.
I noticed problems with both hdtSearch (cpp) and hdtSearch.sh (java). Both apparently give incorrect results in some cases when looking for a specific literal value. I have reported the problems with the cpp version in a separate issue.
My test dataset is this NT file with only 3 triples:
<http://example.org/000046085> <http://schema.org/name> "Raamattu" .
<http://example.org/000146854> <http://schema.org/name> "Ajan lyhyt historia" .
<http://example.org/000019643> <http://schema.org/name> "Seitsemän veljestä" .
I converted it to HDT using the Java version of rdf2hdt. Then I query it for the literal values using hdtSearch.sh:
$ rdf2hdt.sh hdt-test.nt hdt-test.hdt
Converting hdt-test.nt to hdt-test.hdt as null
File converted in: 47 ms 744 us 0.0
Total Triples: 3
Different subjects: 3
Different predicates: 1
Different objects: 3
Common Subject/Object:0
HDT saved to file in: 3 ms 441 us
$ hdtSearch.sh hdt-test.hdt
Could not read .hdt.index, Generating a new one.
Predicate Bitmap in 319 us
Count predicates in 26 us
Count Objects in 41 us Max was: 1
Bitmap in 23 us
Object references in 39 us
Sort object sublists in 8 us
Count predicates in 17 us
Index generated in 314 us
>> ? ? "Raamattu"
Query: |?| |?| |"Raamattu"|
http://example.org/000046085 http://schema.org/name "Raamattu"
Iterated 1 triples in 7 ms 384 us
>> ? ? "Ajan lyhyt historia"
Query: |?| |?| |"Ajan lyhyt historia"|
http://example.org/000146854 http://schema.org/name "Ajan lyhyt historia"
Iterated 1 triples in 261 us
>> ? ? "Seitsemän veljestä"
Query: |?| |?| |"Seitsemän veljestä"|
No results found.
As you can see from the above output, the first and second queries (for "Raamattu" and "Ajan lyhyt historia") give the correct result, but the last one gives zero results even though it should match one triple in the data.
Likewise, if I start up Fuseki using hdtEndpoint.sh
and perform this SPARQL query:
SELECT * { ?s ?p "Seitsemän veljestä" }
I get no results, but similar queries for the two other literal values do give the correct result. I tried this query both directly via the Fuseki UI and via YASGUI.org, just in case there would be some problem with character encodings. The query appears as it should in the Fuseki log/console, there are no obvious encoding problems.
I'm not sure whether the problem is in the HDT generation, index file generation, or querying.
Apparently there is a bug in the format synchronization between HDT-JAVA and HDT-CPP. One HDT file created in HDT-Java cannot be loaded in HDT-CPP (the inverse is working fine).
I was trying query the LOD-a-lot dataset. The dataset is so big that int's are not sufficing as an index for the datastructures used in HDT. So probably the FourSectionDictionaryBig is used. The problem is that while the HDT file can be loaded, it cannot be queried since the
idToString(int id, TripleComponentRole role)
method uses an int for the id. A long should be possible since the file cannot be queried otherwise.
Hi,
Would it be possible to update the release on the Central Repo of Maven? Thanks in advance.
KR
Pieter
Because then I know how to run a SPARQL query against a HDT file...
Hello there,
I am using hdt-jena to query over the geonames HDT. I am unabled to use a FILTER.
Look at this :
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?s ?lat ?long
WHERE {
?s geo:lat ?lat.
?s geo:long ?long
} LIMIT 10
I successfully get :
s,lat,long
_:b0,35.325,-71.085641
_:b0,35.325,-80.00
_:b0,35.325,13.286104
_:b0,35.325,2.15898513793945
_:b0,35.325,2.312922
_:b0,35.325,25.13
_:b0,35.325,9.221907
_:b0,40.44,-71.085641
_:b0,40.44,-80.00
_:b0,40.44,13.286104
Now a try to filter over geo:lat :
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?s ?lat ?long
WHERE {
?s geo:lat ?lat.
?s geo:long ?long.
FILTER(?lat > 40)
} LIMIT 10
Unfortunately, no more result :
s,lat,long
So, can you tell me if FILTER is out of HDT capabilities ?
I crawled the code, and I found this line :
// // FIXME: Allow a filter here.
Regards,
The HDT file generated by hdt-java from http://experimental.worldcat.org/fast/download/FASTTitle.nt.zip is incompatible with hdt-cpp. When generating the file with hdt-cpp, it is compatible with both versions.
I just modified the 'fuseki_example.ttl' a little bit to make it point to two hdt files I have generated. But I got the following error message when I tried to start the fuseki with config file:
[fug2@virtuosodev11 hdt-fuseki]$ bin/hdtEndpoint.sh --config=fuseki_example.ttl
com.hp.hpl.jena.assembler.exceptions.AssemblerException: caught: Adjacency list bitmap and array should have the same size
doing:
root: file:///home/fug2/hdt-java/hdt-fuseki/fuseki_example.ttl#graph1 with type: http://www.rdfhdt.org/fuseki#HDTGraph assembler class: class org.rdfhdt.hdtjena.HDTGraphAssembler
root: file:///home/fug2/hdt-java/hdt-fuseki/fuseki_example.ttl#dataset with type: http://jena.hpl.hp.com/2005/11/Assembler#RDFDataset assembler class: class com.hp.hpl.jena.sparql.core.assembler.DatasetAssembler
the changed fuseki_example.ttl file is as follow:
<#graph1> rdfs:label "RDF Graph1 from HDT file" ;
rdf:type hdt:HDTGraph ;
hdt:fileName "/export/home/SSD/BIGDATA/hdt/pc_compound_0.hdt" ;
.
<#graph2> rdfs:label "RDF Graph2 from HDT file" ;
rdf:type hdt:HDTGraph ;
hdt:fileName "/export/home/SSD/BIGDATA/hdt/pc_compound_1.hdt" ;
.
I've ran mvn install
. Then... what? Where are the compiled binaries? The bin/
directories only have a bunch of .bat
and .sh
script, but the README doesn't explain how to use them.
I'd really appreciate any help (or more info in the README) about how to use the CLI tools after compilation.
The output format of this Fuseki server is limited, the following do not work:
output=json-ld
output=json-rdf
output=nt
output=ttl
If we serve it as a REST interface service with content negotiation, we need another dependent library to change format...which is not convenient.
Any comment?
Sorry I understand this is not an issue, but I don't see any HDT mailing list or forum where I can ask this. Basically I'm using HDT with Fuseki, and I'd like to understand how are HDT files mapped into memory. When I start Fuseki, what about a HDT file is loaded into RAM? Only the index? And once I start submitting queries, are HDT triples loaded into RAM or read from the hard disk? What happens if the HDT file can't be loaded completely in RAM? Will it swap or will it keep moving new data from disk to ram? Finally, to improve HDT query performance would I be better buying more RAM or a faster disk (SSD) - or something else?
Thank you.
For example, 2.12.1... I gave it a go, but running into Jena API changes around the ReorderTransformationBase class :/ However, I am not familiar with these APIs so cannot update this myself...
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] RDF/HDT
[INFO] HDT API
[INFO] HDT Java Core
[INFO] HDT Java Command line Tools
[INFO] HDT Jena
[INFO] HDT Java Package
[INFO] HDT Fuseki
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building RDF/HDT 2.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-assembly-plugin:3.1.0:single (default-cli) @ hdt-java-parent ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] RDF/HDT ........................................... FAILURE [1.062s]
[INFO] HDT API ........................................... SKIPPED
[INFO] HDT Java Core ..................................... SKIPPED
[INFO] HDT Java Command line Tools ....................... SKIPPED
[INFO] HDT Jena .......................................... SKIPPED
[INFO] HDT Java Package .................................. SKIPPED
[INFO] HDT Fuseki ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.972s
[INFO] Finished at: Sun Nov 11 00:07:06 CET 2018
[INFO] Final Memory: 8M/245M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single (default-cli) on project hdt-java-parent: Error reading assemblies: No assembly descriptors found. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException ```
In RDFParserRIOT, the "parser not found for" exception is hidden by the catches in lines 87-91. The Ideally, ParserException should take a Throwable as an argument to its constructor, so that he exceptions can be chained.
Hi All,
I found this feature is very helpful in our case, but it is only available for TDB backend, can we make this available for HDT files?
This service offers SPARQL query access only to a TDB database. The TDB database can have specific features set, such as making the default graph the union of all named graphs.
<#service3> rdf:type fuseki:Service ;
fuseki:name "tdb" ; # http://host:port/tdb
fuseki:serviceQuery "sparql" ; # SPARQL query service
fuseki:dataset <#dataset> ;
.
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
# Query timeout on this dataset (1s, 1000 milliseconds)
ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "1000" ] ;
# Make the default graph be the union of all named graphs.
## tdb:unionDefaultGraph true ;
.
Are the .hdt.index.v* files we generate compatible with the hdt-cpp index files for the same value of *
?
If not, shouldn't we name them differently?
Hello,
I tried to follow the instructions at https://github.com/rdfhdt/hdt-java/tree/master/hdt-jena
but I get build errors
$ cd hdt-api/
$ mvn install
...
[INFO] BUILD SUCCESS
...
$ cd ../hdt-java-core
$ mvn install
...
[ERROR] Failed to execute goal on project hdt-java-core: Could not resolve dependencies for project org.rdfhdt:hdt-java-core:jar:2.0-SNAPSHOT: Failed to collect dependencies at org.rdfhdt:hdt-api:jar:2.0-SNAPSHOT: Failed to read artifact descriptor for org.rdfhdt:hdt-api:jar:2.0-SNAPSHOT: Failure to find org.rdfhdt:hdt-java-parent:pom:2.0-SNAPSHOT in https://oss.sonatype.org/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of sonatype-nexus-snapshots has elapsed or updates are forced -> [Help 1]
Can anyone give me a hint how to fix it?
Hello,
Next a generation of a hdt file with https://github.com/rdfhdt/hdt-docker and my own turtle file, I used this hdt file with hdt-fuseki (command : ./bin/hdtEndpoint.sh --hdt ./ALL.hdt /dataset
) . the sparql endpoint works well with some sparql command i used but throw an exception for the following sparql request
curl -H "Accept: application/json" http://<myserver>:3030/dataset/sparql --data-urlencode query='PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?u ?l ?p WHERE { VALUES ?u { <fr.mgdis.odata.data.plagesBREfrType> } OPTIONAL { ?u rdfs:label ?l . BIND (rdfs:label AS ?p) } }'
server logs:
15:38:36 INFO [25] Query = PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?u ?l ?p WHERE { VALUES ?u { <fr.mgdis.odata.data.plagesBREfrType> } OPTIONAL { ?u rdfs:label ?l . BIND (rdfs:label AS ?p) } } 15:38:36 WARN [25] RC = 500 : (?u,null) java.lang.IllegalArgumentException: (?u,null) at org.rdfhdt.hdtjena.bindings.BindingHDTId.put(BindingHDTId.java:79) at org.rdfhdt.hdtjena.solver.HDTSolverLib$3.apply(HDTSolverLib.java:202) at org.rdfhdt.hdtjena.solver.HDTSolverLib$3.apply(HDTSolverLib.java:178) at org.apache.jena.atlas.iterator.Iter$4.next(Iter.java:308) at org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:47) at org.rdfhdt.hdtjena.util.IterAbortable.hasNext(IterAbortable.java:62) at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:303) at org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:53) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:66) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterDefaulting.hasNextBinding(QueryIterDefaulting.java:54) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:58) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74) at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59) at org.apache.jena.fuseki.servlets.SPARQL_Query.executeQuery(SPARQL_Query.java:297) at org.apache.jena.fuseki.servlets.SPARQL_Query.execute(SPARQL_Query.java:252) at org.apache.jena.fuseki.servlets.SPARQL_Query.executeWithParameter(SPARQL_Query.java:205) at org.apache.jena.fuseki.servlets.SPARQL_Query.perform(SPARQL_Query.java:100) at org.apache.jena.fuseki.servlets.SPARQL_ServletBase.executeLifecycle(SPARQL_ServletBase.java:227) at org.apache.jena.fuseki.servlets.SPARQL_ServletBase.executeAction(SPARQL_ServletBase.java:204) at org.apache.jena.fuseki.servlets.SPARQL_ServletBase.execCommonWorker(SPARQL_ServletBase.java:186) at org.apache.jena.fuseki.servlets.SPARQL_ServletBase.doCommon(SPARQL_ServletBase.java:79) at org.apache.jena.fuseki.servlets.SPARQL_Query.doPost(SPARQL_Query.java:60) at javax.servlet.http.HttpServlet.service(HttpServlet.java:755) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:256) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.nio.BlockingChannelConnector$BlockingChannelEndPoint.run(BlockingChannelConnector.java:298) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:748) 15:38:36 INFO [25] 500 (?u,null) (5 ms)
When performing a
HDT hdt = HDTManager.mapIndexedHDT(rdfFile.getAbsolutePath(), null);
and then do a
hdt.close();
The number of open files keeps increasing. As I have 70.000 HDT files to query I have to run the program multiple times on subsets as otherwise I end up with too many open files in the OS. I tried the 2.1 branch but was unable to compile the code.
Hi all,
try {
IteratorTripleString it = hdt.search(identifier, property, "");
while (it.hasNext()) {
TripleString ts = it.next();
ValAgg.put(identifier, ts.getObject().toString(), lang);
}
} catch (NotFoundException nfe) {
//intentionally left blank.
//hdt.search returns NF exception instead of null or it.hasNext() == false
}
this is how my code looks like. I find it weird, that I have to leave the catch block empty. But the code works.
My example data is this triple:
<http://example.org/> <http://schema.org/name> "Example" .
I converted it to HDT and created an index. Then I executed this SPARQL query using hdtsparql.sh:
$ hdtsparql.sh example.hdt "SELECT * { BIND(<http://example.org/> AS ?s) ?s ?p ?o }"
s,p,o
http://example.org/,http://schema.org/name,Example
So far so good. But when I change the bound URI to something nonexistent, I get an error:
$ hdtsparql.sh example.hdt "SELECT * { BIND(<http://example.org/2> AS ?s) ?s ?p ?o }"
Exception in thread "main" java.lang.IllegalArgumentException: (?s,null)
at org.rdfhdt.hdtjena.bindings.BindingHDTId.put(BindingHDTId.java:79)
at org.rdfhdt.hdtjena.solver.HDTSolverLib$3.apply(HDTSolverLib.java:202)
at org.rdfhdt.hdtjena.solver.HDTSolverLib$3.apply(HDTSolverLib.java:178)
at org.apache.jena.atlas.iterator.Iter$4.next(Iter.java:308)
at org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:47)
at org.rdfhdt.hdtjena.util.IterAbortable.hasNext(IterAbortable.java:62)
at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:303)
at org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:53)
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74)
at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59)
at org.apache.jena.sparql.resultset.CSVOutput.format(CSVOutput.java:81)
at org.apache.jena.query.ResultSetFormatter.outputAsCSV(ResultSetFormatter.java:624)
at org.rdfhdt.hdtjena.cmd.HDTSparql.main(HDTSparql.java:53)
I also tested via Fuseki and got the same errors.
Note that this variant, where BIND is not used, works fine:
SELECT * { <http://example.org/2> ?p ?o }
However, if I use VALUES, I can get the same error as with BIND:
SELECT * { VALUES ?s { <http://example.org/2> } ?s ?p ?o }
I'm traying to compile hdt-fuseki with maven but i'm getting the following error:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile (default-compile) on project hdt-fuseki: Compilation failure
[ERROR] /C:/Users/BS/DBpedia/hdt-java-master/hdt-fuseki/src/main/java/org/rdfhdt/hdt/fuseki/FusekiHDTCmd.java:[331,47] cannot access com.hp.hpl.jena.graph.impl.GraphBase
[ERROR] class file for com.hp.hpl.jena.graph.impl.GraphBase not found
I've changed the pom.xml to work with jena v3.8.0.
However the java file FusekiHDTCmd.java still indicating on eclipse:
The type com.hp.hpl.jena.graph.impl.GraphBase cannot be resolved. It is indirectly referenced from required .class files.
Can you help me please ?
Thanks
The 2.0 version of this codebase is not available yet on Maven Central, but I don't have to permissions to do so. @MarioAriasGa or @webdata can you do that or give me the right access?
This might or might not be the right project for this issue...
I'm trying to query a large dataset (5GB .hdt file, 266 M-Triples) and have a problem searching for untyped literals. SPARQL queries with typed literals or URIs in the object position run fine. Also, when I create a small dataset (13 triples), SPARQL queries for typed literals run fine, so I assume that it's an issue with the hdt file size. The hdt files were created using hdt-cpp.
I have integrated HDT support into Fuseki as described and the service as a whole works fine.
The problem looks like this: I first run a DESCRIBE
in order to get some triples:
PREFIX gndo: <http://d-nb.info/standards/elementset/gnd#>
PREFIX rdau: <http://rdaregistry.info/Elements/u/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcterm: <http://purl.org/dc/terms/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX gnd: <http://d-nb.info/gnd/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dnbt: <http://d-nb.info/standards/elementset/dnb#>
DESCRIBE <http://d-nb.info/1000000354>
FROM <http://d-nb.info/dnb-all>
That query returns
@prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
@prefix dnbt: <http://d-nb.info/standards/elementset/dnb#> .
@prefix rdau: <http://rdaregistry.info/Elements/u/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dcterm: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix gnd: <http://d-nb.info/gnd/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<http://d-nb.info/1000000354>
a <http://purl.org/ontology/bibo/Collection> ;
dc:identifier "(OCoLC)723788590" , "(DE-101)1000000354" ;
dc:publisher "A. F. W. Sommer" ;
dc:subject "830"^^dnbt:ddc-subject-category , "B"^^dnbt:ddc-subject-category ;
dc:title "Neuere Gedichte" ;
dcterms:creator gnd:118569317 ;
dcterms:medium <http://rdaregistry.info/termList/RDACarrierType/1044> ;
rdau:P60163 "Wien" ;
rdau:P60327 "August Friedrich Ernst Langbein" ;
rdau:P60333 "Wien : A. F. W. Sommer" ;
rdau:P60493 "1814" ;
rdau:P60539 "30 cm" ;
<http://www.w3.org/2002/07/owl#sameAs>
<http://hub.culturegraph.org/resource/DNB-1000000354> .
When I then try to query for a literal like this:
SELECT ?entity
FROM <http://d-nb.info/dnb-all>
WHERE {
?entity ?p "A. F. W. Sommer"
}
I get zero results.
Also adding the datatype `xsd:string' to the literal doesn't help:
SELECT ?entity
FROM <http://d-nb.info/dnb-all>
WHERE {
?entity ?p "A. F. W. Sommer"^^<http://www.w3.org/2001/XMLSchema#string>
}
does not help.
If I inspect the hdt file using hdt-it!, a search for "A. F. W. Sommer"^^http://www.w3.org/2001/XMLSchema#stringreturns 231 hits, so the data is obviously present.
As a verification, I created a dataset consisting only of this one entity and configured fuseki to run a separate service with that dataset (15 triples) in a single named graph. With that configuration, SPARQL queries for untyped literals work so I guess that it's a problem with the hdt file size.
Any insights are much appreciated.
Thanks,
Lars
[2018-11-06 01:44:02] PredicateIndexArray INFO Count predicates in 1 hour 16 min 22 sec 104 ms 376 us
for a 2B triples HDT.
Could it use the values from my HDT file?
_:triples http://purl.org/HDT/hdt#triplesnumTriples "2419923078"
it takes a few ms to get that from the header.
The following command from the readme file does not work on my system:
$ mvn assembly:single
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single (default-cli) on project hdt-java-parent: Error reading assemblies: No assembly descriptors found. -> [Help 1]
I'm on the following Java and Maven versions:
$ java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
$ mvn -version
Apache Maven 3.5.0 (Red Hat 3.5.0-6)
Maven home: /usr/share/maven
Java version: 1.8.0_151, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc27.x86_64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.14.6-300.fc27.x86_64", arch: "amd64", family: "unix"
The other documented Maven command (mvn install
) does work.
Hi all,
I just tried to convert DBpedia to hdt and would like to share my experience. Some things did not owrk for me, although I am not sure, whether it is my fault or just missing features.
Overall, I solved it now by sort -um <(lbzip file1.bz2) .... | gzip2 > core.gz
My question is just whether all of the above is intended and expected behaviour or whether I should try some of the methods again as I might have done something wrong.
All the best,
Sebastian
Hi,
I have a set up with Fuseki on top of HDT, data with HTTP URIs (surprise, surprise) in my data set, and when I query using:
SELECT * WHERE {?s ?p ?o} LIMIT 5
I get results. However, when I add the prefix definition for HTTP like
PREFIX http: <http://www.w3.org/2011/http#>
SELECT * WHERE {?s ?p ?o} LIMIT 5
I don't get results any more. When I change the prefix to something else like
PREFIX htt: <http://www.w3.org/2011/http#>
SELECT * WHERE {?s ?p ?o} LIMIT 5
I have results again.
When I fire a query that does not have HTTP URIs in the results (eg. only blank nodes), then PREFIX http: <...>
does not disturb: I get results with, without, and with the amended prefix.
Cheers!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.