linkeddatafragments / server.js Goto Github PK
View Code? Open in Web Editor NEWA Triple Pattern Fragments server for Node.js
Home Page: http://linkeddatafragments.org/
License: Other
A Triple Pattern Fragments server for Node.js
Home Page: http://linkeddatafragments.org/
License: Other
The context used by data.linkeddatafragments.org contains this line:
"": "http://data.linkeddatafragments.org/"
That is violating the JSON-LD specification. It states that a term MUST NOT be an empty string ("") in section 8.1. Using @vocab might be an alternative.
Replacing BufferedIterator
with TransformIterator
at
Server.js/lib/datasources/Datasource.js
Line 109 in f5dff0c
destination
iterator that is passed to the _executeQuery
method at Server.js/lib/datasources/Datasource.js
Line 125 in f5dff0c
In my case, I'm implementing a custom data source to use the quadstore
module as the backend for a ldf-server
. As quadstore
is able to return a stream of quads for any given query, I'd like to feed such streams into the destination
iterator without having to worry about backpressure management.
As TransformIterator
extends BufferedIterator
, this appears to be just a matter of replacing the latter with the former. I have briefly tested this locally and it seems to be working fine. I can do some more testing and submit a PR if you're interested.
EDIT: the reason why TransformIterator
is better than BufferedIterator
for the purpose of feeding an external stream after the iterator has been instantiated is that it supports setting the iterator's source through the .source
member.
Example query: there must be statements with FOAF knows, but the following query gives no results: http://lod-a-lot.lod.labs.vu.nl/?subject=&predicate=foaf%3Aknows&object=
This is because of the sheer size of some of the indices of LOD-a-lot. To accommodate this the HDT library was updated to use 64 bit integers (currently in branch https://github.com/rdfhdt/hdt-cpp/tree/long-dict-id). The Node.js/HDT API probably has to be updated to take these large indices into account.
Metadata of data sources (title, description, license, licenseUrl, copyright, homepage...) needs to hard-coded in config.json
but data sources should be able to provide their own metadata. Some thoughts:
Datasource
interface should be extended with an optional method to fetch metadata about the datasourceIndexDatasource
class needs to call this method to get additional triples from a datasourcetitle
of a data source is also used on instantiation in bin/ldf-server
. It should better be access by an accessor of the created data source instead of (or as fallback to) its raw configuration.By the way this should also allow for VoID files as data sources (new class VoidDatasource
):
By the way my use case of this feature is BEACON files as data sources. A BEACON link dump contains a list of links and additional metadata about this data set.
I'm wondering if it's possible to specify in config.json
multiple HDT files for one datasource?
Say,
"settings": { "file":["/data/dump1.hdt", "/data/dump2.hdt"] }
but didn't work for me.
At the moment, the HTTPS configuration is not clear for some edge cases:
Todo: revise and come up with a better design to specify different HTTPS scenarios
Currently, the views path is hardcoded (https://github.com/LinkedDataFragments/Server.js/blob/v2.2.2/lib/views/HtmlView.js#L34). If we make this an option, people can specify their own views.
This would be useful for custom views such as DBpedia's (http://fragments.dbpedia.org/, which I currently implement as a fork), and for other cases such as #61.
The phrase "results of SPARQL queries" in the first sentence of README.md is linked to
Dereferencing that URL results in this error message:
Virtuoso 37000 Error SP030: SPARQL compiler, line 3: Undefined namespace prefix at 'dbpedia-owl' before '}'
SPARQL query:
define output:format "HTTP+TTL text/turtle"
#output-format:text/turtle
define input:default-graph-uri <http://dbpedia.org> CONSTRUCT { ?p a dbpedia-owl:Artist }
WHERE { ?p a dbpedia-owl:Artist }
Can the config.json be used to redirect to another server location as in the description?
Just a small issue I ran into while testing this server.
README.md
has an example config file (not config-example.json
) which has some trailing commas on lines 9
and 15
.
When the parser throws an exception (e.g. in this situation rdfhdt/hdt-cpp#11) then no response status is returned. Would be nice to catch errors like this and return a 500 error message
There are two issues w.r.t. horizontal scalability (i.e. # datasets):
(both tested with ~ 650.000 datasets. You know which ;))
Please provide an extended documentation of the options in config.json. The examples under /config are not sufficient. It's really hard to find config options within the scripts and figure out what effects they have. E.g. "routers", "assetsPath", "blankNodePrefix" etc. A full list and a hint on the effect would be fine.
Thanks in advance!
Since .gitignore contains a line with "config" (which is by the way not consistent with the github repository, since it includes the config folder), npm doesn't include it in the published package.
When you start the ldf-server, you get an error such as :
Error: ENOENT: no such file or directory, open '/Users/myuser/myproject/node_modules/ldf-server/config/config-defaults.json'
Could you remove "config" from the .gitignore and re-publish ? Thx
as already done in the Java version of LDF: https://query.wikidata.org/bigdata/ldf .
For the query "SELECT DISTINCT ?p WHERE { ?s ?p ?o . }"
the server returns: Could not parse query: ...
How much of Sparql 1.1 is already supported by LDF?
Is there a way to expand the metadata that is shown on the index page about the datasets?
Hi Ruben, running v0.10.29 on Debian 8, installed using apt-get and getting the following error on starting. Searching around, this error appears with EventEmitter in other places, have you come across it?
$` ldf-server config.json 5000 4
/usr/local/lib/node_modules/ldf-server/lib/datasources/Datasource.js:36
Datasource.prototype = new EventEmitter();
^
TypeError: object is not a function
at Object.<anonymous> (/usr/local/lib/node_modules/ldf-server/lib/datasources/Datasource.js:36:24)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Module.require (module.js:364:17)
at require (module.js:380:17)
at Object.<anonymous> (/usr/local/lib/node_modules/ldf-server/lib/datasources/MemoryDatasource.js:5:18)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
Hello,
Thanks for releasing this project. It seems really promising. I'm looking into using it to give biologists more options for querying the open-source data from our non-profit research group WikiPathways.org.
Right now, the software works great when I query a small subset of our data, but when I try querying a larger dataset, I get a timeout error:
"Error: Error: ETIMEDOUT\n at Request.onResponse as _callback\n at self.callback (/usr/local/share/npm/lib/node_modules/ldf-client/node_modules/request/request.js:129:22)\n at Request.EventEmitter.emit (events.js:95:17)\n at null._onTimeout (/usr/local/share/npm/lib/node_modules/ldf-client/node_modules/request/request.js:591:12)\n at Timer.listOnTimeout as ontimeout"
Since the examples demonstrate querying DBPedia, I know the software should be able handle my data, which is 24.3 MB in size. It's currently stored as JSON-LD in an online Mongo instance here. (Caution: 24.3 MB json file.)
I'm thinking the problem is either
I can run this query when using our pre-production SPARQL endpoint as an datasource, so I'm assuming the main problem is that the software is only intended for small datasets when using JSON-LD as the datasource.
Should I be able to use 24MB of JSON-LD as a datasource, or is that outside the intended usage of the software?
Thanks.
(This was originally posted as an issue with the client code.)
How can we make this happen easily ?
When a Server.js
instance is configured to use a SPARQL endpoint and that endpoint times out, the error is not caught correctly but is re-thrown which causes an ldf-client
to terminate its execution. The Server.js
instance writes the following error message:
events.js:160
throw er; // Unhandled 'error' event
^
Error: Error accessing SPARQL endpoint http://dbpedia.org/sparql: ESOCKETTIMEDOUT
at emitError (/usr/local/lib/node_modules/ldf-server/lib/datasources/SparqlDatasource.js:71:40)
at Request._callback (/usr/local/lib/node_modules/ldf-server/lib/datasources/SparqlDatasource.js:64:7)
at self.callback (/usr/local/lib/node_modules/ldf-server/node_modules/request/request.js:186:22)
at emitOne (events.js:96:13)
at Request.emit (events.js:188:7)
at ClientRequest.<anonymous> (/usr/local/lib/node_modules/ldf-server/node_modules/request/request.js:781:16)
at ClientRequest.g (events.js:291:16)
at emitNone (events.js:86:13)
at ClientRequest.emit (events.js:185:7)
at Socket.emitTimeout (_http_client.js:620:10)
Worker 12477 died with 1. Starting new worker.
Worker 12896 running on http://localhost:8081/.
The client exits with this message:
svensson@ldslab:~$ node --max_old_space_size=4096 -- /usr/local/bin/ldf-client http://10.69.14.96:8081/DBPedia-SPARQL http://10.69.14.96:8081/WikipediaCitationSources http://10.69.14.96:8081/WikipediaCitationsISBN http://10.69.14.96:8081/DNBTitel http://10.69.14.96:8081/GND -f aen-dbpedia.sparql > aen-dbpedia.json
events.js:160
throw er; // Unhandled 'error' event
^
Error: socket hang up
at createHangUpError (_http_client.js:254:15)
at Socket.socketOnEnd (_http_client.js:346:23)
at emitNone (events.js:91:20)
at Socket.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:74:11)
at process._tickCallback (internal/process/next_tick.js:98:9)
I have an HDT version of the KEGG dataset taken from http://download.bio2rdf.org/current/kegg/kegg.html
Several files have triples like the following:
<http://bio2rdf.org/cpd:C00003> <http://bio2rdf.org/ns/bio2rdf#xRef> <http://bio2rdf.org/pdb-ccd:NAD NAJ> .
Notice a spacebar in the URI.
N3 to NT and NT to HDT parsers did not encounter any errors during parsing.
However, when I'm querying the endpoint using LDF Client I get an error:
WARNING TriplePatternIterator Unexpected "<http://bio2rdf.org/pdb-ccd:NAD" on line 66.
events.js:160
throw er; // Unhandled 'error' event
^
Error: Unexpected "<http://bio2rdf.org/pdb-ccd:NAD" on line 66.
at N3Lexer._syntaxError (/<...>/node_modules/n3/lib/N3Lexer.js:358:12)
at reportSyntaxError (/<...>/node_modules/n3/lib/N3Lexer.js:325:54)
at N3Lexer._tokenizeToEnd (/<...>/node_modules/n3/lib/N3Lexer.js:311:18)
at TrigFragmentIterator._parseData (/<...>/node_modules/n3/lib/N3Lexer.js:393:16)
at TrigFragmentIterator.TurtleFragmentIterator._transform (/<...>/node_modules/ldf-client/lib/ nts/TurtleFragmentIterator.js:47:8)
at readAndTransform (/<...>/node_modules/asynciterator/asynciterator.js:959:12)
at TrigFragmentIterator.TransformIterator._read (/<...>/node_modules/asynciterator/ 3)
at TrigFragmentIterator.BufferedIterator._fillBuffer (/<...>/node_modules/asynciterator/ 10)
at Immediate.fillBufferAsyncCallback (/<...>/node_modules/asynciterator/asynciterator.js:800:8)
at runCallback (timers.js:639:20)
I know this is a very rare case and somehow violates the URI naming convention (not URI-encoded), but can it be fixed or I have to fix this URI manually in the raw nt files?
When serving an HDT file, the server sometimes throws the following error:
Error: Trying to read a LOGArray but data is not LogArray
It is unclear whether this is a problem with the HDT GUI (from http://www.rdfhdt.org), or the linked data fragments server. For instance, the SWDF HDT dataset http://gaia.infor.uva.es/hdt/swdf-2012-11-28.hdt.gz loads fine in the server, but cannot be opened using the GUI.
A test file can be downloaded from: https://www.dropbox.com/s/mumhgkt4fazbk8s/clariah-canada-1901.hdt?dl=0
Current docker image is about 700 MB, a smaller image would be nice for running Server.js on cheap virtual servers / instances.
Testing with alpine didn't seem to work when HDT is needed (after adding a few packages and successfully compiling hdt.node, there are quite a lot of missing symbols at runtime, adding libuv and libstdc++ packages didn't help).
See also comments at bottom of #19 , where it is explained that this could be due to Alpine using musl-libc instead of glibc
I foresee these changes:
this._datasources[query.datasource].datasource.addTriple
Do you think I will encounter other obstacles?
Server configuration chains two composite data sources: composite data source A refers to composite data source B. Composite data source B refers to 1 HDT data source and 2 Turtle data sources.
The following SPARQL query executed via the ldf-client
against data source A results in a 502
after a certain amount of results have returned.
SELECT ?s ?n WHERE {?s <http://schema.org/name> ?n}
When executing this query directly against data source B or the HDT data source no error occurs.
Currently, when setting up a datasource using the file:// protocol, the 'file:' part of the link gets stripped and the remainder gets used to find the file ( https://github.com/LinkedDataFragments/Server.js/blob/master/lib/datasources/Datasource.js#L148 )
The first two slashes should also be stripped, and besides that there are some additional considerations ( https://en.wikipedia.org/wiki/File_URI_scheme ). And yes, one of the issues is that it's different between UNIX and Windows ๐
Now we could also just not do that and use our 'own' file protocol (file:URI as it is now). The reason for this issue is one specific test that uses the file:// protocol: https://github.com/LinkedDataFragments/Server.js/blob/master/test/datasources/Datasource-test.js#L63
This test fails on Windows because the extracted path becomes //C:/my/path
. On unix the extracted path is ///my/path
which is not a problem for the path parser.
So either that test has to change or the interpretation of the file:// protocol
Hi there,
Pagination is SparqlDatasource is done by a CONSTRUCT with LIMIT/OFFSET. The comments says :
// Even though the SPARQL spec indicates that
// LIMIT and OFFSET might be meaningless without ORDER BY,
// this doesn't seem a problem in practice.
// Furthermore, sorting can be slow. Therefore, don't sort.
When i use Blazegraph SPARQL Endpoint, it's a problem in practice : order is not garantee without ORDER BY. I experimented it by get differents results for the same page.
I use a 4B triplets dataset, and the use of a ORDER BY is not possible due to a performance consideration.
This is a limitation of the LDF concept over SPARQL Endpoint : it's based on a false SPARQL assumption ("this doesn't seem a problem in practice.").
What are your feelings on this point ?
Bests,
Blaise
Hi,
This could be an error, or it could be my lack of kowledge about Node.js
npm ERR! Error: failed to fetch from registry: ldf-server
npm ERR! at /usr/share/npm/lib/utils/npm-registry-client/get.js:139:12
npm ERR! at cb (/usr/share/npm/lib/utils/npm-registry-client/request.js:31:9)
npm ERR! at Request._callback (/usr/share/npm/lib/utils/npm-registry-client/request.js:136:18)
npm ERR! at Request.callback (/usr/lib/nodejs/request/main.js:119:22)
npm ERR! at Request. (/usr/lib/nodejs/request/main.js:212:58)
npm ERR! at Request.emit (events.js:88:20)
npm ERR! at ClientRequest. (/usr/lib/nodejs/request/main.js:412:12)
npm ERR! at ClientRequest.emit (events.js:67:17)
npm ERR! at HTTPParser.onIncoming (http.js:1261:11)
npm ERR! at HTTPParser.onHeadersComplete (http.js:102:31)
npm ERR! You may report this log at:
npm ERR! http://bugs.debian.org/npm
npm ERR! or use
npm ERR! reportbug --attach /home/ispace/Documents/programming/nodeJS/npm-debug.log npm
npm ERR!
npm ERR! System Linux 3.13.0-57-generic
npm ERR! command "node" "/usr/bin/npm" "install" "-g" "ldf-server"
npm ERR! cwd /home/ispace/Documents/programming/nodeJS
npm ERR! node -v v0.6.12
npm ERR! npm -v 1.1.4
npm ERR! message failed to fetch from registry: ldf-server
npm ERR! Error: EACCES, permission denied 'npm-debug.log'
npm ERR!
npm ERR! Please try running this command again as root/Administrator.
npm ERR!
npm ERR! System Linux 3.13.0-57-generic
npm ERR! command "node" "/usr/bin/npm" "install" "-g" "ldf-server"
npm ERR! cwd /home/ispace/Documents/programming/nodeJS
npm ERR! node -v v0.6.12
npm ERR! npm -v 1.1.4
npm ERR! path npm-debug.log
npm ERR! code EACCES
npm ERR! message EACCES, permission denied 'npm-debug.log'
npm ERR! errno {}
npm ERR!
npm ERR! Additional logging details can be found in:
npm ERR! /home/ispace/Documents/programming/nodeJS/npm-debug.log
The SPARQL datasource takes as argument a single default graph. The SPARQL protocol supports multiple defaults graphs though. I.e., you can consider allowing this value to be an array of graphs
The current HTML-dialog includes fields for subject, predicate and object. It also should include a field to submit SPARQL queries as in this HTML-dialog: http://client.linkeddatafragments.org/
http://www.downforeveryoneorjustme.com/http://data.linkedddatafragments.org/ says that the example server is down.
Is there a way to configure the Expires header for the results? I edited Controller.js to add the header:
response.setHeader('Expires', new Date(+new Date() + 86400000).toUTCString());
but this would be nice to set in the config.json.
The Expires header is needed for Apache mod_cache.
I have trouble installing the ldf-server due to dependencies. Being not proficient with NodeJS the error reports and dependencies just look unfamiliar when setting up the environment. It would be helpful to have a Dockerfile. My approach does not work fully:
FROM ubuntu:14.10
RUN apt-get install -y software-properties-common
RUN apt-add-repository ppa:chris-lea/node.js
RUN apt-get update
RUN apt-get install -y nodejs
RUN apt-get install -y python
#RUN npm install hdt
# Bundle app source
ADD . /src
# Install app dependencies
RUN cd /src; npm install -g ldf-server
EXPOSE 5000
WORKDIR /src
CMD ldf-server config.json 5000 4
Publishing empty HDT files crashes the server. Error message:
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
I know, publishing empty HDT files is not a common thing to do ;). However, when publishing many datasets, situations like this may occur.
Is the ca
required (as done in the example) or is the key
and cert
enough? And are there errors thrown or logs created when the keys/certs are not found?
Hi!
I ran some experiments with the last version of the LDF server (v2.2.2) and I notice some unexpected behaviors when requesting pages with nonexistent page numbers.
In short, if you request a triple pattern with a page number higher than the max page number, triples from the full graph are returned with one triple per page.
I uploaded an example on Amazon AWS: http://ec2-34-208-134-212.us-west-2.compute.amazonaws.com/watDiv_100 with the triple pattern ?s <http://schema.org/contentSize> ?o
. I use a 100k triples version of WatDiv and 100 triples per page in my configuration file.
This triple has 26 pages, so if you access the last page http://ec2-34-208-134-212.us-west-2.compute.amazonaws.com/watDiv_100?predicate=http%3A%2F%2Fschema.org%2FcontentSize&page=26, everything is fine.
However, if you try to access any page above 26, like http://ec2-34-208-134-212.us-west-2.compute.amazonaws.com/watDiv_100?predicate=http%3A%2F%2Fschema.org%2FcontentSize&page=27, unrelated triples are returned with only one triple per page.
Nonetheless, metadata and hypermedia controls are fine so the classic ldf-client
is not affected by this issue.
I built a Docker image of the Server and supplied there an HdtDatasource in the config.json:
"datasources": {
"chebi": {
"title": "ChEBI",
"type": "HdtDatasource",
"description": "ChEBI HDT",
"settings": { "file": "/data/chebi.hdt" }
}
},
then I run the container and attach volumes:
docker run -it -p 3001:3000 --rm -v /<path>/ChEBI/config.json:/tmp/config.json -v <path>/ChEBI/chebi.hdt:/data/chebi.hdt ldf_hdt_server /tmp/config.json
I get the error:
Master 1 running on http://localhost:3000/.
terminate called after throwing an instance of 'std::out_of_range'
what(): map::at
I didn't manage to trace the error to the js sources, apparently it's an error of the HDT library.
The HDT file is correct, I can query it and get hdtInfo().
Could you please give any clue how to fix it?
Hi, I was wondering whether it is possible "automatically" to distinguish between standard SPARQL endpoint and LDF endpoint?
Say you have endpoint URL http://example.com/endpoint but we your not aware whether this is LDF or standard SPARQL endpoint. Being able to automatically detect the type of endpoint will help to choose the right client type (for LDF or standard LDF). Thanks!
I followed the README directions for installation and configuration of the ldf-server. The server starts up just fine but I'm unable to query data, either the example DBPedia SPARQL instance or a local SPARQL endpoint. All query results say "Dataset index contains no triples that match this pattern." Am I missing a step?
A configuration with the subdirectory does not work. I tried with both a subdomain and a subdirectory.
You can reproduce the problem trying the following.
baseURL: "http://data.example.com"
. If I curl for, e.g., the stylesheet on the machine where the server is running the correct file is returned (curl http://localhost:50000/assets/styles/ldf-server
).
baseURL: "http://example.com/data/"
. If I curl for, e.g., the stylesheet on the machine where the server is running an HTML file is returned that says that No resource with URL /assets/styles/ldf-server was found
(curl http://localhost:50000/assets/styles/ldf-server
) and the access logs show a 404 response.
The example "config.json" in "README.md" refers to "data/dbpedia2014.hdt" but the README does not tell where that file is available.
This would be very useful for what I am trying to do with your awesome product.
Here is the spec. A predicate can be either a URI or a path construct.
I started to implement this here. The Matcher only checks for an asterisk for now.
However, I realized that this is more involved than I had anticipated. A decision tree has to made to route property path type queries to SELECT and normal ones to CONSTRUCT. And if it is a SELECT then an Accept header has to be sent to make sure that the response is N3/turtle.
Thanks!
In my case the configured baseURL is "https://licensedb.org/data/", the generated links look like "http://licensedb.org/data/licensedb#dataset" and "http://licensedb.org/data/licensedb?page=2".
I have set up a 301 redirect from http to https, so it isn't much of a practical problem right now -- but obviously it would be nicer if those links didn't go via http://. A front-end proxy does the SSL termination, so the connection between the browser and the front-end proxy is SSL, but the connection between the proxy and the LDF server is plain http.
The config file allows for an original base URL to be set using the parameter originalBaseURL
. However, when this parameter is not set, the MementoControllerExtension, Line 33-34 chooses the TimeGate base URL. It would be better to either make this parameter mandatory in the config when the server is in Memento mode, or, use the baseURL
parameter in the config to construct the original URL.
Hi - can the config be modified to support a remote NQUAD file in the config.json?
Using node:4 will select the latest supported Node.js 4.x LTS (currently 4.8.3) containing some security fixes and lots of patches
For many people it is not clear that when visiting an LDF server, that you still have to select a datasource to get to the actual data. Most people seem to think that something is wrong because they only see a few triples (the triples that describe the listed datasets).
A default index page in the form of DBpedia's TPF server might make more sense. But maybe there are better solutions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.