Comments (16)
I have exactly the same issue. Worked fine yesterday, and today got the same errors as you.
from rdflib-jsonld.
Note: Libraries/applications should be either caching or installing a version of the schema.org JSON-LD context and loading it locally. A language/ecosystem appropriate package manager could then be used to manage updates to the context. This would mitigate this problem and improve performance for users.
from rdflib-jsonld.
Work-around:
I have an ugly fix for those who need this working quickly, I won't submit a PR because this isn't a fix, it's a hack that (in my case) works:
--- lib/python3.7/site-packages/rdflib_jsonld/context.py~ 2020-05-21 13:57:18.086689949 -0400
+++ lib/python3.7/site-packages/rdflib_jsonld/context.py 2020-05-21 14:21:43.238345583 -0400
@@ -207,6 +207,8 @@
for source in inputs:
if isinstance(source, str):
source_url = urljoin(base, source)
+ if "/schema.org" in source_url:
+ source_url = "jsonldcontext.jsonld"
if source_url in referenced_contexts:
raise errors.RECURSIVE_CONTEXT_INCLUSION
referenced_contexts.add(source_url)
save that as jsonfix.patch then (in linux) I do patch -p0 < jsonfix.patch
from my base directory and my above short test, when I remove my (fake) context value, now works as expected. This requires the jsonldcontext.jsonld file to be placed in the current directory.
Improvements are very much appreciated
from rdflib-jsonld.
it seems this issue started at 20:17.49 EDT May 18th and has broken all of our applications using the json-ld plugin. Is it possible that our repeated use of this plugin has caused us to be banned somewhere?
from rdflib-jsonld.
as of 13:21.39 EDT May 19th, it appears to be back online, whatever it was, the very same code as above now produces a graph as expected.
So clearly whatever this is, perhaps I need to prevent this fetch by providing my own local copy?
from rdflib-jsonld.
and my day just keeps getting worse ...
>>> data = '{"@context": "http://schema.org/", "@type": "AggregateRating", "ratingValue": 5, "reviewCount": 1, "bestRating": 5, "@id": "FOO#Rating"}'
>>> g = Graph().parse(data=data, format="json-ld", context="https://www.example.com")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "./lib/python3.7/site-packages/rdflib/graph.py", line 1043, in parse
parser.parse(source, self, **args)
File ".lib/python3.7/site-packages/rdflib_jsonld/parser.py", line 95, in parse
to_rdf(data, conj_sink, base, context_data)
File "./lib/python3.7/site-packages/rdflib_jsonld/parser.py", line 104, in to_rdf
context.load(context_data)
File "./lib/python3.7/site-packages/rdflib_jsonld/context.py", line 200, in load
self._prep_sources(base, source, sources)
File "./lib/python3.7/site-packages/rdflib_jsonld/context.py", line 213, in _prep_sources
source = source_to_json(source_url)
File "./lib/python3.7/site-packages/rdflib_jsonld/util.py", line 28, in source_to_json
return json.load(StringIO(stream.read().decode('utf-8')))
File "/usr/lib/python3.7/json/__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
it could be blindness from fatigue, but I'm nearly certain those two lines would have worked before.
from rdflib-jsonld.
it has to be that mystery file that the plugin secretly fetches, some sort of definition spec for json-ld that was lost in their Monday crash and is now corrupted?
I am using
rdflib==4.2.2
rdflib-jsonld==0.5.0
Python 3.7.3 (default, Oct 7 2019, 12:56:13)
[GCC 8.3.0] on linux (ubuntu 19.10)
but none of this information has changed since last week -- strace does not appear to show any fetch of a remote file (that I can see) but the HTTP 500 errors overnight Monday do seem to indicate otherwise?
here is my test script, I inserted json.loads to verify that the string is indeed json-compatible, but also to show that it isn't failing because of the rdflib-jsonld use of the json module; I also changed the context to a nonsense but valid url just in case example.com was triggering someting
from rdflib import Graph
from rdflib import plugins # required for json-ld
import json
data = '{"@context": "http://schema.org/", "@type": "AggregateRating", "ratingValue": 5, "reviewCount": 1, "bestRating": 5, "@id": "FOO#Rating"}'
print(json.loads(data))
g = Graph().parse(data=data, format="json-ld", context="https://www.google.com")
for row in g:
print(row)
This presently outputs the dict as expected, but followed by a failure in the json decoder.py
{'@context': 'http://schema.org/', '@type': 'AggregateRating', 'ratingValue': 5, 'reviewCount': 1, 'bestRating': 5, '@id': 'FOO#Rating'}
Traceback (most recent call last):
File "./jsontest.py", line 9, in <module>
g = Graph().parse(data=data, format="json-ld", context="https://www.google.com")
File "./lib/python3.7/site-packages/rdflib/graph.py", line 1043, in parse
parser.parse(source, self, **args)
File "./lib/python3.7/site-packages/rdflib_jsonld/parser.py", line 95, in parse
to_rdf(data, conj_sink, base, context_data)
File "./lib/python3.7/site-packages/rdflib_jsonld/parser.py", line 104, in to_rdf
context.load(context_data)
File "./lib/python3.7/site-packages/rdflib_jsonld/context.py", line 200, in load
self._prep_sources(base, source, sources)
File "./lib/python3.7/site-packages/rdflib_jsonld/context.py", line 213, in _prep_sources
source = source_to_json(source_url)
File "./lib/python3.7/site-packages/rdflib_jsonld/util.py", line 28, in source_to_json
return json.load(StringIO(stream.read().decode('utf-8')))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 9718: invalid continuation byte
position 9718??
from rdflib-jsonld.
This is an error on the schema.org site. See: https://tinyurl.com/kk7k5qt (A link to the JSON-LD playground) that uses schema.org as well. Has anyone reported this issue to them???
from rdflib-jsonld.
@hsolbrig I'm curious, how do you know it is an error on the schema.org site? I don't know the internals of this thing very well (but apparently now is the time to learn) -- I couldn't find an easy means to contact json-ld.org, but I am in their IRC channel right now; there are 21 participants, but I fear these days IRC isn't first on anyone's list as a place to check
from rdflib-jsonld.
dlongley at json-ld provides the following:
curl -v -H "accept: application/ld+json" "http://schema.org"
is returning nonsense
from rdflib-jsonld.
They are aware: schemaorg/schemaorg#2578
from rdflib-jsonld.
In a way I am thankful for their downtime overnight Monday because I was unaware that my code was hitting some service for every single json-ld parse job, and in some runs, I can be doing thousands in a very short time. So yes, a Conditional-GET might be in order, done during init, or done once on the first invocation and cached for the lifespan of the process, and if it fails, fall back to a local filesystem cached copy -- after years of using schema.org and rdflib-jsonld, I guess I had never tried to run one on a disconnected machine.
schemaorg/schemaorg posted a merge into master that roughly correlates with when the HTTP 500 errors stopped and the parsing errors began
update at 16:20 EDT: that curl test now returns HTML again (for a while it was returning binary)
from rdflib-jsonld.
as we move into day 3 of this outage, and this is probably a naive question but @dlongley is there any alternate source to obtain that context so we can work towards getting our applications back online?
from rdflib-jsonld.
If you run this:
curl "https://schema.org/docs/jsonldcontext.jsonld"
You should get the latest schema.org context.
from rdflib-jsonld.
thanks again -- I see today on the schema.org mailing list that this change in the accept header was intentional, a consequence of blocking a DoS attack, which was likely those HTTP 500 errors I saw Monday night.
Dan Brickley writes:
We expect to replace the HTTP content negotiation with the use of a Link header as specified in the latest JSON-LD specs
i.e.
Link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
...with a corresponding to CORS too
I'm told there are at least 5 JSON-LD implementations that pass the test for this feature, see https://w3c.github.io/json-ld-api/tests/remote-doc-manifest#tla03
from rdflib-jsonld.
A more general solution may be to support Link headers [1, 2] for resolving the location of external contexts (when not available in a local cache). See also #85.
When parsing JSON-LD, redirect handling actually happens in rdflib in rdflib.parser.URLInputSource
.
The parser uses rdflib.parser.create_input_source()
(called from util.source_to_json
) to get an input source from a URL (such as the context location URL). That in turn relies on rdflib.parser.URLInputSource()
which returns an open stream. URLInputSource
uses urllib
to create the request and return the stream (see line 115). urllib
will internally handle redirects which worked previously, but does not now that schema.org is using Link headers.
I made a PR for supporting link headers (patch available at https://patch-diff.githubusercontent.com/raw/RDFLib/rdflib/pull/1125.patch ) for the specific case of JSON-LD parsing, though this would generally change the behavior of URLInputSource when a Link header is present for the json-ld
format. If that is undesirable then a more invasive approach could add an optional parameter indicating whether to follow link headers could be added to URLInputSource (which could be set when calling parse on remote context URLs) .
This change resolved schema.org context parsing for me, and also works with other context documents referenced through link headers.
[1] https://tools.ietf.org/html/rfc8288
[2] https://w3c.github.io/json-ld-api/#remote-document-and-context-retrieval
from rdflib-jsonld.
Related Issues (20)
- Parsing of @context string fails HOT 1
- JSON-LD data import adds trailing slashes to IRIs HOT 2
- RDFS Type to @type property HOT 3
- Graph formatted as json-ld contains all default namespaces as @context HOT 1
- jsonld with multiple datasets
- JSON-LD Star, YAML-LD Star HOT 1
- YAML-LD HOT 1
- v0.4.0 fails to parse what 0.5.0.dev0 can HOT 2
- Doesn't interpret newlines with python3 HOT 3
- Hierarchical JSON and @reverse
- Schema.org moving from HTTP Content Negotiation to JSON-LD 1.1 "Link:" header for context file HOT 9
- Aliased property serialized with namespace HOT 1
- Calling rdflib.tools.rdfpipe results in AttributeError: 'str' object has no attribute 'decode' HOT 1
- Compact serialization HOT 7
- Broken stdin (-) handling HOT 2
- Support JSON-LD Framing HOT 1
- Cannot parse JSON-LD document if the scheme of @base IRI is non-standard HOT 1
- On parsing, @included section is ignored unless the JSON-LD document is flattened
- Parse all the data in the json-ld file?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rdflib-jsonld.