GithubHelp home page GithubHelp logo

Comments (16)

LidaPetr avatar LidaPetr commented on May 27, 2024 1

I have exactly the same issue. Worked fine yesterday, and today got the same errors as you.

from rdflib-jsonld.

dlongley avatar dlongley commented on May 27, 2024 1

Note: Libraries/applications should be either caching or installing a version of the schema.org JSON-LD context and loading it locally. A language/ecosystem appropriate package manager could then be used to manage updates to the context. This would mitigate this problem and improve performance for users.

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024 1

Work-around:

I have an ugly fix for those who need this working quickly, I won't submit a PR because this isn't a fix, it's a hack that (in my case) works:

--- lib/python3.7/site-packages/rdflib_jsonld/context.py~	2020-05-21 13:57:18.086689949 -0400
+++ lib/python3.7/site-packages/rdflib_jsonld/context.py	2020-05-21 14:21:43.238345583 -0400
@@ -207,6 +207,8 @@
         for source in inputs:
             if isinstance(source, str):
                 source_url = urljoin(base, source)
+                if "/schema.org" in source_url:
+                    source_url = "jsonldcontext.jsonld"
                 if source_url in referenced_contexts:
                     raise errors.RECURSIVE_CONTEXT_INCLUSION
                 referenced_contexts.add(source_url)

save that as jsonfix.patch then (in linux) I do patch -p0 < jsonfix.patch from my base directory and my above short test, when I remove my (fake) context value, now works as expected. This requires the jsonldcontext.jsonld file to be placed in the current directory.

Improvements are very much appreciated

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024

it seems this issue started at 20:17.49 EDT May 18th and has broken all of our applications using the json-ld plugin. Is it possible that our repeated use of this plugin has caused us to be banned somewhere?

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024

as of 13:21.39 EDT May 19th, it appears to be back online, whatever it was, the very same code as above now produces a graph as expected.

So clearly whatever this is, perhaps I need to prevent this fetch by providing my own local copy?

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024

and my day just keeps getting worse ...

>>> data = '{"@context": "http://schema.org/", "@type": "AggregateRating", "ratingValue": 5, "reviewCount": 1, "bestRating": 5, "@id": "FOO#Rating"}'
>>> g = Graph().parse(data=data, format="json-ld", context="https://www.example.com")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "./lib/python3.7/site-packages/rdflib/graph.py", line 1043, in parse
    parser.parse(source, self, **args)
  File ".lib/python3.7/site-packages/rdflib_jsonld/parser.py", line 95, in parse
    to_rdf(data, conj_sink, base, context_data)
  File "./lib/python3.7/site-packages/rdflib_jsonld/parser.py", line 104, in to_rdf
    context.load(context_data)
  File "./lib/python3.7/site-packages/rdflib_jsonld/context.py", line 200, in load
    self._prep_sources(base, source, sources)
  File "./lib/python3.7/site-packages/rdflib_jsonld/context.py", line 213, in _prep_sources
    source = source_to_json(source_url)
  File "./lib/python3.7/site-packages/rdflib_jsonld/util.py", line 28, in source_to_json
    return json.load(StringIO(stream.read().decode('utf-8')))
  File "/usr/lib/python3.7/json/__init__.py", line 296, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

it could be blindness from fatigue, but I'm nearly certain those two lines would have worked before.

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024

it has to be that mystery file that the plugin secretly fetches, some sort of definition spec for json-ld that was lost in their Monday crash and is now corrupted?

I am using
rdflib==4.2.2
rdflib-jsonld==0.5.0
Python 3.7.3 (default, Oct 7 2019, 12:56:13)
[GCC 8.3.0] on linux (ubuntu 19.10)

but none of this information has changed since last week -- strace does not appear to show any fetch of a remote file (that I can see) but the HTTP 500 errors overnight Monday do seem to indicate otherwise?

here is my test script, I inserted json.loads to verify that the string is indeed json-compatible, but also to show that it isn't failing because of the rdflib-jsonld use of the json module; I also changed the context to a nonsense but valid url just in case example.com was triggering someting

from rdflib import Graph
from rdflib import plugins  # required for json-ld
import json

data = '{"@context": "http://schema.org/", "@type": "AggregateRating", "ratingValue": 5, "reviewCount": 1, "bestRating": 5, "@id": "FOO#Rating"}'
print(json.loads(data))
g = Graph().parse(data=data, format="json-ld", context="https://www.google.com")
for row in g:
    print(row)

This presently outputs the dict as expected, but followed by a failure in the json decoder.py

{'@context': 'http://schema.org/', '@type': 'AggregateRating', 'ratingValue': 5, 'reviewCount': 1, 'bestRating': 5, '@id': 'FOO#Rating'}
Traceback (most recent call last):
  File "./jsontest.py", line 9, in <module>
    g = Graph().parse(data=data, format="json-ld", context="https://www.google.com")
  File "./lib/python3.7/site-packages/rdflib/graph.py", line 1043, in parse
    parser.parse(source, self, **args)
  File "./lib/python3.7/site-packages/rdflib_jsonld/parser.py", line 95, in parse
    to_rdf(data, conj_sink, base, context_data)
  File "./lib/python3.7/site-packages/rdflib_jsonld/parser.py", line 104, in to_rdf
    context.load(context_data)
  File "./lib/python3.7/site-packages/rdflib_jsonld/context.py", line 200, in load
    self._prep_sources(base, source, sources)
  File "./lib/python3.7/site-packages/rdflib_jsonld/context.py", line 213, in _prep_sources
    source = source_to_json(source_url)
  File "./lib/python3.7/site-packages/rdflib_jsonld/util.py", line 28, in source_to_json
    return json.load(StringIO(stream.read().decode('utf-8')))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 9718: invalid continuation byte

position 9718??

from rdflib-jsonld.

hsolbrig avatar hsolbrig commented on May 27, 2024

This is an error on the schema.org site. See: https://tinyurl.com/kk7k5qt (A link to the JSON-LD playground) that uses schema.org as well. Has anyone reported this issue to them???

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024

@hsolbrig I'm curious, how do you know it is an error on the schema.org site? I don't know the internals of this thing very well (but apparently now is the time to learn) -- I couldn't find an easy means to contact json-ld.org, but I am in their IRC channel right now; there are 21 participants, but I fear these days IRC isn't first on anyone's list as a place to check

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024

dlongley at json-ld provides the following:

curl -v -H "accept: application/ld+json" "http://schema.org"

is returning nonsense

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024

They are aware: schemaorg/schemaorg#2578

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024

In a way I am thankful for their downtime overnight Monday because I was unaware that my code was hitting some service for every single json-ld parse job, and in some runs, I can be doing thousands in a very short time. So yes, a Conditional-GET might be in order, done during init, or done once on the first invocation and cached for the lifespan of the process, and if it fails, fall back to a local filesystem cached copy -- after years of using schema.org and rdflib-jsonld, I guess I had never tried to run one on a disconnected machine.

schemaorg/schemaorg posted a merge into master that roughly correlates with when the HTTP 500 errors stopped and the parsing errors began

update at 16:20 EDT: that curl test now returns HTML again (for a while it was returning binary)

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024

as we move into day 3 of this outage, and this is probably a naive question but @dlongley is there any alternate source to obtain that context so we can work towards getting our applications back online?

from rdflib-jsonld.

dlongley avatar dlongley commented on May 27, 2024

@teledyn,

If you run this:

curl "https://schema.org/docs/jsonldcontext.jsonld"

You should get the latest schema.org context.

from rdflib-jsonld.

teledyn avatar teledyn commented on May 27, 2024

thanks again -- I see today on the schema.org mailing list that this change in the accept header was intentional, a consequence of blocking a DoS attack, which was likely those HTTP 500 errors I saw Monday night.

Dan Brickley writes:

We expect to replace the HTTP content negotiation with the use of a Link header as specified in the latest JSON-LD specs
i.e.
Link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
...with a corresponding to CORS too
I'm told there are at least 5 JSON-LD implementations that pass the test for this feature, see https://w3c.github.io/json-ld-api/tests/remote-doc-manifest#tla03

from rdflib-jsonld.

datadavev avatar datadavev commented on May 27, 2024

A more general solution may be to support Link headers [1, 2] for resolving the location of external contexts (when not available in a local cache). See also #85.

When parsing JSON-LD, redirect handling actually happens in rdflib in rdflib.parser.URLInputSource.

The parser uses rdflib.parser.create_input_source() (called from util.source_to_json) to get an input source from a URL (such as the context location URL). That in turn relies on rdflib.parser.URLInputSource() which returns an open stream. URLInputSource uses urllib to create the request and return the stream (see line 115). urllib will internally handle redirects which worked previously, but does not now that schema.org is using Link headers.

I made a PR for supporting link headers (patch available at https://patch-diff.githubusercontent.com/raw/RDFLib/rdflib/pull/1125.patch ) for the specific case of JSON-LD parsing, though this would generally change the behavior of URLInputSource when a Link header is present for the json-ld format. If that is undesirable then a more invasive approach could add an optional parameter indicating whether to follow link headers could be added to URLInputSource (which could be set when calling parse on remote context URLs) .

This change resolved schema.org context parsing for me, and also works with other context documents referenced through link headers.

[1] https://tools.ietf.org/html/rfc8288
[2] https://w3c.github.io/json-ld-api/#remote-document-and-context-retrieval

from rdflib-jsonld.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.