GithubHelp home page GithubHelp logo

eclipse-rdf4j / rdf4j Goto Github PK

View Code? Open in Web Editor NEW
344.0 25.0 160.0 567.95 MB

Eclipse RDF4J: scalable RDF for Java

Home Page: https://rdf4j.org/

License: BSD 3-Clause "New" or "Revised" License

HTML 0.42% Java 95.64% Ruby 0.21% CSS 0.14% Shell 0.17% XSLT 0.78% TypeScript 0.31% JavaScript 2.32% Dockerfile 0.01%
semantic-web linked-data java rdf sparql shacl hacktoberfest

rdf4j's Introduction

Welcome to the Eclipse RDF4J repository

RDF4J

This is the main code repository for the Eclipse RDF4J project.

main status develop status Join the chat at https://gitter.im/eclipse/rdf4j

Visit the project website for news, documentation, and downloadable releases. For support questions, comments, and any ideas for improvements you'd like to discuss, please use our discussion forum. If you have found a bug or have a very specific feature/improvement request, you can also use our issue tracker to report it.

Installation and usage

For installation and usage instructions of the RDF4J Workbench and Server applications, see RDF4J Server and Workbench.

For installation and usage instructions of the RDF4J Java libaries, see Programming with RDF4J.

Building from source

RDF4J is a multi-module maven project. It can be compiled, tested, and installed with the usual maven lifecycle phases from the command line, for example:

  • mvn verify - compiles and runs all tests
  • mvn package - compiles, tests, and packages all modules
  • mvn install - compiles, tests, packages, and installs all artifacts in the local maven repository
  • mvn -Pquick install - compiles, packages and installs everything (skipping test execution)

These commands can be run from the project root to execute on the entire project or (if you're only interested in working with a particular module) from any module's subdirectory.

To build the full RDF4J project, including onejar and SDK files and full aggregated javadoc, from source, run:

 mvn -Passembly package

The SDK and onejar will be available in assembly/target. Individual module jars and wars will be in target/ in their respective modules.

Modern IDEs like Eclipse, IntelliJ IDEA, or Netbeans can of course also be used to build, test, and run (parts of) the project.

Keen to contribute?

We welcome contributions! Whether you have a new feature you want to add, or a bug you want to fix, or a bit of documentation you want to improve, it's all very welcome. Have a look in our issue tracker for any open problems, in particular the ones marked as good first issue or as help wanted. Or feel free to add your own new issue if what you have in mind is not there yet.

To get started on your contribution, please first read our Contributor guidelines.

The short version:

  1. Digitally sign the Eclipse Contributor Agreement (ECA), as follows:
  2. Create an issue in the issue tracker that describes your improvement, new feature, or bug fix - or if you're picking up an existing issue, comment on that issue that you intend to provide a solution for it.
  3. Fork the GitHub repository.
  4. Create a new branch (starting from main) for your changes. Name your branch like this: GH-1234-short-description-here where 1234 is the Github issue number.
  5. Make your changes on this branch. Apply the RDF4J code formatting guidelines. Don't forget to include unit tests.
  6. Run mvn verify from the project root to make sure all tests succeed (both your own new ones, and existing).
  7. Commit your changes into the branch. Make sure the commit author name and e-mail correspond to what you used to sign the ECA. Use meaningful commit messages. Reference the issue number in each commit message (for example "GH-276: added null check").
  8. Once your fix is complete, put it up for review by opening a Pull Request against the main branch in the central Github repository. If you have a lot of commits on your PR, make sure to squash your commits.

These steps are explained in more detail in the Contributor guidelines.

You can find more detailed information about our development and release processes in the Developer Workflow and Project Management documentation.

rdf4j's People

Contributors

abrokenjester avatar anqit avatar ansell avatar aschwarte10 avatar ate47 avatar barthanssens avatar damyan-ognyanov avatar denitsastoianova avatar dependabot[bot] avatar domkun avatar edwardsph avatar erikgb avatar fkleedorfer avatar frensjan avatar heshanjse avatar hmottestad avatar jervenbolleman avatar jgrzebyta avatar josephw avatar kenwenzel avatar knoan avatar manuelfiorelli avatar maxstolze avatar mirzov avatar nguyenm100 avatar patrickwyler avatar reckart avatar redcrusaderjr avatar tokovach avatar tpt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rdf4j's Issues

Support BIND in SparqlQueryRenderer

(Migrated from https://openrdf.atlassian.net/browse/SES-2206 )

SPARQL 1.1 constructs cannot be rendered via the SparqlQueryRenderer class.

Subtasks

  • support BIND
  • migrate metaphacts internal query renderer to RDF4J (#3012 )
  • Integrate experimental new renderer with existing renderer code (#3041)
  • support aggregates (#3000)
  • support subqueries (#3001)

Examples

For example If I parse this query with QueryParserUtil.parseQuery:

SELECT \* WHERE {
  ?s ?p ?o .
  BIND(uri("http://test-graph.com/") AS ?foo) .
}

And then render the ParsedQuery back out again with SPARQLQueryRenderer, it appears to lose the binding clause, returning the following string:

select ?g ?s ?p ?o ?g2
where {
  GRAPH ?g {
    ?s ?p ?o.
}}

Looking at renderTupleExpr in SparqlTupleExprRenderer I can see that the following lines are commented out:

    // aRenderer.mProjection = new ArrayList<ProjectionElemList>(mProjection);
    // aRenderer.mDistinct = mDistinct;
    // aRenderer.mReduced = mReduced;
    // aRenderer.mExtensions = new HashMap<String, ValueExpr>(mExtensions);
    // aRenderer.mOrdering = new ArrayList<OrderElem>(mOrdering);
    // aRenderer.mLimit = mLimit;
    // aRenderer.mOffset = mOffset;

With the following commented out in SPARQLQueryRenderer:

                    // SPARQL does not support this, its an artifact of copy and
                    // paste from the serql stuff
                    // aQuery.append(mRenderer.getExtensions().containsKey(aElem.getSourceName())
                    // ?
                    // mRenderer.renderValueExpr(mRenderer.getExtensions().get(aElem.getSourceName()))
                    // : "?"+aElem.getSourceName());
                    //
                    // if (!aElem.getSourceName().equals(aElem.getTargetName()) ||
                    // (mRenderer.getExtensions().containsKey(aElem.getTargetName())
                    // &&
                    // !mRenderer.getExtensions().containsKey(aElem.getSourceName())))
                    // {
                    // aQuery.append(" as ").append(mRenderer.getExtensions().containsKey(aElem.getTargetName())
                    // ?
                    // mRenderer.renderValueExpr(mRenderer.getExtensions().get(aElem.getTargetName()))
                    // : aElem.getTargetName());
                    // }

I believe these lines are commented out in error and that they should be commented back in in order to get be able to round trip queries from SPARQL text into the AST and back out again.

Other SPARQL 1.1 queries that fail include:

SELECT (COUNT (*) as ?c) WHERE { ?s ?p ?o }

is rendered as

select ?c where { ?s ?p ?o }

and the query

SELECT (?p as ?x) WHERE { ?s ?p ?o }

is rendered as

select ?p WHERE { ?s ?p ?o }

Clean up and simplify Javadoc

The current RDF4J Javadoc is massive and quite hard to find your way in. We should try and do some simplifications to make it easier to browse. Things to think of:

  • remove AST classes from class overview
  • remove "internal" packages from general overview
  • reorganize to have user-facing interfaces/package most prominently featured on main page

Allow SAIL to inspect/process unparsed query at prepareQuery stage

(Migrated from https://openrdf.atlassian.net/browse/SES-2162 )

The current SAIL interface assumes it gets passed a TupleExpr (that is, an algebra representation of a query), and currently this is handled by SailRepositoryConnection.prepareQuery, which passes the query string to RDF4J query parser and produces a TupleExpr.

However, some SAIL implementation prefer to do their own parsing and/or prefer not to base their query evaluation on RDF4J's algebra model. To facilitate this, we should do a pass-down of the query string at the prepare stage, which allows a SAIL to (optionally) process or wrap the query in such a way that the RDF4J query parser is bypassed and the SAIL implementation can opt to use a completely independent parser and query engine.

design new logo

We need a new logo (and house style) for the rdf4j project, to visually distinguish ourselves from the 'old' Sesame project. This issue can be used to propose and discuss designs.

File format autodetect does not work in 'Add RDF' screen

(Migrated from https://openrdf.atlassian.net/browse/SES-2185 )

When uploading a file through the "Add RDF" screen, the (autodetect) option is supposed to determine the correct format and select the right parser. However, this does not work. In the current system, for any format other than RDF/XML, file upload with autodetect results in an error "Content is not allowed in prolog. [line 1, column 1] "

Only after explicitly selecting the correct format from the dropdown does file upload work.

Results of aggregations in service calls are not included in the inner query projections

(Migrated from https://openrdf.atlassian.net/browse/SES-2189 )

This is simple to reproduce. I installed RDF4J into Tomcat and created a new in-memory repository called "test".
Add the following triples:

<http://example.org/a> <http://example.org/value> 1 .
<http://example.org/b> <http://example.org/value> 2 .

Running this query returns the value "3" as expected.

SELECT (SUM(?value) AS ?total) {
  ?s <http://example.org/value> ?value
}

Now, create a second in-memory repository called "test2".
Running this query from that repository returns a blank value.

SELECT ?total {
    SERVICE <http://localhost:8080/openrdf-sesame/repositories/test> {{
        SELECT (SUM(?value) AS ?total)  {
            ?s <http://example.org/value> ?value
        } 
    }}
}

By turning debug logging I was able to see the query being sent to "test".

[DEBUG] 2015-02-27 11:38:31,682 [http-bio-8080-exec-7] path info: /test
[DEBUG] 2015-02-27 11:38:31,682 [http-bio-8080-exec-7] repositoryID is 'test'
[DEBUG] 2015-02-27 11:38:31,682 [http-bio-8080-exec-7] queryLn="SPARQL"
[DEBUG] 2015-02-27 11:38:31,682 [http-bio-8080-exec-7] query="PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX sesame: http://www.openrdf.org/schema/sesame# PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX xsd: http://www.w3.org/2001/XMLSchema# PREFIX fn: http://www.w3.org/2005/xpath-functions# SELECT  ?s ?value WHERE { {
        SELECT (SUM(?value) AS ?total)  {
            ?s http:/example.org/value ?value
        } 
    } }"
[DEBUG] 2015-02-27 11:38:31,682 [http-bio-8080-exec-7] infer="true"

Scrolling to the right, you can see that although ?value is included in the projection, ?total from within the aggregation is not. As a workaround, I added an additional inner select to ensure ?total is projected:

SELECT ?total {
  SERVICE http://localhost:8080/openrdf-sesame/repositories/test {{
    SELECT ?total {  
      {
        SELECT (SUM(?value) AS ?total)  {
          ?s http:/example.org/value ?value
        }
      }
    }
  }}
}

Query results download: Provide query results page with a hidden form for sending long query text

(Migrated from https://openrdf.atlassian.net/browse/SES-2229 )

The fix for SES-1995 is less than ideal when dealing with a results page coming directly from the query page POSTing a long query (>~ 1k characters). It requires working around by saving the long query on the server.

However, the query text is actually present in the cookies along with the other parameters needed to specify the query. These cookies could be copied into a hidden form at page load, then the Download link would perform its request as a form POST, getting around the URL character limit.

SPARQL endpoint implementation should treat update sequences as atomic

The current SPARQL endpoint implementation handles update sequences by sending them down to the underlying Repository. Since at the level of SPARQL protocol no transactions are supported, this effectively means that transaction handling is left to the Repository API.

The Repository API handles SPARQL update sequences by treating each operation in the sequence as a separate update, which is conform the SPARQL 1.1 Update specification (section 3):

Implementations MUST ensure that the operations of a single request are
executed in a fashion that guarantees the same effects as executing them
sequentially in the order they appear in the request.

In effect the SPARQL endpoint implementation handles update sequence requests as several transactions. The SPARQL spec, however, also has the following soft requirement (see section 2.2):

SPARQL 1.1 Update requests are sequences of operations. Each request SHOULD
be treated atomically by a SPARQL 1.1 Update service. The term 'atomically'
means that a single request will result in either no effect or a complete
effect, regardless of the number of operations that may be present in the
request.

While the current implementation does not break the spec, it does deviate from this recommended pattern. To change this, we should add a flag to the RDF4J REST protocol that allows our service implementation to distinguish between requests coming from a SPARQL endpoint client, and requests coming from an RDF4J client. In the former case, the service can choose to explicitly start a transaction before executing the sequence, so that the sequence is treated as an atomic update.

Support parallel Java 7-compatible release

We are currently focusing on the Sesame 4 code base as the launch point for RDF4J. However there are several core users who require to stay at Java 7 for a while longer. We should consider bringing over the Sesame 2.9 code base to RDF4J to live alongside the main branch, so we can do parallel releases for those users who wish to stick to Java 7.

rename OpenRDFException

Possible candidates are RDFException or RDF4JException. I personally prefer the first since it's shorter. OpenRDFException should remain as a deprecated class for backward compatibility.

RDF4J POMs do not play nicely with gradle

(Migrated from https://openrdf.atlassian.net/browse/SES-2168 )

afaict, there's no way for gradle projects to pull down sesame artifacts from maven central.

I am admittedly still new to gradle, so i might have overlooked something obvious, but i think the fact that some dependencies are unversioned and others use variables, is problematic for gradle when trying to resolve that dependency.

If you look at [http://repo1.maven.org/maven2/org/openrdf/sesame/sesame-model/2.7.14/sesame-model-2.7.14.pom] you can see that junit has no scope or version, and that the sesame-util uses variable placeholders.

Trying to grab that artifact via
{code}
compile ("org.openrdf.sesame:sesame-model:2.7.14")
{code}

will yield:

{code}

Could not resolve org.openrdf.sesame:sesame-model:2.7.14.
Required by:
com.complexible.stardog.openrdf-utils:openrdf:2.2.4
Could not parse POM https://repo1.maven.org/maven2/org/openrdf/sesame/sesame-model/2.7.14/sesame-model-2.7.14.pom
> Unable to resolve version for dependency 'junit:junit:jar'
{code}

I'm not a maven guru either, but I thought these, while legal, are not recommended.

As an aside, this works fine using Ivy to resolve the exact same dependency, and I'm assuming it works fine with Maven. So I think only gradle users are affected.

I know Jeen is mucking about with the maven stuff atm, it would be nice if this could be resolved as well.

Set up CI server

Either (temporarily) using ci.rdf4j.net, or looking into using an Eclipse-hosted environment

Decide on version number for initial release

We need to make a decision on what version number to use for the initial RDF4J release. There are, roughly, three options:

  1. continue where Sesame left off: Sesame 4.1.0 -> RDF4J 4.1.0.
  2. jump a minor version after the last Sesame release: Sesame 4.1.0 -> RDF4J 4.2.0.
  3. reset. RDF4J 1.0.0.

Advantage of the first option is that it's a more 'gradual' transition. Potential downside is that it suggests it's not the first Eclipse RDF4J release.

Advantage of the second option is that it is more clear that there may be compatibility problems between the last Sesame release and the first RDF4J release.

Advantage of the last option is that we get to start fresh. Downside is that it's not obvious how this release relates to existing Sesame releases.

No matter what we choose, we will always need to provide accompanying upgrade notes anyway.

Rename datadirs

Sesame datadirs are by defaults stored in $APP_DIR/Aduna/OpenRDF Sesame or something along those lines. This needs to be modified to something simpler. A preference is to have a root dir $APP_DIR/RDF4J/ with subdirs for the various RDF4J applications: RDF4J/Server RDF4J/Workbench, etc.

In addition, we should provide a conversion method that allows users to migrate their existing data to the new dir structure. This should either be a separate script (so that users can choose to run it), or an automated one-time migration, with a preference for the former (an automated procedure can cause problems if the datadirs are sufficiently large).

  • Modify data dir structure in code
  • provide method to easily migrate existing datadir

stabilize build

We need to stabilize the build - there's several things fail after merging in the final sync with the old Sesame repo.

Review and edit Javadoc

The Javadoc still contains references to 'org.openrdf' and 'sesame' in many places. This needs to be reviewed and edited.

SPIN compliance tests are slow and unstable

The SPIN compliance tests severely slow down the build (just these tests take almost 45 minutes to run on our HIPP), and moreover they are unstable: in several builds the testOrderByQueriesAreInterruptable test intermittently fails.

We should temporarily disable these compliance tests from the normal build process and only (manually) execute them when changes are made to the SPIN modules.

Set up upload of artifacts to maven central

Current maven configuration still relies on old Sesame project settings for syncing with sonatype OSS (and from there syncing to maven central). This needs to be tweaked/reconfigured according to what Eclipse projects do for maven artifact deployment.

BIND with type errors result leads to cross joins instead of empty result

(Migrated from https://openrdf.atlassian.net/browse/SES-2250 )

When using a BIND variable in a pattern join, the result is a cross join of the dataset triple when the BIND expression raises a type error.

This query should expose the behavior on any store that contains blank nodes.
{code}
SELECT *
WHERE {
?s ?p ?o .
FILTER(isBlank(?o))
BIND (iri(?o) as ?s2)
?s2 ?p2 ?o2 .
} LIMIT 10
{code}

The join evaluation should normally conclude that both multisets are incompatible since ?s2 is unbound in the join's left argument so the query should normally return no result.

XMLDatatypeUtil.isValidValue() doesn't validate xsd:anyURI

(Migrated from https://openrdf.atlassian.net/browse/SES-2226 )

Example from taken from http://www.datypic.com/sc/xsd/t-xsd_anyURI.html

new java.net.URI("http://datypic.com#f% rag")

throws "java.net.URISyntaxException: Malformed escape pair at index 20: http://datypic.com#f% rag"

where as:

XMLDatatypeUtil.isValidValue("http://datypic.com#f% rag", XMLSchema.ANYURI)

returns true.

Looking at the source for isValidValue there is no case to validate XMLSchema.ANYURIs, is this deliberate or simply an omission?

Add a CONTRIBUTING.md file to display the guidelines when opening issues or pull requests

Eclipse recommends that contributors be shown or directed to the following text when attempting to make a contribution:

Before your contribution can be accepted by the project, you need to create and 
electronically sign the Eclipse Foundation Contributor License Agreement (CLA) and sign 
off on the Eclipse Foundation Certificate of Origin. 

For more information, please visit

http://wiki.eclipse.org/Development_Resources/Contributing_via_Git

This can be done for GitHub issues and pull requests by adding a file to the repository named either CONTRIBUTING or CONTRIBUTING.md:

https://help.github.com/articles/setting-guidelines-for-repository-contributors/

Fix test failures

Hudson build is currently failing with test failures. We need to get the build stabilized ASAP.

Update project documentation and website

The project documentation and website at http://rdf4j.org/ will need to be updated to reflect the changes from Sesame to RDF4J. In particular, we'll need:

  • remove logo
  • rename "(OpenRDF) Sesame" to "(Eclipse) RDF4J"
  • remove obsolete copyright notices
  • update documentation to use new package names, project names, etc.

possible error parsing long unicode escape sequences

(Migrated from SES-2161)

There's a query parse error with the following query:

{code}
insert data {
urn:alpha urn:beta """\U0001F61F""" .
}
{code}

I am fairly certain this is a valid query, from what I can grok of the spec, that unicode sequence is correct. ARQ also bombs out on this query though, which leaves me with some doubt.

Parser produces non-canonical integer when parsing math '+' expression

(Migrated from https://openrdf.atlassian.net/browse/SES-2234 )

This query:

{code}
PREFIX ex: ex:

ASK WHERE {
?this ex:score ?score .
FILTER (!(?score+5 != 0)) .
}
{code}

produces the following algebra expression:

{code}
Slice ( limit=1 )
Filter
Not
Compare (!=)
MathExpr (+)
Var (name=score)
ValueConstant (value="+5"^^http://www.w3.org/2001/XMLSchema#integer)
ValueConstant (value="0"^^http://www.w3.org/2001/XMLSchema#integer)
StatementPattern
Var (name=this)
Var (name=_const-313ecd0b-uri, value=ex:score, anonymous)
Var (name=score)
{code}

The value constant representing the integer 5 incorrectly has a '+' sign prepended - presumably because the parser incorrectly processes the + math operator as part of the integer value.

Although this causes no problems in normal operation of the SPARQL engine, it is an issue in work by [~pulquero] on a SPIN engine.

Should we rewrite @since tags?

Should we rewrite @SInCE tags in Javadoc to point to the new versioning scheme (or keep them but rewrite them with "sesame-2.7.0" or similar? Alternative could be to remove all of the @SInCE tags and start afresh.

Incorrect variable bindings with subqueries

(Migrated from https://openrdf.atlassian.net/browse/SES-2248 )

Hi Jeen,

I could reproduce the behaviour for [https://openrdf.atlassian.net/browse/SES-2099] with a much simpler query; you'll see that each result binds ct01 with a different bnode.


SELECT * WHERE {
    BIND (bnode() as ?ct01) 
    { SELECT ?s WHERE {
            ?s ?p ?o .
      }
      LIMIT 10
    }
}

If I'm not mistaken, the query should be equivalent with this one which actually works as expected :

SELECT * WHERE {
    BIND (bnode() as ?ct01) 
    ?s ?p ?o .
}
LIMIT 10

meaning the algebra should first create a SingetonSet then extend it with the BIND and only then do the join so the variable ct01 should be bound to the same bnode for each result of the subquery.

So it seems that evaluating the subquery first (which is indeed required by the recommendation) does not respect the evaluation or join orders of the preceding graph patterns.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.