GithubHelp home page GithubHelp logo

ruby-rdf / rdf Goto Github PK

View Code? Open in Web Editor NEW
383.0 383.0 98.0 6.71 MB

RDF.rb is a pure-Ruby library for working with Resource Description Framework (RDF) data.

Home Page: http://rubygems.org/gems/rdf

License: The Unlicense

Ruby 100.00%
graph linked-data rdf semantic-web

rdf's People

Contributors

abrisse avatar artob avatar bhuga avatar brixen avatar cbeer avatar cjcolvar avatar conorsheehan1 avatar cpence avatar danny avatar devwout avatar doriantaylor avatar dwbutler avatar fumi avatar gkellogg avatar janschill avatar jcoyne avatar jfieber avatar jgeiger avatar jperville avatar kna avatar l00mi avatar mistydemeo avatar mmn80 avatar nyarly avatar petervandenabeele avatar pezra avatar pius avatar tomjnixon avatar ujifgc avatar ursm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rdf's Issues

RDF::Writer#insert_graph error since RDF.rb 0.3.2

Hello,

Since I updated to rdf-0.3.2 when I run:
require 'rdf'
src = %{
http://rdf.rubyforge.org/RDF/Writer.html#insert_graph http://www.w3.org/1999/02/22-rdf-syntax-ns#label "Writer#insert_graph test" .
}

reader = RDF::Reader.for(:ntriples).new(src)
graph = RDF::Graph.new << reader

RDF::Writer.open("insert_graph.nt") do |writer|
    writer.insert_graph graph
end

I raises:

insert_graph.rb:11: protected method `insert_graph' called for #<RDF::NTriples::Writer:0x1020d2548> (NoMethodError)
  from /Library/Ruby/Gems/1.8/gems/rdf-0.3.2/lib/rdf/writer.rb:186:in `call'
  from /Library/Ruby/Gems/1.8/gems/rdf-0.3.2/lib/rdf/writer.rb:186:in `initialize'
  from /Library/Ruby/Gems/1.8/gems/rdf-0.3.2/lib/rdf/writer.rb:155:in `new'
  from /Library/Ruby/Gems/1.8/gems/rdf-0.3.2/lib/rdf/writer.rb:155:in `open'
  from /Library/Ruby/Gems/1.8/gems/rdf-0.3.2/lib/rdf/writer.rb:154:in `open'
  from insert_graph.rb:10

If I use the method #write_graph instead, it works as expected but, the source code (lib/rdf/writer.rb:284) says:

# @deprecated replace by `RDF::Writable#insert_graph`

Am I missing something?

Thanks!

Consider graph validation option for RDF.rb's in-memory repository

I just tracked down an issue on spira in which could have been found if we had a repository that performed validation before writing things down; a predicate was being saved as a string. It would be useful for testing if we had a version of RDF::Repository that performed input validation.

So I am thinking something like this:

RDF::Validating::Repository.new

or

RDF::Repository.new(:validate => true)

Whereupon:

RDF::Repository << RDF::Statement.new(RDF::DC.title, "a string", "another string")
#=> RDF::TypeError: Statement predicate must respond to #to_uri

If I implemented either of these, is that something you'd want to have available in core?

RuntimeError: can't modify frozen object on Ruby 1.9.2

I just did a local install of the latest checked in source (0.3.0.pre). A simple vocabulary expansion results in a "can't modify frozen object" error.

[rdf] irb
ruby-1.9.2-p0 > require 'rdf'
 => true 
ruby-1.9.2-p0 > RDF::FOAF.to_uri
RuntimeError: can't modify frozen object
from /Users/gregg/.rvm/gems/ruby-1.9.2-p0/gems/rdf-0.2.3/lib/rdf/util/cache.rb:58:in `define_finalizer'
from /Users/gregg/.rvm/gems/ruby-1.9.2-p0/gems/rdf-0.2.3/lib/rdf/util/cache.rb:58:in `define_finalizer!'
from /Users/gregg/.rvm/gems/ruby-1.9.2-p0/gems/rdf-0.2.3/lib/rdf/util/cache.rb:93:in `[]='
from /Users/gregg/.rvm/gems/ruby-1.9.2-p0/gems/rdf-0.2.3/lib/rdf/model/uri.rb:57:in `intern'
from /Users/gregg/.rvm/gems/ruby-1.9.2-p0/gems/rdf-0.2.3/lib/rdf/vocab.rb:93:in `to_uri'
from (irb):2
from /Users/gregg/.rvm/rubies/ruby-1.9.2-p0/bin/irb:17:in `<main>'

Interned URIs should be marked as frozen

Since interned RDF::URI instances are global to a Ruby process, being shared across different threads and varying use cases, they should be immutable in more than just principle.

The way to ensure this is for RDF::URI.intern to call #freeze whenever it constructs a new URI instance, which will then cause Ruby to throw a RuntimeError: can't modify frozen object exception if somebody inadvertently tries to modify a returned URI object.

RDF::Mutable does not open URIs

RDF::Mutable does not open URIs via load:

RDF::Repository.load('http://datagraph.org/jhacker/foaf.nt')
Errno::ENOENT: No such file or directory - http://datagraph.org/jhacker/foaf.nt
    from /opt/local/lib/ruby/gems/1.8/gems/rdf-0.1.1/lib/rdf/reader.rb:107:in `initialize'
    ...

Addressable ~> 2.1.2 does not allow 2.2.0

Addressable::URI 2.2.0 adds some important fixes to URI format checking. If another gem includes Addressable 2.2.0, RDF will fail when loading with the following:

RubyGem version error: addressable(2.2.0 not ~> 2.1.2) (Gem::LoadError)

N-Triples output escaped incorrectly on Ruby 1.9

The following works in 1.8 but not 1.9 (forgive the invalid ntriples as input):

require 'rdf'
s = RDF::NTriples.unserialize '<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jhūlā." '
RDF::NTriples.serialize(s)

1.8:

ben:rdf ben$ irb
>>     require 'rdf'
=> true
>>     s = RDF::NTriples.unserialize '<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jhūlā." '
=> #<RDF::Statement:0x90bbb8(<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jhūlā." .)>
>>     RDF::NTriples.serialize(s)
=> "<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jh\305\253l\304\201." .\n"

1.9:

ben:rdf ben$ irb1.9
irb(main):001:0>     require 'rdf'
=> true
irb(main):002:0>     s = RDF::NTriples.unserialize '<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jhūlā." '
=> #<RDF::Statement:0x93f260(<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jhūlā." .)>
irb(main):003:0>     RDF::NTriples.serialize(s)
=> "<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> \"Jhūlā.\" .\n"

Time to XSD.time mapping is ambiguous

Ruby's Time class can represent either a datetime or just a time by itself. Currently, however, RDF.rb treats Time instances as if they always straightforwardly mapped to the XSD.time datatype. This is clearly wrong, as the following demonstrates:

>> RDF::Literal.new(Time.parse("2010-12-31T12:34:56Z"))
=> #<RDF::Literal::Time:0x80f9f378("12:34:56Z"^^<http://www.w3.org/2001/XMLSchema#time>)>

We need additional logic in RDF::Literal.new to ensure we correctly map Time instances to the XSD.dateTime datatype when the object in question contains a date component as well.

Reader/Writer#prefix value should not be a URI

The current implementation of Reader/Writer#prefix takes an optional uri to associate with the prefix. In fact, this may not be a URI at all. The only requirement is that when the prefix value as attached to a suffix, that that be a URI. Consider these rules from RDF/XML, used for creating prefix mappings required for defining predicate relatinonships:

An XML namespace-qualified name (QName) has restrictions on the legal characters such that not all property URIs can be expressed
as these names. It is recommended that implementors of RDF serializers, in order to break a URI into a namespace name and a local
name, split it after the last XML non-NCName character, ensuring that the first character of the name is a Letter or '_'. If the
URI ends in a non-NCName character then throw a "this graph cannot be serialized in RDF/XML" exception or error.

One of the RDFa tests verifies that, without prefix mappings, that dc:title will be treated as a URI, not a CURIE. It is, in fact, a valid URI. Following the process outlined above, you come up with a prefix of mapping of "dc:", which, when applied to the suffix "title", re-generates the original URI "dc:title".

The change need to #prefix would be to just not cast the uri parameter as an RDF::URI, but just intern it as a string:

def prefix(name, uri = nil)
  name = name.to_s.empty? ? nil : (name.respond_to?(:to_sym) ? name.to_sym : name.to_s.to_sym)
  uri.nil? ? prefixes[name] : prefixes[name] = (uri.respond_to?(:to_sym) ? uri.to_sym : uri.to_s.to_sym)
end

RDF vocabulary

The RDF vocabulary is defined and usable but not actually documented.

Literal subclasses must ensure datatype is a URI

Consider the following:

RDF::Literal.new("10", :datatype => "http://www.w3.org/2001/XMLSchema#integer").datatype.inspect

Note that this is a string, and not a URI. This is because Literal.new does a case comparison by first typecasting the datatype to a URI, but not using that type-casted value in the instantiation of a subclass.

HTTP proxy support

open-uri has a :proxy option - we currently can't use rdf.rb for a client as their internal network uses a proxy to get out (yes, they're consuming their own data...).

More flexible literal implementation

The current implementation of RDF::Literal has some default handling for dates, floats, and so forth, but it's somewhat inflexible and not extensible. The system ought to provide a way for different XSD types to do different things with different Ruby classes, so that one could, for example, get an XSD.float as a Rational, or an XSD.XMLLiteral as a parsed Nokogiri object.

Non-linear performance curve in graph traversal

The attached code runs the same test three times, each time it uses a larger source file. The test consists of: create a new graph, load the source document into the graph, identify a list of concepts resources, query for the rdfs:label of each concept resource. The time taken for the last step grows out-of-proportion with the size of the input document.

Here's the output I get on my machine:

ian@rowan-15 $ ruby rdf_misc_tests.rb 
Loaded suite rdf_misc_tests
Started
Initializing with account-code.ttl
 ... parsing complete in 1.1s producing 4711 triples
 ... got code list root, now indexing 
 ... got 587 concepts to index in 0.1s
 ... collected names in 17.3s.
4241.37 triples/sec parsing, 5579.79 resources/sec query, collected 34.00 names/sec
.Initializing with programme-object-group-code.ttl
 ... parsing complete in 3.1s producing 15895 triples
 ... got code list root, now indexing 
 ... got 1985 concepts to index in 0.4s
 ... collected names in 207.1s.
5086.36 triples/sec parsing, 5476.99 resources/sec query, collected 9.59 names/sec
.Initializing with programme-object-code.ttl
 ... parsing complete in 16.7s producing 38855 triples
 ... got code list root, now indexing 
 ... got 4855 concepts to index in 0.9s
 ... collected names in 1286.2s.
2333.01 triples/sec parsing, 5188.34 resources/sec query, collected 3.77 names/sec
.

Finished in 1533.469101951 seconds.

3 tests, 0 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed

Note before running the test that the last step takes over 20 minutes. For reference, I'm using Ruby 1.9.1 on a four-core 64 bit linux machine with 8Gb of memory. Ruby version says:

ian@rowan-15 $ ruby -v
ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux]
~/workspace/coins/ruby/bugrep

I'm using the following version of RDF.rb:

ian@rowan-15 $ gem list --local | grep rdf
rdf (0.2.1)
rdf-raptor (0.4.0)
rdf_context (0.5.6)

Ah. Just realised that I can't attach a file to this issue report (unless I'm missing something on github). Code is here: http://iandickinson.me.uk/download/rdf-ruby-perftest.tar

Expensive URI#qname could cache vocabulary

Consider allowing a vocabulary to be assigned to a URI, such as might happen from uri = RDF::FOAF.name, which could have a side-effect of setting uri.vocab to RDF::FOAF. This would remove the O(N!) lookup of the URI's vocabulary. Also, a URI#vocab method would be useful in determining the assigned vocabulary of a given URI.

RDF::URI join method doesn't work for URIs ending with a hash

$ irb -rrdf
>> p = RDF::URI('http://www.w3.org/ns/rdfa#')
=> #<RDF::URI:0x810d2150(http://www.w3.org/ns/rdfa#)>
>> p.join('term')
=> #<RDF::URI:0x810d0670(http://www.w3.org/ns/rdfa/term)>

I would expect that the result would be:

http://www.w3.org/ns/rdfa#term

RDF literal escaping/unescaping

Consider using the String#rdf_escape and String#rdf_unescape monkey patches. They properly deal with going from UTF-8 to escaped ASCII and back, somewhat based on JSON utf8_to_json.

# coding: utf-8
require 'iconv'

class String
  #private
  # "Borrowed" from JSON utf8_to_json
  RDF_MAP = {
    "\x0" => '\u0000',
    "\x1" => '\u0001',
    "\x2" => '\u0002',
    "\x3" => '\u0003',
    "\x4" => '\u0004',
    "\x5" => '\u0005',
    "\x6" => '\u0006',
    "\x7" => '\u0007',
    "\b"  =>  '\b',
    "\t"  =>  '\t',
    "\n"  =>  '\n',
    "\xb" => '\u000B',
    "\f"  =>  '\f',
    "\r"  =>  '\r',
    "\xe" => '\u000E',
    "\xf" => '\u000F',
    "\x10" => '\u0010',
    "\x11" => '\u0011',
    "\x12" => '\u0012',
    "\x13" => '\u0013',
    "\x14" => '\u0014',
    "\x15" => '\u0015',
    "\x16" => '\u0016',
    "\x17" => '\u0017',
    "\x18" => '\u0018',
    "\x19" => '\u0019',
    "\x1a" => '\u001A',
    "\x1b" => '\u001B',
    "\x1c" => '\u001C',
    "\x1d" => '\u001D',
    "\x1e" => '\u001E',
    "\x1f" => '\u001F',
    '"'   =>  '\"',
    '\\'  =>  '\\\\',
    '/'   =>  '/',
  } # :nodoc:

  if defined?(::Encoding)
    # Funky way to define constant, but if parsed in 1.8 it generates an 'invalid regular expression' error otherwise
    eval %(ESCAPE_RE = %r([\u{80}-\u{10ffff}]))
  else
    ESCAPE_RE = %r(
                    [\xc2-\xdf][\x80-\xbf]    |
                    [\xe0-\xef][\x80-\xbf]{2} |
                    [\xf0-\xf4][\x80-\xbf]{3}
                  )nx
  end

  # Convert a UTF8 encoded Ruby string _string_ to an escaped string, encoded with
  # UTF16 big endian characters as \U????, and return it.
  #
  # \\:: Backslash
  # \':: Single quote
  # \":: Double quot
  # \n:: ASCII Linefeed
  # \r:: ASCII Carriage Return
  # \t:: ASCCII Horizontal Tab
  # \uhhhh:: character in BMP with Unicode value U+hhhh
  # \U00hhhhhh:: character in plane 1-16 with Unicode value U+hhhhhh
  def rdf_escape
    string = self + '' # XXX workaround: avoid buffer sharing
    string.gsub!(/["\\\/\x0-\x1f]/) { RDF_MAP[$&] }
    if defined?(::Encoding)
      string.force_encoding(Encoding::UTF_8)
      string.gsub!(ESCAPE_RE) { |c|
                      s = c.dump.sub(/\"\\u\{(.+)\}\"/, '\1').upcase
                      (s.length <= 4 ? "\\u0000"[0,6-s.length] : "\\U00000000"[0,10-s.length]) + s
                    }
      string.force_encoding(Encoding::ASCII_8BIT)
    else
      string.gsub!(ESCAPE_RE) { |c|
                      s = Iconv.new('utf-16be', 'utf-8').iconv(c).unpack('H*').first.upcase
                      "\\u" + s
                    }
    end
    string
  end

  # Unescape characters in strings.
  RDF_UNESCAPE_MAP = Hash.new { |h, k| h[k] = k.chr }
  RDF_UNESCAPE_MAP.update({
    ?"  => '"',
    ?\\ => '\\',
    ?/  => '/',
    ?b  => "\b",
    ?f  => "\f",
    ?n  => "\n",
    ?r  => "\r",
    ?t  => "\t",
    ?u  => nil, 
  })

  if defined?(::Encoding)
    UNESCAPE_RE = %r(
      (?:\\[\\bfnrt"/])   # Escaped control characters, " and /
      |(?:\\U00\h{6})     # 6 byte escaped Unicode
      |(?:\\u\h{4})       # 4 byte escaped Unicode
    )x
  else
    UNESCAPE_RE = %r((?:\\[\\bfnrt"/]|(?:\\u(?:[A-Fa-f\d]{4}))+|\\[\x20-\xff]))n
  end

  # Reverse operation of escape
  # From JSON parser
  def rdf_unescape
    return '' if self.empty?
    string = self.gsub(UNESCAPE_RE) do |c|
      case c[1,1]
      when 'U'
        raise RdfException, "Long Unicode escapes no supported in Ruby 1.8" unless defined?(::Encoding)
        eval(c.sub(/\\U00(\h+)/, '"\u{\1}"'))
      when 'u'
        bytes = [c[2, 2].to_i(16), c[4, 2].to_i(16)]
        Iconv.new('utf-8', 'utf-16').iconv(bytes.pack("C*"))
      else
        RDF_UNESCAPE_MAP[c[1]]
      end
    end
    string.force_encoding(Encoding::UTF_8) if defined?(::Encoding)
    string
  rescue Iconv::Failure => e
    raise RdfException, "Caught #{e.class}: #{e}"
  end
end

Please pass :base_uri to readers

Please can you pass the URI being loaded as :base_uri to readers, so that it is possible to write:

graph = RDF::Graph.load('http://rdfa.digitalbazaar.com/test-suite/test-cases/xhtml1/0001.xhtml')

Instead of:

graph = RDF::Graph.load('http://rdfa.digitalbazaar.com/test-suite/test-cases/xhtml1/0001.xhtml', :base_uri => 'http://rdfa.digitalbazaar.com/test-suite/test-cases/xhtml1/0001.xhtml')

RDF::Literal equality for non-canonical literals intended?

This is the current behavior for non-canonical literals in HEAD:

irb(main):024:0* x = RDF::Literal.new("001", :datatype => RDF::XSD.integer)
=> #<RDF::Literal::Integer:0xb97094("001"^^<http://www.w3.org/2001/XMLSchema#integer>)>
irb(main):025:0> y = x.canonicalize
=> #<RDF::Literal::Integer:0xb96356("1"^^<http://www.w3.org/2001/XMLSchema#integer>)>
irb(main):026:0> y == x
=> true
irb(main):027:0> y.eql? x
=> true

Is this intended? I realized while doing the canonicalize option for rdf-isomorphic that this is the behavior, but this would mean it's not needed.

Vocabulary.new does not allow vocabulary to be enumerated

Create a new ad-hoc vocabulary such as the following:

foo = RDF::Vocabulary.new("http://foo.com#")

Running Vocabulary.each(&:to_s) should return the newly created vocabulary. This is necessary if you want to be able to use it for URI#qname, for example. Note that if you name the anonymous class, such as

RDF::FOO = Class.new(Vocabulary.new("http://foo.com#"))

It will be enumerated. Perhaps either have a #name= method, or some other way to assign the ad-hoc vocabulary a name. Borrowing from ActiveSupport#constantize:

"RDF::FOO".constantize = Class.new(Vocabulary.new("http://foo.com#"))

RDF::NTriples::Writer#format_uri should escape value

Just as literals must be escaped to be represented as valid RDF strings, URIs must also be escaped.

Consider making the following change:

def format_uri(uri, options = {})
  "<%s>" % escaped(uri_for(uri))
end

Here are specs I've used:

describe "utf-8 escaped" do
  {
    %(http://a/D%C3%BCrst)                => %(<http://a/D%C3%BCrst>),
    %(http://a/D\u00FCrst)                => %(<http://a/D\\u00FCrst>),
    %(http://b/Dürst)                     => %(<http://b/D\\u00FCrst>),
    %(http://a/\u{15678}another)          => %(<http://a/\\U00015678another>),
  }.each_pair do |uri, dump|
    it "should dump #{uri} as #{dump}" do
      RDF::URI.new(uri).to_ntriples.should == dump
    end
  end
end

Enable Ruby-idiomatic aliases for camelCased property names

Instead of contaminating our Ruby code with camelCased monstrosities such as:

FOAF.firstName  #=> RDF::URI("http://xmlns.com/foaf/0.1/firstName")
RDFS.seeAlso    #=> RDF::URI("http://www.w3.org/2000/01/rdf-schema#seeAlso")  
OWL.sameAs      #=> RDF::URI("http://www.w3.org/2002/07/owl#sameAs")
XSD.dateTime    #=> RDF::URI("http://www.w3.org/2001/XMLSchema#dateTime")

...we ought to be able to stick with Ruby conventions and say:

FOAF.first_name #=> RDF::URI("http://xmlns.com/foaf/0.1/firstName") 
RDFS.see_also   #=> RDF::URI("http://www.w3.org/2000/01/rdf-schema#seeAlso")
OWL.same_as     #=> RDF::URI("http://www.w3.org/2002/07/owl#sameAs")
XSD.date_time   #=> RDF::URI("http://www.w3.org/2001/XMLSchema#dateTime")

There's no reason we can't transparently support both naming conventions.

XSD.string is a curious special case

The recent round of RDF::Literal updates left XSD.string in a strange place. Strings are an implicit default type. Thus, currently, RDF::Literal handles language directly, which shouldn't be the case, as it's only defined on strings.

I'd like to factor out Strings into their own RDF::Literal::String class, and further, to return for the Ruby version of the literal not an instance of String but of a subclass thereof, which contains language data. This will make round-tripping easier and let me cleanly solve Spira issue 15 at http://github.com/datagraph/spira/issues/#issue/15.

If I do this, will you merge it, or is there a reason that Strings are the way they are?

Using the RDF::RDF vocabulary

I'm having a problem accessing the RDF::RDF vocabulary. The following program fails:

require 'rdf'
puts "#{RDF::RDF.first}"

with:

ian@rowan-15 $ ruby rdf-ns-2.rb
rdf-ns-2.rb:4:in `': uninitialized constant RDF::RDF (NameError)

I think this is because the autoload isn't being triggered for RDF::RDF. If I manually force a load of the RDF vocabulary:

require 'rdf'
require 'rdf/vocab/rdf'

puts "#{RDF::RDF.first}"

then other things break:

ian@rowan-15 $ ruby rdf-ns-2.rb
/var/lib/gems/1.9.1/gems/rdf-0.2.1/lib/rdf/vocab.rb:83: warning: toplevel constant URI referenced by RDF::RDF::URI
/var/lib/gems/1.9.1/gems/rdf-0.2.1/lib/rdf/vocab.rb:83:in `[]': undefined method `intern' for URI:Module (NoMethodError)
    from /var/lib/gems/1.9.1/gems/rdf-0.2.1/lib/rdf/vocab.rb:74:in `block in property'
    from rdf-ns-2.rb:4:in `'

I'm pretty sure I'm doing something wrong, but for the time being I've resorted to defining my own RDF Namespace object, so avoid having to touch RDF::RDF.

Inconsistent handling of context (quads)

Take this RDF_Mutable spec:

it "should not insert a statement twice" do
  @repository.insert(@statements.first)
  @repository.insert(@statements.first)
  @repository.count.should == 1
end

That is fine and good. But if I alter the the second insert to by adding (or changing) the context of the Statement object, I would expect @repository.count.should == 2. Yes, it is the same s-p-o, but in two different contexts. But with the RDF::Repository base implementation, the answer is still 1. Drilling down, that is because the == operator for the Statement objects throws away the context.

There are a variety of fixes for this, and some of them are certainly wrong, so I combed through RDF.rb to pick out behaviors of note around context handling and offer them up here with my thought on what a correct fix would be.

First off, Statement objects behaves explicitly as a triple with these methods:

  • ==
  • []
  • to_a, :to_ary
  • to_hash

And they behaves as quad with these methods closely related to those above:

  • eql?
  • ===
  • []=
  • to_s

I gather from the rdf-spec that the equality methods are intentional as they are, though I think I disagree with their current behavior. I think a Statement should always be treated as a quad, and refine the meaning of the context bit. I see two conflated API uses of the context: I have a context, or a I don't have a context versus I don't care about the context. The current behavior of the == method is a problem because it injects the I-don't-care semantics into places where the I-do-or-I-don't-have-a-context needs to be faithfully preserved, such as adding the same s-p-o into two different contexts of a RDF::Repository. The I-don't-care cases shows up mostly in query sorts APIs, such as Enumerable.has_statement? and should be intentionally handled there.

My proposal would be to move all the Statement methods listed above under the triple-like behavior to be quad like, and introduce the default context value of a boolean false for statements with no defined context, and leave the explicit value of nil for the I-don't-care case to be consistent with use of nil as a wildcard for s-p-o in various other query-oriented parts of the API. I'm fairly certain that will break some existing downstream things, so I'm putting this out for feedback and counter proposals.

So, on to some specific observations...

RDF::Mutable

Mutable.insert --- Rejects statements for which Statement.valid? is false. Valid admits statements without a context, which conspires to create problem with Mutable.delete.

Mutable.delete --- Context is currently treated as a wildcard if not supplied. The problem: A statement without a context is valid to insert, but you cannot isolate it to delete it without also taking the same triple out of other contexts. If statements with no context have a distinct value for the context, say the boolean false, they could be distinguished from an explicit "don't care" value of nil.

Mutable.update --- Implies a delete, so must behave consistently. Current behavior tosses the context on the delete, which is certainly a bug.

RDF::Enumerable

Enumerable.has_statement? --- The base class implementation is Enumeration.include? so the meaning is dictated by the == method of Statement, which currently discards the context. It behaves the same as Enumerable.has_triple?, which is not what I'd expect if I supply a Statement with an explicit context. Like Mutable.delete, this method should be able to verify both the existence of a statement with specific context, and a triple with no context (context == false), and with an explicit nil context, behave like a wildcard.

Enumerable.triples, Enumerable.each_triple -- If we cast away the context, the same triple may appear more than once. Is that a problem?

RDF::Graph --- The has_statement?, insert_statement and delete_statement implementations all depend on the Statement.== method, which throws away the context. This happens to work for Graph because the context is coerced to the same value all statements going in, so they would match if == was a quad match.

RDF::Repository --- Like Graph, the base class implementation depends on the Statement.== method and makes the latent bugs in Graph actual bugs.

Repository.has_statement? --- See Enumerable.has_statement.

Repository.insert_statement --- The duplicate check discards the context, so only one context can contain a given triple, which is a bug (and, incidentally, what lead me into investigating all this).

Feedback welcome.

Add dump method to RDF::Enumerable

What do you think about adding the following method to RDF::Enumerable? Makes it super easy to serialise something...

def dump(args)
  RDF::Writer.for(*args).dump(self)
end

XMLLiteral canonicalization

XMLLiterals need to be treated differently than other literals. In particular, it is necessary for XML and RDF readers to add namespace definitions to XMLLiterals. Also, equivalence tests look for two semantically equivalent XMLLiterals that are textually different to be equivalent; this is best handled by canonicalizing XMLLiterals.

Requirements are defined more specifically for RDFa [1], but should apply to all readers. Many tests look for equivalence of XMLLiterals that are defined somewhat differently, so the real thing to do is to perform an exclusive canonicalization [2]. See also in RDF Concepts [3].

In rdf-rdfxml this is handled incompletely by transferring namespaces and performing a partial re-write of the XML. See Literal.xmlliteral in rdf-rdfxml. A more complete solution would involve using the c14n module from libXML2, not usable directly through standard ruby bindings (is implemented at [4]).

RdfConcept deals with this by performing a partial transformation with namespace transfer and minimal rewriting and putting the burden in the literal comparison (which could be done in ref-isomorphic) by turning each XML Literal into a hash using ActiveSupport::XmlMini.parse and doing hash comparison.

[1] http://www.w3.org/TR/rdfa-core/#s_xml_literals
[2] http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/
[3] http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral
[4] http://rubygems.org/gems/coupa-libxml-ruby

Make it easier to enumerate serialisers

Hello,

Please can you make it easier to enumerate the available serialisers. It is currently quite difficult to get the name, extensions and mime-type for each of the serialisers.

<link rel="alternate" type="application/rdf+xml" href="http://dbpedia.org/data/Oxford.rdf" title="Structured Descriptor Document (RDF/XML format)" />
<link rel="alternate" type="text/rdf+n3" href="http://dbpedia.org/data/Oxford.n3" title="Structured Descriptor Document (N3/Turtle format)" />
<link rel="alternate" type="application/json+rdf" href="http://dbpedia.org/data/Oxford.jrdf" title="Structured Descriptor Document (RDF/JSON format)" />
<link rel="alternate" type="application/json" href="http://dbpedia.org/data/Oxford.json" title="Structured Descriptor Document (RDF/JSON format)" />

It would be great to be able to do this:
>> f = RDF::Format.for(:ntriples)
=> RDF::NTriples::Format
>> f.name
=> "N-Triples"
>> f.content_types.first
=> "text/plain"
>> f.file_extensions.first
=> "nt"

nick.

Implement RDF::List support

Support for the other RDF collection types can wait until someone actually needs them, but RDF::List is pretty crucial. Dealing with rdf:List structures in the form of blank nodes is just painful.

We laid the groundwork for collection support earlier in ensuring that we always first check that an object responds to #each_statement before we check for #each, which becomes important with containers that return non-statements from #each. Let's build from there.

Literals should allow for validation and normalization

RDF places limitations on the lexical value of typed literals [1]. Values must belong the lexical space of the relevant datatype. XML Schema defines the value space of various primitive datatype [2].

RDF::Literal should implement a #valid? method to verify the validity of typed literals.

Specs for various different datatypes are implemented in RdfContext, the relevant mapping information is included here.

xsd:decimal:

  "1"                              => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "-1"                             => %("-1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "1."                             => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "1.0"                            => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "1.00"                           => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "+001.00"                        => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "123.456"                        => %("123.456"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.345"                          => %("2.345"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "1.000000000"                    => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.3"                            => %("2.3"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.234000005"                    => %("2.234000005"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.2340000000000005"             => %("2.2340000000000005"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.23400000000000005"            => %("2.234"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.23400000000000000000005"      => %("2.234"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "1.2345678901234567890123457890" => %("1.2345678901234567"^^<http://www.w3.org/2001/XMLSchema#decimal>),

xsd:boolean

  "true"  => %("true"^^<http://www.w3.org/2001/XMLSchema#boolean>),
  "false" => %("false"^^<http://www.w3.org/2001/XMLSchema#boolean>),
  "tRuE"  => %("true"^^<http://www.w3.org/2001/XMLSchema#boolean>),
  "FaLsE" => %("false"^^<http://www.w3.org/2001/XMLSchema#boolean>),
  "1"     => %("true"^^<http://www.w3.org/2001/XMLSchema#boolean>),
  "0"     => %("false"^^<http://www.w3.org/2001/XMLSchema#boolean>),

xsd:integer

  "01" => %("1"^^<http://www.w3.org/2001/XMLSchema#integer>),
  "1"  => %("1"^^<http://www.w3.org/2001/XMLSchema#integer>),
  "-1" => %("-1"^^<http://www.w3.org/2001/XMLSchema#integer>),
  "+1" => %("1"^^<http://www.w3.org/2001/XMLSchema#integer>),

xsd:double

  "1"         => %("1.0E0"^^<http://www.w3.org/2001/XMLSchema#double>),
  "-1"        => %("-1.0E0"^^<http://www.w3.org/2001/XMLSchema#double>),
  "+01.000"   => %("1.0E0"^^<http://www.w3.org/2001/XMLSchema#double>),
  "1."        => %("1.0E0"^^<http://www.w3.org/2001/XMLSchema#double>),
  "1.0"       => %("1.0E0"^^<http://www.w3.org/2001/XMLSchema#double>),
  "123.456"   => %("1.23456E2"^^<http://www.w3.org/2001/XMLSchema#double>),
  "1.0e+1"    => %("1.0E1"^^<http://www.w3.org/2001/XMLSchema#double>),
  "1.0e-10"   => %("1.0E-10"^^<http://www.w3.org/2001/XMLSchema#double>),
  "123.456e4" => %("1.23456E6"^^<http://www.w3.org/2001/XMLSchema#double>),

xsd:date, xsd:dateTime and xsd:Time are implemented as follows:

    contents.is_a?(Time) ? contents.strftime("%H:%M:%S%Z").sub(/\+00:00|UTC/, "Z") : contents.to_s
    contents.is_a?(DateTime) ? contents.strftime("%Y-%m-%dT%H:%M:%S%Z").sub(/\+00:00|UTC/, "Z") : contents.to_s
    contents.is_a?(Date) ? contents.strftime("%Y-%m-%d%Z").sub(/\+00:00|UTC/, "Z") : contents.to_s

RdfContext also implements a Duration class that transforms integer milliseconds and floating point seconds into XSD format: [+1]PYYYYMMDDTHHMMSS.MMM

[1] http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
[2] http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#built-in-primitive-datatypesg

N-Triples serializer sometimes serializes nodes invalidly

The N-Triples spec says that a node is identifed as '_:' name, where name is [A-Za-z][A-Za-z0-9]*. However, on ruby 1.8.7 from a recent Ubuntu distro, Node.new creates identifiers with a dash in them, which the N-Triples serializer incorrectly passes on to an output file, e.g.:

_:g-605660708 <http://www.w3.org/2000/01/rdf-schema#label> "Movie Tickets" .

This is kinda nasty, since rapper will reject them, thus breaking any serialization to other formats, too.

This is RDF.rb 0.2.0.1.

private method `puts' called for "spec/data/output.nt":String (NoMethodError)

When i run

require 'rdf'

graph = RDF::Graph.new

s = RDF::URI.new("http://gemcutter.org/gems/rdf")
p = RDF::DC.creator
o = RDF::URI.new("http://ar.to/#self")

graph << RDF::Statement.new(s, p, o)

graph.each do |elem|
  puts elem.inspect
end

RDF::Writer.for(:ntriples).new("spec/data/output.nt") do |writer|
  graph.each_statement do |statement|
    writer << statement
  end
end

i got this

c:/ruby/lib/ruby/gems/1.8/gems/rdf-0.0.9/lib/rdf/writer.rb:248:in `puts': private method `puts' called for "spec/data/output.nt":String (NoMethodError)

URI#join and normalization issues

URI joining and normalization is not well documented, but can be inferred from various W3C tests. Best described in RFC3986 section 5.2 [1]. Much of this is handled by Addressable::URI#join

The following specs were created when developing RdfContext to ensure proper normalization of joined URIs:

describe "normalization" do
  {
    %w(http://foo ) =>  "http://foo/",
    %w(http://foo a) => "http://foo/a",
    %w(http://foo /a) => "http://foo/a",
    %w(http://foo #a) => "http://foo/#a",

    %w(http://foo/ ) =>  "http://foo/",
    %w(http://foo/ a) => "http://foo/a",
    %w(http://foo/ /a) => "http://foo/a",
    %w(http://foo/ #a) => "http://foo/#a",

    %w(http://foo# ) =>  "http://foo/", # Special case for Addressable
    %w(http://foo# a) => "http://foo/a",
    %w(http://foo# /a) => "http://foo/a",
    %w(http://foo# #a) => "http://foo/#a",

    %w(http://foo/bar ) =>  "http://foo/bar",
    %w(http://foo/bar a) => "http://foo/a",
    %w(http://foo/bar /a) => "http://foo/a",
    %w(http://foo/bar #a) => "http://foo/bar#a",

    %w(http://foo/bar/ ) =>  "http://foo/bar/",
    %w(http://foo/bar/ a) => "http://foo/bar/a",
    %w(http://foo/bar/ /a) => "http://foo/a",
    %w(http://foo/bar/ #a) => "http://foo/bar/#a",

    %w(http://foo/bar# ) =>  "http://foo/bar",
    %w(http://foo/bar# a) => "http://foo/a",
    %w(http://foo/bar# /a) => "http://foo/a",
    %w(http://foo/bar# #a) => "http://foo/bar#a",

    %w(http://foo/bar# #D%C3%BCrst) => "http://foo/bar#D%C3%BCrst",
    %w(http://foo/bar# #Dürst) => "http://foo/bar#D%C3%BCrst",
  }.each_pair do |input, result|
    it "should create <#{result}> from <#{input[0]}> and '#{input[1]}'" do
      RDF::URI.new(input[0]).join(input[1].to_s).normalize.to_s.should == result
    end
  end

Note that rules for URIs are different than rules for namespace declarations. A URI can/should be canonicalized (e.g. http://foo.com => http://foo.com/) but a namespace should not (e.g., @Prefix foo: http://foo.com#. foo:a foo:b foo:c. => http://foo.com#a http://foo.com#b http://foo.com#c).

[1] http://tools.ietf.org/html/rfc3986#page-30
W3C rdfcore xmlbase tests: http://www.w3.org/2000/10/rdf-tests/rdfcore/xmlbase/

running a sparql query on a sesame based rdf store

Hi,

I am trying to run a basic sparql query on a sesame based rdf store.

I can connect to an RDF store on sesame and print out all of the results, but that's about it. Having some challenges with the documentation for doing more advanced stuff (and pretty new to ruby, but not coding)

Here is my simple query:

#SELECT ?title
#WHERE
#{
#  <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title .
#} 

So far I have the following:

puts "Trying a different method for test"
  urlTest = "http://localhost:8080/openrdf-sesame/repositories/test" 

the above works, but when I append the below phrase to the above I get nothing.

not sure if you're supposed to do it this way anyways

?query=SELECT+%3Ftitle+WHERE+{+http://example.org/book/book1+http://purl.org/dc/elements/1.1/title+%3Ftitle+.+}"

repositoryTest = RDF::Sesame::Repository.new(urlTest)
repositoryTest.each {|x| puts x} #(&block)
puts "run a query:"

not sure if this is the right way to set up a query

queryTest = RDF::Query.new( urlTest ) 
puts "New query instantiated"
query.select(:title)
puts "Title selected from query"
query.each {|x| puts x} #(&block)
puts "Query results printed out"

Thanks in advance,

Bryan

RDF::Literal, RDF::Graph do not support #anonymous?

gkellog noticed that RDF::Literal does not support #anonymous? or #unlabeled?, which are currently defined only on RDF::URI and RDF::Node.

I implemented #anonymous on RDF::Literal and RDF::Graph and sent a pull request. Not sure you'll agree with the semantics for Graph but I think it's what we want.

Problem in N-Triples writer

A serious bug slipped through to the 0.1.0 release's N-Triples writer implementation:

NameError: undefined local variable or method `node' for #<RDF::NTriples::Writer:0x1023c6628>
    rdf-0.1.0/lib/rdf/ntriples/writer.rb:36:in `format_node'
    rdf-0.1.0/lib/rdf/writer.rb:226:in `format_value'
    rdf-0.1.0/lib/rdf/ntriples/writer.rb:26:in `write_triple'
    rdf-0.1.0/lib/rdf/ntriples/writer.rb:26:in `map'
    rdf-0.1.0/lib/rdf/ntriples/writer.rb:26:in `write_triple'
    rdf-0.1.0/lib/rdf/writer.rb:199:in `write_statement'
    rdf-0.1.0/lib/rdf/writer.rb:163:in `<<'

This affects the serialization of any statements that contain blank nodes. Fix coming up ASAP.

Enumerators on Ruby 1.8/1.9

Prompted by a recent contribution to fix Ruby 1.9 enumerator compatibility (to be included in RDF.rb 0.1.8), I'm investigating what it will take to ensure that our use of enumerators is safe and compatible with all Ruby baseline versions that we wish to support (that is, 1.8.2+ and 1.9.x).

RDF::Literal#canonicalize should downcase the language tag

Literal#language is currently transformed into a constant. http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-plain-literal indicates that a plain literal may have a language tag as defined in RFC-3066, normalized to lower case. This includes tags with a primary-subtag and a subtag, such as "en-us". Changing options[:language].to_sym, dis-allows the this, because :en-us is not a Ruby symbol.

Also, note that normalization should force the language value to lower-case.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.