GithubHelp home page GithubHelp logo

pacer's Introduction

Pacer

Coverage Status Build Status

Pacer is a JRuby library that enables very expressive graph traversals.

It currently supports all of the major graph databases including OrientDB, Neo4j and Dex thanks to the Tinkerpop graphdb stack. Plus there's a very convenient in-memory graph called TinkerGraph which is part of Blueprints.

Pacer allows you to create, modify and traverse graphs using very fast and memory efficient stream processing thanks to the very cool Pipes library. That also means that almost all processing is done in pure Java, so when it comes the usual Ruby expressiveness vs. speed problem, you can have your cake and eat it too, it's very fast!

Documentation

Check out the Pacer docs for detailed explanations of many of Pacer's features.

Feel free to contribute to it, by submitting a pull-request to the gh-pages branch of this repo, or by opening issues.

Pacer is also documented with a comprehensive RSpec test suite and with a thorough YARD documentation. Dig in!

If you like, you can also use the documentation locally via

  gem install yard
  yard server

Installation

Install dependencies:

  • JRuby 1.7.x
    Recommended: Use RVM to install and manage all Ruby (and JRuby) versions on your machine.
  • RubyGems

Install Pacer:

gem install pacer

Graph Database Support

Pacer can work with any Blueprints-enabled graph, such as Neo4j, OrientDB, TinkerGraph and more.

See the docs for more details.

Example traversals

Friend recommendation algorithm expressed in basic traversal functions:

    friends = person.out_e(:friend).in_v(:type => 'person')
    friends.out_e(:friend).in_v(:type => 'person').except(friends).except(person).most_frequent(0...10)

or using Pacer's route extensions to create your own query methods:

    person.friends.friends.except(person.friends).except(person).most_frequent(0...10)

or to take it one step further:

    person.recommended_friends

You can use the Quick Start guide to get a feel for how Pacer queries (aka traversals) work.

Design Philosophy

I want Pacer and its ecosystem to become a repository for real implementations of ideas, best practices and techniques for streaming data manipulation. I've got lots of ideas that I'd like to add, and although Pacer seems to be quite rock solid right now -- and I am using it in limited production environments -- it is still in flux. If we find a better way to do something, we're going to do it that way even if that means breaking changes from one release to another.

Once Pacer matures further, a decision will be made to 'lock it down' at least a little more, hopefully there will be a community in place by then to help determine the right time for that to happen!

Pluggable Architecture

Pacer is meant to be extensible and is built on a very modular architecture. Nearly every chainable route method is actually implemented in an independent module that is plugged into the route only when it's in use. That allows great flexibility in adding methods to routes without clogging up the method namespace. It also makes it natural to make pacer plugin gems.

There are lots of examples of route extensions right inside Pacer. Have a look at the lib/pacer/filter, side_effect and transform folders to see the modules that are built into Pacer. They vary widely in complexity, so take a look around.

If you want to add a traversal technique to Pacer, you can fork Pacer and send me a pull request or just create your own pacer-<feature name> plugin! To see how to build your own Pacer plugin, see my example pacer-bloomfilter plugin which also has a readme file that goes into considerable detail on the process of creating plugins and provides some additional usage examples as well.

As a side note, don't worry about any magic happening behind the scenes to discover or automatically load pacer plugins, there is none of that! If you want to use a pacer plugin, treat it like any other gem, add it to your Gemfile (if that's what you use) and require the gem as normal if you need it.

Gremlin

If you're already familiar with Gremlin, please look at my Introducing Pacer post for a simple introduction and explanation of how Pacer is at once similar to and quite different from Gremlin, the project that inspired it. That post is a little out of date at this point since it refers to the original version of Gremlin. Groovy Gremlin is the latest version, inspired in turn by Pacer!

A great introduction to the underlying concept of pipes can be found in Marko Rodriguez' post On the Nature of Pipes

Test Coverage

I'm aiming for 100% test coverage in Pacer and am currently nearly there in the core classes, but there is a way to go with the filter, transform and side effect route modules. Open coverage/index.html to see the current state of test coverage. And of course contributions would be much apreciated.

Style Guide

Please follow Github's Ruby style guide when contributing to make your patches more likely to be accepted!

YourKit Profiler

One of the advantages of building Pacer on JRuby is that we can leverage the incredible tools that exist in the JVM ecosystem. YourKit is a tool that I found through glowing recommendation, and has been greatly useful in profiling the performance of Pacer.

YourKit is kindly supporting the Pacer open source project with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products:

YourKit Java Profiler and YourKit .NET Profiler.

pacer's People

Contributors

andrewvc avatar dcolebatch avatar dstuebe avatar jayniz avatar joeyfreund avatar mccarvell avatar pangloss avatar purbon avatar rafaelrosafu avatar shyndman avatar stevepereira avatar tazsingh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pacer's Issues

Loop step

HI,
I just discovered your library! Great work.
I just thought I'd ask how to implement the Loop step,?
Can't seem to find it/
Furthermore is it possible, to employ a simple flowrank pattern caclulation like so:

  g.v(12).as('x').out.groupCount(m){it.name}.loop('x'){c++ < 1000}.iterate()

This is on the gremlin page:
https://github.com/tinkerpop/gremlin/wiki/Flow-Rank-Pattern

Thanks again,
BHargav.

Documentation with examples

Hi Darrick

I'm thinking of throwing away my neo4j-embedded traversal api and recommend users to use pacer instead in the new neo4j/neo4j-core 3.0 version. It would be really nice if you had a tutorial with examples how to use pacer on a wiki page. I think it's a pity not more people use your excellent gem.

Cheers

Make the loop pipe actually build a chain internally

Because pipe instances sometimes are stateful, their behaviour within a data loop can be difficult to predict and counterintuitive. For instance, the range filter:

graph.v.loop { |v| v.out[0] }.while { :loop_and_emit }

It would make sense for this loop to be equivalent to:

graph.v.out[0] + graph.v.out[0].out[0] + graph.v.out[0].out[0].out[0] + ... 

but in fact it is equivalent to:

graph.v.out[0]

because the range filter is not reset on each loop. The elements that are looped are mixed with the starting elements and fed back through the same out[0] pipe fragment which allows only the first element through and then stops emitting.

Another problem with this approach is that the order of iteration is uncontrollable because all emitted elements are mixed together in one queue that competes with the elements that are initially fed into the loop.

Instead of that it would make more sense to dynamically generate a pipe fragment for each depth of recursion within the loop, so the same definition would produce numerous out[0] pipes, each with their own queue that can be iterated in either depth-first or breadth first order.

Because the block generating the pipe fragment would be called once for each depth, it seems sensible to feed that block some context such as its current depth and the previous pipeline fragments. That would also make it possible to do things like aggregation, something like these bits of pseudo code:

graph.v.
  loop { |v, context| v.out.aggregate(context.first? ? LinkedList.new : context.prev.pipe.getSideEffect) }.
  while { :emit_and_loop }.
  cap

graph.v.
  loop do |v, context|
    context.data = LinkedList.new
    if context.first?
      v.out.aggregate(context.data)
    else
      v.out.except(context.prev.data).aggregate(context.data)
    end
  end.
  while { |element, path, context| context.data.length < 10 }

graph.v.loop { |v, context| v.out.limit(context.depth * 2) }.while { true }

#subgraph fails if using .both_v

When traversing a route with a .both_v, it will fail to build the subgraph because 50% of the time only one vertex from the edge is visible in the route, so it fails when building that edge:

>> g.v.both_e.both_v.limit(2).paths
[#<TinkerGraph>, #<V[338]>, #<E[6347]:21-followed_by-338>, #<V[338]>]
[#<TinkerGraph>, #<V[338]>, #<E[6347]:21-followed_by-338>, #<V[21]>] 

The first path above does not actually contain V[21] so it can't build the 21-followed_by-338 edge.

Building a pacer adapter for graph db's

Hi,
I was wondering where to start if I wanted to build a pacer adapter for a graph database implementation based on Tinkerpop that is not currently supported with pacer?
I looked at the Titan-pacer gem and the neo4j-pacer gem, but I could'nt make much sense of what was going on. Could you point me in the right direction?
I need some kind of a map that I should follow, I have arango, accumulo and flux graph in mind.
Many Thanks,
Bhargav.

On OSX OpenJDK 1.7.0, Dex crashes the JVM

This happens regardless of JRuby version.

w:jruby-head:(develop↑⚡) pacer $ rspec spec/pacer/blueprints/dex_spec.rb 
Using JRuby 1.7.0.dev in 1.9.3 mode.
Testing all graphs.
No examples matched {:focus=>true}. Running all.
/Volumes/xn_dev/xn/pacer/lib/pacer/core/graph/graph_index_route.rb:36 warning: instance vars on non-persistent Java type Java::ComTinkerpopBlueprintsPgmImplsDex::DexGraph (http://wiki.jruby.org/Persistence)
..**** CRITICAL ERROR (SIGNAL NUM 10)
------- Begin of call stack ------
1   libSystem.B.dylib                   0x00007fff830401ba _sigtramp + 26
2   ???                                 0x00000000fffecc00 0x0 + 4294888448
-------- End of call stack -------
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x00007fff82fdfd7a, pid=58416, tid=140735077993664
#
# JRE version: 7.0
# Java VM: OpenJDK 64-Bit Server VM (23.0-b16 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libSystem.B.dylib+0xd7a]  mach_msg_trap+0xa
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Volumes/xn_dev/xn/pacer/hs_err_pid58416.log
.[thread 4594876416 also had an error]
^C**** CRITICAL ERROR (SIGNAL NUM 6)
------- Begin of call stack ------
1   libSystem.B.dylib                   0x00007fff830401ba _sigtramp + 26
2   ???                                 0x0000000000000000 0x0 + 0
3   libjvm.dylib                        0x0000000101b8c2db _ZN7Monitor28lock_without_safepoint_checkEv + 39
4   libjvm.dylib                        0x000000010196d50e _ZN9CodeCache18largest_free_blockEv + 66
5   libjvm.dylib                        0x000000010196d5e0 _ZN9CodeCache12print_boundsEP12outputStream + 58
6   libjvm.dylib                        0x0000000101c972bd _ZN7VMError6reportEP12outputStream + 3555
7   libjvm.dylib                        0x0000000101c97b2b _ZN7VMError14report_and_dieEv + 1483
8   libjvm.dylib                        0x0000000101bada5b JVM_handle_bsd_signal + 1047
9   libSystem.B.dylib                   0x00007fff830401ba _sigtramp + 26
10  ???                                 0x00007fff5fc45df8 0x0 + 140734800092664
11  CoreFoundation                      0x00007fff82428902 __CFRunLoopRun + 1698
12  CoreFoundation                      0x00007fff82427d8f CFRunLoopRunSpecific + 575
13  java                                0x000000011e7dd49c CreateExecutionEnvironment + 871
14  java                                0x000000011e7d7c8c JLI_Launch + 1952
15  java                                0x000000011e7dd800 main + 108
16  java                                0x000000011e7d74e4 start + 52
17  ???                                 0x0000000000000012 0x0 + 18
-------- End of call stack -------
Abort trap

Sections not working properly inside lookahead block

I was trying to create a friends-of-friends query (2nd degree friends only, excluding the source user and any of their 1st degree friends).

I was expecting the following query to work:

jruby-1.7.18 :052 > g.v(name: 'Alice').as(:alice)
    .out_e.in_v
    .out_e.in_v
    .is_not(:alice)
    .neg_lookahead {|u| u.out_e.in_v.is(:alice)}

#<V[2]> #<V[3]> #<V[1]>
Total: 3

 => #<GraphV -> V-Property(name=="Alice") -> V-Section(:alice) -> outE -> inV -> outE -> inV -> is_not(:alice) -> V-Future(#<V -> outE -> inV -> is(:alice) -> negate>)>

The result above is not what I expected to see, as the following query shows:

jruby-1.7.18 :053 > g.v(name: 'Alice').as(:alice)
    .out_e.in_v
    .out_e.in_v
    .is_not(:alice)
    .neg_lookahead {|u| u.out_e.in_v(name: 'Alice')}

#<V[2]>
Total: 1

 => #<GraphV -> V-Property(name=="Alice") -> V-Section(:alice) -> outE -> inV -> outE -> inV -> is_not(:alice) -> V-Future(#<V -> outE -> inV -> V-Property(name=="Alice") -> negate>)>

Neo4j specs failing

Neo4j specs are faiing in the person count method, probably cause of the blueprint usage.

Comment from the main pangloss:

Actually, that test is failing. This failure arose when we upgraded to the
latest versions of Blueprints and Neo4j a couple of months ago. I believe
the problem is that Neo's index doesn't get the count right before it's
committed but don't recall for certain

spec output

  1) neo4j indexed NeoSpec::Person count 
     Failure/Error: its(:count) { should == 2 }
       expected: 2
            got: 0 (using ==)
     # ./spec/pacer/blueprints/neo4j_spec.rb:43:in `NeoSpec'

JRuby warning about ambiguous Java method

Here is the message I am seeing:

/home/vagrant/.rvm/gems/jruby-1.7.18/gems/pacer-2.0.12-java/lib/pacer/core/route.rb:519 warning: ambiguous Java methods found, using setStarts(java.util.Iterator)

Here is a description of a possible solution.

Line 519 should probably change from

start.setStarts src

to

start.java_send :setStarts, [java.util.Iterator], src

Rescue from `iter.next` in route.rb's `each`

Hey,

we have some bigger traversals that break completely with this message:

NativeException: org.neo4j.graphdb.NotFoundException: Unable to
load one or more relationships from Node[532772]. This usually happens
when relationships are deleted by someone else just as we are about to
load them. Please try again.

So at this place, how about rescueing from a NotFoundException https://github.com/pangloss/pacer/blob/develop/lib/pacer/core/route.rb#L130 here and just don't yield that particular element, but continue with the each:

  def each
    iter = iterator
    iter = configure_iterator(iter)
    if block_given?
      while true
        begin
          yield iter.next
        rescue org.neo4j.graphdb.NotFoundException
        end
      end
    else
      iter
    end
  rescue Pacer::EmptyPipe, java.util.NoSuchElementException
    self
  end

The reason that this is not a pull request is that I suspect you designed it like this on purpose - I'd like to discuss how to handle this situation from a pacer-user's pov.

Calling first on route is slow

Seeing pathologic performance issue with calling first on a route.
Can anyone help me understand what I am seeing here?

Without First:

irb(main):007:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134063') }}
       user     system      total        real
   0.000000   0.000000   0.000000 (  0.002000)
=> [#<Benchmark::Tms:0x79c4f23b @stime=0.0, @label="", @cstime=0.0, @real=0.002000093460083008, @total=0.0, @cutime=0.0, @utime=0.0>]
irb(main):008:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134063') }}
       user     system      total        real
   0.010000   0.000000   0.010000 (  0.004000)
=> [#<Benchmark::Tms:0x9147ba2 @stime=0.0, @label="", @cstime=0.0, @real=0.003999948501586914, @total=0.00999999999999801, @cutime=0.0, @utime=0.00999999999999801>]
irb(main):009:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134063') }}
       user     system      total        real
   0.010000   0.000000   0.010000 (  0.003000)
=> [#<Benchmark::Tms:0x5341641d @stime=0.0, @label="", @cstime=0.0, @real=0.003000020980834961, @total=0.00999999999999801, @cutime=0.0, @utime=0.00999999999999801>]
irb(main):010:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134063') }}
       user     system      total        real
   0.010000   0.000000   0.010000 (  0.005000)
=> [#<Benchmark::Tms:0x24018c8b @stime=0.0, @label="", @cstime=0.0, @real=0.005000114440917969, @total=0.00999999999999801, @cutime=0.0, @utime=0.00999999999999801>]
irb(main):011:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134063') }}
       user     system      total        real
   0.040000   0.000000   0.040000 (  0.019000)
=> [#<Benchmark::Tms:0x5bde6148 @stime=0.0, @label="", @cstime=0.0, @real=0.01900005340576172, @total=0.03999999999999915, @cutime=0.0, @utime=0.03999999999999915>]
irb(main):012:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134063') }}
       user     system      total        real
   0.010000   0.000000   0.010000 (  0.003000)
=> [#<Benchmark::Tms:0x16fc5622 @stime=0.0, @label="", @cstime=0.0, @real=0.0029997825622558594, @total=0.010000000000005116, @cutime=0.0, @utime=0.010000000000005116>]
irb(main):013:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134061') }}
       user     system      total        real
   0.000000   0.000000   0.000000 (  0.003000)

VS

With First:

irb(main):016:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134061').first }}
       user     system      total        real
   1.310000   0.080000   1.390000 (  0.922000)
=> [#<Benchmark::Tms:0x75e4fe3c @stime=0.08000000000000007, @label="", @cstime=0.0, @real=0.9219999313354492, @total=1.3899999999999952, @cutime=0.0, @utime=1.3099999999999952>]
irb(main):017:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134061').first }}
       user     system      total        real
   0.060000   0.000000   0.060000 (  0.033000)
=> [#<Benchmark::Tms:0x13f4048e @stime=0.0, @label="", @cstime=0.0, @real=0.03299999237060547, @total=0.060000000000002274, @cutime=0.0, @utime=0.060000000000002274>]
irb(main):018:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134061').first }}
       user     system      total        real
   0.040000   0.000000   0.040000 (  0.025000)
=> [#<Benchmark::Tms:0x638977e0 @stime=0.0, @label="", @cstime=0.0, @real=0.02500009536743164, @total=0.04000000000000625, @cutime=0.0, @utime=0.04000000000000625>]
irb(main):019:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134061').first }}
       user     system      total        real
   0.060000   0.000000   0.060000 (  0.034000)
=> [#<Benchmark::Tms:0x8cb7185 @stime=0.0, @label="", @cstime=0.0, @real=0.03399991989135742, @total=0.060000000000002274, @cutime=0.0, @utime=0.060000000000002274>]
irb(main):020:0> Benchmark.bm { |bm| bm.report { store.tickets(date: '2015-01-10', ticket_id: '134061').first }}
       user     system      total        real
   0.050000   0.000000   0.050000 (  0.041000)

Here is the query execution plan:

[24] pry(main)> store.tickets(date: '2015-01-10', ticket_id: '134063')
=> #<V[9021833224]>
Total: 1
#<V-TitanQuery ([["label", "store"], ["store_pretty_url", "beach-chalet-brewery-and-restaurant-san-francisco"]]) -> V -> outE(:tickets) -> E-Property(date=="2015-01-10", ticket_id=="134063") -> inV -> V-Property(Guestly::Extensions::Ticket)>

The Store route extensions tickets method is:

        def tickets(opts = {})
          out_e(:tickets, opts).in_v(Guestly::Extensions::Ticket)
        end

We are running:

  • Using pacer-titan 0.0.7
  • Using pacer 2.0.22

YamlEncoder.dump doesn't handle DateTime (or Date) properly

jruby-1.7.18 :040 > Pacer::YamlEncoder.encode_property [Time.new, DateTime.new, Date.new]
 => " ---\n- 2015-05-13 12:23:42.625000000 -04:00\n- !ruby/object:DateTime '-4713-12-31 18:42:28.000000000 -05:17'\n- -4712-01-01\n"

Notice that the values are all wrong for DateTime and Date.

Important: This is not a bug in Pacer, it is a bug in the YAML library. We'll just have to find a workaround (or see if this bug has been fixed in a more recent version of the YAML library).

NameError: undefined field 'pathEnabled' for class 'Pacer::Pipes::RubyPipe'

Just upgraded jruby (to 1.7.2) and pacer (to 1.1.1, with pacer-neo4j-2.1.0) and now get this error when trying to start my app:

NameError: undefined field 'pathEnabled' for class 'Pacer::Pipes::RubyPipe'
            field_reader at org/jruby/java/proxies/JavaProxy.java:287
                RubyPipe at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/pacer-1.1.1-java/lib/pacer/pipe/ruby_pipe.rb:3
                   Pipes at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/pacer-1.1.1-java/lib/pacer/pipe/ruby_pipe.rb:2
                  (root) at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/pacer-1.1.1-java/lib/pacer/pipe/ruby_pipe.rb:1
                 require at org/jruby/RubyKernel.java:1027
  require_with_backports at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/backports-2.6.4/lib/backports/tools.rb:314
                  (root) at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/pacer-1.1.1-java/lib/pacer/pipes.rb:1
                 require at org/jruby/RubyKernel.java:1027
  require_with_backports at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/backports-2.6.4/lib/backports/tools.rb:314
                  (root) at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/pacer-1.1.1-java/lib/pacer/pipes.rb:33
                 require at org/jruby/RubyKernel.java:1027
  require_with_backports at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/backports-2.6.4/lib/backports/tools.rb:314
                  (root) at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/pacer-1.1.1-java/lib/pacer/loader.rb:1
                  (root) at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/pacer-1.1.1-java/lib/pacer/loader.rb:19
                 require at org/jruby/RubyKernel.java:1027
                  (root) at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/pacer-1.1.1-java/lib/pacer.rb:1
                    each at org/jruby/RubyArray.java:1613
                   Pacer at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/pacer-1.1.1-java/lib/pacer.rb:42
                    each at org/jruby/RubyArray.java:1613
                  (root) at /Users/jannis/.rvm/gems/jruby-1.7.2@sheldon/gems/pacer-1.1.1-java/lib/pacer.rb:25
                  (root) at /Users/jannis/.rvm/gems/jruby-1.7.2@global/gems/bundler-1.2.3/lib/bundler/runtime.rb:1
                  (root) at ./console:10

This is with an embedded neo4j version 1.8.1 and the ruby gem neo4j.rb version 2.2.0

concurrency

I'm doing a few experiments and keep running into ConcurrentModificatio
nException. What strikes me as odd is that access such as g.v(...).first causes problems.

I'm using tinkergraph, perhaps that is causing it's own problems that may be solved with a non-memory backend?

What' i'd like to do is parse a CSV file into a graph. Some items need to be looked up first. create_vertex seems to work fine with Thread or the Parallel gem. routes do not, and add_edges_to also seems problematic. I've experimented with bulk_job, but this seems very slow for a single vertex result.

basic example

(1..1000).each do |n|
      Thread.new {
        new_node = g.create_vertex({thing: n})
        name = Random.rand(1000)

        #  Blows up
        #v = g.v(name: name).first

       # works
        v = g.vertex(name)
        if v == nil
          raise "bad! #{name}"
        end
        v.add_edges_to(:unf, new_node)
      }
    end

in this test I can grab the vertex by the raw ID, in my project I have another key and do not use it for the vertex id. should I keep some index of the key mappings for the pacer id's to my id's to avoid creating a "route"? Why does reading a route create an exception? Is tinkergraph and the lack of transactions the problem?

any help appreciated.

Enhance the subgraph method

Turns out it would be useful to be able to filter paths before creating a subgraph as well as to transform the subgraph based on the path data that was used to create it.

    route.subgraph(filter_path: proc { |path| [path[0],path[2]] }) do |subgraph,paths|
      # Do additional work on the subgraph here...
      # (return value ignored)
    end
    #=> <subgraph>

It would be best if the paths were cached before the subgraph were created so that there would be no chance that the source data change before being processed by the block.

Another option to consider would look more like follows and should be more efficient, though has the potential disadvantage that the entire subgraph would not be created before the subgraph transform happens:

route.subgraph(filter_path: proc { |path| [path[0], path[2]] }, after_create: proc { |subgraph, created_elements, source_elements| # modify the subgraph here })

Tinkerpop and Pacer Versioning

Hi,

I commented on a pull request about upgrading to tinkerpop 2.3.0 (and my trouble upgrading to 2.4.0-SNAPSHOT). I've also previously had a pull requested rejected (for GOOD cause!) separating jars from the Pacer repo.

I've been thinking a lot about how pacer could make this easier on those of us trying (or having-to-for-bugs) remain on the bleeding edge.

I have three ideas:

  1. Keep the JARS in Pacer, but add a unless defined?() before the require, that way in development, or even in production, I could include my own jars before Pacer and have Pacer use those.
  2. Version Pacer in line with tinkerpop... so the version number at release of pacer would match tinkerpop - then you know what your'e getting. If there's a pacer-specific update - tack it on at the end. And... use .pre for the snapshot releases... so right now there would probably be a pacer 2.2.0, a pacer 2.3.0 and a pacer 2.4.0.pre And if pacer needed a bug fix, it'd get 2.2.0.1, 2.3.0.1 and 2.4.0.pre2
  3. This one you have already rejected, but putting the jars into a different gem and having pacer have only a loose dependency on them... like the zk gem does: https://github.com/slyphon/zookeeper_jar That way, I would only fork the jars repo and update there.

I, personally, lean towards doing 1 and 2. What are your thoughts? I'm happy to submit a pull request with any/all of this or any better ideas. Right now, we have had to fork pacer to use updated tinkerpop and that poses serious challenges because pacer has a build step... so I can't just add a git: git@mypacerepo to my Gemfile... In fact, I'm at a little bit of a loss as to how to use my forked gem easily in prod without publishing it to rubygems.

So that's a few ideas and the why of allowing easier swapping of tinkerpop. I'm open to any and all suggestions.

Component 'org.neo4j.kernel.extension.KernelExtensions@1eb458fd' failed to initialize.

I'm unable to follow your readme example on Interoperation with the neo4j gem:

>> require 'neo4j'
=> true
>> require 'pacer-neo4j'
=> true
>> Neo4j.db.start
I, [2013-01-29T16:51:19.924000 #12485]  INFO -- : Starting local Neo4j using db    /home/ettober/code/neo4j/serv1/db using Java::OrgNeo4jKernel::EmbeddedGraphDatabase
Java::JavaLang::RuntimeException: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.extension.KernelExtensions@1eb458fd' failed to initialize. Please see attached cause exception.
    from org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:258)
    from org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:88)
    from org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:75)

Issue do not appear if I only require the Neo4j gem and start the db.

Context:
jruby 1.7.2
neo4j 2.2.3
pacer 1.1.1
pacer-neo4j 2.1.0

Update Pacer to use latest Tinkerpop stack

I'm filing this just in case anyone is wondering if/when this is going to happen.

The answer is yes but I am not certain when because my current startups are keeping me rather busy. I am using Pacer on a daily basis for both of them without any problems.

My primary motivation when I do upgrade is likely just to be able to use the most recent version of Neo4j (currently neo4j-1.5.M01), which is progressing rapidly and adding useful new features that I would like to have access to.

If anyone using Pacer would also like to see an update or has any other compelling reasons for upgrading they'd like to share, please comment here or +1. If you're considering using Pacer but this (or any other issue) is making you hesitate, please also share. Feedback, if any, will definitely be considered in prioritizing this task.

  • Darrick

Cannot use pacer with jruby 9000

Hi,

Pacer doesn't run in jruby 9000 (jruby-head) now.

May be because the jruby-head is using RUBY_VERSION 2.2.0 and Pacer is treating any ruby that is not 1.9.x as 1.8 ruby in lib/pacer.rb.

Would you consider add support for jruby-head?

 jruby-head :001 > RUBY_VERSION
 => "2.2.0" 

`merge` results in an incorrect route type

See the IRB snippet below ...

jruby-1.7.18 :177 > g.v.branch {|r| r.out_e.limit(1)} .branch {|r| r.limit(1) } 
#<E[6]:3-flies_to-1> #<V[3]>             
Total: 2

 => #<GraphV -> Elem-Branch> 
jruby-1.7.18 :178 > g.v.branch {|r| r.out_e.limit(1)} .branch {|r| r.limit(1) } .merge
#<V[6]> #<V[3]>
Total: 2
 => #<GraphV -> V-Branch -> V-MergedBranch>

Notice that, when merging the branches (i.e. creating a mixed route), the edge with id 6 is printed as #<V[6]>.

Next, the following snippet implies that, when merging the routes, we get a vertex route (instead of a mixed route, as expected)

jruby-1.7.18 :204 > g.v.branch {|r| r.out_e.limit(1)} .branch {|r| r.limit(1) } .e
#<E[6]:3-flies_to-1>
Total: 1
 => #<GraphV -> Elem-Branch -> E(Type(Java::ComTinkerpopBlueprints::Edge))> 

jruby-1.7.18 :205 > g.v.branch {|r| r.out_e.limit(1)} .branch {|r| r.limit(1) } .merge .e
Pacer::UnsupportedOperation: Can't get edges for this route type.
    from /home/ubuntu/.rvm/gems/jruby-1.7.18/gems/pacer-2.0.8-java/lib/pacer/core/graph/element_route.rb:26:in `e'
    from (irb):205:in `evaluate'
    from org/jruby/RubyKernel.java:1107:in `eval'
    from org/jruby/RubyKernel.java:1507:in `loop'
    from org/jruby/RubyKernel.java:1270:in `catch'
    from org/jruby/RubyKernel.java:1270:in `catch'
    from /home/ubuntu/.rvm/rubies/jruby-1.7.18/bin/irb:13:in `(root)'

And, just to verify ...

jruby-1.7.18 :260 > g.v.branch {|r| r.out_e.limit(1)} .branch {|r| r.limit(1) } .vertices_route?
 => false 

jruby-1.7.18 :261 > g.v.branch {|r| r.out_e.limit(1)} .branch {|r| r.limit(1) } .merge .vertices_route?
 => true

String typed property values ending with a blank space cannot be found.

Given a node with a property (e.g. "name") of value "Pangloss " (with trailing " ") a query

my_graph.v(:name => "Pangloss ")

will return no results.

Given another node with the name-value "Pacer" (without trailing " ") the query

my_graph.v(:name => "Pacer                      ")

will return this specific "Pacer"-node.

I think this in not only counter-intuitive but also harms usability. So, in other words: What is the purpose of this line:

value = value.strip

Can it be removed?

Many thanks in advance!

Create a not_unique pipe

Yield only elements that are not unique. The implementation could probably be similar to the DuplicateFilterPipe, however it may be better to implement with a Map rather than a Set and keep a count of each element (similar to GroupCount) which would allow the user to specify that an element should appear in the stream within a range of number of times.

Once an element is discovered to be duplicate, it would be emitted. I'm not sure how the data should be emitted. Some possibilities below:

  original data: [a b c a' b' a'']

possible results:
  only the duplicate elements: [a' b' a"]
  original and duplicates:     [a a' b b' a"]
  all so far:                  [ [a a'] [b b'] [a a' a"] ]
  first and last:              [ [a a'] [b b'] [a a"] ]

Enumerable + function and nokogiri

Hi Darrick,

It seems Pacer's extensions to the Enumerable module can cause problems - I've been getting errors with Nokogiri when running cucumber tests as some of Nokogiri's node selectors inherit Enumerable and + tries to send them to a MultiPipe.

I've not quite worked out where exactly the Enumerable module fits in with the Pacer ecosystem, perhaps you could test for object class like you did in Array's + function to make it a bit more selective?

Cheers,
Ilya

bad gemspec

Unfortunately, the gem pacer (0.8.5) has an invalid gemspec. As a result, Bundler cannot install this Gemfile. Please ask the gem author to yank the bad version to fix this issue. For more information, see http://bit.ly/syck-defaultkey.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.