gleanerio / nabu Goto Github PK

Nabu: Synchronize data graph objects with a triplestore

Makefile 0.39% Go 94.83% Shell 3.79% Dockerfile 0.69% Python 0.31%

nabu's Introduction

Nabu

About

Nabu is a program for loading data graphs into triple-stores. Its main goal is to synchronize a collection of RDF graphs (for example JSON-LD document) in an object store (minio, S3, etc.) into a graph database/triplestore like blazegraph or jena.

Further information can be found in the documentation directory.

nabu's People

Contributors

Stargazers

Watchers

Forkers

earthcube adplincinst internetofwater

nabu's Issues

Massive update

@valentinedwv this is a massive revision of the code.

I removed much, updated much. Really sorry to push this much out in one fell swoop.. I'll try not to do this again but felt it was needed. I tried to keep some notes (below). If you feel up to to taking a look at it, great.

Change Log

Removed several options that were experiments by me like
txtai, tika, etc and all the related pkg, pkg/cli and internal
files associated with them. Basically, removing experiments which
I will bring back but as proper branches for testing and review
and not mixed into dev or master directly.
Removed several functions that were no longer being used.
Basically, anything not referenced was removed.
Some functions were exported that didn't need to be, ie they
were only internal to a given package. They were unexported which
in go is simply making them start lower case and refactoring.
One or two files had several functions in them. I split these into
individual files so, for the most part, any exported function is in
its own file. It might have a few unexported support functions in it
as well.
Documentation updates including some d2 diagrams to better describe
the flow of events
Updated root.go to deal with the --prefix flag correctly. So the
var passed is a proper []string now.
Reduced some log chatter. Lots of unneeded logs lines
removed and some trapped behind error catches or moved from Info
to Trace.
Created several testing and inspection SPARQL functions in docs/sparql
URN pattern resolved. This was done in several place (because I am
really a terrible programmer) so I made a single function for this
at internal/graph/mintURN.go This is now an ADR entry.
Added decisions directory for ADRs with URN pattern as first one
at decisions/0001-URN-decision.md
Renamed jena to the more generic bulk
Added an endpointBulk to the config file since this is needed for bulk
loading vs the SPARQL update calls.
Centralize JSON-LD proc and options into ldproc.go
TESTING: Meili service reads server address from environment variable

TODO

add an input for the location of the schema file, pipelined into ldproc.go already
Align meili pipeJS2Array with object/pipecopy.go so that there is common
code use. just use pipecopy then mod the JSON to address meili format?
Make a framing pipeline for marqo "pipeframe"

Working status and supporting documentation for the following

[x] prefix loading works with blaze (sparql update)
[x] prune loading works with blaze (sparql update)

[x] prefix loading works with jena (sparql update)
[x] prune works with jena (sparql update)

[X] prefix loading works with graphdb (sparql update)
[ ] prune loading works with graphdb (sparql update)

[x] bulk loading works with jena
[x] bulk loading works with graphdb

graph URNs

There is an issue on the connection between Gleaner operation and Nabu.

The selection of namespaces for the "graph first approach" in Gleaner impacts the config file for Nabu with respect to where it looks for loading the Gleaner output. It would be better if the selection between raw summoned files and processes miller files was not a "convention" based on the source type (sitemap vs sitegraph).

I had a note this could impact queries too, however, that doesn't seem likely as the graph quad holds prov relations/references and is never part of the discovery aspect of the query.

Existing graph checking is off

See line 99 on..

in https://github.com/gleanerio/nabu/blob/master/internal/flows/pipeload.go

This needs to be a flag.

prune Bad endpoint, still runs

checking to see if passing sparql endpoint still worked.
fed it a bad sparql endpoint, still ran... in fact it acts like prune query worked

SPARQL: SELECT DISTINCT ?g WHERE {GRAPH ?g {?s ?p ?o}

endpoint

endpoints:
  - service: ec_blazegraph
    baseurl: https://graph.geocodes-aws-dev.earthcube.org/blazegraph/namespace/blah
    type: blazegraph
    authenticate: false
    username: admin
    password: jfpwd
    modes:
      - action: sparql
        suffix: /sparql
        accept: application/sparql-results+json
        method: GET
      - action: update
        suffix: /sparql
        accept: application/sparql-update
        method: POST
      - action: bulk
        suffix: /sparql
        accept: text/x-nquads
        method: POST

{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/objects/objectlist.go:44","func":"github.com/gleanerio/nabu/internal/objects.ObjectList","level":"info","msg":"test:summoned/geocodes_demo_datasets object count: 18\n","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphList.go:18","func":"github.com/gleanerio/nabu/internal/prune.graphList","level":"info","msg":"Getting list of named graphs","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphList.go:44","func":"github.com/gleanerio/nabu/internal/prune.graphList","level":"info","msg":"Pattern: urn:gleaner.io:TEST:geocodes_demo_datasets:data\n","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphList.go:45","func":"github.com/gleanerio/nabu/internal/prune.graphList","level":"info","msg":"SPARQL: SELECT DISTINCT ?g WHERE {GRAPH ?g {?s ?p ?o} }\n","time":"2023-10-05T10:34:38-07:00"}
Current graph items: 0  Cuurent object items: 18
Orphaned items to remove: 0
Missing items to add: 18
{"difference":0,"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:73","func":"github.com/gleanerio/nabu/internal/prune.Snip","graph items":0,"level":"info","missing":18,"msg":"Nabu Prune","object items":18,"prefix":"summoned/geocodes_demo_datasets","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:105","func":"github.com/gleanerio/nabu/internal/prune.Snip","level":"info","msg":"uploading missing %n objects18","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/graph/insert.go:47","func":"github.com/gleanerio/nabu/internal/graph.Insert","level":"info","msg":"response Status: 404 Not Found","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/graph/insert.go:48","func":"github.com/gleanerio/nabu/internal/graph.Insert","level":"info","msg":"response Headers: map[Access-Control-Allow-Credentials:[true] Access-Control-Allow-Headers:[Authorization,Origin,Content-Type,Accept] Access-Control-Allow-Origin:[*] Content-Type:[text/plain] Date:[Thu, 05 Oct 2023 17:34:39 GMT] Server:[Jetty(9.4.z-SNAPSHOT)] Vary:[Origin] X-Frame-Options:[SAMEORIGIN]]","time":"2023-10-05T10:34:38-07:00"}
   5% |██████

Push to multiple endpoints

Would it be a good ideal to allow for pushing to multiple endpoints?

poor mans clustering
dev testing

urn:repo:id or urn:bucket:repo:id or urn:community:repo:id

urn:repo:id

urn:bucket:repo:id

noticing that the summoned|milled got dropped, which is good.

Do we want to go one step further?

EC backend code only uses the last two parts of the urn, so does not matter, but if a user replicates an s3 to a different bucket then if a programmer assumes the bucket is part of the urn, then that would not be good.

Might go with

urn:{community}:repo:id

so a urn might be identifier as part of a community to avoid naming clashes

output quad file

Gleaner output triples to runX.
Nabu should do this effort, ability to dump quads to a file, and to an s3 bucket.

Right now the runX outputs are concatenations of .nt files
So they are in a state that does not allow for use when the quad is needed
&also can not be used as-is to reconstruct them, as that info is back
in the pre-concatenated stage.
I heard there might be, or at least could be a flag, to get these .nq files to contain quads

This would really help me, as I like to do quick post-processing on the bulk quad files,
&them upload them, separate from nabu.

Filter on a a list of valid TYPE values on load

Need the ability to filter out (or only allow) certain TYPE values.

This is needed in the case where things like BREADCRUMB types are indexed but we don't want these in the graph.

A utility script to remove these from an object store might help for Gleaner architecture with the object store. However, it would be good to still have this in Nabu as well.

Somewhat related to: gleanerio/gleaner#128

No Implementation Network in File

If there is not implmentation network in the file, the code dies.

Lets just use some default

Memory Leak

There a slow leak in nabu? ssiodp using 2.3 gig memory... up from 2.2 an hour and a half ago?

Mem: 14870184K used, 1512776K free, 43792K shrd, 33876K buff, 829740K cached
CPU:  42% usr  20% sys   0% nic  10% idle  25% io   0% irq   0% sirq
Load average: 5.45 5.89 6.02 7/1082 39
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 root     S    2814m  16%   0   0% /nabu/nabu --cfg /nabu/nabuconfig.yaml prefix --prefix prov/ssdbiodp
   33     0 root     S     1672   0%   0   0% sh
   39    33 root     R     1600   0%   1   0% top

put my money on GetS3Bytes...
think it's missing a defer

Data Loading Issue

https://graph.geocodes.earthcube.org/blazegraph/namespace/ecrr
Data loading is an issue: ecrr which was frequently updated:
30627902 triples
delete and reload: 30512 triples
push a second time using nabu: 53676 triples
third time: 76840
prune: 76840 (still)

glcon config directory
ecrr.zip

WikiMedia Updater logic:
https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/eu#runUpdate.sh

https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/Updater.java

add sparql endpoint check

sparql:
  endpoint: https://graph.geocodes.ncsa.illinois.edu/blazegraph/iris_nabu/

looked like it ran fine, but like five progress bars, and zero triples uploaded.

Need to add a check if the spqrql endpoint works.

prune performance improvement

. 35 seconds is on the test graph (now 70)
SELECT DISTINCT ?g WHERE {GRAPH ?g {?s ?p ?o} FILTER regex(str(?g), "^urn:test:geocodes_demo_datasets:")}
solutions=18, chunks=10, children=0, elapsed=342935ms, Cancel

https://graph.geocodes-aws-dev.earthcube.org/blazegraph/
test namespace

Error in prune not seen in prefix

See gleanerio/scheduler#10

{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/pipeload.go:56",
  "func": "github.com/gleanerio/nabu/internal/objects.PipeLoad",
  "level": "error",
  "msg": "JSONLDToNQ err: %sunexpected end of JSON input",
  "time": "2023-02-21T00:26:26Z"
}
{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/gets3bytes.go:21",
  "func": "github.com/gleanerio/nabu/internal/objects.GetS3Bytes",
  "level": "info",
  "msg": "Issue with reading an object:  gleaner.oih/summoned/africaioc/ffb59b01cf1d2de175c66576d2b69c7940dda8a5.jsonld",
  "time": "2023-02-21T00:26:26Z"
}
{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/pipeload.go:41",
  "func": "github.com/gleanerio/nabu/internal/objects.PipeLoad",
  "level": "error",
  "msg": "gets3Bytes %v\\nThe specified key does not exist.",
  "time": "2023-02-21T00:26:26Z"
}
{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/graph/jsonldToNQ.go:17",
  "func": "github.com/gleanerio/nabu/internal/graph.JSONLDToNQ",
  "level": "info",
  "msg": "Error when transforming JSON-LD document to interface: unexpected end of JSON input",
  "time": "2023-02-21T00:26:26Z"
}

Add skolemize

Need a function to:

JSON-LD -> n-triples (quads)

n-triples (quads) ------> skolimized blank nodes --------> n-triples (quads)

via https://gleaner.io/id/genid/HASHVALUEHERE

The goal being a function that rolls up a bucket prefix into a single RDF graph for bulk loading into things like Jena via the "Data" HTTP endpoints.

options include:

https://hashids.org/go/ (convert string to array of ints) which is fine, that is just
input := []byte("<http://example.org/cars/for-sale#tesla> <http://purl.org/goodrelations/v1#includes> _:b1 ")
https://github.com/jkomyno/nanoid
base64 To long with a triple set ref: https://play.golang.com/p/nzPZt_qAmFO

Though a bit of tech debt, the nanoid is documented and at least in terms of round tripping is issue really.

ref: https://www.w3.org/TR/rdf11-concepts/#section-skolemization

Release graph is triples not quads

The release graph is generating triples, not quads. I really want quads as a sort of "nabu prov" statement. However, since the triples is there now, make this an option. However, I think I want the quads the default and the triples the option.

bulk drop graphs for a release

https://groups.google.com/a/stardog.com/g/stardog/c/5t8Q63w25w8

prov release files

when we do a release, can we also generate a release file for the 'prov'
This would save a step along the way for a fully second call for the 'prov'
and speed loading.

prune... what is happening

pruning with orgs looking for orgs in summmon/org?

and urn pattern looks broken:
Pattern: urn:gleanermilled/iris:
Graph items: 0 Object items: 28 difference: 0
Missing item count: 28
{"Graph items":0,"Missing item count":28,"Object items":28,"difference":0,"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:67","func":"github.com/gleanerio/nabu/internal/prune.Snip","level":"info","msg":"Nabu Prune","prefix":"milled/iris","time":"2023-02-10T14:56:24-08:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:87","func":"github.com/gleanerio/nabu/internal/prune.Snip","level":"info","msg":"uploading missing %n objects28","time":"2023-02-10T14:56:24-08:00"}
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (28/28, 1 it/s)
Exiting.

minio:
    address: oss.geocodes.ncsa.illinois.edu
    port: 443
    ssl: true
    accesskey: accesskey
    secretkey: secretkey
    bucket: gleaner
objects:
    bucket: gleaner
    domain: us-east-1
    prefix:
        - summoned/iris
        - org
    prefixoff: []
sparql:
    endpoint: https://graph.geocodes.ncsa.illinois.edu/blazegraph/namespace/iris_nabu/sparql
    authenticate: false
    username: ""
    password: ""
txtaipkg:
    endpoint: http://0.0.0.0:8000

nabu prune --cfg ../gleaner/configs/summarize/nabu --prefix orgs --prefix milled/iris

nabu prune --cfg ../gleaner/configs/summarize/nabu --prefix org --prefix milled/iris

{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/objectlist.go:44","func":"github.com/gleanerio/nabu/internal/prune.ObjectList","level":"info","msg":"gleaner:orgs object count: 10\n","time":"2023-02-10T14:49:35-08:00"}
Pattern: urn:gleanerorgs:
Graph items: 0  Object items: 10  difference: 0
Missing item count: 10
{"Graph items":0,"Missing item count":10,"Object items":10,"difference":0,"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:67","func":"github.com/gleanerio/nabu/internal/prune.Snip","level":"info","msg":"Nabu Prune","prefix":"orgs","time":"2023-02-10T14:49:35-08:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:87","func":"github.com/gleanerio/nabu/internal/prune.Snip","level":"info","msg":"uploading missing %n objects10","time":"2023-02-10T14:49:35-08:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/objects/gets3bytes.go:21","func":"github.com/gleanerio/nabu/internal/objects.GetS3Bytes","level":"info","msg":"Issue with reading an object:  gleaner/summoned/orgs/aquadocs.jsonld","time":"2023-02-10T14:49:35-08:00"}

release failing

Failed to create a release.
And I think if one release fails, any additional releases do not get generated.

@geocodes:~/indexing$ ./glcon nabu release --cfgName aws_geocodes
Using nabu config file: /home/ubuntu/indexing/configs/aws_geocodes/nabu
nabu release called

2023/05/13 13:07:20 Error decoding triples: syntax error: bad IRI: disallowed character ' '

files in opencore on oss.geocodes-dev.earthcube.org
org is empty.

GOROOT=/usr/local/opt/go/libexec #gosetup
GOPATH=/Users/valentin/go #gosetup
/usr/local/opt/go/libexec/bin/go build -ldflags -X main.VERSION=testline -o /private/var/folders/t2/t39bprkn16dg9nr4v6c18v0w0000gn/T/GoLand/___nabu_release_opencore /Users/valentin/development/dev_earthcube/gleanerio/gleaner/cmd/glcon/main.go #gosetup
/private/var/folders/t2/t39bprkn16dg9nr4v6c18v0w0000gn/T/GoLand/___nabu_release_opencore nabu release --cfg configs/opencore_nabu
Using nabu config file: configs/opencore_nabu
nabu release called
2023/05/03 07:06:22 Error decoding triples: 1:195 unexpected 3 as dot (.)

Process finished with the exit code 0

minio:
    address: oss.geocodes-dev.earthcube.org
    port: 443
    ssl: true
    accesskey:
    secretkey:
    bucket: opencore
objects:
    bucket: opencore
    domain: us-west-2
    prefix:
        - summoned/opencoredata
        - org
    prefixoff: []
sparql:
    endpoint: https://graph.geocodes-dev.earthcube.org/blazegraph/namespace/opencore/sparql
    authenticate: false
    username: ""
    password: ""
txtaipkg:
    endpoint: http://0.0.0.0:800

prefixoff not working

nabu prune all fail on lost connection

from a run.. I think I restarted a frozen graph server.
Run stopped rather than contining, after issue resolved itself

0% |                          | (2048/301912, 14 it/min) [56m17s:349h30m11s]panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x8f66ed]
goroutine 1 [running]:
github.com/gleanerio/nabu/internal/graph.Drop(0xc00017f0b1?, {0xc00233f440, 0x3b})
	/home/runner/work/nabu/nabu/internal/graph/drop.go:35 +0x38d
github.com/gleanerio/nabu/internal/objects.PipeLoad(0xc0004d4380?, 0x0?, {0xc00002603d, 0xa}, {0xc00017f080, 0x38}, {0xc00002e010, 0x4d})
	/home/runner/work/nabu/nabu/internal/objects/pipeload.go:66 +0x414
github.com/gleanerio/nabu/internal/objects.ObjectAssembly(0xc0001777a0?, 0x4?)
	/home/runner/work/nabu/nabu/internal/objects/objectAssembly.go:45 +0x6c8
github.com/gleanerio/nabu/pkg.Prefix(0xbe8840?, 0xc000012018?)
	/home/runner/work/nabu/nabu/pkg/prefix.go:13 +0x74
github.com/gleanerio/nabu/pkg/cli.glob..func5(0x1022b00?, {0xaee538?, 0x4?, 0x4?})
	/home/runner/work/nabu/nabu/pkg/cli/prefix.go:21 +0x69
github.com/spf13/cobra.(*Command).execute(0x1022b00, {0xc000103c80, 0x4, 0x4})
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:860 +0x663
github.com/spf13/cobra.(*Command).ExecuteC(0x1023280)
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
github.com/gleanerio/nabu/pkg/cli.Execute()
	/home/runner/work/nabu/nabu/pkg/cli/root.go:41 +0x25
main.main()
	/home/runner/work/nabu/nabu/cmd/nabu/main.go:13 +0x17
2023-08-09T23:53:07Z ERR | file=/home/runner/work/nabu/nabu/internal/graph/drop.go:33 func=github.com/gleanerio/nabu/internal/graph.Drop msg=Post "https://graph.geocodes-aws-dev.earthcube.org/blazegraph/namespace/test/sparql": dial tcp: lookup graph.geocodes-aws-dev.earthcube.org: i/o timeout

config file entry for context files

Need to make a config file entry for the context file maps. This is already done in Gleaner so I need to bring over the same approach to Nabu

@valentinedwv This is related to our discussion today I just tagged you in it.

I need this since when Nabu is running as a container launched by Portainer, it needs a more explicate path to the context files otherwise it's a bit of a mess (hard coded mess). So just want to make this a config file entry. The default should be set to just the schema.org context file in the same directory as the executable.

https://github.com/gleanerio/nabu/blob/ce97e03918e53c46375de28603912f313b8c9f32/internal/graph/ldproc.go#LL39C2-L51C3

releases http://schema.org in some files

Think there may be an issue with summon...
but for consistency, we should probably map to http://schema.org or https://schema.org

The geocodes_demo_datasets run produces http://schema.org

review Content-Type:text/rdf+n3;charset=utf-8

@nein09 mentioned that

Content-Type:text/rdf+n3;charset=utf-8

can be used to support non-ASCII in many cases.

Need to review Nabu loading code for this (and shell scripts)

A cli entry for stats/report separate from prune / snip

I wonder if prune could not be two things:

collect|validate|stats, then pass object to prune/Snip
Then add a cli for stats/report, to just dump a file info from collect|validate|stat, with options for a brief summary or a detailed report with all the URN information.

We could do the same thing in gleaner for the sitemaps, etc.

release, simplify name, write archive copy at same time as latest

Simplify the name so that bulk loading can be done by dagster in a simple SOURCEVAL replacement.

source_release.nq

make a copy to archive/source/source_date_release.nq

Release generating http://schema.org

are we using http or https?

<https://ds.iris.edu/ds/products/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/DataCatalog> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/> <http://schema.org/url> "https://ds.iris.edu/ds/products/" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Dataset> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/alternateName> "EQEnergy" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/author> <https://gleaner.io/xid/genid/cglb8mlpv00ll85v7nh0> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/author> <https://gleaner.io/xid/genid/cglb8mlpv00ll85v7nhg> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/author> <https://gleaner.io/xid/genid/cglb8mlpv00ll85v7ni0> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/dateCreated> "2013-06-09T23:22:04.030" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/dateModified> "2017-04-28T16:35:54.824" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/description> "Earthquake energy and rupture durations are estimated following all earthquakes with initial magnitude above M<sub>w</sub> 6.0 and a GCMT moment tensor.  The method follows Convers and Newman, 2011.  These are fully automated and not reviewed by a human." <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/image> "https://ds.iris.edu/media/product/eqenergy/images/eqEnergyLogo_1.png" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/includedInDataCatalog> <https://ds.iris.edu/ds/products/> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/keywords> "EQEnergy,geophysics,seismic,seismology" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/name> "Earthquake energy & rupture duration" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/url> "https://ds.iris.edu/ds/products/eqenergy/" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .

Add graphs prefix to hold bulk versions of the repo graphs

Mod the bulk workflow in nabu to:

output the bulk files into a prefix graphs/latest
also have a graphs/archive to move the old version to
do we need the output to be quads and or triples

can be a mod of services/bulkloader

https://cayley.io/ support

https://github.com/cayleygraph/cayley is a golang graph system used by google apparently

its designed for lined data and can use different data stores.

and its golang, so easy to run :)

nq builder

Need a process that changes nt or nq to nq with a context defined by the urn to object path used in the gleaner data architecture.

Take the current nqtontc function and modify it (is it being used somewhere now?) with the object path / name for the RI I use.

bullk blind nodes inccorect?

blind nodes are all _b0, _:b1, etc so not unique in the file

Or maybe it does not matter since they are in thier own graph,

but would it be good for them to be unique?

<https://ds.iris.edu/ds/products/noise-toolkit/> <http://schema.org/name> "The IRIS DMC Noise Toolkit" <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
<https://ds.iris.edu/ds/products/noise-toolkit/> <http://schema.org/url> "https://ds.iris.edu/ds/products/noise-toolkit/" <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
_:b0 <http://schema.org/name> "IRIS Data Products" <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
<https://ds.iris.edu/ds/products/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/DataCatalog> <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
<https://ds.iris.edu/ds/products/> <http://schema.org/url> "https://ds.iris.edu/ds/products/" <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> <urn:gleaner-wf:iris:3d1828ea2585a565b3d38386eb9303488bc437e7> .
_:b0 <http://schema.org/name> "IRIS Data Products" <urn:gleaner-wf:iris:3d1828ea2585a565b3d38386eb9303488bc437e7> .

prov id's buckets and paths

ProvID's look good.

Should these be modified to new pattern?

		"prov:generated": {
		  "@id": "urn:geocodes:milled:r2r:9be39508f20e4d11d224227572e565e9d64ac488"
		},

"urn:gleaner.io:geocodes:r2r:9be39508f20e4d11d224227572e565e9d64ac488"

and
urn:gleaner.io/id/collection/9be39508f20e4d11d224227572e565e9d64ac488"
become:
urn:gleaner.io/id/collection/{source}/9be39508f20e4d11d224227572e565e9d64ac488"

improve minio connection check

There is a connection check at the pkg/cli/root.go which where there was an error, returned "bucket not found"

Improve the message "Its not bucket found, it's an issue with the configuration. check minio: address, port, ssl"