GithubHelp home page GithubHelp logo

nabu's Introduction

Nabu

About

Nabu is a program for loading data graphs into triple-stores. Its main goal is to synchronize a collection of RDF graphs (for example JSON-LD document) in an object store (minio, S3, etc.) into a graph database/triplestore like blazegraph or jena.

Further information can be found in the documentation directory.

nabu's People

Contributors

fils avatar valentinedwv avatar

Stargazers

Luigi Marini avatar

Watchers

 avatar Melinda Minch avatar James Cloos avatar  avatar

nabu's Issues

Massive update

@valentinedwv this is a massive revision of the code.

I removed much, updated much. Really sorry to push this much out in one fell swoop.. I'll try not to do this again but felt it was needed. I tried to keep some notes (below). If you feel up to to taking a look at it, great.

Change Log

  • Removed several options that were experiments by me like
    txtai, tika, etc and all the related pkg, pkg/cli and internal
    files associated with them. Basically, removing experiments which
    I will bring back but as proper branches for testing and review
    and not mixed into dev or master directly.
  • Removed several functions that were no longer being used.
    Basically, anything not referenced was removed.
  • Some functions were exported that didn't need to be, ie they
    were only internal to a given package. They were unexported which
    in go is simply making them start lower case and refactoring.
  • One or two files had several functions in them. I split these into
    individual files so, for the most part, any exported function is in
    its own file. It might have a few unexported support functions in it
    as well.
  • Documentation updates including some d2 diagrams to better describe
    the flow of events
  • Updated root.go to deal with the --prefix flag correctly. So the
    var passed is a proper []string now.
  • Reduced some log chatter. Lots of unneeded logs lines
    removed and some trapped behind error catches or moved from Info
    to Trace.
  • Created several testing and inspection SPARQL functions in docs/sparql
  • URN pattern resolved. This was done in several place (because I am
    really a terrible programmer) so I made a single function for this
    at internal/graph/mintURN.go This is now an ADR entry.
  • Added decisions directory for ADRs with URN pattern as first one
    at decisions/0001-URN-decision.md
  • Renamed jena to the more generic bulk
  • Added an endpointBulk to the config file since this is needed for bulk
    loading vs the SPARQL update calls.
  • Centralize JSON-LD proc and options into ldproc.go
  • TESTING: Meili service reads server address from environment variable

TODO

  • add an input for the location of the schema file, pipelined into ldproc.go already
  • Align meili pipeJS2Array with object/pipecopy.go so that there is common
    code use. just use pipecopy then mod the JSON to address meili format?
  • Make a framing pipeline for marqo "pipeframe"

Working status and supporting documentation for the following

[x] prefix loading works with blaze (sparql update)
[x] prune loading works with blaze (sparql update)

[x] prefix loading works with jena (sparql update)
[x] prune works with jena (sparql update)

[X] prefix loading works with graphdb (sparql update)
[ ] prune loading works with graphdb (sparql update)

[x] bulk loading works with jena
[x] bulk loading works with graphdb

graph URNs

There is an issue on the connection between Gleaner operation and Nabu.

The selection of namespaces for the "graph first approach" in Gleaner impacts the config file for Nabu with respect to where it looks for loading the Gleaner output. It would be better if the selection between raw summoned files and processes miller files was not a "convention" based on the source type (sitemap vs sitegraph).

I had a note this could impact queries too, however, that doesn't seem likely as the graph quad holds prov relations/references and is never part of the discovery aspect of the query.

prune Bad endpoint, still runs

checking to see if passing sparql endpoint still worked.
fed it a bad sparql endpoint, still ran... in fact it acts like prune query worked

SPARQL: SELECT DISTINCT ?g WHERE {GRAPH ?g {?s ?p ?o}

endpoint

endpoints:
  - service: ec_blazegraph
    baseurl: https://graph.geocodes-aws-dev.earthcube.org/blazegraph/namespace/blah
    type: blazegraph
    authenticate: false
    username: admin
    password: jfpwd
    modes:
      - action: sparql
        suffix: /sparql
        accept: application/sparql-results+json
        method: GET
      - action: update
        suffix: /sparql
        accept: application/sparql-update
        method: POST
      - action: bulk
        suffix: /sparql
        accept: text/x-nquads
        method: POST
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/objects/objectlist.go:44","func":"github.com/gleanerio/nabu/internal/objects.ObjectList","level":"info","msg":"test:summoned/geocodes_demo_datasets object count: 18\n","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphList.go:18","func":"github.com/gleanerio/nabu/internal/prune.graphList","level":"info","msg":"Getting list of named graphs","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphList.go:44","func":"github.com/gleanerio/nabu/internal/prune.graphList","level":"info","msg":"Pattern: urn:gleaner.io:TEST:geocodes_demo_datasets:data\n","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphList.go:45","func":"github.com/gleanerio/nabu/internal/prune.graphList","level":"info","msg":"SPARQL: SELECT DISTINCT ?g WHERE {GRAPH ?g {?s ?p ?o} }\n","time":"2023-10-05T10:34:38-07:00"}
Current graph items: 0  Cuurent object items: 18
Orphaned items to remove: 0
Missing items to add: 18
{"difference":0,"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:73","func":"github.com/gleanerio/nabu/internal/prune.Snip","graph items":0,"level":"info","missing":18,"msg":"Nabu Prune","object items":18,"prefix":"summoned/geocodes_demo_datasets","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:105","func":"github.com/gleanerio/nabu/internal/prune.Snip","level":"info","msg":"uploading missing %n objects18","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/graph/insert.go:47","func":"github.com/gleanerio/nabu/internal/graph.Insert","level":"info","msg":"response Status: 404 Not Found","time":"2023-10-05T10:34:38-07:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/graph/insert.go:48","func":"github.com/gleanerio/nabu/internal/graph.Insert","level":"info","msg":"response Headers: map[Access-Control-Allow-Credentials:[true] Access-Control-Allow-Headers:[Authorization,Origin,Content-Type,Accept] Access-Control-Allow-Origin:[*] Content-Type:[text/plain] Date:[Thu, 05 Oct 2023 17:34:39 GMT] Server:[Jetty(9.4.z-SNAPSHOT)] Vary:[Origin] X-Frame-Options:[SAMEORIGIN]]","time":"2023-10-05T10:34:38-07:00"}
   5% |██████                     

Push to multiple endpoints

Would it be a good ideal to allow for pushing to multiple endpoints?

poor mans clustering
dev testing

urn:repo:id or urn:bucket:repo:id or urn:community:repo:id

urn:repo:id

or

urn:bucket:repo:id

noticing that the summoned|milled got dropped, which is good.

Do we want to go one step further?

EC backend code only uses the last two parts of the urn, so does not matter, but if a user replicates an s3 to a different bucket then if a programmer assumes the bucket is part of the urn, then that would not be good.

Might go with

urn:{community}:repo:id

so a urn might be identifier as part of a community to avoid naming clashes

output quad file

Gleaner output triples to runX.
Nabu should do this effort, ability to dump quads to a file, and to an s3 bucket.


Right now the runX outputs are concatenations of .nt files
So they are in a state that does not allow for use when the quad is needed
&also can not be used as-is to reconstruct them, as that info is back
in the pre-concatenated stage.
I heard there might be, or at least could be a flag, to get these .nq files to contain quads

This would really help me, as I like to do quick post-processing on the bulk quad files,
&them upload them, separate from nabu.

Filter on a a list of valid TYPE values on load

Need the ability to filter out (or only allow) certain TYPE values.

This is needed in the case where things like BREADCRUMB types are indexed but we don't want these in the graph.

A utility script to remove these from an object store might help for Gleaner architecture with the object store. However, it would be good to still have this in Nabu as well.

Somewhat related to: gleanerio/gleaner#128

Memory Leak

There a slow leak in nabu? ssiodp using 2.3 gig memory... up from 2.2 an hour and a half ago?

Mem: 14870184K used, 1512776K free, 43792K shrd, 33876K buff, 829740K cached
CPU:  42% usr  20% sys   0% nic  10% idle  25% io   0% irq   0% sirq
Load average: 5.45 5.89 6.02 7/1082 39
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 root     S    2814m  16%   0   0% /nabu/nabu --cfg /nabu/nabuconfig.yaml prefix --prefix prov/ssdbiodp
   33     0 root     S     1672   0%   0   0% sh
   39    33 root     R     1600   0%   1   0% top

put my money on GetS3Bytes...
think it's missing a defer

Data Loading Issue

https://graph.geocodes.earthcube.org/blazegraph/namespace/ecrr
Data loading is an issue: ecrr which was frequently updated:
30627902 triples
delete and reload: 30512 triples
push a second time using nabu: 53676 triples
third time: 76840
prune: 76840 (still)

glcon config directory
ecrr.zip

WikiMedia Updater logic:
https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/eu#runUpdate.sh

https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/Updater.java

add sparql endpoint check

sparql:
  endpoint: https://graph.geocodes.ncsa.illinois.edu/blazegraph/iris_nabu/

looked like it ran fine, but like five progress bars, and zero triples uploaded.

Need to add a check if the spqrql endpoint works.

Error in prune not seen in prefix

See gleanerio/scheduler#10

{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/pipeload.go:56",
  "func": "github.com/gleanerio/nabu/internal/objects.PipeLoad",
  "level": "error",
  "msg": "JSONLDToNQ err: %sunexpected end of JSON input",
  "time": "2023-02-21T00:26:26Z"
}
{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/gets3bytes.go:21",
  "func": "github.com/gleanerio/nabu/internal/objects.GetS3Bytes",
  "level": "info",
  "msg": "Issue with reading an object:  gleaner.oih/summoned/africaioc/ffb59b01cf1d2de175c66576d2b69c7940dda8a5.jsonld",
  "time": "2023-02-21T00:26:26Z"
}
{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/pipeload.go:41",
  "func": "github.com/gleanerio/nabu/internal/objects.PipeLoad",
  "level": "error",
  "msg": "gets3Bytes %v\\nThe specified key does not exist.",
  "time": "2023-02-21T00:26:26Z"
}
{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/graph/jsonldToNQ.go:17",
  "func": "github.com/gleanerio/nabu/internal/graph.JSONLDToNQ",
  "level": "info",
  "msg": "Error when transforming JSON-LD document to interface: unexpected end of JSON input",
  "time": "2023-02-21T00:26:26Z"
}


Add skolemize

Need a function to:

JSON-LD -> n-triples (quads)

n-triples (quads) ------> skolimized blank nodes --------> n-triples (quads)

via https://gleaner.io/id/genid/HASHVALUEHERE

The goal being a function that rolls up a bucket prefix into a single RDF graph for bulk loading into things like Jena via the "Data" HTTP endpoints.

options include:

Though a bit of tech debt, the nanoid is documented and at least in terms of round tripping is issue really.

ref: https://www.w3.org/TR/rdf11-concepts/#section-skolemization

Release graph is triples not quads

The release graph is generating triples, not quads. I really want quads as a sort of "nabu prov" statement. However, since the triples is there now, make this an option. However, I think I want the quads the default and the triples the option.

prov release files

when we do a release, can we also generate a release file for the 'prov'
This would save a step along the way for a fully second call for the 'prov'
and speed loading.

prune... what is happening

pruning with orgs looking for orgs in summmon/org?

and urn pattern looks broken:
Pattern: urn:gleanermilled/iris:
Graph items: 0 Object items: 28 difference: 0
Missing item count: 28
{"Graph items":0,"Missing item count":28,"Object items":28,"difference":0,"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:67","func":"github.com/gleanerio/nabu/internal/prune.Snip","level":"info","msg":"Nabu Prune","prefix":"milled/iris","time":"2023-02-10T14:56:24-08:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:87","func":"github.com/gleanerio/nabu/internal/prune.Snip","level":"info","msg":"uploading missing %n objects28","time":"2023-02-10T14:56:24-08:00"}
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (28/28, 1 it/s)
Exiting.

minio:
    address: oss.geocodes.ncsa.illinois.edu
    port: 443
    ssl: true
    accesskey: accesskey
    secretkey: secretkey
    bucket: gleaner
objects:
    bucket: gleaner
    domain: us-east-1
    prefix:
        - summoned/iris
        - org
    prefixoff: []
sparql:
    endpoint: https://graph.geocodes.ncsa.illinois.edu/blazegraph/namespace/iris_nabu/sparql
    authenticate: false
    username: ""
    password: ""
txtaipkg:
    endpoint: http://0.0.0.0:8000

nabu prune --cfg ../gleaner/configs/summarize/nabu --prefix orgs --prefix milled/iris

or

nabu prune --cfg ../gleaner/configs/summarize/nabu --prefix org --prefix milled/iris

{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/objectlist.go:44","func":"github.com/gleanerio/nabu/internal/prune.ObjectList","level":"info","msg":"gleaner:orgs object count: 10\n","time":"2023-02-10T14:49:35-08:00"}
Pattern: urn:gleanerorgs:
Graph items: 0  Object items: 10  difference: 0
Missing item count: 10
{"Graph items":0,"Missing item count":10,"Object items":10,"difference":0,"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:67","func":"github.com/gleanerio/nabu/internal/prune.Snip","level":"info","msg":"Nabu Prune","prefix":"orgs","time":"2023-02-10T14:49:35-08:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/prune/graphprune.go:87","func":"github.com/gleanerio/nabu/internal/prune.Snip","level":"info","msg":"uploading missing %n objects10","time":"2023-02-10T14:49:35-08:00"}
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/nabu/internal/objects/gets3bytes.go:21","func":"github.com/gleanerio/nabu/internal/objects.GetS3Bytes","level":"info","msg":"Issue with reading an object:  gleaner/summoned/orgs/aquadocs.jsonld","time":"2023-02-10T14:49:35-08:00"}

release failing

Failed to create a release.
And I think if one release fails, any additional releases do not get generated.

@geocodes:~/indexing$ ./glcon nabu release --cfgName aws_geocodes
Using nabu config file: /home/ubuntu/indexing/configs/aws_geocodes/nabu
nabu release called

2023/05/13 13:07:20 Error decoding triples: syntax error: bad IRI: disallowed character ' '

files in opencore on oss.geocodes-dev.earthcube.org
org is empty.

GOROOT=/usr/local/opt/go/libexec #gosetup
GOPATH=/Users/valentin/go #gosetup
/usr/local/opt/go/libexec/bin/go build -ldflags -X main.VERSION=testline -o /private/var/folders/t2/t39bprkn16dg9nr4v6c18v0w0000gn/T/GoLand/___nabu_release_opencore /Users/valentin/development/dev_earthcube/gleanerio/gleaner/cmd/glcon/main.go #gosetup
/private/var/folders/t2/t39bprkn16dg9nr4v6c18v0w0000gn/T/GoLand/___nabu_release_opencore nabu release --cfg configs/opencore_nabu
Using nabu config file: configs/opencore_nabu
nabu release called
2023/05/03 07:06:22 Error decoding triples: 1:195 unexpected 3 as dot (.)

Process finished with the exit code 0
minio:
    address: oss.geocodes-dev.earthcube.org
    port: 443
    ssl: true
    accesskey:
    secretkey:
    bucket: opencore
objects:
    bucket: opencore
    domain: us-west-2
    prefix:
        - summoned/opencoredata
        - org
    prefixoff: []
sparql:
    endpoint: https://graph.geocodes-dev.earthcube.org/blazegraph/namespace/opencore/sparql
    authenticate: false
    username: ""
    password: ""
txtaipkg:
    endpoint: http://0.0.0.0:800

nabu prune all fail on lost connection

from a run.. I think I restarted a frozen graph server.
Run stopped rather than contining, after issue resolved itself

0% |                          | (2048/301912, 14 it/min) [56m17s:349h30m11s]panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x8f66ed]
goroutine 1 [running]:
github.com/gleanerio/nabu/internal/graph.Drop(0xc00017f0b1?, {0xc00233f440, 0x3b})
	/home/runner/work/nabu/nabu/internal/graph/drop.go:35 +0x38d
github.com/gleanerio/nabu/internal/objects.PipeLoad(0xc0004d4380?, 0x0?, {0xc00002603d, 0xa}, {0xc00017f080, 0x38}, {0xc00002e010, 0x4d})
	/home/runner/work/nabu/nabu/internal/objects/pipeload.go:66 +0x414
github.com/gleanerio/nabu/internal/objects.ObjectAssembly(0xc0001777a0?, 0x4?)
	/home/runner/work/nabu/nabu/internal/objects/objectAssembly.go:45 +0x6c8
github.com/gleanerio/nabu/pkg.Prefix(0xbe8840?, 0xc000012018?)
	/home/runner/work/nabu/nabu/pkg/prefix.go:13 +0x74
github.com/gleanerio/nabu/pkg/cli.glob..func5(0x1022b00?, {0xaee538?, 0x4?, 0x4?})
	/home/runner/work/nabu/nabu/pkg/cli/prefix.go:21 +0x69
github.com/spf13/cobra.(*Command).execute(0x1022b00, {0xc000103c80, 0x4, 0x4})
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:860 +0x663
github.com/spf13/cobra.(*Command).ExecuteC(0x1023280)
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
github.com/gleanerio/nabu/pkg/cli.Execute()
	/home/runner/work/nabu/nabu/pkg/cli/root.go:41 +0x25
main.main()
	/home/runner/work/nabu/nabu/cmd/nabu/main.go:13 +0x17
2023-08-09T23:53:07Z ERR | file=/home/runner/work/nabu/nabu/internal/graph/drop.go:33 func=github.com/gleanerio/nabu/internal/graph.Drop msg=Post "https://graph.geocodes-aws-dev.earthcube.org/blazegraph/namespace/test/sparql": dial tcp: lookup graph.geocodes-aws-dev.earthcube.org: i/o timeout 

config file entry for context files

Need to make a config file entry for the context file maps. This is already done in Gleaner so I need to bring over the same approach to Nabu

@valentinedwv This is related to our discussion today I just tagged you in it.

I need this since when Nabu is running as a container launched by Portainer, it needs a more explicate path to the context files otherwise it's a bit of a mess (hard coded mess). So just want to make this a config file entry. The default should be set to just the schema.org context file in the same directory as the executable.

https://github.com/gleanerio/nabu/blob/ce97e03918e53c46375de28603912f313b8c9f32/internal/graph/ldproc.go#LL39C2-L51C3

A cli entry for stats/report separate from prune / snip

I wonder if prune could not be two things:

collect|validate|stats, then pass object to prune/Snip
Then add a cli for stats/report, to just dump a file info from collect|validate|stat, with options for a brief summary or a detailed report with all the URN information.

We could do the same thing in gleaner for the sitemaps, etc.

Release generating http://schema.org

are we using http or https?

<https://ds.iris.edu/ds/products/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/DataCatalog> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/> <http://schema.org/url> "https://ds.iris.edu/ds/products/" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Dataset> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/alternateName> "EQEnergy" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/author> <https://gleaner.io/xid/genid/cglb8mlpv00ll85v7nh0> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/author> <https://gleaner.io/xid/genid/cglb8mlpv00ll85v7nhg> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/author> <https://gleaner.io/xid/genid/cglb8mlpv00ll85v7ni0> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/dateCreated> "2013-06-09T23:22:04.030" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/dateModified> "2017-04-28T16:35:54.824" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/description> "Earthquake energy and rupture durations are estimated following all earthquakes with initial magnitude above M<sub>w</sub> 6.0 and a GCMT moment tensor.  The method follows Convers and Newman, 2011.  These are fully automated and not reviewed by a human." <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/image> "https://ds.iris.edu/media/product/eqenergy/images/eqEnergyLogo_1.png" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/includedInDataCatalog> <https://ds.iris.edu/ds/products/> <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/keywords> "EQEnergy,geophysics,seismic,seismology" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/name> "Earthquake energy & rupture duration" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .
<https://ds.iris.edu/ds/products/eqenergy/> <http://schema.org/url> "https://ds.iris.edu/ds/products/eqenergy/" <urn:gleaner-wf:iris:157135237fd57e12d23a4582ce49f2b7a7fff9ca> .

nq builder

Need a process that changes nt or nq to nq with a context defined by the urn to object path used in the gleaner data architecture.

Take the current nqtontc function and modify it (is it being used somewhere now?) with the object path / name for the RI I use.

bullk blind nodes inccorect?

blind nodes are all _b0, _:b1, etc so not unique in the file

Or maybe it does not matter since they are in thier own graph,

but would it be good for them to be unique?

<https://ds.iris.edu/ds/products/noise-toolkit/> <http://schema.org/name> "The IRIS DMC Noise Toolkit" <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
<https://ds.iris.edu/ds/products/noise-toolkit/> <http://schema.org/url> "https://ds.iris.edu/ds/products/noise-toolkit/" <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
_:b0 <http://schema.org/name> "IRIS Data Products" <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
<https://ds.iris.edu/ds/products/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/DataCatalog> <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
<https://ds.iris.edu/ds/products/> <http://schema.org/url> "https://ds.iris.edu/ds/products/" <urn:gleaner-wf:iris:3b6b1e71db1f017b9f2c446c2af707505e69edb6> .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> <urn:gleaner-wf:iris:3d1828ea2585a565b3d38386eb9303488bc437e7> .
_:b0 <http://schema.org/name> "IRIS Data Products" <urn:gleaner-wf:iris:3d1828ea2585a565b3d38386eb9303488bc437e7> .

prov id's buckets and paths

ProvID's look good.

Should these be modified to new pattern?

		"prov:generated": {
		  "@id": "urn:geocodes:milled:r2r:9be39508f20e4d11d224227572e565e9d64ac488"
		},

"urn:gleaner.io:geocodes:r2r:9be39508f20e4d11d224227572e565e9d64ac488"

and
urn:gleaner.io/id/collection/9be39508f20e4d11d224227572e565e9d64ac488"
become:
urn:gleaner.io/id/collection/{source}/9be39508f20e4d11d224227572e565e9d64ac488"

improve minio connection check

There is a connection check at the pkg/cli/root.go which where there was an error, returned "bucket not found"

Improve the message "Its not bucket found, it's an issue with the configuration. check minio: address, port, ssl"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.