This is just the repo for the web site at gleaner.io. It is shameless copy of the Tailwind Toolbox landing page template.
It's deployed out to a Google object store served via GROW.
Scheduling approaches related to gleaner tooling
License: Apache License 2.0
This is just the repo for the web site at gleaner.io. It is shameless copy of the Tailwind Toolbox landing page template.
It's deployed out to a Google object store served via GROW.
Need to automate the build of the dagster code when a configuration file updates.
Need to document and automate the whole build process when a config file changes. It would be good to do this all the way to docker containers.
Need to diagram this flow out better in the documents.
If there is a error in a run, a container will be left behind.
Probably need to method that wraps a call to a container (in case we use another one).
to be sure it's created, and removed if there is an error.
For testing, it looks like the @graph needs to be moved from the file with all the ops.
someting like this works with @graph removed.
from implnet_ops_geocodes_demo_datasets import geocodes_demo_datasets_gleaner
def test_geocodes_demo_datasets_gleaner():
res = geocodes_demo_datasets_gleaner()
assert res.success
assert res.output_for_node("find_highest_protein_cereal") == "Special K"
Maybe we can us the same set of @ops with parameters/context passed.
There are now separate runstats and repo_{name}_{loaded|issue}.log files.
should these be uploaded
also, should we capture to directories?
Option 1: just put each source in it's source all in one
Option 2:
Can the latest runstat be an artifact... that way we would not need to dig too far after a run
The following is being seen in the prune call but not in prefix for nabu:
{
"file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/pipeload.go:56",
"func": "github.com/gleanerio/nabu/internal/objects.PipeLoad",
"level": "error",
"msg": "JSONLDToNQ err: %sunexpected end of JSON input",
"time": "2023-02-21T00:26:26Z"
}
{
"file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/gets3bytes.go:21",
"func": "github.com/gleanerio/nabu/internal/objects.GetS3Bytes",
"level": "info",
"msg": "Issue with reading an object: gleaner.oih/summoned/africaioc/ffb59b01cf1d2de175c66576d2b69c7940dda8a5.jsonld",
"time": "2023-02-21T00:26:26Z"
}
{
"file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/pipeload.go:41",
"func": "github.com/gleanerio/nabu/internal/objects.PipeLoad",
"level": "error",
"msg": "gets3Bytes %v\\nThe specified key does not exist.",
"time": "2023-02-21T00:26:26Z"
}
{
"file": "/home/fils/src/Projects/gleaner.io/nabu/internal/graph/jsonldToNQ.go:17",
"func": "github.com/gleanerio/nabu/internal/graph.JSONLDToNQ",
"level": "info",
"msg": "Error when transforming JSON-LD document to interface: unexpected end of JSON input",
"time": "2023-02-21T00:26:26Z"
}
Then failure, because one exists
urllib.error.HTTPError: HTTP Error 409: Conflict
returned_value = gleanerio(("gleaner"), "opentopography")
File "/usr/src/app/./ops/implnet_ops_opentopography.py", line 177, in gleanerio
docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1.e9c742fnsem50khimxyxni0as
4a3e056a973f fils/gleaner:v3.0.11-development-df "/gleaner/gleaner --…" 5 hours ago Exited (1) 5 hours ago gleaner01_opentopography
f22d332a1701 fils/gleaner:v3.0.11-development-df "/gleaner/gleaner --…" 10 hours ago Exited (1) 10 hours ago gleaner01_geocodes_demo_datasets
earthcube@ip-172-31-2-108:~$ docker container rm f22d332a1701
f22d332a1701
earthcube@ip-172-31-2-108:~$ docker container rm 4a3e056a973f
Wonder if for a headless, we can't spin up a container dedicated to a that process
Need some logic/documentation/ideas
While we can run locally, the configs need to getup to the PORTAINER_URL
So if a PORTAINER_URL is not equal to a Endpoints.docker.Host
toss cookies ;)
It looks like we will need to use docker contexts to set the endpoint for the docker scripts.
(venv) valentin@MacBook-Pro deployment % docker context show
desktop-linux
(venv) valentin@MacBook-Pro deployment % docker context inspect desktop-linux
[
{
"Name": "desktop-linux",
"Metadata": {},
"Endpoints": {
"docker": {
"Host": "unix:///Users/valentin/.docker/run/docker.sock",
"SkipTLSVerify": false
}
},
"TLSMaterial": {},
"Storage": {
"MetadataPath": "/Users/valentin/.docker/contexts/meta/fe9c6bd7a66301f49ca9b6a70b217107cd1284598bfc254700c989b916da791e",
"TLSPath": "/Users/valentin/.docker/contexts/tls/fe9c6bd7a66301f49ca9b6a70b217107cd1284598bfc254700c989b916da791e"
}
}
]
It would be good to let people know this is not the only way to o this.
AirFlow, Prefect and new ones like Dagu (https://github.com/yohamta/dagu) are all valid alternatives. Indeed, a rather large amount of this could be done via cron too.
Need to be able to cancel a job, and not have the container hang around.
Could a list of sources be the first asset?
Then could sitemaps and information from sitemaps be the next asset to drive the system?
The Iow generator is not doing the time distribution.
Need to take the generator code an make one version of it with arguments for running. Encode these into the Makefile then too and re-name the makefile entries for back command line completion.
The GLEANER_ variables are not getting utiized in a run.
At present I need to review the "ops templates" and review the loading of triples. I think currently just the summoned triples are loaded.
Need to review and, if needed, add the prov triples as a new step in the loading done by Nabu.
Added headless to dagster stack.
If gleaner container is connected manually to the network then we can connect using
Some other changes are on the dev_eco branch: https://github.com/earthcube/scheduler/tree/dev_eco
to remove the passwords from a gleaner/nabu file
https://github.com/gitleaks/gitleaks/blob/master/scripts/pre-commit.py
http://eloquentcode.com/prevent-committing-secrets-with-a-pre-commit-hook
text/plain
, whereas application/xml
is expectedERROR reading sitemap XML
(related to BeBOP-OBON/odis-interface#1 )
Given:
https://geoconnex.us/sitemap.xml
and
https://geoconnex.us/sitemap/usgs/monitoring-location/nwisgw/nwisgw__20.xml
use nwisgw__20 for the name
So the development process for this across the three implementer was hideous...
see title
Figure out how to get the proper internal networking to make the docker headless work with a
http://headless:9000/
call
or do we do a stack for each run with a headless container?
It looks like I can make the sitemap op, job, schedule, into one file and then do an append into the arrays for jobs and schedules in the repo file.
Run the load summary reports:
They should be installed when you install earthcube utilities ;)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.