GithubHelp home page GithubHelp logo

gleanerio / scheduler Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 3.0 38.38 MB

Scheduling approaches related to gleaner tooling

License: Apache License 2.0

Python 95.69% Shell 0.16% Jupyter Notebook 4.07% Makefile 0.06% Dockerfile 0.01%

scheduler's Introduction

scheduler's People

Contributors

fils avatar jmckenna avatar valentinedwv avatar

Watchers

 avatar  avatar

scheduler's Issues

Build on commit

Need to automate the build of the dagster code when a configuration file updates.

Need to document and automate the whole build process when a config file changes. It would be good to do this all the way to docker containers.

Need to diagram this flow out better in the documents.

Containers... use try.. finally to remove containers

If there is a error in a run, a container will be left behind.
Probably need to method that wraps a call to a container (in case we use another one).
to be sure it's created, and removed if there is an error.

Testing/Refactoring

For testing, it looks like the @graph needs to be moved from the file with all the ops.

someting like this works with @graph removed.

from implnet_ops_geocodes_demo_datasets import geocodes_demo_datasets_gleaner
def test_geocodes_demo_datasets_gleaner():
    res = geocodes_demo_datasets_gleaner()
    assert res.success
    assert res.output_for_node("find_highest_protein_cereal") == "Special K"

Maybe we can us the same set of @ops with parameters/context passed.

logs capture runstats, and repository logs, and directory?

There are now separate runstats and repo_{name}_{loaded|issue}.log files.

should these be uploaded
also, should we capture to directories?

Option 1: just put each source in it's source all in one

  • source/

Option 2:

  • run (gleaner/nabu) or run/source
  • runstats (place for just the stats, and also the ec utilities stats)
  • loaded/source (place for the repo_{loaded|issue} files

Can the latest runstat be an artifact... that way we would not need to dig too far after a run

Error in prune but not prefix

The following is being seen in the prune call but not in prefix for nabu:

{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/pipeload.go:56",
  "func": "github.com/gleanerio/nabu/internal/objects.PipeLoad",
  "level": "error",
  "msg": "JSONLDToNQ err: %sunexpected end of JSON input",
  "time": "2023-02-21T00:26:26Z"
}
{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/gets3bytes.go:21",
  "func": "github.com/gleanerio/nabu/internal/objects.GetS3Bytes",
  "level": "info",
  "msg": "Issue with reading an object:  gleaner.oih/summoned/africaioc/ffb59b01cf1d2de175c66576d2b69c7940dda8a5.jsonld",
  "time": "2023-02-21T00:26:26Z"
}
{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/objects/pipeload.go:41",
  "func": "github.com/gleanerio/nabu/internal/objects.PipeLoad",
  "level": "error",
  "msg": "gets3Bytes %v\\nThe specified key does not exist.",
  "time": "2023-02-21T00:26:26Z"
}
{
  "file": "/home/fils/src/Projects/gleaner.io/nabu/internal/graph/jsonldToNQ.go:17",
  "func": "github.com/gleanerio/nabu/internal/graph.JSONLDToNQ",
  "level": "info",
  "msg": "Error when transforming JSON-LD document to interface: unexpected end of JSON input",
  "time": "2023-02-21T00:26:26Z"
}


Remove container on confilct 409

Info: https://portainer.geocodes-aws-dev.earthcube.org/api/endpoints/2/docker/containers/create?name=gleaner01_opentopography

Then failure, because one exists

urllib.error.HTTPError: HTTP Error 409: Conflict
  
    returned_value = gleanerio(("gleaner"), "opentopography")
  File "/usr/src/app/./ops/implnet_ops_opentopography.py", line 177, in gleanerio

docker container ls -a

CONTAINER ID   IMAGE                                      COMMAND                  CREATED          STATUS                      PORTS                                                                      NAMES

1.e9c742fnsem50khimxyxni0as
4a3e056a973f   fils/gleaner:v3.0.11-development-df        "/gleaner/gleaner --…"   5 hours ago      Exited (1) 5 hours ago                                                                                 gleaner01_opentopography
f22d332a1701   fils/gleaner:v3.0.11-development-df        "/gleaner/gleaner --…"   10 hours ago     Exited (1) 10 hours ago                                                                                gleaner01_geocodes_demo_datasets


earthcube@ip-172-31-2-108:~$ docker container rm f22d332a1701
f22d332a1701
earthcube@ip-172-31-2-108:~$ docker container rm 4a3e056a973f

Headless

Wonder if for a headless, we can't spin up a container dedicated to a that process

Use Docker Contexts to set docker endpoints in scripts

Need some logic/documentation/ideas

While we can run locally, the configs need to getup to the PORTAINER_URL

So if a PORTAINER_URL is not equal to a Endpoints.docker.Host

toss cookies ;)

It looks like we will need to use docker contexts to set the endpoint for the docker scripts.

(venv) valentin@MacBook-Pro deployment % docker context show                 
desktop-linux
(venv) valentin@MacBook-Pro deployment % docker context inspect desktop-linux
[
    {
        "Name": "desktop-linux",
        "Metadata": {},
        "Endpoints": {
            "docker": {
                "Host": "unix:///Users/valentin/.docker/run/docker.sock",
                "SkipTLSVerify": false
            }
        },
        "TLSMaterial": {},
        "Storage": {
            "MetadataPath": "/Users/valentin/.docker/contexts/meta/fe9c6bd7a66301f49ca9b6a70b217107cd1284598bfc254700c989b916da791e",
            "TLSPath": "/Users/valentin/.docker/contexts/tls/fe9c6bd7a66301f49ca9b6a70b217107cd1284598bfc254700c989b916da791e"
        }
    }
]

List of sources as an asset

Could a list of sources be the first asset?
Then could sitemaps and information from sitemaps be the next asset to drive the system?

Time schedule distribution code not in the IoW generator?

The Iow generator is not doing the time distribution.

Need to take the generator code an make one version of it with arguments for running. Encode these into the Makefile then too and re-name the makefile entries for back command line completion.

Duplication

So the development process for this across the three implementer was hideous...

  • generator script should be one file (and really should be Go or Python and use real template files)
  • The repository directory and file are not used, remove
  • At present the template files are all the same, don't need them three times

sitemap indexing

It looks like I can make the sitemap op, job, schedule, into one file and then do an append into the arrays for jobs and schedules in the repo file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.