GithubHelp home page GithubHelp logo

igsn / igsn-json Goto Github PK

View Code? Open in Web Editor NEW
2.0 15.0 1.0 140 KB

Test schema repo for IGSN 2040 Architecture sprint

Home Page: https://igsn.github.io/

License: MIT License

Python 100.00%
json-ld jsonschema samples web-architecture persistent-identifiers metadata

igsn-json's Introduction

IGSN JSON development for the IGSN 2040 Sprint 1 (May/June 2020)

Build Status

Welcome to the test schema repo for the IGSN 2040 Architecture Sprint!

Here's a list of important things in this repository to get you going quickly:

Otherwise feel free to dive in and get your hands dirty. We're all making it up as we go along so don't feel like you've got to know what you're saying before asking a question.

Developing schemas and JSON-LD contexts

We're going to build out the shared schemas and contexts in this repo - all of the references in the schema are to the raw versions of these documents on the master branch of this repo.

So if you want to add new terms etc, then open an issue and we can work on a pull request. If you're not sure about the code side then just open an issue in this repo and we'll find some people that know what to do to help you out.

Meeting schedule

We're planning on having a few video-to-video meetings over the course of the sprint. Please email Sarah Ramdeen for the meeting links.

As we're spread all around the world getting a time that suits everyone is a bit challenging. If you can't make these we will post recordings of these here:

Date Time (UTC) Resources
Monday May 25 10am Video, Slides, Notes
Tuesday June 2nd 10am Video, Notes
Friday June 5th 10am Video, Notes

Getting help

If something's not clear, raise an issue in the issue tracker.

If you're not sure where something goes or you'd rather talk to a human than Octocat then please get in touch with one of these lovely people: Jens Klump (in Perth/Oceania time zones, [email protected], Doug Fils (in central US timezones, [email protected]) or Anusuriya Devaraju (in European timezones, [email protected])

Context

The current implementation of IGSN is an excellent combination of lean centralised functions that are supported by federated services. This has given IGSN the ability to adapt to requirements arising from new communities joining the system. To accommodate a more diverse community of users and a much larger number of sample registrations requires a number of changes.

In particular, there are two new roles within the IGSN architecture โ€“ that of an allocating agent, who publishes data in a minimal and cost-effective way, and the data aggregators who provide services that consume and republish that data in more powerful ways (but which potentially come with higher support and service costs). Both of these roles are currently carried out by allocating agents, but by forcing agents to also provide aggregated data services raises the resources required to become an agent and makes the role less sustainable in the longer term.

Under the new scheme, we propose that agents simply publish JSON documents on their landing pages which aggregators can crawl to uncover new data, along with a sitemap pointing to all the landing pages which contain sample data. Agents would not be responsible for providing high-frequency or high-volume query support against this data โ€“ that would be the role of the aggregator. Aggregators would have a responsibility to their end-users to provide services that are performant and scientifically useful, removing this burden from the agents. For the relationship between agents and aggregators to work effectively, we need to outline the contract governing the relationship. In the long run, we want these roles to be as decoupled as possible, but while we are developing recommendations it would be good to have both in the room to ensure that we are balancing the needs of the two roles effectively.

In this sprint we want to test and evaluate the implementation of sharing and aggregating IGSN metadata between IGSN Agents and Metadata Aggregators.

Aims of the sprint

  • Determine how difficult (or not) it will be for agents to make the required changes to their landing pages to conform to the new requirements and provide crawler guidance in robot.txt and sitemap.xml files.
  • Determine how difficult or not it is will be to develop new web crawlers for aggregators to aggregate data.
  • Uncover any new ways of using aggregated data that might be of interest to the community
  • Determine what services IGSN eV needs to provide to agents to support their publication role (e.g. publication of JSON Schema, JSON LD contexts etc, authentication, role-based access etc)

Developing schemas and running tests

We're using some lightweight checks with pytest as a testing harness. Take a look at the python files in the tests folder for how these work. Basically we just fire a bunch of JSON fragments that should validate against our schemas. This has the bonus of checking that all our JSON references etc are correct.

We're using pipenv to manage the Python environment and dependencies. To install pipenv just do

pip install pipenv

and then you can install the environment with

pipenv install --dev

Pipenv creates an isolated environment for you - you will need to run pytest inside of this. To do this, just do

$ cd /path/to/igsn_json
$ pipenv run python -m pytest
================================== test session starts ===================================
platform win32 -- Python 3.7.5, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- c:\users\jesse\.virtualenvs\igsn_json-uu6qpojl\scripts\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\jesse\OneDrive\Documents\IGSN\igsn_json, inifile: setup.cfg
collected 10 items

tests/test_registration_schema.py::test_igsn_registration[obj0-True] PASSED         [ 10%]
tests/test_registration_schema.py::test_igsn_registration[obj1-False] PASSED        [ 20%]
tests/test_registration_schema.py::test_igsn_registration[obj2-False] PASSED        [ 30%]
tests/test_registration_schema.py::test_igsn_registration[obj3-False] PASSED        [ 40%]
tests/test_registration_schema.py::test_igsn_registration[obj4-True] PASSED         [ 50%]
tests/test_registration_schema.py::test_igsn_registration[obj5-True] PASSED         [ 60%]
tests/test_registration_schema.py::test_igsn_registration[obj6-True] PASSED         [ 70%]
tests/test_registration_schema.py::test_igsn_registration[obj7-False] PASSED        [ 80%]
tests/test_registration_schema.py::test_igsn_registration[obj8-True] PASSED         [ 90%]
tests/test_registration_schema.py::test_igsn_registration[obj9-True] PASSED         [100%]

- generated xml file: C:\Users\jesse\OneDrive\Documents\IGSN\igsn_json\tests\reports\test-output.junit.xml -
=================================== 10 passed in 1.30s ===================================

in the root folder of the repository.

We're also using Travis (need to update the Travis link once public) to automatically check pull requests, so you may get asked to clean things up if your tests are failing before we can merge your contribution. If you've got any questions about this just ping @jesserobertson.

igsn-json's People

Contributors

fils avatar jklump avatar kitchenprinzessin3880 avatar lulin-song avatar ramdeensarah avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

jesserobertson

igsn-json's Issues

Design vocabulary design using Github dev tools

I think it would be worthwhile to have a crack at a Github-centered approach to developing vocabularies with the community. We've had some pretty good discussions from tech and non-tech people using the Github discussion boards, and I think we could maintain schemas and contexts alongside this, coupled with a more in-depth use of GitHub's code review tools.

I'd suggest the following groundwork needs to be laid:

  • A good contributing.md doc that lays out how you could use pull requests and code reviews to make changes to central contexts and schemas
  • A better explanation of what goes where in the repo
  • A way of linking issues to new branches (have installed
  • [maybe] agreeing on a workflow for linking issues to our project boards so that you can see what bits of the vocab are being worked on

Then we'd have to do the hard work of getting people to actually engage but at least there might be clear guidelines.

@jklump @dr-shorthair @fils @ramdeensarah thoughts?

CI/CD to validate documents in repo

We probably need to have some way of validating and linting the JSON docs so that they stay valid when we have a bunch of people editing them...

Merge in context.jsonld to registration schema

Would be good to have the JSON-LD context that @fils has worked up sitting alongside the JSONSchema defns and some tests to ensure that these are in sync.

To do:

  • Copy over the existing work from @fils in the context folder into the schema.igsn.org tree
  • Build a set of unit tests which use pyld to build the JSON-LD representation and check that everything is valid
  • Add the @id tags to the JSONSchema (@fils is there anything else we need in the straight schema checks to make sure that things are valid on the graph side?)
  • Finish updating examples/examples.md with the @id tags
  • Get @fils and @dr-shorthair to review...

Update URLs in schemas to point to raw Github links

Currently the schemas point to bogus schema.igsn.org links but these aren't actually published anywhere yet.

We could point these to the raw GitHub links on master so that they work when you don't have the entire schema present.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.