GithubHelp home page GithubHelp logo

o2r-project / erc-spec Goto Github PK

View Code? Open in Web Editor NEW
7.0 7.0 5.0 9.33 MB

Executable Research Compendium specification and guides

Home Page: https://o2r.info/erc-spec/

License: Creative Commons Zero v1.0 Universal

erc-spec's People

Contributors

diegosiqueir4 avatar eftyk avatar fmazin avatar nuest avatar simonwaldherr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

erc-spec's Issues

Add progress communication to specification

Well-defined log messages during execution could provide progress information, which can be parsed by tools and communicated to the user, during ERC execution.

This could initially be based on the default progress percentage of rmarkdown/knitr.

Authors could add progress information in their scripts, too (e.g. by calling R functions, or by adding well-defined comments).

This requires an extension of the specification.

ERC metadata as part of Rmd header

RMarkdown headers are yaml, the erc.yml file is yaml - why not think about a variant of the ERC where the ERC metadata is actually in the header of the main document?

Allow minimal ERC with only one plot

I like the idea of allowing an ERC to contain only code for one plot. I.e. an R-script plot.R and an output document plot{.png,.pdf}. The former could be the "main document" and the latter the "view document".

I suggest to make sure that the wording still allows this, i.e. that the main document CAN be a script instead of a literate programming file.

If this should be possible, it would be nice to have an example minimal ERC like this, I think it would be easy for people to understand.

@7048730 Thoughts on this?

Manual UI bindings creation

[ outsourced from #31 ]

Users want to write UI bindings directly into RMarkdown themselves, not only create them in a (browser-based) UI-based workflow.

Do we want to work on this now?

How can UI bindings be embedded into RMarkdown?

Validation instructions

There are two major aspects to validation:

  1. Validation of proper reproduction of the contents of the erc.
  2. Validation of the archival-related integrity of all data.

This issue deals with 2.

Taks for the User guide part of the spec:

  • add detailed instruction on how to create a bagit bag with custom properties (such as erc-version tag in bag-info.txt) with a standard tool like loc java bagger
  • add instructions on how to manually validate the bag correctly with a standard tool like loc java bagger

edit: will add this in branch https://github.com/o2r-project/erc-spec/tree/update-eval_1 as it was also part of first feedback

Drop extensions

Move all contents from extensions into the spec. Do not remove any content.

Add a user guide on manual examination (without Docker)

We should add a user guide on how to examine an ERC without Docker and without the reproducibility service and platform/UI.
This could mitigate issues about "What if Docker is not available anymore?" and demonstrate that the information is still there and accessible using the structure required by the spec alone.

  • unpack the archive
  • on the structure of Docker exports (maybe a side note on image squashing)
  • where to find the ERC metadata files
  • how to extract the ERC payload from /erc
  • how to check bag validity on the extracted payload
  • how to find the main and view files (default names, erc.yml)
  • blog post (examining a Docker image without Docker)

Resources

Plain OCI/Docker bundle

Explore how the label mechanisms of Docker and OCI (especially the latter) allow to merge the inner and outer container.

  • How can files in the container be accessed easily? (extract tarball, then make sense of the layers? does squashing help?)
  • How can metadata be accessed? (docker inspect ... command to access the erc.yml)

ERC as RO

Let's package an ERC as a Research Object.

We can re-use a lot of their (meta)data model, especially the added semantics and see how the ERC "(old) simple tools and manual is possible" approach relates to the world of Linked Open Data.

Added concepts: "one click", nested containers, "offline"/self-consistency

See also the disambiguation in the paper

First Comments on spec

  1. "These typically consist of data, code and libraries in executable form which are needed to re-do an analysis, and the outputs of the original analysis." - not an easy sentence
  2. required fields for erc.yml do not match minimal example
  3. "Default command statements of implementing tools" - "for" instead of "of"?
  4. is time zone a MUST?
  5. link to o2r-metadata schema?
  6. Example configuration file at the end? Or a link?
  7. Example docker file?
  8. Is it possible that I can avoid the entire validation process by using the .ercignore file? Should that be possible?
  9. Validation of research results is missing in Validation, right?

I am not sure if I understood each point in sufficient detail. I will re-read it on a later occasion.

Add FAQ/developer note on Singularity

ERCs could just as well use Singularity instead of Docker, and (cf. C4RR workshop) it might be very well suited for reproducible research. Add a statement to the developer guide.

Empty Affiliation Structure

Shouldn't the structure of an empty affiliation be an empty array instead of null as defined here?

This should be according to our definition and would provide a more consistent structure

Current list of minor formatting errors, typos, necessary changes

Fixes:

  • Displayfile frame does not render (404) at erc-spec/user-guide/minimal/
  • template download table is broken at erc-spec/user-guide/template/ although correct md
  • admonition box !!! tip “Example in bagit example is broken at erc-spec/spec/#bagit-outer-container
  • Indicate external links with icon (e. g. 🔗 🔗), e. g. at erc-spec/glossary/#discover

Add section on manipulation

The "manipulation extension" draft was removed, content preserved here:

## UI bindings

How is the user interface defined?

## Using other data

Define in `erc.yml` which files are potential input data which can be exchanged.

```yml
id: adcd
manipulate:
    input_data:
        - filename: are.json
          format: geojson
        - filename: rs.tiff
          format: geotiff

Then: How is external data mounted into the container and where to (what are the paths)?

Validation

How are UI bindings validated/checked?

Update and clarify Docker

Re-check that the export and import can use the ERC identifier as it is in the spec.

  • remove URI as a variant for ERC id and make UUID a must
  • extend core spec that UUID must be created properly during ERC creation
  • to Docker extension, add code statement how to tag the image to section "Docker container", rephrase section to "MUST use these commands to create the image file"
  • image.tar name is a MUST not a SHOULD (also in "Docker container")
  • remove "Docker container" section, we actually want to save the image not export the container
  • "Default control statements" potentially better named "Docker container control"
  • add example for valid but not the default statements (with my_image.tar in load and myimages:latest as tag in the run statement)
  • add example command/implementation hint for extracting the workspace from a container that has completed, e.g. docker cp erc:.../erc /host/path, which must be done so that an ERC can be validated
  • add sentence to Validation in core spec: "Prior to validation the output of an ERC execution must be extracted from the nested runtime environment."
  • add example to "Runtime manifest" in core spec, "e.g. a Dockerfile".
  • add section to Developer Guide: Add section "Implementation hints" and add "How to implement validation" and connect the docker cp stuff with Validation

Integrating expert feedback

[These comments are based on notes and transcripts from a discussion of the ERC specification with publishing domain experts.]

  • dropping extensions is an excellent idea #36
  • a plain R solution is much smaller, consequently less burden for collaboration, but higher burden for preservation
  • need and option to download just main file and data (cf. o2r-project/o2r-platform#26); minimize footprint for specific usages
    • what would be the "dev version" of an ERC? everything but the runtime environment image? The Dockerfile, the RMarkdown document, and the data (explicitly)?
  • community support is more important than technical support for people to be using the specification
    • noted the meta-idea of a discussion forum, e.g. discourse
    • not relevant for the spec
    • add section to user guide (add email, add dicussion forum)
  • authors should not read the ERC spec, which is developer material; authors should be confronted with a very simple system within the submission experience; alternative: guide of required steps
  • system must be as simple as possible, understandable by users if need be - current state is really good
  • automated learning should be considered for user experience, i.e. after an upload give feedback "you have 80% complete, consider these things", comparable to "profile completeness score" > taken into consideration in UI only, not part of spec, rather an application of the spec
  • badges should be considered
  • two possibilities for UI bindings: RMarkdown + UI and write directly into RMarkdown; not being able to do the latter would be a drawback, "scientists don't like interactions" > deferred, see #32
  • evolutionary approach is favoured, i.e. not replacing the article completely but having one/multiple ERC as supplemental material

Spec updates to be done

  • clarify the target audience for spec (devs, not authors) and guide (authors)
  • add reasoning against plain R solution to dev guide
  • how to handle ERC as supplemental material instead of the main published item? what metadata is (not) needed?
  • write concept for completeness score

.erc file extension

Since BagIt is the desired outer container, an ERC can readily be zipped as a single file. Consider .erc as a file extension for the ZIP archive (instead of .zip).

Description of the "Extension" concept and the other concepts

It might be unclear to anyone new to ERC, how the extensions work and what is left, if no extension is used.
The spec documentation should be updated to include a paragraph on this ("the extension concept") and reflect in its structure what is "base" and what is "extension".

Mechanism for intermediate results

Is there a possibility for a transparent mechanism to handle intermediate (calculation intensive) results?

Check with other R packages for handling workflows...

Add single file HTML and MD output of spec and store it in each ERC

use https://github.com/jgrassler/mkdocs-pandoc to create single file, then https://github.com/jgrassler/mkdocs-pandoc#usage-example and http://pandoc.org/demos.html to create

CLI tool

Create a tool that allows to run an ERC from a command-line interface:

erc create /directory /erc-dir
erc package /erc-dir my_research.zip
erc reproduce my_research.zip

Do it with golang :-).

Add possibility to "sign" ERCs

An ERC must be subject to human inspection.

How can we model and trace the involved people in a way that is open to scrutiny?

Examples: reviewers "sign" an ERC which they examined and evaluated, librarians add their signature on receiving and checking a submission to an archive, an author does a self-check and confirms "to his best knowledge" the ERC is OK. Could possibly be done by storing files in the .erc directory.

And blockchains are suppossed to be good for this stuff, too...

Make use of Discover, Examine, Create on top level

We just discussed:

  • Discover
  • Examine
    -- Check
    -- Inspect
    -- Manipilate
    -- Substitute
  • Create

as top level interactions and hence items for the spec.

Tasks for dev branch:

  • add terms to glossary
  • use check instead of validate in spec
  • replace "validation" with "bag validation" whenever this is meant, to avoid confusion

OCI support

The Open Container Initiative (OCI) develops an open specification for container runtime and image, see https://github.com/opencontainers/

The image spec (currently) has flexible annotations, which we can use.

Must analyse the actual differences to Docker first.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.