o2r-project / erc-spec Goto Github PK
View Code? Open in Web Editor NEWExecutable Research Compendium specification and guides
Home Page: https://o2r.info/erc-spec/
License: Creative Commons Zero v1.0 Universal
Executable Research Compendium specification and guides
Home Page: https://o2r.info/erc-spec/
License: Creative Commons Zero v1.0 Universal
Well-defined log messages during execution could provide progress information, which can be parsed by tools and communicated to the user, during ERC execution.
This could initially be based on the default progress percentage of rmarkdown
/knitr
.
Authors could add progress information in their scripts, too (e.g. by calling R functions, or by adding well-defined comments).
This requires an extension of the specification.
Should an ERC be able to announce what ressources (cores, RAM) it needs, or within which limits it should work?
https://peerj.com/articles/cs-112/
Krewinkel A, Winkler R. (2017) Formatting Open Science: agilely creating multiple document formats for academic manuscripts with Pandoc Scholar. PeerJ Computer Science 3:e112 https://doi.org/10.7717/peerj-cs.112
RMarkdown headers are yaml, the erc.yml
file is yaml - why not think about a variant of the ERC where the ERC metadata is actually in the header of the main document?
From the Open Data community, there comes a specification Data Package: https://specs.frictionlessdata.io/data-package/
It's quite simple with some metadata and a list of resources in a JSON file.
StatTag lets users create reproducible docx documents: http://sites.northwestern.edu/stattag/
OOXML being an open format, we should evaluate if a StatTag-based workflow is supported by the current ERC specification.
See presentation 2017-03-17 CWL @ HTS-CSRS "BioCompute Object" Workshop at https://docs.google.com/presentation/d/1a-iQYhu52F5L0-UaD-5mGCpWIJCxdVEH9a1i4Rx8BOA/edit#slide=id.g15b9625092_0_337
CWL also defines runtimes, so there is probably something to learn from that: http://www.commonwl.org/draft-3/CommandLineTool.html#Runtime_environment
I like the idea of allowing an ERC to contain only code for one plot. I.e. an R-script plot.R
and an output document plot{.png,.pdf}
. The former could be the "main document" and the latter the "view document".
I suggest to make sure that the wording still allows this, i.e. that the main document CAN be a script instead of a literate programming file.
If this should be possible, it would be nice to have an example minimal ERC like this, I think it would be easy for people to understand.
@7048730 Thoughts on this?
We could mention packages freezr
and recordr
for
capture data provenance for R scripts and console commands without the need to modify existing R code.
Via https://discuss.ropensci.org/t/track-fast-evolving-custom-r-scripts-via-freezr/903
This is merely related, but interesting if one of the user guides develops into recommendations of day-to-day habits.
[ outsourced from #31 ]
Users want to write UI bindings directly into RMarkdown themselves, not only create them in a (browser-based) UI-based workflow.
Do we want to work on this now?
How can UI bindings be embedded into RMarkdown?
There are two major aspects to validation:
This issue deals with 2.
Taks for the User guide part of the spec:
erc-version
tag in bag-info.txt) with a standard tool like loc java baggeredit: will add this in branch https://github.com/o2r-project/erc-spec/tree/update-eval_1 as it was also part of first feedback
minimal draft, that can be extended in the future.
Move all contents from extensions into the spec. Do not remove any content.
Use LABEL
in Dockerfiles and give example of using docker inspect
to see core metadata.
We should add a user guide on how to examine an ERC without Docker and without the reproducibility service and platform/UI.
This could mitigate issues about "What if Docker is not available anymore?" and demonstrate that the information is still there and accessible using the structure required by the spec alone.
/erc
Explore how the label mechanisms of Docker and OCI (especially the latter) allow to merge the inner and outer container.
docker inspect ...
command to access the erc.yml)Ideas (not necessarily conflicting):
data/erc.yml
, if it exists, it's an ERCExecutable-Research-Compendium: Yes
to bag-info.txt
Let's package an ERC as a Research Object.
We can re-use a lot of their (meta)data model, especially the added semantics and see how the ERC "(old) simple tools and manual is possible" approach relates to the world of Linked Open Data.
Added concepts: "one click", nested containers, "offline"/self-consistency
See also the disambiguation in the paper
I am not sure if I understood each point in sufficient detail. I will re-read it on a later occasion.
add metadata to connect which metadata file uses which schema and which contained file is the actual schema
A core point of containers is not to include the kernel but use that from the host. This means for complete metadata we must include the kernel version into erc.yml
.
What other things are not captured by the container?
relatedIdentifiers
that refers to the main publications and possibly other ERC supplements with persistent identifiers.ERCs could just as well use Singularity instead of Docker, and (cf. C4RR workshop) it might be very well suited for reproducible research. Add a statement to the developer guide.
Something along the lines of "this is how you can check before uploading if we can create a runtime manifest (using containerit
) and what kind of metadata we will be able to extract (with o2r-meta extract
)".
Shouldn't the structure of an empty affiliation
be an empty array
instead of null
as defined here?
This should be according to our definition and would provide a more consistent structure
Fixes:
!!! tip “Example
in bagit example is broken at erc-spec/spec/#bagit-outer-container🔗
), e. g. at erc-spec/glossary/#discoverThe "manipulation extension" draft was removed, content preserved here:
## UI bindings
How is the user interface defined?
## Using other data
Define in `erc.yml` which files are potential input data which can be exchanged.
```yml
id: adcd
manipulate:
input_data:
- filename: are.json
format: geojson
- filename: rs.tiff
format: geotiff
Then: How is external data mounted into the container and where to (what are the paths)?
How are UI bindings validated/checked?
Re-check that the export and import can use the ERC identifier as it is in the spec.
id
and make UUID
a mustUUID
must be created properly during ERC creationimage.tar
name is a MUST
not a SHOULD
(also in "Docker container")my_image.tar
in load and myimages:latest
as tag in the run statement)docker cp erc:.../erc /host/path
, which must be done so that an ERC can be validateddocker cp
stuff with ValidationDescribe how to handle large data files in an ERC using git and how this could be integrated with https://github.com/mjordan/GitBags
[These comments are based on notes and transcripts from a discussion of the ERC specification with publishing domain experts.]
in the long run, publish repo e. g. on zenodo, include doi for the spec on the pages
Try out udocker to run an ERC, and if it works, write a short guide and discuss pros & cons.
TBD
Since BagIt is the desired outer container, an ERC can readily be zipped as a single file. Consider .erc
as a file extension for the ZIP archive (instead of .zip
).
Prefer checksums from cryptographic hash functions that have not yet been broken by collisions.
As soon as supported by bagit standard and implementations, we should go for sha3
. Bagit is likely to support multiple hash functions and not require this high-quality one itself, see also LibraryOfCongress/bagit-python#86
It might be unclear to anyone new to ERC, how the extensions work and what is left, if no extension is used.
The spec documentation should be updated to include a paragraph on this ("the extension concept") and reflect in its structure what is "base" and what is "extension".
Is there a possibility for a transparent mechanism to handle intermediate (calculation intensive) results?
Check with other R packages for handling workflows...
use https://github.com/jgrassler/mkdocs-pandoc to create single file, then https://github.com/jgrassler/mkdocs-pandoc#usage-example and http://pandoc.org/demos.html to create
spec_version
!) > #10.erc/erc_raw.yml
@7048730Create a tool that allows to run an ERC from a command-line interface:
erc create /directory /erc-dir
erc package /erc-dir my_research.zip
erc reproduce my_research.zip
Do it with golang
:-).
An ERC must be subject to human inspection.
How can we model and trace the involved people in a way that is open to scrutiny?
Examples: reviewers "sign" an ERC which they examined and evaluated, librarians add their signature on receiving and checking a submission to an archive, an author does a self-check and confirms "to his best knowledge" the ERC is OK. Could possibly be done by storing files in the .erc
directory.
And blockchains are suppossed to be good for this stuff, too...
We just discussed:
as top level interactions and hence items for the spec.
Tasks for dev branch:
The Open Container Initiative (OCI) develops an open specification for container runtime and image, see https://github.com/opencontainers/
The image spec (currently) has flexible annotations, which we can use.
Must analyse the actual differences to Docker first.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.