GithubHelp home page GithubHelp logo

biocompute-objects / bco_documentation Goto Github PK

View Code? Open in Web Editor NEW
16.0 11.0 12.0 57.3 MB

Repository for documentation to support the IEEE 2791-2020 standard. Please see our home page for communications/publications:

Home Page: http://biocomputeobject.org/

License: BSD 3-Clause "New" or "Revised" License

Python 16.60% Shell 3.01% HTML 54.86% CSS 23.08% JavaScript 2.45%
standardization bioinformatics workflow science-communication hts-computations

bco_documentation's People

Contributors

corburn avatar fochtmanb avatar hadleyking avatar james-r-jones avatar jeet-vora avatar jpat1546 avatar kee007ney avatar kkaragiannis avatar ktaletsk avatar mattheweber avatar mr-c avatar nanxstats avatar rajamazumder avatar rykahsay avatar stain avatar tianywan819 avatar tiwa1154 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bco_documentation's Issues

Are arbitrary extra keys allowed?

In the example for structured_name the arbitrary key taxonomy is introduced. (See #14)

Several other examples also use extra keys.

It might be powerful to allow BCO extensions to add more fields, but I thought we already had extension_domain for that.

It must be declared if arbitrary extra keys are allowed or not (and where). If they are allowed I would recommend they are namespaced so that they are not in conflict across vendors or future BCO versions.

Recommend UUIDs for BCO_id?

The field BCO_id is defined as

A unique identifier that should be applied to each BCO instance. These can be assigned by a BCO database engine. IDs should be URIs (expressed as a URN or URL). IDs should never be reused.

Hiroki Morizono suggested that we recommend using UUIDs (sometimes called GUIDs) as they are easy to generate and also to keep unique.

UUIDs can be URNs, e.g. urn:uuid:2bf8397b-9aa8-47f2-80a7-235653e8e824 (which are then not resolvable) or be used as part of an in-house identifier, http://repo.example.com/bco/2bf8397b-9aa8-47f2-80a7-235653e8e824

I don't think we should mandate which way - although I prefer the second form as it can be resolvable (e.g. click the hyperlink). We should probably only have a soft recommendations for UUIDs, something like:

It is RECOMMENDED that the BCO identifier is based on a UUID to ensure uniqueness, either as a location-independent URN (e.g. urn:uuid:2bf8397b-9aa8-47f2-80a7-235653e8e824) or as part of an identifier permalink, (e.g. http://repo.example.com/bco/2bf8397b-9aa8-47f2-80a7-235653e8e824)

A related question would be if a change of provenance_domain/version means the BCO_id should be changed or not.

Update environment_variables schema

@HadleyKing

environment_variables schema:

    "environment_variables": {
      "type": "object",
      "description": "Environmental parameters that are useful to configure the execution environment on the target platform.",
      "additionalProperties": false,
      "patternProperties": {
        "^[a-zA-Z_]+[a-zA-Z0-9_]*$": {
          "type": "string"
        }
      }
    }

The regex is based on the following:

http://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap08.html
Environment variable names used by the utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 consist solely of uppercase letters, digits, and the '_' (underscore) from the characters defined in Portable Character Set and do not begin with a digit. Other characters may be permitted by an implementation; applications shall tolerate the presence of such names. Uppercase and lowercase letters shall retain their unique identities and shall not be folded together. The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities.
Note:
Other applications may have difficulty dealing with environment variable names that start with a digit. For this reason, use of such names is not recommended anywhere.

script and script_access_type

I would drop script_access_type and define script simply as a URI:

https://tools.ietf.org/html/rfc8089
file://path/to/rfc8089.txt

"script": {
    "$ref": "#/definitions/uri"
}

Allow relative URI in input_list/output_list "address"

input_list (and output_list) uses keys address and access_time which are not explained.

The text says:

...expressed as a URN or URL

However Joseph Nooraga comments:

Needs clarity. Is this indicating that all data being used needs to be addressable via HTTP, and should remain so for the life of the BCO?

@rajamazumder responded:

"or a unique location in a file system"? I forgot what the discussion around this was.

I think we do need to permit relative references here, see for example Dataset_BCO_example that uses relative URIs:

"input_list":[
  "human_protein_position_pmid_id_aminoacid_glytoucan_2018_09_04_07_51_27.txt"
],

To avoid BCO parsers having to second-guess if h:/file.txt is a URI or a file location we should say that this must be an absolute URI or a relative URI reference. If we say it is always like that it means for instance that spaces in filenames are always URI escaped and have / forward slashes:

```json
"input_list":[
   "nested%20folder/file_with_50%25_percent.txt"
],

It must made clear that the relative URIs are relative to the location of the BCO JSON file and that file name must be assumed to be case-sensitive.

If this is found in D:\Submissions\bco15\bco.json then this would mean the file D:\Submissions\bco15\nested folder\file_with_50%percent.txt - or file:///d:/Submissions/bco15/nested%20folder/file_with_50%25_percent.txt as absolute (but local) file: URI

This issue relates to packaging and distribution of BCOs which is currently undefined.

UVP-BCO@c33e365 feedback

@HadleyKing following up on biocompute-objects/UVP-BCO@c33e365#commitcomment-31591403

pipeline_steps validation

Inserting "additionalProperties": false, into the pipeline_steps definition would catch errors such the following prerequisites typo:

jsonschema.exceptions.ValidationError: Additional properties are not allowed ('prerequisites' was unexpected)

uri-in-uri stutter

Fields such as the following could be flattened using the JSON Schema allOf property:

"software_prerequisites": [{
    "name": "BEDtools",
    "version": "2.17.0",
    "uri": {
        "uri": "http://example.com/example"
    }
}]
"software_prerequisites": {
  "type": "array",
  "description": "Minimal necessary prerequisites, library, tool versions needed to successfully run the script to produce BCO.",
  "items": {
    "allOf": [
      { "$ref": "biocomputeobject.json#/definitions/uri" },
      {
        "type": "object",
        "description": "A necessary prerequisite, library, or tool version.",
        "required": [ "name", "version" ],
        "additionalProperties": false,
        "properties": {
          "name": {
            "type": "string",
            "description": "Names of software prerequisites",
            "examples": [ "HIVE-hexagon" ]
          },
          "version": {
            "type": "string",
            "description": "Versions of the software prerequisites",
            "examples": [ "babajanian.1" ]
          },
        },
      }
    ]
  }
},
"software_prerequisites": [{
    "name": "BEDtools",
    "version": "2.17.0",
    "uri": "http://example.com/example"
}]

What is a derived_from objectId

derived_from says:

If the object is derived from another, this field will specify the parent object, in the form of the ‘objectid’. If the object inherits only from the base BioCompute Object or a type definition than the value here is null.

Is is unclear what is an objectid. Is this different from a BCO_id of the other BCO? (which we said was a URI)

The example is shown with null - rather this should be shown with a value.

See also #9

Define keys of review object

The review object is only shown by example.

The sub-keys reviewer_comment and reviewer must be further defined. For instance, why is reviewer_comment a list?

The possible values for status should be defined as a bullet-point list rather than inline in a text, so that is clear what are the only possible values.

Clarify script_access_type vs script

script_access_type feels a bit contrived, as it is defining the type of script key, making it hard to validate/use alone.

The script key however does not explain at all how an inline script can be used.

Other parts of BCO use sub-objects like

 "source": {
    "address": "http://example.com/file.txt"
}

It might make sense to do a similar approach here, where one can provide either address for URI (potentially relative #23 for files) or value for inline scripts (but preferably not both!) - thus one can remove script_access_type

remove 'additionalProperties: false'

TODO: remove 'additionalProperties: false' after handling inherited types is resolved. In the meantime it was set to avoid forgetting to write a schema for a new property because it is silently ignored during validation"

This should be done after resolving #11

script_driver values not clearly defined

script_driver defines a couple of example values:

hive, cwl-runner, shell.

The text needs to be refined to either clearly define possible values (and eventual extensions), rather than listing these as examples from thin air.

This sentence feels out of place:

It is noteworthy to mention that scripts and script drivers by themselves can be objects. These objects can exist in internal (BCO) or external databases and be publicly or privately accessible.

What does this mean? We can instead use a { json } object here with undefined keys? Suggest to remove.

Recommend Semantic Versioning for version field?

The version field defines briefly what constitutes a change of a BCO.

We should recommend using https://semver.org/ (e.g. 1.2.0) so that there is also a clear semantics on how to compare version numbers to determine which BCO is "newer" in what way.

As pointed out in #8 we also need to be clear if such a change should constitute a change of BCO_id or not.

Explain template language

The usability_domain seems to use some kind of template language with examples like [SO:0000694].

I think this is what is alluded to in external-references - but it must be made clear for each field where such [expansions] are allowed/expected or not, as well as defining their syntax.

Execution Domain

Issues from execution-domain

  • Maybe script_access_type and script properities can be tied together (#26)
  • script.uri
    URI values require regex
  • pipeline_version should not be required (is listed as required in schema)
    (maybe it is not necessary here)
  • platform maybe not needed in this domain

In general, this domain needs more refinement

  • domain_prerequisites should be renamed (maybe "data_endpoint")
    domain_prerequisites.name should also be renamed
    regex for domain_prerequisites.url
  • env_parameters should a simple dictionary of a list of dictionaries
    Should avoid saying "Environmental parameters" --> should be environment variables

How are templates expanded in structured_name?

structured_name seems to define a simple template system:

This field can refer to other fields within the same or other objects. For example, a string like "HCV1a [taxonomy:$taxonomy] mutation detection"

It is not defined what is the syntax for the $magic variable, neither if this is restricted to looking up direct neighbouring keys of provenance_domain.

The example also seems to imply that adding arbitrary non-namespaced keys like taxonomy is allowed, but nowhere else is this clarified.

This should be clarified or removed.

License of the BCO Specification repository?

What should be the license of this BCO_Specification repository? Presumably we want this to be Open Access as it's on GitHub and our IEEE Open Source Pilot part will reference this repo?

Normally for documents Creative Commons BY 4.0 is a good choice, however it is not recommended for software.

A technical specification with various JSON examples is somewhat in-between a Document and Software. The line moves more towards Software when we introduce formal schemas that implementer might want to copy into their code.

So our license should presumably be something that is easy to integrate into other commercial and open source projects, like Apache License v2.0 which conveniently also cover contributions as well as protection against patent traps.

Note that for relicensing we should ideally ask for permission from every BCO copyright holder as the previous Google Docs document never had a license or Intellectual Property section. In reality only those who contributed "substantial work" (e.g. a paragraph) would own copyright.

Conflict of interest

Note that I am probably biased above. I am both a Apache Software Foundation member and on the Common Workflow Language leadership team, which use Apache License for the CWL specifications.

Rephrase/remove confusing Seven Bridges script reference

script says:

This may be a reference to Galaxy Project or Seven Bridges Genomics pipeline, a Common Workflow Language (CWL) object in GitHub, a High-performance Integrated Virtual Environment (HIVE) computational service or any other type of script.

In a comment @mr-c says:

SBG exclusively used CWL, so that is redundant. This sentence confuses platform providers with workflow technologies/standards.

Suggestion is to remove or rephrase reference to Seven Bridges Genomics pipeline as a platform provider is different from a workflow technology.

(It might still make sense to link to the pipeline in SBG platform, but this link will most likely not be directly to the script and might need a different key)

ECO - Evidence and Conclusion Ontology

The error_domain should use ECO to describe the results.
This would mean updating the text as well as creating a good example. @rajamazumder and @openbox-bio are currently working on an example that may be a good test case.

Other suggestions/comments welcome.
Currently this is only described as follows:

The empirical error subdomain contains the limits of detectability, false positives, false negatives, statistical confidence of outcomes, etc. This can be measured by running the algorithm on multiple data samples of the usability domain or in carefully designed in-silico spiked data. For example, a set of spiked, well-characterized samples can be run through the algorithm to determine the false positives, negatives and limits of detection.
The algorithmic subdomain is descriptive of errors that originated by fuzziness of the algorithms, driven by stochastic processes, in dynamically parallelized multi-threaded executions, or in machine learning methodologies where the state of the machine can affect the outcome. This can be measured in repeatability experiments of multiple runs or using some rigorous mathematical modeling of the accumulated errors. For example: bootstrapping is frequently used with stochastic simulation based algorithms to accumulate sets of outcomes and estimate statistically significant variability for the results.

Maybe we could incorporate this elsewhere too?

digital signature: needs work

How to generate using MD5? (can't include a signature inside itself)

  • I recommend another field to specify the algorithm

Explain keys of prerequisite

prerequisite has inconsistent definition:

A list of text values to indicate any packages or prerequisites for running the tool used.

Yet it is shown as example with sub-keys that are not defined anywhere:

                    "prerequisite": [
                        {
                            "name": "Hepatitis C virus genotype 1", 
                            "source": {
                                "address": "http://www.ncbi.nlm.nih.gov/nuccore/22129792",
                                "access_time": "2017-01-24T09:40:17-0500"
                            }
                        }, 
                        {
                            "name": "Hepatitis C virus type 1b complete genome", 
                            "source": {
                                "address": "http://www.ncbi.nlm.nih.gov/nuccore/5420376",
                                "access_time": "2017-01-24T09:40:17-0500"
                            }
                        }, 

The text needs to explain what is the meaning of name, source, address and access_time. Some examples use other keys like uri version and sha1_chksum which are not defined anywhere.

If prerequisite is a free-for-all in terms of keys this should be defined, although it can be argued over how useful that will be.

Description domain

Issues from description_domain.md

  • xref.access_time
    regex for datetime but this mayebe already validated in first level
  • pipeline_steps.tools
    schema says it is required, spec text should say the same thing
  • pipeline_steps.tool
    should is redundant property (pipeline_steps should be flat list of
    step objects)
  • pipeline_steps.step_number
    is an integer in the schema but the spec text has it as a string
  • pipeline_steps.prerequisite.uri.access, pipeline_steps.prerequisite.uri.address
    regex required
  • pipeline_steps.input_list,pipeline_steps.output_list
    regex for url values of these arrays

Develop a spec release protocol

We need to develop a spec release protocol so we can update in an orderly fashion. A few suggestions for what it should contain to start:

Also, where should these policies be stated? In the README, in a new user guide, or in a new document?

Avoid null fields, but declare if optional or required

Several fields are documented with null in their examples.

I think in general these fields should rather be optional, as otherwise we have some kind of distinction of a field missing vs. it being present and having null as value. Sometimes this is appropriate (e.g. unknown vs. nothing), but I don't think that is the intention here.

If some fields can be null and/or missing, then we need to be explicit about that, as parsers would need to handle that the value might not be there.

Fix inconsistencies in HCV1a.json example

The example HCV1a.json includes some keys not defined elsewhere (uri, sha1_chksum):

            {
                "name": "HIVE-heptagon", 
                "version": "albinoni.2",
                "uri": {
                    "address": "https://hive.biochemistry.gwu.edu/dna.cgi?cmd=dna-heptagon&cmdMode=-",
                    "access_time": "2017-01-24T09:40:17-0500",
                    "sha1_chksum": null
}

While the BCO do permit arbitrary keys for software_prerequisites the example should only use values defined in the spec.

One error_domain is listed twice, with and without spaces (and different values!):

     "false positive mutation calls discovery": "<0.0005", 
     "false_positive_mutation_calls_discovery": "<0.00005", 

Access to FTP is used without hostname, but this behavior is not defined in domain_prerequisites

            {
                "name": "access to ftp", 
                "url": "ftp://:22/"
}, 

Similarly this abstract example should be removed as this "concrete" example don't want to access the protocol protocol:

				"name": "generic name",
			    "url": "protocol://domain:port/application/path"
}

Access to HIVE should presumably extend beyond the login page:

            {
                "name": "HIVE", 
                "url": "https://hive.biochemistry.gwu.edu/dna.cgi?cmd=login"
}, 

so here the URL should be chopped at first /

The script_access_type is text, yet a URI is provided for script:

        "script_access_type": "text",
        "script": ["https://example.com/workflows/antiviral_resistance_detection_hive.py"],

The script driver manual is undefined:

"script_driver": "manual",

The input/output URI examples have invalid hostname hive.biochemistry.gwu.edudata. These should either be neutral on http://example.com/ or actually work.

 "input_list": [
                        {
                            "address": "https://hive.biochemistry.gwu.edudata/514769/dnaAccessionBased.csv",
                            "access_time": "2017-01-24T09:40:17-0500"
                        }
], 

Some of the Sequence Ontology examples are missing SO: and thus don't work with http://identifiers.org/so/ according to external references expansion.

 "structured_name": "HCV1a [taxonomy:31646] ledipasvir 
       [pubchem.compound:67505836] resistance SNP 
       [so:0000694] detection",

"name": "Sequence Ontology",
"ids": ["0000048"], 

  "usability_domain": [
        "Identify baseline single nucleotide polymorphisms SNPs [SO:0000694], insertions [so:SO:0000667], and deletions [so:SO:0000045] that correlate with reduced ledipasvir [pubchem.compound:67505836] antiviral drug efficacy in Hepatitis C virus subtype 1 [taxonomy:31646]", 
],

Extension domain issues

Issues from extension_domain-fhir.md

  • fhir_endpoint regex for URL (since format is specified, maybe it is already being validated in the first level)
  • fhir_version regex [0-9]

Issues from extension_domain-scm.md

  • scm_extension.scm_repository, scm_extension.scm_preview
    regex for URL (maybe this is already being validated at the first level)
  • scm_extension.scm_preview is listed as required in schema but it
    should not be required
  • scm_extension.scm_path should not be uri-reference (as described in schema)

Remove reference to "BCO Server"

digital_signature introduces a term "BCO Server" which is not explained elsewhere:

The BCO server can provide an API validating the signature versus BCO content, allowing users to validate the signature "offline" on their own. The server will also must provide a reference to the signature creation algorithm, facilitating for greater interoperability.

This is very confusing as "BCO Server" is a new term here, and the specification does not elsewhere talk about how BCO APIs are meant to work.

Suggestion to remove that paragraph or to make a new top-level section about how BCOs are resolved/transferred.

platform: What are possible values?

platform is defined as:

The multi-value reference to a particular deployment of an existing platform where this BCO can be reproduced. A platform can be a bioinformatic platform such as Galaxy or HIVE or it can be a software package such as CASAVA or apps that includes multiple algorithms and software.

"platform": "HIVE"
  1. The example is not actually multi-value

  2. Where do the values for platform come from? This seems to be free-text, so are we OK with variable values like "Galaxy", "UseGalaxy" and "Galaxy Platform"?

provenance_domain obsolete or obsolete_after

The specification uses obsolete, but the schema uses obsolete_after:

./base_type_BioCompute.json:89: "obsolete_after" : {
./provenance-domain.md:17: "obsolete" : "2118-09-26T14:43:43-0400",
./provenance-domain.md:105:### 2.1.6 Obsolescence "obsolete"
./provenance-domain.md:110:"obsolete" : "2118-09-26T14:43:43-0400"
./HCV1a.json:33: "obsolete" : "2118-09-26T14:43:43-0400",
./user_guide.md:242: "obsolete_after" : {
./user_guide.md:461: "obsolete" : "2118-09-26T14:43:43-0400",

Refine github extension keys

The github extension is only explained by example.

The two keys github_repository and github_URI are not explained and seem to be partially overlapping. The camel_Case is also inconsistent.

Given that these are URLs I think the extension should support any source control repository, not just github.com, perhaps something like:

"extension_domain":{
  "scm_extension": {
    "scm_repository": "https://github.com/example/repo1",
    "scm_type": "git",
    "scm_branch": "c9ffea0b60fa3bcf8e138af7c99ca141a6b8fb21",
    "scm_path": "workflow/hive-viral-mutation-detection.cwl",
    "scm_preview": "https://github.com/example/repo1/blob/c9ffea0b60fa3bcf8e138af7c99ca141a6b8fb21/workflow/hive-viral-mutation-detection.cwl"
}

Here's how Maven defines it's scm metadata.

Parametric Domain

Issues from parametric_domain.md

  • 1) parametric_domain is not used for running/reproducing a bco e.g. not used by execution_domain
  • 2) the parameters exposed are NOT default values.
  • 3) automatically generated
  • 4) human readable
  • 5) Value HAS to be resolved before being populated (elaborate on this @Mazumder)
  • 6) Defined as:
"parametric_domain": [
    {"param": "name_of_parameter", "value": "value_of_parameter", "step": "step#"},
    {"param": "name_of_parameter", "value": "value_of_parameter", "step": "step#"}
]

Explain FHIR extension

fhir extension only shown by example, the keys are not explained. The camel_Casing is a bit inconsistent.

James Jones suggested:

Can address the camel_Case issues and describe the keys as follows:

FHIR_endpoint is a string containing the URL of endpoint of the FHIR server containing the resource.

FHIR_resource is a string containing the type of resource used. A full list of permitted FHIR resources is available at http://hl7.org/fhir/resourcelist.html.

FHIR_ID is a string containing the server-specific identifier for the resource instance.

The full URL of each referenced FHIR object is the combined address of the form: FHIR_endpoint/FHIR_resource/FHIR_ID

Avoid type mentions in parametric_domain

parametric_domain says:

All BCOs should inherit from the fundamental BioCompute data type and as such inherit all of the core fields described in document. Specific BioCompute types introduce specific fields designed to customize the use of pipelines for a particular use pattern. Please refer to documentation of individual scripts and specific BCO descriptions for details.

It is very unclear what these types are talking about and how such specific fields can be defined. Does this relate to #11?

Suggestion is to remove this paragraph and rather explain how parametric_domain are reflecting configurations of other parts of the BCO, presumably the keys here relate to the the name of individual pipeline_steps ?

(Perhaps other keys are allowed? Some workflow systems like KNIME also have workflow-wide parameters)

Define BCO in JSON Schema?

Instead of the custom BCO data type schema with _type etc (which itself does not have any documentation), we should use something like JSON Schema which have multiple tools and validators.

Note that JSON Schema itself is working towards RFC so there might be finer details here changing, but I would argue it is still a more mature schema language for defining expected JSON types than the one we blindly use in primitives.json

Refine SCM extension keys

When opening issue #21 @stain said:

Given that these are URLs I think the extension should support any source control repository, not just github.com, perhaps something like:

"extension_domain":{
  "scm_extension": {
    "scm_repository": "https://github.com/example/repo1",
    "scm_type": "git",
    "scm_branch": "c9ffea0b60fa3bcf8e138af7c99ca141a6b8fb21",
    "scm_path": "workflow/hive-viral-mutation-detection.cwl",
    "scm_preview": "https://github.com/example/repo1/blob/c9ffea0b60fa3bcf8e138af7c99ca141a6b8fb21/workflow/hive-viral-mutation-detection.cwl"
}

Here's how Maven defines it's scm metadata.

As we have now changed the extension to scm_extension I felt the discussion should be continued on another thread.
The wording in extension-scm.md has been updated and the antiviral_resistance_detectionTypeDef.json has been as well, but only on the most superficial level. Each of the fields are simply described as string.

Should we have a more comprehensive definition here?

How are BCOs packaged or transferred?

Relates a bit to #23 - how are BCOs serialized and transferred?

Is there a conventional file name for the BCO JSON? (bco.json springs to mind)

Is there a conventional path structure to contain a BCO and its sub-resources"? (e.g. data/ ? )

Are BCO sub-resources (scripts, inputs, outputs) webby or contained in some kind of package? (alternative: snapshots of the webby resources)

It has been briefly mentioned that BCOs are meant to be submitted to FDA in the form of physical hard-drives. What form does this take? (We can rule it out of scope for this spec)

Should BCO packaging reuse existing standards like bagit or Research Object?

`BCO_id` -> why not `bco_id`

Taken from a closed issue, #30

* BCO_id in top level (why not bco_id ? )

I think that distinguishing the defining field in the BCO is important, and as such the CAPS are appropriate. If anyone disagrees, please convince me.

"bco_spec_version" as a URL?

Should the "bco_spec_version" field be expressed as a URL to the RELEASE of the version used to draft it? For example:

"bco_spec_version": "v1.1-draft1" 

Becomes:

"bco_spec_version": "https://github.com/biocompute-objects/BCO_Specification/releases/tag/v1.2"

What is a type?

type is explained as if one is already meant to know how types are declared.

As any object of type 'type,' it has its own fields: _type, _id, _inherits, name, title and description. Type of this JSON object is "antiviral_resistance_detection"

The meaning of these fields are nowhere explained, and this typing system is not explained.

Suggestion is to remove the type field or to add a section that explain the typing system. data-typing.md might be an early attempt of this, but it has no technical information.

Keywords should not be nested

The keywords is for some reason nested as a map to lists.

      "keywords": [
            {
                "key": "search terms",
                "value": [
                    "HCV1a", 
                    "Ledipasvir", 
                    "antiviral resistance", 
                    "SNP", 
                    "amino acid substitutions"
                ]
            }
        ]

It is unclear what is the meaning of nestings like search terms and where such keys should be defined.

Keywords are normally not structured, so I would change this to a flat listing:

      "keywords": [
                    "HCV1a", 
                    "Ledipasvir", 
                    "antiviral resistance", 
                    "SNP", 
                    "amino acid substitutions"
        ]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.