nfdi4plants / arc-validate Goto Github PK
View Code? Open in Web Editor NEWHome of all the tools and libraries to create and run validation of ARCs
Home Page: https://nfdi4plants.github.io/arc-validate/
License: MIT License
Home of all the tools and libraries to create and run validation of ARCs
Home Page: https://nfdi4plants.github.io/arc-validate/
License: MIT License
I'm currently working on a few ARCs that keep running into failing CQC pipelines due to divergent handling of ISA structure / naming / referencing between ARC commander (v0.5), ARCitect (v≤0.07, based on ARCCtrl).
Collecting here. Feel free to move to more relevant repo / discussion.
studies/<studyName>/isa.study.xlsx
<assayName>/isa.assay.xlsx
arc export
, arc a list
, arc a register
)We want to standardize our error messages. To do so, an OBO-based ontology would be useful. This ontology should inhabit all errors, structured into classes, that we use and link them via relations.
As there are some currently issues on gitlab CI with hooks etc., let's ignore most of the tests for now and only test if the .git
folder is present
When building an ARCGraph with a token list containing at least one User Comment, the following occurs:
System.Collections.Generic.KeyNotFoundException: The given key 'User Comment' was not present in the dictionary.
at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
at ARCExpect.ARCGraph.isPartOfHeader(IParam header, Dictionary`2 ontoGraph, IParam ip) in C:\Repos\nfdi4plants\arc-validate\src\ARCExpect\ARCGraph.fs:line 353
at [email protected](IParam ip) in C:\Repos\nfdi4plants\arc-validate\src\ARCExpect\ARCGraph.fs:line 383
at Microsoft.FSharp.Collections.SeqModule.Exists[T](FSharpFunc`2 predicate, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 671
at [email protected](Int32 i, Tuple`2 tupledArg) in C:\Repos\nfdi4plants\arc-validate\src\ARCExpect\ARCGraph.fs:line 383
at Microsoft.FSharp.Collections.Internal.IEnumerator.mapi@145.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 150
at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 113
at Microsoft.FSharp.Collections.SeqModule.ToList[T](IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 1003
at ARCExpect.ARCGraph.constructIntermediateMetadataSubgraph(Dictionary`2 ontoGraph, IEnumerable`1 ips) in C:\Repos\nfdi4plants\arc-validate\src\ARCExpect\ARCGraph.fs:line 440
at [email protected](IEnumerable`1 ips)
at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 130
at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 113
at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 130
at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 113
at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 130
at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 113
at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 130
at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 113
at Microsoft.FSharp.Core.CompilerServices.RuntimeHelpers.takeOuter@319[T,TResult](ConcatEnumerator`2 x, Unit unitVar0) in D:\a\_work\1\s\src\FSharp.Core\seqcore.fs:line 320
at Microsoft.FSharp.Core.CompilerServices.RuntimeHelpers.takeOuter@319[T,TResult](ConcatEnumerator`2 x, Unit unitVar0) in D:\a\_work\1\s\src\FSharp.Core\seqcore.fs:line 320
at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 130
at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 113
at Microsoft.FSharp.Collections.SeqModule.Iterate[T](FSharpFunc`2 action, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 631
at <StartupCode$FSI_0010>.$FSI_0010.main@() in C:\Repos\nfdi4plants\arc-validate\playgrounds\qcPackage_prototypes\pride_prototype_v0.1.0.fsx:line 53
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
Stopped due to error
Problem probably lies in isPartOfHeader
function which cannot find the User Comment term in the given structural ontology.
https://github.com/kMutagene/AnyBadge.NET
This eliminates the need for python in the container.
Additionally, since we create the badge in the same environment as the tests are run, we have way more fine-grained control over the information available at badge-creation.
the Orcid identifier must correspond to the following scheme: xxxx-xxxx-xxxx-xxxx. Otherwise, no valid record can be created in Invenio. This must be checked, otherwise Archigator will create a "broken" Invenio record.
The current example is the ARCInvenioTesting repository, where "orcid1", "orcid2" etc. are entered as identifiers in the isa file. The creation of an Invenio record fails here.
this way we can offload yaml parsing in CI jobs to the tool
This will be needed for our data publication workflow, since all authors will get confirmation emails.
Currently person
is tested as one module either returning success or a general error for the complete person.
In this case we loose the information, which field of person is faulty.
We should change this behaviour to test all fields individually. Such as person.validateName
and person.validateEmail
.
This is needed for people being able to author validation packages in https://github.com/nfdi4plants/arc-validate-packages.
Also related: #38
ideally, a validation package would look something like this:
#r "nuget: ARCValidation"
open ARCValidation
//<create a list of validation tests created via the API>
//eg:
let validationTests = [
ValidationTest.create(<API logic here>)
ValidationTest.create(<API logic here>)
]
//<end the script with an API function that applies the tests to an arc root path provided via cli args>
// This function would for example take a list of tests and apply it to tzhe first cli arg, which is checked to be a path
ARCValidation.runValidationTestsWithCLIArgs validationTests
Note that a top-level API like this can also be used for the internal default tests that will always be performed by the tool, e.g. structural metadata file validation
Error output that ends up in the jsunit file is too verbose. It contains the full exception message with stacktrace. If the goal is to provide helpful insights on what is wrong in the arc, I suggest cleaning up the error messages.
Here is an example on how this gets displayed (rendered junit file on gitlab):
Talking about an "entity" here is weird.
There is way too much text
Ideally, the error message should be short and simple, e.g.: Folder was not present: "./.git/hooks"
As a future improvement: maybe replace the .
signifying the arc root with the path supplied via ARC_PATH
Format error messages
Do not show exception text
Atm., the pipeline only looks for persons with present email addresses but since Invenio cannot parse emails in incorrect formats (e.g. "myName[at]server.domain") or text that is no email at all, we must implement a check for valid email format (i.e. "@.")
CC: @Zerskk
Once the transition to the new API is done and stable, we should release them as nuget packages.
arc-validation in the HUB and locally fails whenever I am adding a person to a study as shown below using the ARC Commander v0.5.0.. arc-validate-results.xml is not written in these cases but arc-valiation is smooth without adding a person to the study.
STUDY CONTACTS | |
---|---|
Study Person Last Name | Schrader |
Study Person First Name | Andrea |
Study Person Mid Initials | |
Study Person Email | [email protected] |
Study Person Phone | |
Study Person Fax | |
Study Person Address | 50674 Cologne, Germany |
Study Person Affiliation | Data Science and Management |
Study Person Roles | contact person; data-curator; leads investigation; designs experiments; supervisor; performs experiments; analyses data |
Study Person Roles Term Accession Number | ;;;;;; |
Study Person Roles Term Source REF | SCORO; SCORO; SCORO; SCORO; SCORO; SCORO; SCORO |
Comment[] | 0000-0002-3879-7057 |
Atm., only ORCID numbers are allowed in validation. Expand this to also incorporate URLs to the ORCID website.
Also, try to detect (and ignore) leading and trailing whitespaces.
Would be nice to have them. Maybe when some time is left...
We need that functionality as part of a package that can be referenced in validation package scripts, therefore it cannot stay in the tool exlcusively.
Code is here:
arc-validate/src/arc-validate/APIs/ValidateAPI.fs
Lines 17 to 91 in 1cf4d0a
Make ORCID great again! valid if present.
Invenio is able to parse ORCIDs only if they are valid. But Invenio also allows ORCID fields to be empty. Thus, IF an ORCID is given, it MUST be valid.
CC: @Zerskk
signature:
// performs action (e.g. ARCExpect.isValid) for each CvParam with the given term
// on class ByTerm:
static member forall (
term: CvTerm,
action: CvParam -> unit,
cvp: seq<CvParam>) = ...
reminder @omaus
A hard requirement for v2 is the consumption and execution of validation packages from the validation package index.
Basically, it has to perform 2 functions:
dotnet fsi
programmatically.This API should then subsequently added as commands for the cli tool.
This is more of a support request than an actual Issue, but I hope you can point me into the right direction.
I'm interested in incorporating the arc-validate pipeline into the arc-manager and re-use this for the Plantmicrobe project. Therefore I wanted to test your pipeline locally, but I'm facing some issues.
I pulled the latest docker image from the repo and started the docker container locally. Then I copied this data inside the container and executed the arc-validate validate
command on it. But no xml file or badge is generated and there's also no error message. In an earlier version, where I had to build the image from source it worked this way and I was able to generate a badge for the same example data.
This should be resolved by adding top level API tests for the ARCValidationPackages
package, which should mirror closely what the cli tests do, but in a more observable manner
While additional tests shall be provided via ARCValidationPackages, the core library shall provide basic validation tests that cover the ARC specification.
the command:
bash arc-validate.sh -p /arc/cmqtl_val1_arc -v
inside the docker container gives me this error message
+ arc-validate -p /arc/cmqtl_val1_arc -v
Internal Error:
The option value was None (Parameter 'option')
" at Microsoft.FSharp.Core.OptionModule.GetValue[T](FSharpOption`1 option) in D:\a\_work\1\s\src\FSharp.Core\option.fs:line 13
at ArcValidation.CvTokenHelperFunctions.CvContainer.isPartOfInvestigation@79-2.Invoke(ICvBase x)
at Microsoft.FSharp.Collections.SeqModule.Exists[T](FSharpFunc`2 predicate, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 641
at Microsoft.FSharp.Collections.SeqModule.Exists[T](FSharpFunc`2 predicate, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 641
at ArcValidation.InformationExtraction.Investigation.getContactsContainer@60.Invoke(CvContainer cv) in /opt/arc-validate/src/ArcValidation/InformationExtraction.fs:line 60
at Microsoft.FSharp.Collections.Internal.IEnumerator.next@246[T](FSharpFunc`2 f, IEnumerator`1 e, FSharpRef`1 started, Unit unitVar0) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 248
at Microsoft.FSharp.Collections.Internal.IEnumerator.filter@236.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 250
at Microsoft.FSharp.Collections.SeqModule.ToList[T](IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 966
at ArcValidation.TestGeneration.Critical.Arc.ISA.generateISATests(ArcConfig arcConfig) in /opt/arc-validate/src/ArcValidation/TestGeneration/Critical/ArcISA.fs:line 22
at ARCValidate.main(String[] argv) in /opt/arc-validate/src/arc-validate/Program.fs:line 29"
ArcCommander currently uses $appdata$/DataPLANT
See source code:
let tryGetGlobalConfigPath () =
let log = Logging.createLogger "IniDataTryGetGlobalConfigPathLog"
// most of this part only remains for legacy reasons. Config file should not be downloaded and placed by the user (as before) but installed by the ArcCommander itself.
let getFolderPath specialFolder inOwnFolder inCompanyFolder newName =
Environment.GetFolderPath(specialFolder, Environment.SpecialFolderOption.DoNotVerify)
|> fun x ->
if inOwnFolder then
Path.Combine(x, "ArcCommander", "config")
elif inCompanyFolder && (not newName) then
Path.Combine(x, "DataPLANT", "ArcCommander", "config")
elif inCompanyFolder && newName then
Path.Combine(x, "DataPLANT", "ArcCommander", "ArcCommander.config")
else
Path.Combine(x, "config")
Every InvestigationContact must have an Affiliation in order to have a publishable ARC.
An implementation of a critical test that checks for a signed Affiliation field of every InvestigationContact is urgent.
Every InvestigationContact should have an ORCID in order to have a higher quality ARC.
An implementation of a non-critical test that checks for a signed ORCID comment field of every InvestigationContact is indicated.
OboGraph is a nice module that allows for creating FGraphs from OboOntologies. It's too nice to not have this in FsObo(Parser). Move it from here to there.
See also CSBiology/OBO.NET#17
To add this specific case here as well.
This public ARC constantly fails the validation pipelie as no xml file ist written.
It contains excessive isa metadata and was already closer inspected by @omaus
It was found that the ORICD entries were not causative as I assumed but parsing one cell from the isa.investigation.xlsx file failed which did not cause a problem in the ARCPrototype which only had a different value there.
Maybe @omaus can specify this here as it might be causative for others as well when the validaton pipeline fails.
the confusion of "tests" with "unit tests" while we are actually talking about validation cases is way to prominent. this helps addressing that.
We need a function that returns a Regex pattern for lower and upper character limits of a given string.
(This is needed in the validation of some information that PRIDE validation package needs)
The functionality to use ARCs as a graph representation is very useful not only in Validation contexts. Therefore, it is adviced to move it to a more fitting library like ARCTokenization.
See also nfdi4plants/ARCTokenization#34
Description is required as of nfdi4plants/archigator-backend#5
Title is required as of nfdi4plants/archigator-backend#5 and not present though expected!
Implement ASAP!
For easy accession of metadata sheet information, the parsed CvParams should be constructed into a graph that represents ARC structure.
In the same vein as #38 , we should distribute the arc-validate
cli tool v2 as binary and dotnet tool in addition to the container.
Let's be consistent with ARCtrl and ARCTokenization, especially since this project has no nuget package released so far.
While testing corner cases of the ARC validation pipeline following error occured. The test was done with a repository containing only a README.md (to test what happens if people create empty repos before pushing an ARC into it). This shows the output of both the arc-validate
tool as well as the create-badge.py
script:
+ arc-validate
Internal Error:
The option value was None (Parameter 'option')
" at Microsoft.FSharp.Core.OptionModule.GetValue[T](FSharpOption`1 option) in D:\a\_work\1\s\src\FSharp.Core\option.fs:line 13
at ArcValidation.CvTokenHelperFunctions.CvContainer.isPartOfInvestigation@79-2.Invoke(ICvBase x)
at Microsoft.FSharp.Collections.SeqModule.Exists[T](FSharpFunc`2 predicate, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 641
at Microsoft.FSharp.Collections.SeqModule.Exists[T](FSharpFunc`2 predicate, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 641
at ArcValidation.InformationExtraction.Investigation.getContactsContainer@60.Invoke(CvContainer cv) in /opt/arc-validate/src/ArcValidation/InformationExtraction.fs:line 60
at Microsoft.FSharp.Collections.Internal.IEnumerator.next@246[T](FSharpFunc`2 f, IEnumerator`1 e, FSharpRef`1 started, Unit unitVar0) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 248
at Microsoft.FSharp.Collections.Internal.IEnumerator.filter@236.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 250
at Microsoft.FSharp.Collections.SeqModule.ToList[T](IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 966
at ArcValidation.TestGeneration.Critical.Arc.ISA.generateISATests(ArcConfig arcConfig) in /opt/arc-validate/src/ArcValidation/TestGeneration/Critical/ArcISA.fs:line 22
at ARCValidate.main(String[] argv) in /opt/arc-validate/src/arc-validate/Program.fs:line 29"
$ /opt/arc-validate/create-badge.py
Traceback (most recent call last):
File "/opt/arc-validate/create-badge.py", line 9, in <module>
xml = JUnitXml.fromfile(xml_path)
File "/usr/local/lib/python3.9/dist-packages/junitparser/junitparser.py", line 751, in fromfile
tree = etree.parse(filepath) # nosec
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1229, in parse
tree.parse(source, parser)
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 569, in parse
source = open(source, "rb")
FileNotFoundError: [Errno 2] No such file or directory: './arc-validate-results.xml'
$ exit 0
Uploading artifacts for successful job 00:00
Uploading artifacts...
WARNING: arc-validate-results.xml: no matching files. Ensure that the artifact path is relative to the working directory (/builds/doniparthi1/Facultative-CAM-in-Talinum)
WARNING: arc-quality.svg: no matching files. Ensure that the artifact path is relative to the working directory (/builds/doniparthi1/Facultative-CAM-in-Talinum)
ERROR: No files to upload
Sounds like just an uncaught expection? The tool should always generate the arc-validate-results.xml
in order for the create-badge.py script to be able to create a 0/x badge.
Atm., V2 branch (which will be the basis for the next big Validation remake) is filled with outdated and deprecated functionality. Time to get rid of that.
install and validate against multiple packages in bulk might be a nice addition
Not sure about the naming though. Reminder @omaus , this is the section:
Atm., metadata files are parsed via ArcGraphModel.IOs parsing functions. Yet, we use those that return ICvBase lists consisting of CvParams and CvParams that are already aggregated into CvContainers.
These aggregations hinder flat CvParam evaluation and overcomplicate the information extraction much more than necessary. Therefore, it may be best to use shortly implemented parsing functions that only return mere CvParams.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.