GithubHelp home page GithubHelp logo

nfdi4plants / arctokenization Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 2.0 7.17 MB

Definition of controlled vocabulary tokens and library to tokenize ARC metadata into these tokens

Home Page: https://nfdi4plants.github.io/ARCTokenization/

License: MIT License

Batchfile 0.03% Shell 0.05% F# 93.89% GLSL 6.03%

arctokenization's People

Contributors

freymaurer avatar hlweil avatar kmutagene avatar muehlhaus avatar omaus avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

freymaurer

arctokenization's Issues

Add integration tests

We definitely need more unit tests, especially integration tests since a we have some functionality tested at the utmost atom level but this does not help us if the whole pipeline functions cannot parse an, e.g., Investigation file.

Check how missing values are handled

example:

we need some way of grouping the flat token list to column-based objects, e.g. persons.

image

First idea would be using cell address information.

Improve parsers to cover more CvTerms

The changes seem fine for getting just a CvParam list. But don't forget to extend the list of KeyMappers if you want to transform all fields to CvParams. I didn't finish it up completely.
https://github.com/nfdi4plants/ArcGraphModel/blob/74c57e61aa5cd798cf0702f9daab053b14af32b2/src/ArcGraphModel.IO/ISA/KeyParser.fs#L38

Originally posted by @HLWeil in #17 (comment)

Edit by kMutagene

Things that need to happen for this (for structural terms in metadata sheet sections):

  • Port obo parser to separate library, reference it (done, see https://github.com/CSBiology/FsOboParser)
  • Create structural ontologies for
    • Investigation metadata sheets
    • Study metadata sheets
    • Assay metadata sheets
  • Create CvParams from all terms in a structural ontology
  • Add these terms to the matching logic that creates the flat token lists for metadata sheets

[BUG] Comment <Investigation Person ORCID> is parsed as UserParam and has no CvTerm

But it would definitely be better if it was parsed as CvParam. Maybe allow Comments at all?

Also: ORCID misses a CvTerm in the respective ontology.

Le workaround...

let cvparams = 
    params      // this is a IParam list, parsed from an ISA.XLSX file
    |> List.map (
        fun p -> 
            match CvParam.tryCvParam p with
            | Some cvp -> cvp
            | None -> CvParam(p.ID, p.Name, p.RefUri, p.Value, p :?> CvAttributeCollection)
    )

[BUG] `parseTableColumn` and `parseColumn` naming is interchanged

https://github.com/nfdi4plants/ArcGraphModel/blob/74c57e61aa5cd798cf0702f9daab053b14af32b2/src/ArcGraphModel.IO/ISA/Worksheet.fs#L24-L37

The name is parseColumn but it parses the columns of the worksheet's first table while

https://github.com/nfdi4plants/ArcGraphModel/blob/74c57e61aa5cd798cf0702f9daab053b14af32b2/src/ArcGraphModel.IO/ISA/Worksheet.fs#L39-L52

parses the columns of the whole worksheet, ignoring tables though the name is parseTableColumns.

This means, the names are mixed up.

Unravel tokenization

Atm., a lot of the tokenization already packs some information into CvAttribute collection among other things. There are also some Bugs in parsing Keys associated with this.
We need to disentangle this structural muddle and also add some more testing to make this mess more stable than it is now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.