GithubHelp home page GithubHelp logo

Comments (9)

mpkocher avatar mpkocher commented on August 26, 2024

Here's some thoughts on a potential model. I believe it addresses the items you've outlined.

https://gist.github.com/mpkocher/79333f8e1f9059914b8bdcdee3f17cea

Let me know if you have any feedback.

from pbcore.

natechols avatar natechols commented on August 26, 2024

I agree that we should have a JSON file cataloging everything. The rest looks reasonable to me as well.

from pbcore.

mpkocher avatar mpkocher commented on August 26, 2024

@mdsmith and @pezmaster31 Any comments or feedback?

from pbcore.

mdsmith avatar mdsmith commented on August 26, 2024

Sounds good to me, especially with API and CLI accessors. I support using json, but we may still get an expansion of non-standardized naming to convey characteristics that lend certain datasets to boundary case testing, e.g. 'lambda', 'lambda-two-movie', 'aligned-lambda-one-zmw', 'empty-bam', 'empty-aligned-bam', 'bam-missing-pbi', 'aligned-bam-missing-pbi', 'rs-cmph5', 'rs-bam', etc. We can either:

  1. accept that as an inevitability, because it isn't too bad now
  2. have a separate "internal-test" file repo with these arbitrary combinations
  3. disallow boundary case test files entirely or within reason, and force users to develop their own or generate them at test-time or
  4. develop a set of characteristics and allow for more advanced indexing (something like the pbi recarray).

We'll probably go with option 1, as it is what we have already and seems to work, but it is something to think about.

from pbcore.

natechols avatar natechols commented on August 26, 2024

I mostly agree, with the caveat that we should restrict this repo to files that adhere to our specs (which includes the empty BAM, I think - anyway, that's used in a lot of different tests - but not, for example, an AlignmentSet with missing pbi, although a BAM missing a pbi is potentially valid). We might want to clearly distinguish the corner cases from canonical examples, for example by dumping them in subdirectories (so AlignmentSet/other/empty_aligned.subreads.bam, etc.).

from pbcore.

mpkocher avatar mpkocher commented on August 26, 2024

Would adding a "description" field help address this issue?

from pbcore.

natechols avatar natechols commented on August 26, 2024

Yes, we probably need that anyway.

from pbcore.

mpkocher avatar mpkocher commented on August 26, 2024

In version 1, the general idea I was going for would be to group files by whatever metric or grouping that make sense for your application or usecase within a bundle JSON file.

Such as grouping by size, or "bad files" using huge.json, and bad-alignmentsets.json.

However, I am open to other approaches to group within a bundle.

from pbcore.

mpkocher avatar mpkocher commented on August 26, 2024

Updated with description https://gist.github.com/mpkocher/79333f8e1f9059914b8bdcdee3f17cea

from pbcore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.