GithubHelp home page GithubHelp logo

Comments (2)

dalexander avatar dalexander commented on August 26, 2024

TL;DR: don't touch the behavior for basic file types; negotiate what you want, for datasets.

For basic biofx file types, I consider a file to consist of "headers/metadata" (which can include, for example, the movie table in the cmp.h5 format, or the text headers in a GFF and "records".

A file can be empty in two possible ways: 1) truly empty (zero length) or 2) contain no records. There is no distinction for some file types like FASTA that contain no metadata, but the distinction is important for file types that do.

A reader for a file type that requires metadata will rightly generate an error if metadata is not found--the file is ill-formed. But in any format, the mere absence of records is not intrinsically an error. It may be an error in the context of an application, but not at the library level.

Pbcore's basic readers mostly adheres to this principle. The one exception is the cmp.h5 reader, which will throw an exception on trying to open an empty file. The reason for that is that we don't get empty cmp.h5 files right, in our ecosystem. When I've seen empty cmp.h5 files, they invariably are lacking important metadata tables. Ideally this would have been fixed upstream, but the cmp.h5 ship has already sailed now.

Since the dataset concept bridges the application and data file levels, you are welcome to say that "by spec", a dataset of such and such type may not be empty, and then you can have the dataset reader throw exception on finding such a file. But honestly then the problem is at the writer level---such an invalid file should never be written. Adding the checks on the reader side is then just good defensive programming.

from pbcore.

dalexander avatar dalexander commented on August 26, 2024

I will suggest one other possibility. We need to spell out the reader API that we intend client code to use. openAlignmentDataset, etc. We can add additional API methods like openNonemptyAlignmentDataset, to centralize the error checking, for client code that requires nonempty.

from pbcore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.