GithubHelp home page GithubHelp logo

global-localhost / pacbiotestdata Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pacificbiosciences/pacbiotestdata

0.0 0.0 0.0 9.8 MB

Small datasets for testing

Makefile 8.15% Python 85.76% Shell 6.09%

pacbiotestdata's Introduction

PacBioTestData

Small datasets for testing

Usage

This repository provides a selection of small-ish representative files produced by PacBio systems and software, suitable for running simple tests. To download and install as a Python module/cmdline tool:

  $ git clone https://github.com/PacificBiosciences/PacBioTestData.git
  $ cd PacBioTestData
  $ make install

You can then access the pbtestdata Python module programatically:

  $ python
  >>> import pbtestdata
  >>> pbtestdata.get_file("subreads_bam")
  '/path/to/movie.subreads.bam'

Or use the command-line tool:

  $ pbdata --help
  usage: pbdata [-h] [-v] {show,get,validate} ...

  Utility and API for accessing representative PacBio data files for testing.
  Run 'pbdata show [--verbose]' to display a list of files sorted by type.

  positional arguments:
    {show,get,validate}

  optional arguments:
    -h, --help           show this help message and exit
    -v, --version        show program's version number and exit
  $ pbtestdata get subreads-bam
  /path/to/movie.subreads.bam

Other codebases may implement their own accessors by reading files.json, which can be captured in an environment variable:

  export PB_TEST_DATA_FILES="`pbdata path`"

Adding data

This repo should only be used for relatively compact (< 100KB), commonly used files in officially supported formats. When adding files, please follow these guidelines:

  • Any file that needs to be accessed directly should have an entry in data/files.json. This should not include index files; however in some cases both the DataSet XML and the underlying BAM or FASTA files may be retrievable.
  • Accessor IDs should be simple and hyphen-separated; see data/files.json for examples.
  • All BAM, FASTA, and DataSet XML files should be compliant with the PacBio file format specifications; use pbvalidate to check compliance.
  • PacBio DataSet XML should always be generated with relative paths.
  • The dataset name should match the accessor ID in files.json.
  • BAM files should always have an accompanying PacBio index (.pbi file).
  • BAM indices created by samtools (.bai files) are optional.
  • FASTA files should always have an accompanying .fai index (from samtools).

Disclaimer

THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.

pacbiotestdata's People

Contributors

natechols avatar mdsmith avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.