GithubHelp home page GithubHelp logo

hdf5-is-for-lovers's Introduction

HDF5 is for Lovers

Bio

Anthony Scopatz is a computational nuclear engineer / physicist post-doctoral scholar at the FLASH Center at the University of Chicago. His initial workshop teaching experience came from instructing bootcamps for The Hacker Within - a peer-led teaching organization at the University of Wisconsin. Out of this grew a collaboration teaching Software Carpentry bootcamps in partnership with Greg Wilson. During his tenure at Enthought, Inc, Anthony taught many week long courses (approx. 1 per month) on scientific computing in Python.

Track

This tutorial was conceived as an advanced track tutorial. However, it could be recast as an introductory one, if the program committee desires.

Description

HDF5 is a hierarchical, binary database format that has become a de facto standard for scientific computing. While the specification may be used in a relatively simple way (persistence of static arrays) it also supports several high-level features that prove invaluable. These include chunking, ragged data, extensible data, parallel I/O, compression, complex selection, and in-core calculations. Moreover, HDF5 bindings exist for almost every language - including two Python libraries (PyTables and h5py).

This tutorial will discuss tools, strategies, and hacks for really squeezing every ounce of performance out of HDF5 in new or existing projects. It will also go over fundamental limitations in the specification and provide creative and subtle strategies for getting around them. Overall, this tutorial will show how HDF5 plays nicely with all parts of an application making the code and data both faster and smaller. With such powerful features at the developer's disposal, what is not to love?!

This tutorial is targeted at a more advanced audience which has a prior knowledge of Python and NumPy. Knowledge of C or C++ and basic HDF5 is recommended but not required.

Outline

  • Meaning in layout (20 min)

    • Tips for choosing your hierarchy
  • Advanced datatypes (20 min)

    • Tables
    • Nested types
    • Tricks with malloc() and byte-counting
  • Exercise on above topics (20 min)

  • Chunking (20 min)

    • How it works
    • How to properly select your chunksize
  • Queries and Selections (20 min)

    • In-core vs Out-of-core calculations
    • PyTables.where()
    • Datasets vs Dataspaces
  • Exercise on above topics (20 min)

  • The Starving CPU Problem (1 hr)

    • Why you should always use compression
    • Compression algorithms available
    • Choosing the correct one
    • Exercise
  • Integration with other databases (1 hr)

    • Migrating to/from SQL
    • HDF5 in other databases (JSON example)
    • Other Databases in HDF5 (JSON example)
    • Exercise

Packages Required

This tutorial will require Python 2.7, IPython 0.12+, NumPy 1.5+, and PyTables 2.3+. ViTables and MatPlotLib are also recommended. These may all be found in Linux package managers. They are also available through EPD or easy_install. ViTables may need to be installed independently.

hdf5-is-for-lovers's People

Contributors

scopatz avatar ilblackdragon avatar

Stargazers

 avatar  avatar celia avatar Dr. Renard Sc.D avatar Andreas Motl avatar Farhood avatar  avatar Ujjwal Panda avatar James Tocknell avatar Anand Baburajan avatar Curtis Mayberry avatar Alex Bender avatar Thubaí Chaves avatar Sadie L. Bartholomew avatar Riley Hales PhD avatar  avatar Christopher Ohara avatar  avatar Brian Skinn avatar Matthew Feickert avatar  avatar  avatar GAURAV avatar Daniel Suess avatar Sebastian Oeste avatar Axel Huebl avatar Ilya Beketov avatar Bater.Makhabel avatar Nicholas Knoblauch avatar mg20400 avatar Juan Luis Cano Rodríguez avatar Kennedy Nganga avatar LI Yunsheng avatar Abhiram  avatar James Dickson avatar Rodrigo de Oliveira avatar Xiaocan Li avatar Izaak "Zaak" Beekman avatar Stefano Zaghi avatar Giacomo Rossi avatar  avatar Sourav Singh avatar James Banting avatar zhuo chen avatar Mathieu Boudreau, PhD avatar  avatar  avatar Rory Hartong-Redden avatar Moritz Neeb avatar  avatar Fabrice Sodogandji avatar Toufeeq Ockards avatar Brian Coulter avatar wynn burke avatar Ben avatar Kersten avatar Chris Holden avatar Sathish Kumar Narayanan avatar Guilherme Freitas avatar Frederic Couderc avatar  avatar Felippe Alves avatar David L. Dotson avatar  avatar Carotene avatar  avatar smanders avatar Bryan Davis avatar Carlos Valiente avatar gully avatar R. Burke Squires avatar zeristor avatar TuNA avatar Denis Demidov avatar Rhys Ulerich avatar  avatar Stephen L Holtz avatar Chia-liang Kao avatar Vishal Belsare avatar Eugene Scherba avatar LM avatar 极元素 avatar svaksha ॥ स्वक्ष ॥ avatar  avatar MP avatar Leonardo Uieda avatar Evan Bianco avatar Larry Fu avatar Daniel Fagnan avatar Kamil Slowikowski avatar Matt Keranen avatar Ivandir avatar Jordan Weaver avatar K. Arthur Endsley avatar Emilio Cota avatar Oleksandr Huziy avatar David A avatar Enric M. Calvo avatar Ivo Flipse avatar  avatar

Watchers

 avatar yocchi avatar Enric M. Calvo avatar James Cloos avatar Oleksandr Huziy avatar zhuo chen avatar

hdf5-is-for-lovers's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.