GithubHelp home page GithubHelp logo

cloud-optimized-icesat2's People

Contributors

andypbarrett avatar asteiker avatar betolink avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

whigg

cloud-optimized-icesat2's Issues

Develop summary write up

Include the following components:

  • Summary of previous work
    • Why working with HDF5 in the cloud is complicated
      • Brief History of HDF
      • Latency in the cloud
      • Python drivers and IO bounded tasks
  • Working with HDF5 in the cloud using open source tools
    • GDAL
    • H5Py
    • H5Coro
    • Kerchunk
  • Performance considerations for HDF5 in the cloud and paged aggregation of nested metadata
    • HDF5 and nested metadata
    • H5Repack and chunk sizes
    • Simple benckmarking
  • Potential Cloud Optimized Formats for HDF5 datasets, the ATL03 case.
    • Raster data and N-dimensional: clear path forward
    • Point Cloud and hybrid datasets: ?
      • Zarr
      • Columnar formats for point cloud data: GeoParquet/Arrow
      • Cloud Native Data formats for PCD: https://copc.io/
    • lessons learned
  • Benchmarking results
  • Downstream processing pipeline considerations
  • Target audience:
    • ESDIS, DAAC management
    • ICESat-2 Science Team
    • Consider providing to NISAR community
  • ATL14/15 (COG)? Provide recommendations to help conversion in future release?

Develop benchmarking criteria for consistent comparison across format options

Candidate criteria:

  • Formats / chunking schemes to compare
    • Re-chunked HDF5
    • Cloud-optimized HDF5
    • Geoparquet
    • Zarr
    • Kerchunk json
    • h5coro
  • Environment
    • CryoCloud - Small instance
    • Assume we'll store all example files in CryoCloud (i.e. Sync or shared_public)
  • Libraries or clients used to open/read data
  • For each format option:
    • Dataset(s)
      • Based on community feedback/discussion, initial focus on ATL03
    • Files
      • Single and multiple? Files can vary by several GBs ; optimally produce and test 10 files
    • Variable(s)
    • Spatial subset(s)
    • Temporal subset(s)
    • Aggregation
    • End-to-end wall clock time
      • Time to re-chunk or reformat
      • Time to open/read file
        • Multiple tools/libraries/clients to compare per format option?
          • Geopandas, xarray
          • Should we consider dask data frame
    • Compute cost
    • Do we include a real-world example?
      • Time series of 60 day repeat cycle
      • Real world example tie in: Jacobshavn surface height

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.