gwlucastrig / gridfour Goto Github PK

View Code? Open in Web Editor NEW

19.0 3.0 5.0 6.26 MB

Tools for raster data including geophysical applications and digital elevation models

License: MIT License

Java 100.00%

raster-data java bathymetry compression-algorithm digital-elevation-model data-compression

gridfour's Introduction

The Gridfour Software Project

Tools for raster data including scientific and geophysical applications.

Documentation

We have two main documentation pages:

The Gridfour Project Notes give information on the underlying concepts and algorithms used by this project. The Notes page isn't just about Gridfour. It covers ideas and topics related to raster data processing in general.
The Gridfour Wiki gives lots of helpful information on using Gridfour software including our Gridfour Virtual Raster Store (GVRS). It also gives information about our project goals and roadmap.

Background

Although there are many tools for image processing and gridded-data applications, the Gridfour Project believes that there is still a need for general-purpose software utilities for the processing of raster (grid) products. Potential applications in these areas run the gamut from rendering, data compression, contouring, surface analysis, and other operations aimed at analyzing and understanding data stored in raster form.

Our Inspiration

Recently, there has been a lot of news about the Seabed 2030 Project . That ambitious undertaking aims to map 100 percent of the ocean floor by 2030. To put that in perspective, the project organizers estimate that, today, only about 20 percent of the world's oceans are fully mapped (see Seabed 2030, FAQ ). So there's a lot of work to be done in the next decade.

On thing is certain, the existence of projects like Seabed 2030 will result in massive collections of grid-based (raster) data sets. Seabed 2030 itself will include about 7.6 billion grid points [1]. There is a need for software libraries that can assist in the processing of that data. In particular, we see a need for more efficient techniques for storage and data compression for grid data. That need inspired us to create Gridfour.

An Old Idea Made New

The first module created for the Gridfour Software Project is the Gridfour Virtual Raster Store (GVRS), a grid-based data compression and file management system. The GVRS module helps Java applications manage raster (grid) data in situations where the size of the data exceeds what could reasonably be kept in memory. It also provides a file-based utility for the persistent storage of data between application sessions and for long-term archiving. And, finally, it includes custom data compression that significantly reduces the storage size required for raster data.

Some of the algorithms used in GVRS have been around for a long time. Our data compression techniques were originally developed for a project named Gem93 that was completed in 1993. Gem93 included a number of tools for working with raster data, among them a data compression technique inspired by the work of Kidner and Smith (1992). You can read more about them at our project documentation page Gridfour Raster Data Compression Algorithms.

Of course, the state of the art has advanced quite a bit since 1993. And although the foundation for GVRS comes from old ideas, we hope you find that our API provides a fresh take on their implementation We intend our GVRS library to provide a convenient tool for investigators developing new techniques for compressing geophysical and scientific data in raster form. GVRS makes it very easy to extend the Gridfour code and add new data compression capabilities ( to see just how easy it is, visit our wiki page on Custom Data Compressors ). Our hope is that by providing GVRS, we will help investigators focus on their own research and leave the details of memory and file-management to the Gridfour API.

Help Wanted

We are finishing up the initial implementation of GVRS. We are looking for developers interested in porting it to additional languages (such as C#, C++, Rust) and creating Python bindings. We are also looking for users with ideas for new features and how to apply the library.

Things to Come

The Gridfour Software Project is still in its infancy. There is a lot of opportunity for new ideas and new software development. In the future we hope to include implementations of contouring, statistical analysis, and physical modeling logic to our collection. We are also building tools to simplify access to data from the Shuttle Radar Topography Mission ( SRTM ) and the U.S. Geological Survey's high-resolution 3D Elevation Program.

In the meantime, you are welcome to visit our companion Tinfour Software Project at https://github.com/gwlucastrig/Tinfour

Finally, we end with a picture that was created using GVRS and a set of elevation and bathymetry data taken from the GEBCO_2019 global data set. The picture shows a shaded-relief rendering of the Island of Hokkaido, Japan. GEBCO_2019 was one of the data sets used for the GVRS pilot project and a good example of the potential of systems like it. Color-coding was based on elevation obtained from a GVRS file and shading was computed using the surface normal obtained with Gridfour's B-Spline raster interpolation class. The GVRS data compression reduces the size of this data set down to about 17.2 percent of its original size (see GVRS Performance for more details ). Future work may bring about more improvements.

References

General Bathymetric Chart of the Oceans [GEBCO], 2019. GEBCO Gridded Bathymetry Data. Accessed December 2019 from https://www.gebco.net/data_and_products/gridded_bathymetry_data/

Kidner, D.B. and Smith, D.H. (1992). Compression of digital elevation models by Huffman coding, Computers and Geosciences, 18(8), 1013-1034.

National Oceanographic and Atmospheric Administration [NOAA], 2019. ETOPO1 Global Relief Model. Accessed December 2019 from https://www.ngdc.noaa.gov/mgg/global/

Notes

[1]Point count estimated using survey-resolution table given in Seabed 2030.

gridfour's People

Contributors

Stargazers

Watchers

Forkers

vb6hobbyst7 pixel-0 kinow laplacekorea anweiss

gridfour's Issues

Add multi-threaded logic for compressing data

The current GVRS API is based on a single thread of execution. There are a few operations related to storing data that could be conducted in a multi-threaded manner.

For example, when I tested the PackageData application today storing ETOPO1 data, the process required 4.5 seconds. But when I turned on data compression, it required 68 seconds. When I activated the advanced LSOP option, it required 101 seconds. GVRS compression works using a generate-and-test scheme where it will try different combinations of predictors (Differencing, Triangle, LSOP, etc.) and compressors (Deflate, Huffman). Essentially, it tries a bunch of things and goes with the one that produces the best results. Since these "trials" don't share any writable memory resources, they could be conducted in parallel.

This afternoon, I did a quick hack with the CodecMaster class and made it use a ThreadPoolExecutor to process these compressors in separate threads. The 101 seconds required for the full-compression suite (including LSOP) was reduced to 65 seconds. The 68 seconds required for the standard predictors was reduced to 35.

So I propose to investigate multi-threaded implementations as a way of expediting data compression.

Notes:

In general, if the different compressors run in parallel, then the time for compression would simply be that of the one that takes the longest time to work.

With multiple compressors running in parallel, the application would consume more memory, but the overall usage of CPU of the entire program execution would not be significantly increased. The program would consume more CPU while running, but would run for a shorter time.

Further refinement may be possible, though I want to avoid the temptation to create unduly complex code to save a few seconds of run time. For example, the LSOP predictor required 12.3 seconds to compute its internal coefficients over the raster grid. If we partitioned the grid into 4 pieces, we could process each in parallel. The processing time would be reduced to about a quarter of what it was, or about 3.5 seconds. So that wouldn't be worth the effort. However, it might be possible to integrate multiple threads for some of the other parts of the LSOP process. I think I would leave that for a future effort.

Expand Color Palette Table (CPT) Support

I have identified a requirement to expand the Color Palette Table (CPT) support classes and to provide a demo application to show how to use them. In particular, the CPT classes should be capable of parsing the sample files given in Generic Mapping Tools project space at https://github.com/GenericMappingTools/gmt/tree/master/share/cpt.

Work is in progress. Preliminary code is available in the current development space. A demo application is available at org.gridfour.demo.utils.palette.ColorPaletteTableRender.java

Here's an example output from the Generic Mapping Tools files:

Translate G93 to new programming language

The Gridfour project is looking for a developer interested in translating G93 to some language beside Java. Candidates include .Net (C#), C/C++, or Rust.

We are particularly interested in a language that would support Python bindings.

The G93 API is a file-based tool for managing very large raster data sets or for storing raster data sets between application run-time sessions. Candidate raster data types include geophysical, scientific, and engineering data. G93 also provides a tool for investigating data-compression techniques for raster data. You may read more about G93 and the Gridfour project in general at our wiki https://github.com/gwlucastrig/gridfour/wiki

Expedite read operations by using multiple threads

This issue proposes to use a multi-threaded approach to improve the speed of reading data. It applies to files that are stored with data compression.

When GVRS reads data from a file that uses data compression, there are two cost factors:

Access times for reading data from file.
Processing times for decompressing the data.

It turns out that decompression is a significant contributor to access times. For example, reading the entire set of raw data from the uncompressed version of the ETOPO1 global elevation and depth data set (233 million points) requires 0.277 seconds just for file access. The compressed version requires 3.34 seconds for combined file access and decompression.

The Gridfour team is currently investigating an approach to reading data from a file using multiple-threads to perform the decompression operation.

Recall that a GVRS file is organized in tiles. If an application accesses tiles in a random order, there’s not much that additional threads can do to expedite data access. But if the application accesses tiles in a predictable order, the GVRS library can predict the next tile that the application will require and read and decompress it ahead of time using a supporting thread.
In our initial experiments, access time for compressed ETOPO1 was reduced from 3.34 seconds to 1.88 seconds.

The GVRS API also includes an enhanced data compression technique known as LSOP that improves compression ratios but requires more processing time that the standard technique. In our experiments with the LSOP version of ETOPO1, reading time was reduced from 8.22 seconds to 4.36 seconds.

We also tested with the much larger GEBCO 2020 data set (3.7 billion points). Time to read the entire data set was reduced from 66.4 seconds to 37.2 seconds.

Remaining tasks for this issue include the creation of Junit tests, code inspections, and documentation.

Document G93 file format

Now that the G93 file format is relatively stable, it would be helpful to potential developers if it were documented. Therefore, I propose that the Gridfour project create an file-format specification or some kind of Interface Control Document (ICD).

A good format specification would be particularly useful to developers translating G93 into a language other than Java.

The layout of the file could, of course, be deduced from the code. But good documentation should also cover some of the rationale for the design choices for the format as well as some of the business rules for accessing it.

For example, each file is assigned an arbitrary UUID. This value is shared with the associated index file so that it is always possible to confirm that a particular index is the correct one for the main G93 file. It would save developers who are studying the file format a lot of time if this idea was documented somewhere.

Expedite Huffman decoder using alternate tree representation

During recent testing, I observed that the Huffman decoder is significantly slower than the corresponding Deflate operation. In a way, that is not surprising since the Deflate uses a class from the standard Java API and is probably written in a very low-level language while the Huffman decoder is written in Java as part of the Gridfour library. On the other hand, Huffman decoding is conceptually quite simple (just a tree traversal) and the fact that it is so slow is a bit unexpected.

To create test data for investigating this matter, I built two test versions of the ETOPO1 library, one using Huffman coding and one using Deflate. This could be accomplished in the PackageData.java demo application by removing one of the codecs from the GvrsFileSpecification. The example snippet below removes Deflate, but leaves Huffman in place for processing (the opposite would also work)

    GvrsFileSpecification spec = // set up in application code
    spec.setDataCompressionEnabled(compressionEnabled);
    spec.removeCompressionCodec(GvrsCodecType.GvrsDeflate.name());

Compression using Deflate took somewhat longer than the simpler Huffman, but the decompression timing raised some concerns

   Time to read whole file, Deflate:     3.09 seconds
   Time to read whole file, Huffman:     6.26 seconds

I performed some code tweaks and was able to reduce that to 5.93 seconds.

Since I wasn't satisfied with that improvement, I tried something entirely different. The Huffman code is stored in a tree. The tree consists of instances of a class called SymbolNode. The traversal is just a matter of using bits collected from an input source to control how the tree is traversed (a zero bit means traverse left, a one bit means traverse right):

   for (int i = 0; i < nSymbols; i++) {
      SymbolNode node = root;
      while (!node.isLeaf) {
        if (input.getBit() == 0) {
          node = node.left;
        } else {
          node = node.right;
        }
      }
      symbols[i] = (byte) node.symbol;
    }

As an experiment, instead of constructing the tree as a set of objects, I implemented it in an array of integers. That is really old school programming. The last time I did something like that, I was working in Fortran 77 (an older version of Fortran that pre-dated structured data types). Anyway, here's a snippet of code showing the same loop:

    int []nodeIndex = decodeTree(...); // gets the Huffman tree as an array of integers
    for (int i = 0; i < nSymbols; i++) {
      int offset = nodeIndex[1 + input.getBit()]; // start from the root node
      // branch nodes are indicated by values of -1, leaf nodes by values >= 0
      while (nodeIndex[offset] == -1) {
        offset = nodeIndex[offset + 1 + input.getBit()];
      }
      symbols[i] = (byte) nodeIndex[offset];
    }

An detailed explanation of the code is included in HuffmanDecoder.java. But the odd thing is that this reduces the time to process the ETOPO1 data set down to 4.55 seconds. Still not as fast as Deflate, but better than it was.

The thing is, I don't really understand why this implementation is so much faster than the conventional tree traversal. I welcome any insights that people can offer. I have revised Gridfour's HuffmanDecoder.java class to use the approach shown above and have posted the code. The original implementation was renamed HuffmanDecoderReference.java and retained in the code base for documentation and future study.

Finally, I close by urging people to use Gridfour as much as they possibly can... I spent many hours refining the code to save 2 seconds per run. So I want to make sure it is used enough times to save users as much time I put into it [grin].

GridFour as a framework for raster processing

A gentlemen community member @micycle1 pointed us the TinFour project. We are studying it to replace the Poly2Tri library in H2GIS. We discovered the GridFour library which seems to be very promising for dealing with rasters.
Many years ago, we have developed the Grap library (https://github.com/orbisgis/grap) which intends to offer IO and algorithms to process raster data (ascii, worldfile images). Grap is based on ImageJ. It works well for small rasters but badly with large data (raster is processed in memory).
Some time ago, we look for improving the Grap lib. There are several directions : JAI framework (a bit hard to code algorithms and seems abandoned) , ImgLib2 + BioFormat (also hard and many deps), GridCoverage and/or JAI-EXT (nice support to nodata value, but based on JAI), Apache Common Imaging (easy to use but TIFF driver limited and no tile, block system) ...
If I understand well the GridFour objectives, you plan to share with the community a library to store, manage, process large raster data with a unified api and IO data R/W.
Do you think that GridFour can be used as a framework to write raster algorithms and process large data (geotiif or ascii) with a limited memory impact ? If yes we will be interested to contribute, test and reverse some algorithms we have.

Best

Prep for release 1.0.2

The changes for release 1.0.2 of Gridfour are now in final testing. I anticipate the next release of the software within the next few weeks.

Let me know if you have any questions or have encountered issues that need to be addressed.

Thanks.

Gary

Improvements to floating point compression following examples of HDF5 and S-102 data

The GVRS compression implementation for floating point data usually does better than the standard format supported by HDF5. Recently, I was working with some S-102 format bathymetry products that did not compress as well when I transcribed their data to GVRS (HDF5 is the underlying format used for data in an S-102 product).

Based on preliminary inspection, I believe that HDF5 compressed better than GVRS for the particular data sets I examined because the bathymetry products contained a large number of "discontinuities" between adjacent cells in their raster fields. The existing GVRS compressor assumes that neighboring points tend to have values that are close together.
And it is less effective when this assumption does not apply. The S-102 bathymetry products used the value 1000000 to indicate "no data" or "land". So, when the data transitioned from water to land, there would be a sudden jump in data value. This configuration was not consistent with the expectations of the GVRS compressor. Consequently the output from the GVRS compressor tended to be about 15 percent larger than the output from the HDF5 compressor.

My proposal is to extend the floating-point compressor currently implemented:

HDF5 splits the data into separate groups of bytes before compressing it using the Deflate compressor. All the high-order bytes from each floating point value are grouped together, then the next-highest bytes, et cetera, until the low-order bytes are grouped together.
The GVRS compressor will be extended to test both its current compressor and the HDF5-format. The smallest result will be used.
Currently, the second byte of the compressed packing for floating point values is a "reserved-for-future-use value" that is always set to zero, This value will be re-interpreted to indicate the data format used for compression. The new format will be indicated by the code value of 1.
The GVRS data format document is going to need a significant update to its description of the floating-point compression format.
The floating-point compressor will need to be extended somewhat to collect and report compression statistics for the alternate compressor. This approach will be similar to that used for the integer-based compressors.

Revise GVRS file format, is backward compatibility required?

I am considering a series of changes to clean up the file format for the Gridfour Virtual Raster Store (GVRS, formerly G93). If I make these changes, they will break backward compatibility with existing files. I am trying to assess the impact this change will have on existing users.

As far as I know, GVRS has not yet established a user base. While this situation is a personal disappointment, it has the advantage that I can proceed with format changes without a negative impact on anyone else's work (something about which I am very concerned).

If you have been working with GVRS files, please let me know so that the Gridfour project can continue to support you.

Thanks,

Gary

P.S. This topic was mentioned in an earlier issue at

#26

Refactor GVRS to improve metadata and API

I am working on a significant revision to the GVRS API and file-format. I plan to submit changes by the end of 2021. The changes include:

Better support for multi-variable data (such as wind vectors and ocean currents). Support for multiple data types within a single file.
Much better support for metadata. Ability to completely support TIFF tags.
Introduction of checksums for internal error detection in data.
Better consistency and predictability for the API.
More thorough unit tests.

Unfortunately, these changes will be incompatible with earlier versions of GVRS. In particular, older GVRS files will be inaccessible. If you have built up a collection of GVRS files, please let me know so that we can figure out the easiest way to transition to the new format.

Ordinarily, I try to avoid breaking compatibility across revisions. But since GVRS is still in pre-alpha development, it seems like the most efficient way to move forward.

Corrections for real-valued coordinates

I recently realized that the definition of the raster cell corner and center coordinates is not quite right. I propose to do the following

Rename the GvrsGeometryType enumeration to RasterSpaceType which I think better reflects what it does. I will also be moving it to Gridfour's "coordinates" package. I am doing that because I think the specification applies to raster data sets in general, not just the GVRS API. So it makes sense to move it to a Java package that has more general applicability.
Modify the interpretation of coordinates based on the raster-space type to be consistent with samples of source data I have investigated.
Update the document in GvrsFileSpecification to reflect a more accurate interpretation of behaviors
Extend my coordinate-transformation JUnit tests to provide more coverage of coordinates
Post a short article on the Gridfour wiki describing the coordinate systems in more detail.

Update GeoTIFF code to use Apache Commons Imaging Alpha-3 API

Last week, the Apache Commons Imaging project released a new version of their software library. The new version features a number of improvements, but it also changes some of the methods used by the Gridfour applications that access data from GeoTIFF files. Therefore, I will be updating the Gridfour code to use the new API.

I have already been using version Alpha 3 to access elevation data from NASA's Shuttle Radar Topography Mission (SRTM) and other sources. The new API is much cleaner when it comes to specifying data-read options such as accessing a subset of a product.

I am also working on a demonstration application with accompanying web article that describes how to access metadata from a GeoTIFF. I've used this approach with Gridfour to create data products that were geographically referenced and suitable for display in Google Earth and mainstream Geographic Information System (GIS) programs.

Implement Catmull-Rom Interpolator

The core gridfour module includes a 2D grid-based interpolation class using the B-Spline curve fitting algorithm. There would be advantages to implementing a companion interpolation class based on the Catmull-Rom interpolation method.

A new class should implement the interface IRasterInterpolator.

As a reminder, the current policy for the gridfour core module is that it not have any external dependencies on 3rd-party APIs other than the standard Java API. While we may be willing to yield a bit on this rule in the future, implementations would need to make a convincing case for doing so.

File corruption can occur when closing file

There is a bug in the way that GvrsFile writes out the free-space list when authoring a data file. This bug may lead to a corrupt output file.

Free space records are variable-length records that represent unused sections of a GVRS file. They occur in applications that are using data compression and performing multiple changes to tiles in the GVRS data file. Non-compressed tile records have fixed sizes. But the size of compressed records can change based on their content. Therefore, situations occur when the space that was previously used to store a tile is no longer large enough. The old space is put on the free list. New space is allocated.

The free space information needs to be preserved when a file is closed so that it can be re-used if the file is opened for future data-writing operations. Unfortunately, a bug occurred in the logic that writes this information.

The bug would not occur when the optional checksums were enabled. It only occurred when checksums were shut off.

Document the GVRS file format, introduce some revisions

I am currently writing a specification document for the GVRS File Format. To document the GVRS format, I am converting my hand-written notes into a Word document and adding text to clarify the underlying concepts.

Changes to the GVRS format
One of the consequences of this kind of review is that has revealed a few shortcomings in the existing specification. Therefore, I am also making changes to some aspects of the format. The revised format will be designated version 1.4. The Java-based API will be changing to write data in the new format. So far, I have been able to maintain the ability for the API to read files written in the 1.2 format. Frankly, I am not really sure that maintaining backwards compatibility is worth the effort. So far, the GVRS user base remains very small and I am unaware of anyone who has invested effort in creating GVRS data products.

I welcome feedback regarding this issue. If you have any questions about the GVRS format, please let me know.

As a preview, I've attached one of the diagrams from the document to illustrate the overall structure of the GVRS format.

Move coordinate-related classes to a dedicated package

The previous release of Gridfour, 1.0.1, introduced a set of coordinate-transform classes. At the time, I put them in the gvrs package. But I think that they have potential uses for grid-based applications that are unrelated to GVRS. Therefore, I am planning on moving them to their own package, which I will be naming "coordinates".

As a reminder, the long-term goal of Gridfour is to provide tools for general raster applications. The GVRS API is only one of these. For example, the Gridfour project already implemented surface interpolation classes based on B-Splines. And I am currently working on a contouring package similar to The Tinfour Contour package which was developed for our companion project Tinfour.

I anticipate that some of this future work will require access to coordinate transformations, so I will be moving the classes and, in some cases, renaming them. Affected classes include IGridPoint, IModelPoint, IGeoPoint, GridPoint, ModelPoint, and GeoPoint. The existing classes GvrsGeoPoint and GvrsGridPoint will be renamed to a more generic form.

Add direct support for image data into GVRS API

Discussed in #12

^{Originally posted by gwlucastrig December 9, 2021}

I propose to extend the GVRS API to provide direct support for imagery (aerial photographs, conventional pictures, etc.). To do so, the API would be expanded to include a new GvrsElement class customized for the representation for pixel data. I am considering on of the following two class names (please indicate if you have a preference):

GvrsElementPixel
GvrsElementRGB

Background
The GVRS file format and Java API is intended to simplify the processing of large raster data sets. One of the ideas behind GVRS is that an application can use it for high-speed processing of large data sets and then export the results to whatever data format is best suited for a user’s needs.

Currently, GVRS focuses on numerical information. Of course, there is another kind of raster information that a lot of people care about: images. I was wondering what everyone thought about implementing some sort of support for image formats in GVRS.

When I designed GVRS, I did not consider image data as being something for which GVRS could provide much “value added”. There are plenty of good graphics formats and supporting APIs out there already. Recently, however, I’ve been playing with some sample data that includes both aerial photographs and surface elevation samples. This effort involves working with two disjoint data products using separate APIs and two separate sets of data objects. It occurs to me that having limited built-in support for imagery might be a more convenient way to work. It might be an attractive feature for some users.

I spent a few evenings going through the PNG and TIFF format specifications, and have also taken a look at a less well-known format known as NITF. Image-processing is an important topic and these products all feature more complexity than might be obvious at first glance. For GVRS, I would deliberately focus on a narrow subset of these features. I list some basic features below. I will consider more advanced features such as CMYK and CIELAB color spaces, or ICC profiles. Doing so would depend on users identifying features that had the most utility and were of the highest priority. Image processing is a vast topic and it would be easy to spend a lot of time adding specialized features only to come up with a complex implementation that nobody wanted to use. I'd like to avoid that.

Better support for GeoTIFF Tags

Although the VariableLengthRecord (VLR) class is designed to preserve metadata such as that found in GeoTIFF tags, it doesn't provide adequate access methods. Nor does it carry information about data type and array dimensions (where appropriate). Expand the class to fully support TIFF tags.

Note that doing so will change the G93 fie definition slightly. Fortunately, this change will only affect VLR storage.

Enhance PackageData application to support Cartesian coordinates

The PackageData application demonstrates how to store NetCDF files that give global elevation and bathymetry data. Currently, it is limited to products with geographic coordinate systems. Products such as the International Bathymetry Chart of the Arctic Ocean (IBCAO) use Cartesian coordinate systems (coordinate systems baed on map projections). Enhance PackageData to support Cartesian coordinates.

Port the GVRS API to Rust

I am looking for a developer who would be interested in porting the GVRS API to the Rust programming language.

The GVRS API offers four capabilities that may be useful for the Rust community:

It provides a “virtual raster” that would assist Rust programs processing very large gridded data products, especially those that might be too large to be conveniently kept in memory.
GVRS provides a testbed for developers who are experimenting with data compression techniques for raster data products (that was, in fact, the original reason I wrote it).
GVRS provides a persistent data store for raster products. It is particularly well suited to geophysical data.
A GVRS port would provide Rust programs with access to the global elevation and bathymetry data sets that already exist for the Java API.

Now, before I go on, I have to qualify claims 3 and 4 by pointing out that there are many raster data products out there and virtually all of them have wider user bases than GVRS (NetCDF and HDF5, for example). In terms of sheer availability of data, those other products have distinct advantages over GVRS. So I don’t want to oversell my project.

On the other hand, I wrote GVRS with the idea that the code would be ported to other languages, and I tried to organize it in such a way that it a port could be executed quickly and well. If you want to port GVRS to Rust, my attitude is that it would be your project and I would try not to interfere in the design or direction of the porting effort. Of course, I am highly motivated to have someone succeed in porting GVRS to Rust. So I would be available to answer questions, explain concepts, and to help smooth over any code incompatibilities that might arise between the Rust and Java implementations.

To learn more about GVRS, visit our Project Wiki or read our Frequently Asked Questions page.

I recently posted a preliminary draft of the GVRS file format at https://gwlucastrig.github.io/GridfourDocs/notes/GvrsFileFormat_1_04.pdf

Changing GVRS API, need comments

As I was writing the "getting started" pages for the Gridfour wiki, I realized that I'd made a mistake in the way I created some of the coordinate-transform methods in the GVRS API.

As a solution, I am preparing for release 1.0.1 in the near future.

I'd like to get some user comments on this issue. In particular, I'd like to get comments on my proposed approach to making this change. I would prefer to remove the old methods and replace them with the new. Ordinarily, my approach to this issue would be to deprecate the old methods and remove them in the future. But since GVRS is so new, and so few users have adopted it (so far), I think it would be cleaner to just make the change. So I would like to get some user feedback on how much of an inconvenience this would be.

The Proposed Change

The existing code includes methods in the form:

double []gridPoint = spec.mapGeographicToGrid(latitude, longitude);
double []geoPoint  = spec.mapGridToGeographic(gridPoint[0], gridPoint[1];

I think that the use of arrays was a mistake. First off, they are error prone. It's easy to get confused as
to which array index refers to which coordinate. Is geoPoint[0] a latitude or a longitude? Unless you know the API pretty well, you may be unsure.

I propose to replace these methods with an alternate approach:

GvrsGridPoint gridPoint = spec.mapGeographicToGridPoint(latitude, longitude);
GvrsGeoPoint  geoPoint  = spec.mapGridToGeoPoint(gridPoint);
System.out.println("row:    "+gridPoint.getRow());
System.out.println("column: "+gridPoint.getColumn());

I've tested these changes for performance and the new class-based technique is only a tiny bit slower than the array based approach (on average only about 1 part in 2397). In fact, in 18 our of 40 trials, the class-based technique was actually faster than the array-based form. So any differences are probably in the noise.

Post Gridfour Core to Maven Central

Version 1.0 of Gridfour Core is now ready for posting to Maven Central.

The following tasks are required:

Post Gridfour Core to Maven Central
Post a new Wiki page giving a getting started guide
Create an S3 bucket on Amazon AWS (the cloud)to provide samples of the ETOPO1 data set in GVRS format. This file is about 120 MB. Posting it on the cloud will give interested developers a set of data that they can use while testing the GVRS API.

Improve terminology for management classes and methods

There are a number of places in the GVRS API where classes associate keys of some sort with an object, file position, or other data element, In the original implementation, these entities were referred to as "indices". But this terminology was confusing because the term "index" was also used for integer indices into arrays or grids. For example, the index of a tile is its position within the raster.

tile_index = tile_row * n_columns_of_tiles + tile_column.

But, at the same time, we also had an entity called the "tile index" that would map the tile_index to its position in the file. This usage was confusing.

For a less ambiguous and less confusing terminology, we are now using the term "map". In computer programming, the term "map" is commonly used to indicate a set of associations between keys and values. In Java there is an interface called Map and in C# there is a class called a Dictionary that implements the same idea. See Wikipedia: Associative array. In mathematics, the term map is used in set theory or as a generalization of the concept of a function. See Wikipedia: Map

A few classes will be renamed. TilePositionIndex.java becomes TilePositionMap. Some methods will also be renamed. Most of the affected elements have package-level or protected scope, and are visible only in the org.gridfour.gvrs package.