glaciohack / geoutils Goto Github PK
View Code? Open in Web Editor NEWAnalysis of georeferenced rasters and vectors
Home Page: https://geoutils.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
Analysis of georeferenced rasters and vectors
Home Page: https://geoutils.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
This is advance warning that once Hack number 2 is over there will be some administrative changes:
dev
renamed to master
main
renamed to stable
master
.master
will probably mandate "squash and merge" commits (I may need to discuss further with @fmaussion on this)What does this mean for you?
geoutils
geoutils
repo to your personal accountgit clone
your forkgit push
back to your own forkmaster
.Amongst other things, this will simplify the main repository's history, and bring us more into line with "best practises" adopted elsewhere.
At the moment, an epsg integer can be provided via the crs keyword. Would probably be clearer to have them both as kwargs.
A raster with no data is plotted with no data values masked.
e.g. for use in notebooks etc, it would be good to streamline the opening/paths to the demo datasets. These can be the same as those used in testing, but we shouldn't need to rely on the test suite to tell us where demo data is....
If the user wants one DEM to fit another, they would presumably use the .reproject()
method directly. If the DEMs already fit, however, the reprojection is still done. This adds an unnecessary processing time (or even failure in case of gigantic DEMs), and could be avoided with a sanity check within the .reproject()
method.
To exemplify what I mean:
import time
import geoutils as gu
import rasterio as rio
# Load a KH-9 DEM sent by Amaury which is quite big
dem = gu.georaster.Raster("DZB1208-500032L004006/DZB1208-500032L004006-DEM.tif")
# I reproject the DEM to its own exact bounds using cubic spline
# This is a proxy for if a DEM with the exact same bounds as a reference DEM should (not) be reprojected.
time_before = time.time()
dem2 = dem.reproject(dem, resampling=rio.warp.Resampling.cubic_spline)
time_after = time.time()
print(f"{time_after - time_before:.2f} s")
Which returned:
4.98 s
I would argue that this is a shortcoming or even bug in either georaster
or rasterio
. Why would resampling be done if the source and destination bounds/crs/resolution is the same?
Something along this could be added:
if dst_ref.bounds == self.bounds and dst_ref.res == self.res and dst_ref.crs == self.crs and dst_ref.nodata == self.nodata:
return dst_ref
or the same with shorter lines:
if all([
dst_ref.bounds == self.bounds,
dst_ref.res == self.res,
dst_ref.crs == self.crs,
dst_ref.nodata == self.nodata
]):
return dst_ref
This is not a huge issue, but I think the check is a valuable addition!
The crop() method notionally supports cropping of an image not yet loaded into memory (i.e. self.isLoaded logic). However, it fails if no image data have been loaded.
To do:
This is because of an improper use of the Raster.crop method, which does not return a data set but update in place.
https://github.com/GlacioHack/GeoUtils/blob/68542e008d685fb9a01ae57d4e4f2befdf2120ae/geoutils/georaster.py#L425
This is currently missing in some instances. For example: if I use crop(), I will lose my original Raster as it operates on the original one. Might want to copy before this operation.
By default, there are at least two links to the underlying rasterio reader saved within a Raster object: .ds
, .dataset_mask
. .memfile
may also count.
There are some edge cases when this is problem. The one I've identified is that Rasters with these readers still open cannot be pickled when they are being distributed to compute nodes in multiprocessing situations.
Manually calling a method to close these links once the Raster has been created is a possible solution to this problem.
The opposite does not make sense...
@rhugonnet have you looked at my earlier comment? I'm concerned that using rasterio's plotting function might load the data a second time, which is not ideal for large data set...
Originally posted by @adehecq in #37 (comment)
We would like to be able to support operations like this:
im1 = Raster(...)
im2 = Raster(...)
combined = im1 + im2
At the moment, this requires a fairly unsemantic approach:
combined = im1.copy()
combined.data = combined.data + im2.data
Proposal: add operator overloading methods to the Raster
class, e.g. __add__
, __multiply__
. See e.g. https://stackabuse.com/overloading-functions-and-operators-in-python/ for more info.
e.g. georaster.coordinates(), potentially geoimg.xy()
Similar to georaster's interp method: https://github.com/GeoUtils/georaster/blob/0df19bfd91fd10e92beb54308428e212e1d19c1a/georaster/georaster.py#L859
This is different from value_at_coords which extract only exact pixel values. This should be used to interpolate at any position.
Convenience wrapper, any coordinates, with optional associated crs if coordinates not in Raster crs, to look up value using rasterio functionality for this. like georaster.value_at_coords().
I have a weird CRS issue with using Vectors and Rasters together:
I am trying to use the Vector exclusion_mask
to create a mask for the Raster reference_raster
:
mask_array = exclusion_mask.create_mask(reference_raster, crs=reference_raster.crs)
but this raises an error:
Traceback (most recent call last):
File "/home/erik/Projects/ETH/GlacioHack/DemUtils/DemUtils/coreg.py", line 703, in <module>
test_icp_coregistration()
File "/home/erik/Projects/ETH/GlacioHack/DemUtils/DemUtils/coreg.py", line 683, in test_icp_coregistration
aligned_raster, error = coregistration.run(reference_raster, to_be_aligned_raster, glacier_mask)
File "/home/erik/Projects/ETH/GlacioHack/DemUtils/DemUtils/coreg.py", line 654, in run
mask_array = exclusion_mask.create_mask(reference_raster, crs=reference_raster.crs)
File "/home/erik/.local/share/conda/demutils/lib/python3.9/site-packages/geoutils/geovector.py", line 151, in create_mask
vect = self.ds.to_crs(crs)
File "/home/erik/.local/share/conda/demutils/lib/python3.9/site-packages/geopandas/geodataframe.py", line 816, in to_crs
geom = df.geometry.to_crs(crs=crs, epsg=epsg)
File "/home/erik/.local/share/conda/demutils/lib/python3.9/site-packages/geopandas/geoseries.py", line 541, in to_crs
transformer = Transformer.from_crs(self.crs, crs, always_xy=True)
File "/home/erik/.local/share/conda/demutils/lib/python3.9/site-packages/pyproj/transformer.py", line 368, in from_crs
_Transformer.from_crs(
File "pyproj/_transformer.pyx", line 349, in pyproj._transformer._Transformer.from_crs
pyproj.exceptions.ProjError: Error creating Transformer from CRS
I have found a workaround which is very ugly but might aid in debugging:
# Extract the string version of the CRS
crs_string = reference_raster.crs.__repr__()
# Find the last EPSG entry (assuming there is one..)
crs_epsg = crs_string[crs_string.rindex("EPSG") + 7: crs_string.rindex("\"]]")]
reference_raster.crs = pyproj.crs.CRS.from_epsg(crs_epsg)
# Now this works
mask_array = exclusion_mask.create_mask(reference_raster)
The Vector CRS is ETRS89 UTM 33N and the Raster CRS is WGS84 UTM 33N
E.g. contributors page, issues template...
Existing test method for Raster.save() does not check the GCP export capability.
For example:
from geoutils import georaster as geor
img = geor.Raster('tests/data/LE71400412000304SGS00_B4_crop.TIF', load_data=False)
img.shift(20,10)
returns an AttributeError,
AttributeError Traceback (most recent call last)
in
----> 1 img.shift(20,10)~/development/GlacioHack/GeoUtils/geoutils/georaster.py in shift(self, xoff, yoff)
575 meta.update({'transform': rio.transform.Affine(dx, b, xmin + xoff,
576 d, dy, ymax + yoff)})
--> 577 self._update(metadata=meta)
578
579 def set_ndv(self, ndv, update_array=False):~/development/GlacioHack/GeoUtils/geoutils/georaster.py in _update(self, imgdata, metadata, vrt_to_driver)
251
252 with memfile.open(**metadata) as ds:
--> 253 ds.write(imgdata)
254
255 self.memfile = memfilerasterio/_io.pyx in rasterio._io.DatasetWriterBase.write()
AttributeError: 'NoneType' object has no attribute 'shape'
rasterio's show method does not allow adding a color bar. Find a solution to this...
Add a kwarg to reproject method which allows another Raster object to be provided from which the destination CRS,transform,bounds are taken.
At the moment, the raster data (if loaded) are stored in Raster.data. This is a publicly available np.array.
The problem with this is that data can be arbitrarily overwritten without any sanity check of whether the dimensions match the Raster. It could also cause problems with nodata values.
I propose that the raw np.array is moved to Raster._data. We keep the Raster.data accessor, but overload it with set/get methods.
This line should be modified to also accept numpy numbers/
https://github.com/GlacioHack/GeoUtils/blob/19e1fd521a0899529a8eef23b5ea0ae5258bc1c7/geoutils/georaster.py#L588
Rather than creating individual file for each executable 'tool' that we want, we could use this: https://github.com/garenchan/pycli
Currently, the clip method is empty
under the CI testing approach, all tests need asserts so that they can be checked automatically - we are no longer relying on checking the outputs from print statements manually.
When loading a single band with load (or self.ds.read), the output array has a dimension of 2, instead of 3 for all other cases. This causes issues later on, for example in set_ndv.
water is too hot
Go through David's tools and see what useful functionalities could be implemented
https://github.com/dshean/pygeotools
It would be great to add an implementation of the Savitsky-Golay algorithm to Raster class. Very useful to smooth the data and calculate derivatives (e.g. for slope, curvature etc)
See an example here: https://scipy.github.io/old-wiki/pages/Cookbook/SavitzkyGolay
The plot shown here is unexpected. Should be checked.
https://github.com/GlacioHack/GeoUtils/blob/ff8d2579085c9c29da1711ec6370febb033f987c/tests/test_georaster.py#L78
This line will fail is self._data is not an array, e.g. is None:
https://github.com/GlacioHack/GeoUtils/blob/ff8d2579085c9c29da1711ec6370febb033f987c/geoutils/georaster.py#L223
.plot() / .display()
When trying to open a VRT file with georaster.Raster, fails with 'RasterioIOError: Read or write failed. No such file or directory'.
Not sure exactly where the error is, because the traceback does not make sense.
from geoutils import projtools -> "ModuleNotFoundError: No module named 'shapely.ops.transform'; 'shapely.ops' is not a package"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.