Comments (4)
Since posting this issue, I have discovered that the drift
method gets me closer to what I want, as I can pass in multiple lat/lons as well as multiple times, and avoid the multiple NetCDF reads:
import pandas as pd
from pyTMD import compute_tide_corrections
# Input data (multiple times per point)
example_times = pd.date_range("2022-01-01", "2022-01-30", freq="1D")
point1_df = pd.DataFrame({'lat': -32, 'lon': 155, 'time': example_times})
point2_df = pd.DataFrame({'lat': -33, 'lon': 157, 'time': example_times})
point3_df = pd.DataFrame({'lat': -34, 'lon': 161, 'time': example_times})
# Combine into a single dataframe
points_df = pd.concat([point1_df, point2_df, point3_df])
# Model tide heights using 'drift'
out = compute_tide_corrections(
x=points_df.lon,
y=points_df.lat,
delta_time=points_df.time.values,
DIRECTORY="FES2014",
MODEL="FES2014",
EPSG=4326,
TYPE="drift",
TIME="datetime",
METHOD="bilinear",
)
# Add back into dataframe
points_df['tide_height'] = out
However, because I have static points with multiple timesteps at each, drift
still ends up being less efficient than I want because it assumes each time also has a unique lat/lon, which causes the spatial interpolation step to be run for every individual lat/lon/time pair (rather than being interpolated once for each unique point location then re-used for each time given that the point coordinates are the same for all times).
I think for this application (many timesteps for a smaller set of static modelling point locations), the most efficient processing flow might be something like this?
- Only once per entire analysis: read NetCDF tide model data
- Only once per static modelling point: Extract and interpolate constants based on lat/lon
- For every time: Model tide heights based on extracted constants at each point
(or alternatively, perhaps some method to detect duplicate/repeated lat/lons, then batch those together to reduce the number of required interpolations...)
from pytmd.
@robbibt still thinking about the best way to enact these changes. One idea I've been floating is to cache the interpolation objects for each constituent so that won't have to be repeated reads. I'm worried about this being a bit memory intensive though so I need to put in some tests.
I've also been reorganizing the code structure lately in #132 and #135. Everything should still be backwards compatible just with some additional warnings.
from pytmd.
Hey @tsutterley, am doing some further optimisations of our tide modelling code as we're moving towards a multi-tide modelling system where we choose the best tide model locally based on comparisons with our satellite data. Because of this, our modelling now takes a lot longer than previously, so I'm looking into trying to parallelise some of the underlying pyTMD
code to improve performance.
Our two big bottlenecks are:
- Loading the tide constituent NetCDF files (which we have largely addressed by clipping the files to a bounding box around Australia)
- Extracting tide constituents from the NetCDFs
For number 2, I've been able to get a big speed up by parallelising the entire pyTMD.io.*.extract_constants
calls across smaller chunks of lat/lon points using concurrent.futures
. However, I think there's still some gains to be made as pyTMD.io.*.extract_constants
includes the slow NetCDF read step itself, so we're effectively wasting time in each parallel run by loading the same data multiple times.
I know you made some changes to address this last year when I first posted this issue, but I wanted to double check: are the newer pyTMD.io.*.read_constants
and pyTMD.io.*.interpolate_constants
functions intended to completely replicate the existing functionality in pyTMD.io.*.extract_constants
? Or is there any functionality I'd lose by running those two functions instead of pyTMD.io.*.extract_constants
?
Ideally, I'd love to do something like this:
- Run once:
pyTMD.io.*.read_constants
- Run many times in parallel, using previously-loaded constituents:
pyTMD.io.*.interpolate_constants
from pytmd.
Hey @robbibt, basically yes that was the plan. The new functions can completely replicate the prior functionality. The difference is that using the new read and interpolate method keeps all of the constituent data in memory. In some cases this may be slower, such as running on a small (possibly distributed) machine. So I've kept both methods.
In cases where you want to run for multiple points with the same data, there is a potential speed up with the new method since (as you mentioned) there's the io bottleneck.
I've thought about switching to dask
arrays (probably using xarray
) but need to do some testing. I'm completely open to suggestions for squeaking out performance.
from pytmd.
Related Issues (20)
- Getting tons of DeprecationWarning upon import HOT 6
- `scipy` v1.10.0 fails for `RegularGridInterpolator`
- compute_tidal_elevations.py --type=grid fails with shape mismatch HOT 1
- Suggestion: show tidal models in help HOT 2
- pyTMD without Jupyter HOT 2
- Getting-Started.rst old link to Arc2kmTM HOT 1
- Netcdf time needs to be double HOT 1
- Definition File for HAMTIDE constituents HOT 3
- Allow Glob patterns when specifying the model files in definition files HOT 8
- pyTMD.io.model not working for current with FES2014 HOT 3
- How to get a datum and a datum value? HOT 2
- setting directory and model HOT 17
- FES2012 support? HOT 4
- FileNotFoundError HOT 1
- Interpolation `METHOD="spline"` fails for TPXO8 HOT 8
- Issue with Iterating Over Model Files in pyTMD.io.ATLAS.extract_constants() HOT 2
- NameError: name 'ipyleaflet' is not defined HOT 1
- module 'pyTMD' has no attribute 'compute' HOT 2
- pyTMD.compute.tide_currents needing both u_* and v_* rather than only u_* HOT 16
- From barotropic velocity to surface velocity
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytmd.