respec / hspsquared Goto Github PK

View Code? Open in Web Editor NEW

43.0 43.0 17.0 174.99 MB

Hydrologic Simulation Program Python (HSPsquared)

License: GNU Affero General Public License v3.0

Python 3.26% Jupyter Notebook 61.19% HTML 35.55% R 0.01%

hspsquared's People

Contributors

Stargazers

Watchers

Forkers

pjwickman drmerlot lucashtnguyen mengfeimu kbrannan mbera61 waternk scalet98 atrcheema benjamincrary limnotech harpgroup ctfysh emnet-llc geosyntec phobson timcera

hspsquared's Issues

Write WDM?

Wondering if there is code in this suite for writing WDM files? I still use wdimex written in fortran, but if there is a modern implementation in this package I may adopt.

UCI import with 15min timestep

Hi,
I have a UCI file that runs at 15min timestep. A couple issues have come up running the import checks, but I'm still unable to reproduce HSPF results.

Issues:

external sources evap record is 15-min with a 0.75 multiplier in HSPF. The MFACTOR comes into HSP2 ok, but needs to be divided by 96 (=4*24). After revising this MFACTOR the IMPLND and portions of the PERLND are ok.
The PERLND in HSP2 are not reproducing HSPF results. I've traced the source to the groundwater storage bin (AGWS/AGWO). Everything above that is nearly identical. The AGWI is nearly identical for HSP2 vs HSPF.

However, the AGWO, AGWS, and GWVS all vary between HSP2 and HSPF. I checked all the PERLND parameters and states and they are the same. Evap is off on the groundwater bin (AGWET=BASET=0).

Trying to figure out what to try next. KGW appears to be 0 (AGWRC=0.997, DELT60=0.25 for 15min), and perhaps HSP2 is treating this case differently than HSPF? Is something else different about the groundwater routine? Perhaps the 15-min time step may be an issue here as it was with item 1.
Any ideas are appreciated.
Thanks.

Improve git-tracking for Jupyter notebooks

As we were working on the testing system (#31), @rheaphy, @steveskrip, @ptomasula and I discussed on a call the challenges of git-tracking Jupyter notebooks, because output cells are updated every time they are run even if the Python code or markdown doesn't change.

Let's work out a good way to save and track Jupyter notebooks.

Here's @ptomasula's HSP2 Potential Solutions for Python Notebooks in GitHub email with background and options:

I did a bit of digging into a potential solution to dealing with merging python notebooks in git. I found three potential solutions, which are outline below, but I’m personally leaning towards option 2. Option 2 is extremely easy to implement, solves the immediate merging challenges Bob was describing, and while there is a slight potential for issues around resolving merge conflicts between binary files, those can be largely avoided by coordinating our efforts (which we’ve done very well thus far). I’d be curious to hear thoughts from the rest of the group. I’d also note that whichever option we choose, isn’t necessarily set in stone. If we find the approach isn’t working for us we can change it down the road. We can also deviate from these options if anyone has great suggestion for a solution.

Background/Problem
Python notebooks are stored as JSON, which provides for source tracking. However; when a notebook cell is run, certain cell attributes (output, execution count, etc.) are updated. This caused a number of impacts including;

Added difficultly managing merge conflicts (manual line by line process)

Larger and somewhat unruly commits

More difficult to review. Important changes in code vs less critical (trivial for the purposes of source tracking?) changes resulting from running a cell both appear in a diff.

Option 1 - Strip output block out prior to commit
Python notebooks are stored as JSON. This makes it fairly easy to read and programmatically strip out the pieces of the document that are causing the issues described above. We could either write or use an existing script to accomplish this.

Pros

Fairly easy to implement. There already appears to be a number of tools that solve do this (https://pypi.org/project/nbstripout/ or https://github.com/toobaz/ipynb_output_filter). It would also not be a big lift to develop something to do this if we need to.

Still allows us to source track code changes in the cells contents (‘source’ attribute).

Cons

We lose the ability to share the output directly in the notebook, which may be of some value to users.

Extra step to committing code. Need to runs conversion tool prior to commit. (We might be able to get around this using the gitattribute filter, but I haven’t an experience using that)

Option 2 - Enable notebooks to be handled as binary
Utilized the git attributes for flag all or select notebook files as binary. This would overwrite the entire file upon commit, and reduce conflict resolution to which version to use (instead of manually resolving lines)

Pros

Easy to implement

Allows for the output from the notebooks to be shared

Cons

Slight potential to overwrite code changes because of how merge conflicts are handled between binary files (i.e. must use either my file our their file)

Less explicit tracking of code changes, but could still tease them out by comparing versions

Option 3 – Use a merge management tool
Use a tool to ease the merge process. The most promising one I've came across is nbdime (https://github.com/jupyter/nbdime), which was developed by the jupyter team.

Pros

Potential for best of all both world approach, easier to manage conflicts while still retaining change tracking on a line by line basis

Cons

Appears to be console only (at least nbdime does, but there may be other tools out there)

Doesn't solve unruly commits when looking at them on GitHub

Still requires some level of manual conflict resolution (it’s just drastically reduced)

Do you support Uncertainty Analysis module ?

Hello !
I've just joined this website. And I found your organization.
I am a Ph.D. candidate for Hydrologic & water quality modeler under the climate change scenario.
For my studies I want some uncertainty analysis in terms of various variables.

However, from some conference material I found that your organization developed some modules to link HSPF input and output file to DAKOTA program...

Do you support those program or module?

Sincerely yours,
Jiheon Lee.

Consolidate and update tutorials for HSP2>0.9.3

Presently the HSP2 repo contains a wide variety of tutorials and demos as Jupyter Notebooks, which have been developed at different time periods and different versions of HSP2, and which have overlapping content that does not clearly build from one tutorial to the next.

Our objective is to consolidate and update all Jupyter notebooks, to build a clear progression of hands-on tutorials that demonstrate increasing complexity of running and using HSP2.

Calculation of IVOL

This is not regarding an issue in code but a question while trying to understand code.
IVOL accounts for sum of inflows in RCHRES from PLS and impervious surfaces. I suppose this is calculated from PERO and SURO as mentioned in MASS-LINK block of uci. I am trying to understand that how this code calculates IVOL. In file hrchhyd.py, IVOL array is set to zeros for the case when there is no inflow to RCHRES. But for cases when there is inflow from either PERLND or IMPLND in RCHRES, which part of code of HSP2 calculates IVOL.
I shall be thankful for the hint.

Improve performance of HDF5 read/write

@rheaphy emailed with his 2020-05-24 "HSP2 status update":

Most of the last 2 weeks was investigating the much slower run times of HSP2 compared with HSPF. Prior to the "new" HSP2, the old HSP2 was 1.6 times slower than HSPF. I had expected this difference to be much less with the new design. Instead it started out almost 4 times slower! Since Python, Pandas, Numpy and Numba had all changed significantly, it is very hard to understand where the slow down occurred. With yesterday's update, I had cut this to a bit above 2 times slower (depending on if the HDF5 file was already created or not.) Using faster write methods in HDF5 seemed to really speed things up - but caused warning messages. I never found any problem in either the HDF5 file or the run results when the warnings were visible. Since warning messages would bother our users, I rejected using the faster write methods to improve the run time. (I still keep the option of the faster write methods but disabling the warning messages as a last resort.) I believe the only difference between the fast and slow writing methods is if they flush() after every write or not.)

Basically, I started using BLOSC compression on all timeseries and computed results when storing into the HDF5 file. This cut the HDF5 file size almost in half as well. Since the newer HDF5 library keeps the HDF5 file from growing significantly with additional runs, this is great. (The old HDF5 library would really let the HDF5 file grow to huge sizes!) And no warnings. I did not compress the UCI like tables so that other tools like HDFView would display properly. While I could compress everything and still use HDFVIew if I register the compression library to the HDF library, I don't want to make our users do this themselves. So this is a compromise for now.

I suspect that the changes to Numba are primarily responsible since I now need to copy information from the regular Python dictionary containing UCI information to the Numba typed dictionary in order to transfer the data into the hydrology codes. I spent time reviewing the Numba team meeting notes and issues and found a related issue concerning the new Numba typed lists. The contributors to the discussion indicated this could impact the typed dictionary as well. The Numba team is investigating the issue, so I will wait for more information before I address this improvement direction. I will do other profiling tests to look for other possible places for the slow execution.

I still think I can make HSP2 nearly as fast as HSPF, but it will take more time. At least, it is still fast enough to use - again. I remember the early days when calleg took over 40 minutes to run instead of a little over a minute. (HSPF takes 32.2 seconds on my machine and worst case HSP2 runs now take 1 minute 23 seconds (clean start after restarting the Python kernel and creating a new HDF5 file) to 1 minute 19 seconds if kernel had previously run HSP2. Without Numba, HSP2 takes 13 minutes 25 seconds - so Numba does help a lot!

I see a lot of profiling in my future.

These some of these recent commits are cca2b0c, d154e55, and e92c035.

Abstract I/O & storage beyond HDF5 for flexibility, performance, & cloud

This high-level issue pulls together several past, current, and near-future efforts (and more granular issues).

The tight coupling of model input/output (I/O) with the Hierarchical Data Format v5 (HDF5) during the HSP2 runtime limits both performance (see #36) and also interoperability with other data storage formats such as the cloud-optimized Parquet and Zarr storage formats (see Pangeo's Data in the Cloud article) that are tightly coupled with high-performance data structures from foundations PyData libraries Pandas, Dask DataFrames, and Xarray.

Abstracting I/O using a class-based approach would also unlock capabilities for within-tilmestep coupling of HSP2 with other models. Specifically, HSP2 could provide upstream, time-varying boundary conditions for higher-resolutions models of reaches, reservoirs, and the groundwater-surface water interface.

Our overall plan was first outlined and discussed in LimnoTech#27 (Refactor I/O to rely on DataFrames & provide storage options). In brief, we would refactor to:

Run HSP2 by interacting with a dictionary of Pandas dataframes or Dask DataFrames in memory
- presently the model reads/writes to HDF5 during the model execution
Reading/writing to storage from the dictionary of dataframes is done with a separate set of functions.

cc: @PaulDudaRESPEC, @ptomasula

HDF5 Class can leave open connections in IPython

During our workshop we noticed that instances of the HDF5 class can leave connections open to the HDF5, file that persist even when the del method is called on the instance. This behavior was only observed in an IPython environment (e.g. jupyter lab). With some additional testing, I determined that this occurs when the instance of that object was called as the last line in the cell. This typically does some display feature in IPython, but for this particular class is also appear to cause the instance to somehow become referenced by the cell. The results is that the open _store attribute is keeping the connection to HDF5 even when the instance has been deleted.

wdmReader does not set constituent for observed timeseries

When converting timeseries data from wdm files to hdf5 files the constituent attribute for the observed data is not converted. For example, constituent for TIMESERIES/TS122 should be 'ATMP' based on the attribute in WDM DSN 122.

show folder for missing file

If a file is not found, show the path that was used in addition to the filename

current code from main.py (11-13)
if not os.path.exists(hdfname):
print (hdfname + ' HDF5 File Not Found, QUITTING')
return

Alternate CLI library to `mando`?

With PR #65 & #75, @timcera introduced some nice Command Line Interface (CLI) features to HSP2. To do so, he used the Mando library.

Unfortunately, mando hasn't been updated since 2017 or tested on Python 3.9, and we'll be wanting to migrate to Python 3.9 in the next round of work. We'll likely want to select and implement an alternative Python CLI library.

Here are few posts that I found on the topic:

Read access to repo?

I would like to submit pull requests for simple stuff like fixing deprecation warnings in the tutorials. To do that I need to keep my fork synced with the main repo, and to do THAT I need to be able to read from the repo (see https://help.github.com/articles/syncing-a-fork/). Looks like clone/read rights are locked down at the moment.

Are you guys accepting updates/changes/fixes right now?

logfile.txt needs line separators

file 'logfile.txt' does not have line separators. suggest adding '\n' to separate records.

logfile.txt

Expand & automate testing system

As we discussed for our RESPEC-LimnoTech Collaborative Work Plan during our workshop (March 24-25, 2020), expanding and automating the testing of HSP2 vs. HSPF is an immediate priority.

Our objective for testing is to ensure that HSP2 provides the same results as HSPF for:

all HSP2 releases,
several relevant operating systems and software environments, and
a selected group of watershed models that have been calibrated and examined for real-world water management applications and that represent a range of watershed properties.

We decided that:

HSPF “reference” model runs should be added to repo and considered static/stable
HSP2 outputs will continually evolve, expanding as new process modules are implemented
Comparisons will be point-by-point for major output time series, to "byte-precision" of about 3 significant figures to allow for rounding errors
- We can't do traditional unit testing of individual routines because HSPF doesn't save that data.

RESPEC has two test models to contribute:

Test10
Calleg

LimnoTech will add additional models:

We selected 2 watersheds that we’ve recently modeled in HSPF, and selected a single sub-watershed (to simplify running)
- Grant River, MI. Relatively simple
- Zumbro River, MN. More complicated. Full water quality suite.
Hydrological Response Unit (HRU) testing
- 5-10 micro watersheds (1 HRU + a few stream reaches)

Let's use this issue to track progress on all the smaller tasks required to complete this.
We have already added some reference models and testing code with 49c71f3, 60378a7, and LimnoTech@130bef2.

cc: @rheaphy, @PaulDudaRESPEC, @steveskrip, @ptomasula,

Implement GENER and related operations

The GENER module is used extensively to compute (within the model) constituent loads based on input flow and concentration, with one or both being a time series.

These capabilities are used extensively several of our HSP2 use cases

@ptomasula is getting started on this.

HSP2 unhandled error for standard Test10

When running Test10 (slightly modified to export output data to an HBN file), I can successfully create an .h5 file using readUCI and readWDM. Then, when running main, I get the following error messaging when processing the PSTEMP module


--------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-ff67a3b4c34f> in <module>
----> 1 main(hdfname, saveall=True)

c:\Users\sskripnik\Documents\GitHub\HSPsquared\HSP2\main.py in main(hdfname, saveall)
     71 
     72                 ############ calls activity function like snow() ##############
---> 73                 errors, errmessages = function(store, siminfo, ui, ts)
     74                 ###############################################################
     75 

c:\Users\sskripnik\Documents\GitHub\HSPsquared\HSP2\PSTEMP.py in pstemp(store, siminfo, uci, ts)
     30 
     31         ui = make_numba_dict(uci)
---> 32         TSOPFG = ui['TSOPFG']
     33         AIRTFG = int(ui['AIRTFG'])
     34 

~\Anaconda3\envs\hsp2_py37\lib\site-packages\numba\typed\typeddict.py in __getitem__(self, key)
    146             raise KeyError(key)
    147         else:
--> 148             return _getitem(self, key)
    149 
    150     def __setitem__(self, key, value):

~\Anaconda3\envs\hsp2_py37\lib\site-packages\numba\dictobject.py in impl()
    736         ix, val = _dict_lookup(d, castedkey, hash(castedkey))
    737         if ix == DKIX.EMPTY:
--> 738             raise KeyError()
    739         elif ix < DKIX.EMPTY:
    740             raise AssertionError("internal dict error during lookup")

KeyError:

Files and jupyter notebook are located here: https://github.com/LimnoTech/HSPsquared/tree/develop-WaterQuality/tests/test10b/HSP2results

@aufdenkampe

Errors and Warnings in reading WDM File

The Test10 uci and WDM files are in DataSources folder. They should be in the TutorialData folder.

Once I move these files to right places, I get following warnings and errors.

C:\Dev\HSPsquared\HSP2tools\uciReader.py:447: FutureWarning: convert_objects is deprecated.  To re-infer data dtypes for object columns, use DataFrame.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  df = df.convert_objects(convert_numeric=True)
C:\Users\Anurag.Mishra\AppData\Local\Continuum\Anaconda3\envs\Python2\lib\site-packages\IPython\core\interactiveshell.py:2717: FutureWarning: get_store is deprecated and be removed in a future version
HDFStore(path, **kwargs) is the replacement
  interactivity=interactivity, compiler=compiler, result=result)
    uciReader is Done
Processing WDM file TutorialData/TEST.WDM

AttributeError                            Traceback (most recent call last)
<ipython-input-4-75fb1fef8ffe> in <module>()
      1 HSP2tools.makeH5()
      2 HSP2tools.readUCI(uciname, unpackedhdfname)
----> 3 HSP2tools.ReadWDM(wdmname, unpackedhdfname)
      4 get_ipython().system(u'ptrepack {unpackedhdfname}  TutorialData\\tutorial.h5')

C:\Dev\HSPsquared\HSP2tools\wdmReader.pyc in ReadWDM(wdmfile, hdffile, **options)
     43         m = re.search(pat, row.SVOLNO)
     44         key = int(m.group(2))
---> 45         if not WDM.exists_dsn(wdmname, key):
     46             continue
     47 

AttributeError: 'WDM' object has no attribute 'exists_dsn'

UCI import

I am attempting to import a UCI from an HSPF model and run it using HSPP as described in the Preview notebook. The UCI and WDM files import without error. However, when I try to run the model, I get the following error:

KeyError: 'No object named /RESULTS/RCHRES_R500/ROFLOW in the file'

Looking back to the UCI, the only pace where ROFLOW is referenced is in the Mass-Link Block:
RCHRES ROFLOW ROVOL 1 RCHRES INFLOW IVOL 1

My understanding is that this line is to instruct HSPF how to handle the passing of time series from on RCHRES to another, not to specify any output. Since they are the same module, they use the same units, and there are no conversions necessary. The actual linkages are setup in the SCHEMATIC Block.

Installing and Testing pip hsp2 execution on Linux

I wanted to document my testing of the new pip-based install (alternative to Anaconda) developed by @timcera . Install seems fine, but testing is now a challenge. I would love any pointers that can be given specifically on managing my expectations about what I should be seeing.

Install pip and package dependencies
Install runnable hsp2 command line tool via pip
Run a test UCI
Examine UCI test run output in hd5 file
Compare output to previous (maybe in ./tests/test10/HSPFresults/ ?)

Installation of pip and package

Get test branch: This branch made by @timcera -- Will replace with master after it is merged
- git clone https://github.com/timcera/HSPsquared/tree/develop
install hdf5
- The necessity of this step on Ubuntu is uncertain. There are apt packages for hdf5-tools, and there are existing executables named h5cc in the system prior to unpacking and installing this package, however, this did not seem sufficient to run test10, so I added these.
- download hdf5-1.12.1.tar.gz (need to go through their web sign-up, so no direct download link to use with wget)
- tar -xvf hdf5-1.12.1.tar.gz
- cd hdf5-1.12.1
- make all
- sudo make install
- sudo apt install hdf5
Install pip:
- sudo apt-get install pip
Install Python Dependencies (I had a brand new Ubtuntu 20.04/python and had to install some dependencies via pip first)
- pip install panda
- pip install tables==3.6.1
  - Needed the previous version of tables because the newest tables breaks numba
- pip install numba
Create hsp2 executable
- pip install .
Log out and log back in to get the python/pip path updated tp be able to run hsp2
Test hsp2 executable (see Testing below)

Testing

I used /opt/model/HSPsquared/HSP2notebooks/Data, though the run did not appear to complete to completion.
- mkdir test; cd test
- cp /HSPsquared/HSP2notebooks/Data/* ./
- hsp2 import_uci test10.uci test10.h5
- hsp2 run test10.h5
Comments below will be used to debug and/or verify the steps above

ViewRCHRES - change 'get_store' to 'HDFStore'

in ViewRCHRES the statement with pd.get_store(hdfname, mode='r') as store:'
results in the following message
'sys:1: FutureWarning: get_store is deprecated and be removed in a future version
HDFStore(path, **kwargs) is the replacement'

also, remove 'mode='r' in argument list as HDF5 file may already be open

Working environment for HSPsquared ?

What is the Working environment for HSPsquared?
Is it only workable in Python 2.7 condition?
Actually, I have installed Python 3.7.3 and when applying HSPsqured code as YouTube material presented, It shows error condition continuously.

Could you help me out what the problem is?
Attached is a pic of screenshot :)

Thanks.

Add Doc strings from HSPF descriptions

HSP2 is presently functionally identical to HSPF, with identical names for modules, sections, routines, and variables, and with identical water simulation algorithms. This provides both credibility and also access to the excellent HSPF documentation.

A goal for HSP2 is coupling with other models, which would be facilitated by adding detailed documentation strings to the code base, including adding long name aliases for the current <8 character Fortran names. In addition, HSP2 can be run from a variety of new interfaces not previously available to HSPF users.

The long term goal is to develop a complete suite of HSP2 manuals and tutorials through automated document generation using Sphinx or a similar document generation library and through tutorials and example code implemented in interactive Jupyter Notebooks.

This task is a first step toward those documentation goals, by copying long names and descriptions from the HSPF manual into HSP2 doc strings.

Error in Preview.ipynb, Step 15

The Preview Jupyter notebook throws an error on Cell 15:

df = HSP2.readPLTGEN(pltname)
df.head()

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-1f9059be498a> in <module>()
----> 1 df = HSP2.readPLTGEN(pltname)
2 df.head()

AttributeError: 'module' object has no attribute 'readPLTGEN'

Converting error from WDM to HDF5 file format with own dataset.

Hello !
Currently, I am trying to apply your code with my dataset.
So for now, I am trying to converting wdm file to HDF5 format.

After I applying my own dataset (as a test I used BASINS4 sample sediment dataset which is mostly larger dataset(56timeseries) than your test01 code(19timeseries)) with your ReadWDM module, I have got an error as below picture shows.

Point-1

Do you have any specific rules, when we are developing wdm file format to apply your HDF5 converter?

Point-2

Now I am reviewing your ReadWDM file code.
But until now I could not find any troubled line.
Eventhough some lines are questionable due to the some number you are using to develop the codes. But for now, I do not have detailed WDM DB specification for variables.
For reference, could you share the original WDM database table or column specifications for variables? Then It would be easier task I think.

If you know why it is coming up. Please let me know.
Thank you for developing this package and reading this question.

Jiheon Lee.

HSP2_CLI.py Parsing UCIs with WDM files indicated as "WDM1", "WDM2", ...

Maybe this is only relevant to the branch that was created by @timcera to allow Linux command line execution, but also appears related to #53 .

Summary: WDM files described in the FILES block and EXT SOURCES/TARGETS with a numerical suffix are not added to the hdf5 file. Example:

FILES
<FILE>  <UN#>***<----FILE NAME------------------------------------------------->
WDM1       21   ../../../input/scenario/climate/met/nldas1121/met_A51800.wdm
WDM2       22   ../../../input/scenario/climate/prad/p20211221/prad_A51800.wdm
WDM4       24   forA51800.wdm
MESSU      25   forA51800.ech
           26   forA51800.out
END FILES

My sense is that this may simply be a convention, not an actual part of the definition that HSPF expects, but nevertheless it appears to be somewhat common (I have seen it in the CBP files, as well as TMDL uci's from 2 different modeling groups).

The error arises in the code in HSP2_CLI.py which skips these elements, as the string test expects an exact match on "WDM", not "WDMn":

        if nline[:10].strip() == "WDM":

This ends up skipping the WDM in question, then, later when running the model, it fails because it can not find the DSN (error: ).
By isolating only the first 3 characters of the stripped string, the test for

By changing the conditional tothe below allows things to proceed (and my CBP UCI executes and appears to produce reasonable output, more on that later):

        if (nline[:10].strip())[:3] == "WDM":

Pull request submitted in #81

Handle irregular time series input

Presently, HSP2 can not handle irregular time series as inputs.

Although irregular time series inputs are not common for HSPF, @bcous has found a historical set of WDM files where the input time series started at 1 hour intervals and then switched to 15 minute intervals. HSPF resamples all inputs to the model time step immediately prior to a run, so everything works fine in HSPF.

With the recent successful Rewrite readWDM.py to read by data group & block #21, we can properly read these and all other tested WDM. However, as @ptomasula commented (LimnoTech#21 (comment)), running HSP2 on those inputs will throw an error if tsfreq == None:

~/Documents/Python/limno.HSPsquared/HSP2/utilities.py in transform(ts, name, how, siminfo)
     78         pass
     79     elif tsfreq == None:     # Sparse time base, frequency not defined
---> 80         ts = ts.reindex(siminfo['tbase']).ffill().bfill()
     81     elif how == 'SAME':
     82         ts = ts.resample(freq).ffill()  # tsfreq >= freq assumed, or bad user choice

KeyError: 'tbase'

We have several options to fix this:

Drop higher frequency data
Fill entire time series to highest frequency, but how?
- Fill options: NaN (or -999.0), previous value, interpolated value
Split into two time series
Modify HSP2.utilties.py code to handle it

We'll do option 1 in the short term (probably "manually"), but option 4 is probably the best long term fix.

This issue will track progress on option 4.

Potential next step...

Thanks for accepting the PR!

The next idea I have is big enough to warrant discussion before I implement something. What I would like to do is remove all of the dependent libraries, '/hspfbintoolbox*', '/mando*', '/tstoolbox*', and '/wdmtoolbox*' and list them in the 'install_requires' variable in the 'setup.py'.

This isn't a have to, I just think it is a bit cleaner approach. You do have to rely that I don't break something - but can still pin to an earlier tested version if needed.

Kindest regards,
Tim

Question on the code line in ReadWDM file.

Hello !~

Several days ago, I raise the question related to HDF5 file converter issue from WDM file.
And PaulDuda helped me out to tackle the issue.

But I have one more question related to that issue especially on the ReadWDM file code.
In the ReadWDM file, you are using the number 512 in the code line 37. ( I have attached the related pic with red line marker below)

It seems like that you are using 512, when you are identifying the times series starting index. For instance, first times series starts from index no. = 512, and second one starts from 512*n and so on. But here is the question.
*** Where does the number 512 come from?
It seems like that the no. comes from the WDM DB/ Table specification which defines the specifications including ID, DSN, and etc. for WDM DB variables. If that is the case Could you send me one of this or could you let me know the source of it?
Sorry for bothering you again. Even though, I have resolved the issue this time, I would like to understand what the code lines say exactly for further application.

I appreciate your help.
Thank you for reading the comments again !~

Best regards,
Jiheon Lee

Implement collaborative feature branch workflows

During the RESPEC-LimnoTech workshop to kickoff HSP2 collaboration (March 24-25, 2020), we decided to use a Feature Branch workflow similar to:

Our related decisions can be summarized as:

RESPEC's master branch will considered the official public facing release branch
- Only RESPEC will have control of this branch
RESPEC’s develop branch is the key development branch for all new features
LimnoTech pull updates to their fork at LimnoTech/HSPsquared
LimnoTech work in their forked repo, and will generally issue Pull Requests to /respec/HSPsquared/develop (and potential other feature branches).
- LimnoTech will be responsible for merge conflicts generated by any Pull Request
- LimnoTech may issue a Pull Request to Respec's master branch for updates to documentation
RESPEC & LimnoTech will work on new features in Feature branches created from develop

We set up some of this workflow with #29 and associated branch creation.

We also decided to use create GitHub Issues in RESPEC's repo for tracking shared objectives and tasks, to main an archive of progress & solutions. LimnoTech will use their issue tracker only for smaller granular task tracking that don't add value to cumulative documentation.

cc: @rheaphy, @PaulDudaRESPEC, @steveskrip, @ptomasula

Fixed limits on operation is all solved in HSP2?

As long as I know, fortran version HSPF has a limited no. of operation.
Therefore, there was a limitation on the application of LULC or Reach.
According to your program specification, there were no more fixed limits on operations.

" Restructure for maintainability, to remove fixed limits (e.g., operations, land use,
   parameters), and to maintain or improve execution time."

Then we don't need to construct the model separately
due to the functional limitation when applying HSP2.
Right?

Best regards.
Jiheon Lee.

Warning Message while processing RCHRES 'extrapolation of rchtab will take place' vanishes when repeating the run.

I am running a for loop to on HSP2.run method, problem is, it shows warning message

2018-04-03 18:44:14.34 Message count 2 Message HYDR: extrapolation of rchtab will take place
2018-04-03 18:44:14.34 Message count 1 Message HYDR: Solve did not converge

But when I rerun the same file again, these messages are vanished. What more worse is that sometimes the warning message is vanished on third run and not even on second run. I am using following code to run the HDF5 file.

def ROR8D(hdfname):      # getting calculated RO from HDF
    bf_p1 = pd.read_hdf(hdfname, 'RESULTS/RCHRES_R009/HYDR')['RO']
    outflow = bf_p1#*0.028316847     # convert cfs to cms
    OutflowR8D = outflow.resample('D').mean()
    #print('now total outflow is {}'.format(sum(OutflowR8D)))
    if isinf(sum(OutflowR8D)):         # check if any value of calculated RO is infinity, 
        print(sum(OutflowR8D[1:]))
        print('infinity aa gya hai')
        #sys.exit()
        OutflowR8D = OutflowR8D.replace([np.inf, -np.inf], 0.0)  # replacing infinity values with 0.0
        print(sum(OutflowR8D))
    return OutflowR8D

def ObjectiveFunction(xx):    #xx is a list of input parameters

    #changing parameters 
    df2 = pd.read_hdf(hdfname, '/PERLND/PWATER/PARAMETERS') 
    print('sum of all input parameters is {0:10.7f}'.format(sum(xx)))    
    df2.LZSN   = xx[0]
    df2.INFILT = xx[1]
    df2.KVARY  = xx[2]
    df2.AGWRC  = xx[3]
    df2.DEEPFR = xx[4]
    df2.BASETP = xx[5]
    df2.AGWETP = xx[6]
    df2.CEPSC  = xx[7]
    df2.UZSN   = xx[8]
    df2.INTFW  = xx[9]
    df2.IRC    = xx[10]
    df2.LZETP  = xx[11]
    df2.NSUR   = xx[12]


    df2.to_hdf(hdfname, '/PERLND/PWATER/PARAMETERS')    

    HSP2.run(hdfname, saveall=True)

    OutflowR8D = ROR8D(hdfname)
    TotalFlow = sum(OutflowR8D)
    #print('\n total flow is {} '.format(TotalFlow))

    while isnan(TotalFlow):                     # rerunning if run gave errors in first run
        print('             ...........Rerunning............. ')
        HSP2.run(hdfname, saveall=True)

        OutflowR8D = ROR8D(hdfname)
        TotalFlow = sum(OutflowR8D)`

KeyError Handling needed for IOManager

@sjordan29's testing in PR #73 revealed that requesting a timeseries from the IOManager class that does not exist results in a KeyError causing the application to stop executing. I think the behavior should be to return an empty pandas.DataFrame and possibly raising a warning, but not stop execution.

Complete OXRX, NUTRX, PLANK PHCARB modules w/ class-based code

Fully porting and testing these detailed WQ sections for the surface water module (RCHRES) would complete the port of surface water quality capabilities from HSPF.

These highly integrated modules for dissolved oxygen, nutrients, plankton, and the carbonate buffering system are challenging because of many interdependencies within a tilmestep.

To facilitate the passing of attributes and values among these modules, @PaulDudaRESPEC, @tredder75, @ptomasula, and I propose to migrate these code sections from functional-programming coding structures inherited from HSPF to object-oriented class structures used in modern programming, to enable inheritance of attributes and methods by objects and the passing of attributes among objects.

@tredder75 has begun this process with LimnoTech#38 and LimnoTech#39.

@tredder75 is also numba-fying these classes as he develops them, adding Numba just-in-time compiling to low level functions that operate many, many times during an HSP2 model run. This also helps toward our overall performance goals (#36).

We hope that the new class-based coding approach and specific WQ classes that we are implementing here will effectively serve as templates for converting the other modules to class-based code.

@jlkittle, this will get us moving toward the ability call attributes and methods using the Python "dot" syntax!

Linux install issue

I am installing hsp2 on a Linux machine (CentOS v. 7) using the conda installation procedure with python 3.8. I created a custom conda environment, and ran the the conda-develop command to add the path to the directory with the HSPsquared download. However, when I try to run hsp2 within the custom environment, it's not finding the command. I checked in the site-packages folder, and the conda.pth file was created with the path. Any ideas on how to resolve this?

Create `setup.py` for `pip install HSPsquared`

I know everybody and their uncle want me to use conda, but on Linux conda will overlay libraries that it installs in-place of system libraries. That isn't the way that I want things handled. I use package management for the system libraries (not conda) and I use pip for python and they don't overlap.

Pip and conda can coexist, because all that pip needs is a setup.py. I had contributed a setup.py to HSPsquared some time ago and it was deleted. Would a resurrected setup.py be accepted as a pull request?

Another advantage to having a setup.py is that it is also used to upload packages to the Python Package Index (https://pypi.org/) where HSPsquared could be available to anyone in the world by using "pip install HSPsquared".

Kind regards,
Tim

error in calculating water balance using hperwat function

I am using hperwat function to simulate processes in a simple PERLND and calculate water balance using following lines of code

import pandas as pd
import numpy as np

np.random.seed(seed=77)

from perwat import pwater

tindex = pd.date_range('20110101', '20111210', freq='H')

general = {
    'sim_len': len(tindex),
    'tindex': tindex,
    'sim_delt':60  # number of minutes in time step
}

ui = {
    'CSNOFG': 0,
    'RTOPFG': 1,  #
    'UZFG': 0,   # algorithm used with `0` causes a lot of surface runoff (thus higher pero) at the expense of uzet
    'VCSFG': 0,
    'VLEFG': 0,
    'IFFCFG': 0,
    'IFRDFG': 0,

    'FOREST': 0.0,
    'LZSN': 0.25,
    'INFILT': 1.0,  #  index to the infiltration capacity of the soil.
    'LSUR': 1.0,
    'SLSUR': 1.5,
    'KVARY': 1.0, # affects the behavior of groundwater recession flow, enabling it to be non-exponential in its decay with time.
    'AGWRC': 0.4, # basic groundwater recession rate if KVARY is zero and there is no inflow to groundwater; AGWRC is defined as the rate of flow today divided by the rate of flow yesterday.

    'PETMAX': 4.0,
    'PETMIN': 1.7,
    'INFEXP': 2.0,
    'INFILD': 2.0,
    'DEEPFR': 0.1,  # fraction of groundwater inflow which will enter deep (inactive) groundwater, and, thus, be lost from the system as it is defined in HSPF
    'BASETP': 0.7,  # fraction of remaining potential E-T which can be satisfied from baseflow (groundwater outflow), if enough is available.
    'AGWETP': 0.5,

    'FZG': 0.0394,
    'FZGL': 0.1,
    'LTINFw': 1.0,
    'LTINFb': 0.0,

    'CEPS': 0.0,
    'CEPSC': 0.0,  # interception storage capacity for this canopy layer. This parameter is only used if VCSFG = 0
    'SURS': 0.0,  #  initial interception storage for this canopy layer.
    'UZS': 0.0,
    'UZSN': 0.025,
    'IFWS': 0.0,  # initial interflow storage.
    'LZS': 0.025,  # initial lower zone storage
    'AGWS': 0.0,
    'GWVS': 0.0,

    'NSUR': 0.1,
    'INTFW': 0.0,
    'IRC': 0.25,
    'LZETP': 0.0,

    'VIFWFG': 0,
    'VIRCFG': 0.1,
    'VNNFG': 0,
    'VUZFG': 0,

    'HWTFG': False

}

prec = np.random.random(len(tindex))
ts = {
    'PREC': prec,
    'PETINP': np.add(prec, 0.5)   # Input potential E-T
}

errorV, errorM = pwater(general, ui, ts)



ifwi = np.sum(ts['IFWI'])
deepfr = np.sum(ts['DEEPFR'])
pet = np.sum(ts['PET'])



# Input
_in = np.sum(ts['SUPY'])

# evapotranspiration
cepe = np.sum(ts['CEPE'])
uzet = np.sum(ts['UZET'])
lzet = np.sum(ts['LZET'])
agwet = np.sum(ts['AGWET'])
baset = np.sum(ts['BASET'])
_taet = cepe +  uzet + lzet + agwet + baset
taet = np.sum(ts['TAET'])
d_et = taet-_taet
if -1e-4 > d_et>1e-4:
    print('Problem in ET balance')
print('{:<10} {:<10} {:<10} {:<10} {:<10} {:<10}'.format('cepe', 'uzet', 'lzet', 'agwet', 'baset', 'taet'))
print('{:<10.3f} {:<10.3f} {:<10.3f} {:<10.3f} {:<10.3f} {:<10.3f}'.format(cepe, uzet, lzet, agwet, baset, taet))

# Outflow
suro = np.sum(ts['SURO'])
ifwo = np.sum(ts['IFWO'])
agwo = np.sum(ts['AGWO'])
igwi = np.sum(ts['IGWI'])
pero = np.sum(ts['PERO'])
_pero = ifwo + agwo + igwi + suro
d_pero = pero-_pero
print('')
if -1e-4 > d_pero or d_pero>1e-4:
    print('Problem in balance of outflow from PERLND')
print('{:<10.6} {:<10.6} {:<10.6} {:<10.6} {:<10.6}'.format('ifwo', 'agwo', 'igwi', 'suro', 'pero'))
print('{:<10.3f} {:<10.3f} {:<10.3f} {:<10.3f} {:<10.3f}'.format(ifwo, agwo, igwi, suro, pero))

# Total storage
ceps = np.sum(ts['CEPS'])   # Interception storage (for each
surs = np.sum(ts['SURS'])   # Surface (overland flow) storage
uzs = np.sum(ts['UZS'])
ifws = np.sum(ts['IFWS'])
lzs = np.sum(ts['LZS'])
agws = np.sum(ts['AGWS'])   # Active groundwater storage
tgws = np.sum(ts['TGWS'])   # Total groundwater storage
pers = np.sum(ts['PERS'])   # Total water stored in the PLS
_pers = ceps + surs + uzs + ifws + lzs + agws + tgws
d_pers = pers-_pers
print('')
if -5 > d_pers > 5:
    print('Probelem in Storage balance')
print('{:<10.6} {:<10.6} {:<10.6} {:<10.6} {:<10.6} {:<10.6} {:<10.6} {:<10.6}'
      .format('ceps', 'surs', 'uzs', 'ifws', 'lzs', 'agws', 'tgws', 'pers'))
print('{:<10.3f} {:<10.3f} {:<10.3f} {:<10.3f} {:<10.3f} {:<10.3f} {:<10.3f} {:<10.3f} '
      .format(ceps, surs, uzs, ifws, lzs, agws, tgws, pers))

print('\nTOTAL WATER BALANCE')
print('{:<18} {:<18} {:<18}'.format('Preciptation', 'Evapotranspiration', 'Total Outflow'))
print('{:<18.3f} {:<18.3f} {:<18.3f} '.format(_in, taet, pero))

I am getting following outputs from this code

cepe       uzet       lzet       agwet      baset      taet      
0.000      465.502    0.000      1018.299   0.000      1483.801  
Problem in balance of outflow from PERLND
ifwo       agwo       igwi       suro       pero      
0.000      0.000      53.595     3073.925   3073.925  
ceps       surs       uzs        ifws       lzs        agws       tgws       pers      
0.000      0.000      0.044      0.000      7885.728   0.000      0.000      7885.772   

TOTAL WATER BALANCE
Preciptation       Evapotranspiration Total Outflow     
4076.324           1483.801           3073.925

which is not correct. Can you tell me where I am making mistake or is this the error in code?

qrid.enable() is not an available command

I get the following error when I run

Required Python imports and setup

cell.

AttributeError: 'module' object has no attribute 'enable'

Running HSP2 from console in Linux

I am trying to explore the possibility of running hsp2 from the console in Linux -- anyone done this? I am thinking similar to how the old hspf version would be run where one supplies a uci name and execution flows from there.

!ptrepack doesn't like absolute paths

This has been reported by somebody previously on stackoverflow.

https://stackoverflow.com/questions/43959585/pytables-ptrepack-cmd-not-accepting-absolute-path

Thanks
~A

Reading Multiple WMD Files Containing Timeseries with Conflicting DSN Assignments

@PaulDudaRESPEC @TongZhai @aufdenkampe @bcous
In trying to get Brendan's more complex WQ model to run we identified an issue with how UCI and WDM Readers handle multiple files that have timeseries with overlapping DSN values. This model contains 4 separate WDM files, some of which have conflicting DSNs between the timeseries. Presently the WDM reader overwrites timeseries with the same DSN with the most recently read timeseries.

HSPF appears to get around this issue by using the UCI file and its FILES specification. Later in the EXT SOURCES specification those file names (i.e. WDM1, WDM2) are used to distinguish between timeseries with the same DSN in different files. The UCI reader and WDM reader need to be expanded to capture and support this file naming.

It actually looks like this might have original been supported. Notably the parseD function in the ReadUCI reads the file name and returns it as part of the dictionary under the SVOL key. However it looks like line 416 then overrides that filename specification for an asterisk. I think removing that override should restore support when ReadUCI. Also the main.get_timeseries function already looks to have logic to read timeseries when the SVOL parameter is populated.

I propose the following as a path here to restore support for multiple WMDs:

Remove the override of SVOL in ReadUCI
Add logic in ReadUCI which also reads and writes the FILES specification to the ReadUCI
Modify ReadWDM to read the file specification table, and if the input WDM filename matches a file in that specification write the timeseries to the SVOL/TS### key to be consistent with the get_timeseries function.

This approach will still allow users to read WDM files independently of the UCI file so we would retain the ability to read and view WMD files without running the model. However for model execution we'd now need to read UCI files first and then read WDM file(s). I'm curious if you all have any thoughts or alternative suggestions on how to best address these conflicting DSN values. If we are comfortable with the proposed solution I can start working on it in a feature branch.

Small differences in numeric precision lead to test run showing benthic algae differences

For a particular test case, in the benthic algae component of PLANK, the computed benthic algae density increases with each time step until it crosses the threshold set by input parameter MBAL in table BENAL-PARM, as expected. Crossing this threshold occurs one time step later in HSP2 than it did in HSPF. After closer examination, it is apparent the differences in benthic algae density are within the expected numeric precision, but the small difference is enough to affect the occurrence of overcrowding by one time step.

This issue focuses a possible shortcoming of the testing code, where comparing each value at each time step can provide misleading indications of non-matching results.

Improve readUCI & readWDM for a broader range of valid files

This spring @steveskrip noticed that many UCI files successfully used by LimnoTech with HSPF (and created by LimnoTech's WinModel package) would not import with readUCI.

@rheaphy also noted that there might be time issues in UCI files, because HSPF doesn't really correctly manage time and for HSP2, we're using ISO time standards that track leap seconds and time zones.

Let's use this issue thread to track @rheaphy's work to improve readUCI, and our results with testing it.

Is it a bug when using SCHEMATIC and MASS-LINKs to link two RCHRES which has more than 2 exists?

Error running intro script

I am trying to run through the Intro_toHSP2.ipynb. I am able to link all directories/paths. I am able to process the HSPF inputs successfully

readUCI(input_uci_path, output_hdf5_path)....runs successfully
readWDM(input_wdm_path, output_hdf5_path)...runs successfully

However when I try to run the HSP2 simulation for Test10:
(main(output_hdf5_path, saveall=True)

I get : AttributeError: 'PosixPath' object has no attribute 'read_uci"

Any I dea how to resolve this error?

Parse UCI without SCHEMATIC and NETWORK blocks

I am testing with a UCI that lacks both SCHEMATIC & NETWORK (it is from a version of the Chesapeake Bay model land simulation which simply routes output to a UCI via EXT TARGETS block to be later run in a separate river-only UCI). This fails on uci import when calling the panda function concat since both net and sc remain as None. I can get the UCI import to complete by testing for if not ( (net is None) and (sc is None) ): before running the linkage output to the hdf, but wondering if anyone sees a problem with this before I go forward with forking and adding a pull request to the new code.

Basically, I swap out the original (near line 160 in readUCI.py):

        linkage = concat((net, sc), ignore_index=True, sort=True)
        for cname in colnames:
            if cname not in linkage.columns:
                linkage[cname] = ''
        linkage = linkage.sort_values(by=['TVOLNO']).replace('na','')
        linkage.to_hdf(store, '/CONTROL/LINKS', data_columns=True)

For this:

        if not ( (net is None) and (sc is None) ):
            linkage = concat((net, sc), ignore_index=True, sort=True)
            for cname in colnames:
               if cname not in linkage.columns:
                    linkage[cname] = ''
            linkage = linkage.sort_values(by=['TVOLNO']).replace('na','')
            linkage.to_hdf(store, '/CONTROL/LINKS', data_columns=True)

And so, in sum, no /CONTROL/LINKS gets written to the hd5.

GQUAL PHFLAG = 1 and PHYTFG = 1 not available for computed timeseries

In GQUAL, the flags PHFLAG and PHYTFG can be used to indicate that input data to the module should come from timeseries computed in RQUAL modules PHYTO and PHCARB. In HSPF, the values of these state variables from the previous time step are available and used. However, because of differences in the time looping in HSP2, those computed time series are not yet available when needed.

A possible solution to this issue would be to change the sequence of simulated sections, so that RQUAL sections are computed before GQUAL.

possible bug in demand function

There is a possible bug in 'demand' function in hrchhyd.py. Try running 'demand' function without jit with following input

vol = 0.0
rowFT = np.array([0.,   0.01, 0.,   0.,  ])
funct = (1,)
nexits = 1
delts = 3600.0
convf = 1.0
colind = [4.2]
outdgt = [2.1]
ODGTF = (0,)

and it will throw the error

  File "E:/debug/test.py", line 51, in demand2
    od[i] = _od1 + diff * (_od1 - rowFT[icol]) * convf                #$2356 

IndexError: index 4 is out of bounds for axis 0 with size 4

The problem with numba is that when we try to access a non-existent value in an array, it just gives a junk value like 1e-313 etc.

Colby sand load method in SEDTRN returns NaN in Anaconda python 3.9.7

In the Colby sand load method down in SEDTRN, the log10 function returns NaN in this code, when using the Anaconda python 3.9.7. I've verified that it returns a good value in python 3.7, and it even returns a good number when I comment out the ‘njit’ declaration. It’s just the numba compiled version that returns the NaN. To make it even more perplexing, when I add the print statement below (which I did as I was trying to debug), it works just fine in all versions, even with numba.

Units not displayed when outputting a section

For example, I used the following command.

FlowReach20=pd.read_hdf(OutHDF5, '/RESULTS/RCHRES_R020/HYDR')
FlowReach20

The output looks like this.

The label of the values must contain the units to avoid any confusion.