mroberge / hydrofunctions Goto Github PK
View Code? Open in Web Editor NEWA suite of convenience functions for working with hydrology data in an interactive Python session.
License: MIT License
A suite of convenience functions for working with hydrology data in an interactive Python session.
License: MIT License
There are two changes to setup.py that need to be made:
If you make an unusually large request for data, or if the NWIS takes a long time to respond for some other reason, the user will not get a message until there is a response. The url should be printed immediately, and say, "Requesting data from ..." instead of "Requested data from ..."
in get_nwis, swap the order of the print statement and the call to Requests.
Requesting data from a non-existent site will return a status code of 200, so everything is fine, you would think. But instead of returning an RDB table, you get a webpage with an error.
This link is for a non-existent site, 01581000:
https://waterdata.usgs.gov/pa/nwis/measurements?site_no=01581000&agency_cd=USGS&format=rdb_expanded
hf.field_meas('01581000')
Create a system for requesting data from the USGS that also handles errors.
Check for non-200 errors
Check the text in 200 codes for a <title>USGS NwisWeb error message</title>
hf.__version__
Out[46]: '0.2.0'
Python 3.7.10
Date parsing Error for Downloads peak data for usgs gage no. 06813500
hf.peaks('06813500')
Retrieving annual peak discharges for site # 06813500 from https://nwis.waterdata.usgs.gov/nwis/peak?site_no=06813500&agency_cd=USGS&format=rdb
Traceback (most recent call last):
File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 655, in parse
ret = self._build_naive(res, default)
File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 1241, in _build_naive
naive = default.replace(**repl)
ValueError: month must be in 1..12
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "pandas\_libs\tslib.pyx", line 514, in pandas._libs.tslib.array_to_datetime
File "pandas\_libs\tslibs\parsing.pyx", line 243, in pandas._libs.tslibs.parsing.parse_datetime_string
File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 1374, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 657, in parse
six.raise_from(ParserError(e.args[0] + ": %s", timestr), e)
File "<string>", line 3, in raise_from
ParserError: month must be in 1..12: 1881-00-00
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pandas\_libs\tslib.pyx", line 525, in pandas._libs.tslib.array_to_datetime
TypeError: invalid string coercion to datetime
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 655, in parse
ret = self._build_naive(res, default)
File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 1241, in _build_naive
naive = default.replace(**repl)
ValueError: month must be in 1..12
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<ipython-input-45-74616d452e0e>", line 1, in <module>
hf.peaks('06813500')
File "C:\Anaconda3\envs\py37\lib\site-packages\hydrofunctions\usgs_rdb.py", line 384, in peaks
outputDF.peak_dt = pd.to_datetime(outputDF.peak_dt)
File "C:\Anaconda3\envs\py37\lib\site-packages\pandas\core\tools\datetimes.py", line 805, in to_datetime
values = convert_listlike(arg._values, format)
File "C:\Anaconda3\envs\py37\lib\site-packages\pandas\core\tools\datetimes.py", line 472, in _convert_listlike_datetimes
allow_object=True,
File "C:\Anaconda3\envs\py37\lib\site-packages\pandas\core\arrays\datetimes.py", line 2090, in objects_to_datetime64ns
raise e
File "C:\Anaconda3\envs\py37\lib\site-packages\pandas\core\arrays\datetimes.py", line 2081, in objects_to_datetime64ns
require_iso8601=require_iso8601,
File "pandas\_libs\tslib.pyx", line 364, in pandas._libs.tslib.array_to_datetime
File "pandas\_libs\tslib.pyx", line 591, in pandas._libs.tslib.array_to_datetime
File "pandas\_libs\tslib.pyx", line 726, in pandas._libs.tslib.array_to_datetime_object
File "pandas\_libs\tslib.pyx", line 717, in pandas._libs.tslib.array_to_datetime_object
File "pandas\_libs\tslibs\parsing.pyx", line 243, in pandas._libs.tslibs.parsing.parse_datetime_string
File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 1374, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 657, in parse
six.raise_from(ParserError(e.args[0] + ": %s", timestr), e)
File "<string>", line 3, in raise_from
ParserError: month must be in 1..12: 1881-00-00
.get_data()
and store it in something like NWIS.df
?
herring
is a different type of object, and it is not immediately clear what is going on:
herring = hf.NWIS('0158520', period='P10D').get_data().df() # herring is a dataframe
herring = hf.NWIS('0158520', period='P10D').get_data() # herring is an instance of NWIS
When making a general data request using the 'NWIS' function, I could not differentiate between sensor/parameter code combinations and ultimately unable to retrieve a full dataset for a location. More specifically, I am looking at a site with multiple instances of a parameter code and the NWIS request cannot specify between these. For example, if you refer to USGS site 444306122144600 the measurements of a parameter (say Turbidity / 63680) are taken from different sensors. When querying using the parameter code '63680' there is no way to get either or both instances.
Here is a basic query using the above mentioned specifications:
data = hf.NWIS(site = '444306122144600', service = 'iv', parameterCd = '63680', start_date = '2020-03-01', end_date = '2020-10-15')
Here is a link to the same query via the web interface:
https://nwis.waterdata.usgs.gov/nwis/uv?cb_32295=on&cb_32316=on&cb_32319=on&cb_63680=on&format=gif_stats&site_no=444306122144600&period=&begin_date=2020-03-01&end_date=2020-07-20
Note the multiple instances of the WQ parameters.
I wrote some code for hydrofunctions a year ago that worked in getting some processed dataframes with hydrofunctions. I ran it again today and found that the code no longer works. The problem lies in the extract_nwis_df function. It used to return just a dataframe but now it returns a tuple with a df and a dictionary. In one instance it also returned 4 more columns than I called. This might of been a separate issue. I found a work around by using this subsettting the tuple with [0]. Is there a more elegant way to fix this workflow?
def create_df(site, start, end):
# YOUR CODE HERE
"""Creates a Panadas DataFrame with data
downloaded from NWIS using hydrofucntions.
Renames columns containing discharge and
qualification codes informaiton to "discharge" and
"flag", respectively. Creates a "siteName", "latitude",
and "longitude" columns. Outputs the new dataframe.
Parameters
----------
site : str
The stream gauge site number.
start : str
The start date as (YYYY-MM-DD) of time period of interest.
end : str
The end date as (YYYY-MM-DD) of time period of interest.
Returns
-------
discharge : Pandas DataFrame
Returns a dataframe containing date discharge, qualification
codes, site name, and latitdue and longitude data
"""
# Response from site
parameterCd = ["00065", "00060"]
resp = hf.get_nwis(site, "dv", start, end).json()
# Extract values to a pandas dataframe
discharge = hf.extract_nwis_df(resp)
# Rename columns
discharge.columns = ["discharge", "flag", 'stage', 'flag']
# Create sitename column
site_name = hf.get_nwis_property(resp, key="siteName")[0]
discharge['siteName'] = site_name
# Create lat and long column
geoloc = hf.get_nwis_property(resp, key="geoLocation")[0]["geogLocation"]
lat = geoloc["latitude"]
long = geoloc["longitude"]
discharge["latitude"] = lat
discharge["longitude"] = long
return discharge
site = ["06479215","06479438","06479500","06479525","06479770","06480000"]
start = "2018-01-01"
end = "2020-12-01"
temp_list = []
for i in site:
df = create_df(i, start, end)
temp_list.append(df)
stream_gage_df = pd.concat(temp_list)
stream_gage_df
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
C:\Users\NICK~1.LAM\AppData\Local\Temp/ipykernel_20488/4087259266.py in <module>
5
6 for i in site:
----> 7 df = create_df(i, start, end)
8 temp_list.append(df)
9
C:\Users\NICK~1.LAM\AppData\Local\Temp/ipykernel_20488/3569674067.py in create_df(site, start, end)
36
37 # Rename columns
---> 38 discharge.columns = ["discharge", "flag", 'stage', 'flag']
39
40 # Create sitename column
AttributeError: 'tuple' object has no attribute 'columns'
If you don't specify a start date or a period, then you will request only the most recent reading, like so:
hf.NWIS('015412000', 'dv')
This only requests the most recent daily mean value. Since there is only one row of data, and only one time-stamp, calc_freq will have trouble figuring out the frequency, and will assign it a freq of 15 minutes.
Instead, the method of last resort for calc_freq should check to see if the data are daily mean or instantaneous. You can get this information in two ways:
It might also be possible to check the meta data at the start of the json file.
Create a method for NWIS that allows user to save the dataframe as a local file. The current internal data structure could be saved as a parquet file, which does a great job compressing the data and working quickly.
Additionally, you could have the NWIS.get_data() method check for a file with the correct name before submitting a request to the NWIS. This would act as a cache, so that Jupyter notebooks that get re-run multiple times don't keep submitting the same request over and over.
Get DataFrame of discharge data from a gage. Many gages without discharge return a valid DataFrame. They should raise a warning/exception if one is trying to get discharge from a gage without that type of record.
import hydrofunctions as hf
start, end = '1984-01-01', '2020-12-31'
sid = '05015500'
nwis = hf.NWIS(sid, 'dv', start_date=start, end_date=end)
df = nwis.df('discharge')
print(list(df.columns))
['USGS:05015500:62614:00003', 'USGS:05015500:62614:32400']
The columns are not discharge, but 62614 data (reservoir sruface elevation). This happens with many non-discharge parameters at many gages.
I needed to add a test for discharge data:
if not np.any(['00060' in x for x in list(df.columns)]):
print('no q data at {}'.format(sid))
If you don't specify a start date or a period, the NWIS will return the most recent data for your sites. This produces two problems:
new = hf.NWIS(stateCd='MD')
new.df()
The most important parameters that define the data request from the NWIS should be stored as properties of the NWIS object. These are:
These values should be parsed and stored as a string or a list of strings, and converted to a comma-delimited list when .get_data() is performed, or .get_nwis() runs.
Use the @property
decorator to create getter methods and make this read-only.
If you request a USGS data table for a sure that doesn't exist, you get a nonsense error that is hard for a beginner to figure out.
hf.rating_curve('666')
NWIS.df('discharge', 'stage')
should return both a discharge and a stage column. Instead, it only returns stage.
NWIS.df('siteA', 'siteB)
should return columns for site A and B, but it only returns site B.
NWIS.df('data', 'flags')
should return both data & the qualifier flags, but it doesn't. It only returns the flags.
the sites, parameters, and metadata should each use an "OR" to process each new argument, so that the new set of columns returns everything that we are currently searching for, plus the latest argument.
I feel like I am not explaining this very well.
I'm going to add a test for the behavior that I want, and then try out my idea for how to solve this.
In response to this issue DOI-USGS/dataretrieval-python#8 and comments from @emiliom @jkreft-usgs @DanCodigaMWRA
There are several open source software projects that allow you to request, parse, and analyze hydrology data from the USGS NWIS. Why are there overlapping projects? My guess is that it is a combination of scientists writing code to meet their very specific research needs, people creating projects without searching for what already exists, and because sometimes people feel uncomfortable trying to work with people they don't know yet. I'd like to work with the maintainers of other projects to eliminate some of the redundancy and improve the cooperation.
My name is Martin Roberge, and I'm the author of hydrofunctions. I do research on stream hydrology and I'm an educator. I made hydrofunctions to meet my specific needs: I download lots of stream gauge data from the USGS which I then analyze inside Pandas dataframes in a Jupyter notebook. Since most of my students do not come from programming backgrounds, I have spent most of my time trying to make hydrofunctions easy for beginners to use.
My main goals for hydrofunctions are:
The problem is that I am just one person, and every hour I spend adding functionality to hydrofunctions is an hour I could have spent measuring how fast flood waves travel down a river, or whatever I'm up to that day. I would love to collaborate with someone else.
Other projects that work with NWIS data are:
ulmo.usgs.nwis.get_site_data()
is the function that requests stream gauge data. Ulmo processes the original WaterML and returns a dictionary that needs further processing to use the data in a dataframe. It can be finicky when you are requesting stream gauge data, and I can't always figure out what is wrong with my requests. Emilio Mayorga @emiliom is the lead developer now.Please let me know if anyone thinks that I have mischaracterized their project.
I would love to hear your opinion about how these different projects could collaborate or how we could 'stake out ground' so that we don't replicate functionality. Why re-invent the wheel?
Sometimes requests to NWIS caused an error to occur in the parsing of the response.json into dataframes, I think.
My guess is that the NWIS returned a response.json = []
or None
(I'm guessing) which resulted in parsing errors. The one time that I managed to capture the response and examine it, I had received a status code of 400: a bad request.
The problem is that I'm not sure what caused the early errors, because I couldn't or didn't capture the response, and it seemed like the error happened intermittently.
I can now get the error every time by doing something like making the start date occur after the end date. Most of the other errors get caught by the typing module and functions.
In response, I changed hf.get_nwis()
to return the response if it gets a status code of 200, and to raise an error for every other status_code after saving the response to NWIS.response
.
Please revisit this Issue if the errors happen again!
hf.map()
is on the front page MD file, but it is not one of the functions in the package.
Either:
This request:
request = hf.NWIS(['01585200', '01646502'], 'iv', parameterCd=['00065','00060'],period='P1D').get_data()
results in this URL:
https://waterservices.usgs.gov/nwis/iv/?format=json%2C1.1&sites=01585200%2C01646502¶meterCd=00065¶meterCd=00060&period=P1D
Note that 'parameterCd' is list twice.
This results in an error from the USGS.
I'm trying to import NWIS data, just as in the sample code, and an error is returned from the extract_nwis_df function. Any ideas what the problem might be?
import hydrofunctions as hf
site = '01646000'
start_date = '2017-10-01'
end_date = '2018-10-01'
response = hf.get_nwis(site,'iv', start_date,end_date,parameterCd='00060')
response.json()
hf.extract_nwis_df(response)
=========================================
Traceback (most recent call last):
File "<ipython-input-1-01f3848c24e4>", line 15, in <module>
hf.extract_nwis_df(response)
File "C:\Users\dgbli\Anaconda3\lib\site-packages\hydrofunctions\hydrofunctions.py", line 277, in extract_nwis_df
ts = nwis_dict['value']['timeSeries']
TypeError: 'Response' object is not subscriptable
A hydrofunctions_testing.log
is created when importing hydrofunctions
into a jupyter cell or python script. what is the source of this log? Can it be suppressed?
In jupyter cell:
import hydrofunctioons
From terminal:
(hydro) mlr@mint-box:~/git/hydro-tools$ ls | grep hydro*.log
hydrofunctions_testing.log
%matplotlib inline
resulted in a
ModuleNotFoundError: No module named 'matplotlib'
Add matplotlib to the list of requirements. Perhaps add seaborn and bokeh too.
If you make a request without specifying a period or a date, you will get the most recent data. Sometimes, the most recent data is very old. Sometimes, one parameter is very old, and another is very recent.
Hydrofunctions has no way of figuring out the frequency of a table that only has one row, so it assumes that the frequency is 15 minutes.
It would be better to assume the frequency is 1 day for 'dv', and 15 minutes for 'iv'.
Better still, it shouldn't assign a frequency to these types of request. If it does, and then attempts to interpolate between two different dates, you can get some crazy results.
problem = hf.NWIS('01541200')
problem.df().shape
>>>(2017441, 4)
This crazy site collected temperature for the last time in 1961, and it also has stream discharge for right now. So hydrofunctions set the frequency to 15 minutes and created a dataframe with a row for every 15 minutes from 1961 to 2019.
Sigh.
The current data format is to put everything into a single large dataframe, with each station getting two columns, one for discharge, one for data flags. Odd columns contain data from different stations, Even columns contain the corresponding data flags for that site.
The PROBLEM with adding all of these dataframes to a single large dataframe using pd.concat is that sites that collect less frequently will get padded with NANs for all of the time indecies that they don't have data for.
A second problem is that every other column will have data, and the the other columns will have flags. There is no simple way to select only the data columns except to take the odd numbered columns.
A POSSIBLE SOLUTION: create a data structure that is composed of stacked dataframes. Each data frame will correspond to a single site. The first column will correspond to discharge, the second to flags, and any others can be derived values like baseflow or other measured parameters. The dataframes will be stacked, and be part of an object that allows you to select by a range of dates, by sites, and by the type of column. In this respect, it might be similar to XArray, except that package requires their n-dimensional structures to all be the same datatype.
hydrofunctions.py extract_nwis_df(response_obj):
is where the USGS json gets processed. Correcting this would be a relatively simple fix: you would simply duplicate this function and have it collect all of the dataframes into an array instead of doing a pd.concat() at the end with each new dataframe.
I tried to get the streamflow data for PA but when I tried to make a dataframe I got the following error:
Shape of passed values is (368, 546), indices imply (366, 546)
It works fine for other states though only PA.
import hydrofunctions as hf
start = "2017-01-01"
end = "2017-12-31"
request = hf.NWIS(None, "dv", start, end, stateCd='PA').get_data()
request.df()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-f4a9c8304bcf> in <module>
1 request = hf.NWIS(None, "dv", start, end, stateCd='PA').get_data()
----> 2 request.df()
~/anaconda/envs/hydro/lib/python3.7/site-packages/hydrofunctions/station.py in <lambda>()
165 self.json = lambda: self.response.json()
166 # set self.df without calling it.
--> 167 self.df = lambda: hf.extract_nwis_df(self.json())
168
169 # Another option might be to do this:
~/anaconda/envs/hydro/lib/python3.7/site-packages/hydrofunctions/hydrofunctions.py in extract_nwis_df(nwis_dict)
362 # except that package requires their n-dimensional structures to all be
363 # the same datatype.
--> 364 DF = pd.concat([DF, dfa], axis=1)
365
366 # replace missing values in the dataframe
~/.local/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
227 verify_integrity=verify_integrity,
228 copy=copy, sort=sort)
--> 229 return op.get_result()
230
231
~/.local/lib/python3.7/site-packages/pandas/core/reshape/concat.py in get_result(self)
424 new_data = concatenate_block_managers(
425 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 426 copy=self.copy)
427 if not self.copy:
428 new_data._consolidate_inplace()
~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
2063 blocks.append(b)
2064
-> 2065 return BlockManager(blocks, axes)
~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in __init__(self, blocks, axes, do_integrity_check)
112
113 if do_integrity_check:
--> 114 self._verify_integrity()
115
116 self._consolidate_check()
~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in _verify_integrity(self)
309 for block in self.blocks:
310 if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
--> 311 construction_error(tot_items, block.shape[1:], self.axes)
312 if len(self.items) != tot_items:
313 raise AssertionError('Number of manager items must equal union of '
~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in construction_error(tot_items, block_shape, axes, e)
1689 raise ValueError("Empty data passed with indices specified.")
1690 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 1691 passed, implied))
1692
1693
ValueError: Shape of passed values is (368, 546), indices imply (366, 546)
The USGS provides several additional services, including:
These services are described here: https://waterwatch.usgs.gov/webservices/
The most useful service seems to be flood stage. This could be used to annotate flow duration charts or other figures.
Hydrofunction's built-in charts should have the ability to set a title or create a legend.
add new parameters legend
and title
legend
default could be False
; otherwise you could provide a value for legend.legend_loc
parameter too.title
could be set to False
or text.Use **kwargs and pass these on
What does Pandas do??
Travis CI has been changing, making my Continuous Integration system less convenient.
I would like to:
The docstring for hydrofunctions no longer works because the example stream gauge now collects different data.
help(hydrofunctions)
This gives us an example that uses the Harrisburg stream gauge: '01570500'.
>>> import hydrofunctions as hf
>>> site = '01570500'
>>> harrisburg = hf.NWIS(site, 'iv', period='P10D')
>>> harrisburg
USGS:01570500: Susquehanna River at Harrisburg, PA
00045: <30 * Minutes> Precipitation, total, inches
00060: <30 * Minutes> Discharge, cubic feet per second
00065: <30 * Minutes> Gage height, feet
Start: 2019-04-06 00:30:00+00:00
End: 2019-04-15 23:00:00+00:00
The example requests the last 10 days of IV data, returning precipitation, discharge, and stage, each with 30 minute sampling.
As of a few days ago (as far as I can tell), if you do this request again, you'll also get some data that gets collected every 15 minutes, leading to an 'upsampling' warning.
One way to avoid this is to either use a different gauge in the example, or simply request data from a range of dates that only have the 30 minute data, like back in 2019 when the example was first run.
harrisburg = hf.NWIS(site, 'iv', start_date='2019-04-06', end_date='2019-04-15')
>>> site = '01542500'
>>> karthaus = hf.NWIS(site, 'iv', period='P10D')
Apparently the NWIS Parameter Codes can be supplemented with additional numbers in a way that breaks my current selection method for NWIS.df().
For example, USGS site #12010000: NASELLE RIVER NEAR NASELLE, WA records stage in two different ways, and then calculates discharge for each of these methods, resulting in two stage numbers and two discharge numbers. To keep these parameters separate, they have the normal parameter code 00060 along with a dash and a six-digit number that doesn't show up in the parameter code listing: https://help.waterdata.usgs.gov/code/parameter_cd_nm_query?parm_nm_cd=%25discharge%25&fmt=html
This completely breaks hf.NWIS.df('00060')
!
>>> site_w_long_parmeter_name = hf.NWIS('12010000')
Requested data from https://waterservices.usgs.gov/nwis/dv/?format=json%2C1.1&sites=12010000
>>> site_w_long_parmeter_name
USGS:12010000: NASELLE RIVER NEAR NASELLE, WA
00010: <0 * Minutes> Temperature, water, degrees Celsius
00060-148364: <0 * Minutes> Discharge, cubic feet per second STILLING WELL
00060-148368: <0 * Minutes> Discharge, cubic feet per second
00065-148365: <0 * Minutes> Gage height, feet STILLING WELL
00065-148369: <0 * Minutes> Gage height, feet
Start: 1973-09-29 00:00:00+00:00
End: 2021-07-01 00:00:00+00:00
USGS:12042800: BOGACHIEL RIVER NEAR FORKS, WA
00060-148506: <0 * Minutes> Discharge, cubic feet per second [(2)]
00060-243490: <0 * Minutes> Discharge, cubic feet per second
When I use bBox to find a station I get a warning about frequency being set to zero. I think this warning is not relevant for this type of inquiry.
On another note, I noticed that if I use a number with more than 7 decimal figures for the lat and long, say -121.74788222 instead of -121.74788 in the following example, it throws an error (JSONDecodeError: Expecting value: line 1 column 1 (char 0)). It is a limitation with the query that is sent to USGS service. So I think it would be a good idea to round the lat and long up to 7 decimals before sending the inquiry.
A small note on the error, there is a space missing between frequency and for (frequencyfor).
import hydrofunctions as hf
hf.NWIS(bBox=[-121.74788, 47.38594, -121.54788, 47.58594])
....hydrofunctions/hydrofunctions/hydrofunctions.py:100: HydroUserWarning: It is not possible to determine the frequencyfor one of the datasets in this request.This dataset will be set to a frequency of 0 minutes
When the user tries to use .get_data(), a warning should be raised instead of printing something.
import hydrofunctions as hf
test = hf.NWIS('01542500', period='P5D').get_data()
Instead, the code should:
import warnings
...
def get_data(self):
warning.warn("don't do this anymore", FutureWarning)
return self
It would be nice to be able to request site information for USGS stream gauges.
It is possible to get:
from waterdata.usgs.gov. The StreamStats station data site pulls drainage area from this source.
https://waterdata.usgs.gov/nwis/inventory?search_site_no=01541200&format=sitefile_output&sitefile_output_format=rdb
returns:
#
#
# US Geological Survey
# retrieved: 2020-04-17 14:38:55 EDT
# URL: https://nwis.waterdata.usgs.gov/nwis/inventory
#
# The Site File stores location and general information about groundwater,
# surface water, and meteorological sites
# for sites in USA.
#
# The following selected fields are included in this output:
#
# agency_cd -- Agency
# site_no -- Site identification number
# station_nm -- Site name
# state_cd -- State code
# county_cd -- County code
# huc_cd -- Hydrologic unit code
# lat_va -- DMS latitude
# long_va -- DMS longitude
# coord_acy_cd -- Latitude-longitude accuracy
# coord_datum_cd -- Latitude-longitude datum
# alt_va -- Altitude of Gage/land surface
# alt_acy_va -- Altitude accuracy
# alt_datum_cd -- Altitude datum
# drain_area_va -- Drainage area
# contrib_drain_area_va -- Contributing drainage area
#
#
# query started 2020-04-17 14:38:55 EDT
#
# there are 1 sites matching the search criteria.
#
#
agency_cd site_no station_nm state_cd county_cd huc_cd lat_va long_va coord_acy_cd coord_datum_cd alt_va alt_acy_va alt_datum_cd drain_area_va contrib_drain_area_va
5s 15s 50s 2s 3s 16s 11s 12s 1s 10s 8s 3s 10s 8s 8s
USGS 01541200 WB Susquehanna River near Curwensville, PA 42 033 02050201 405741 0783110 S NAD27 1124.66 .01 NGVD29 367
Not an issue, just a recommendation! Recently, hydrofunctions extract_nwis_df changed to accept a JSON of a response instead of the raw response object. The line to turn the response to a JSON is still in the code, but commented out. It would possibly easier to use the function if it took either a JSON or a raw response. A simple type check at the top should make that doable. Something like:
if type(nwis_dict) is not dict:
nwis_dict = nwis_dict.json()
Just a thought!
I ran a hydrofunctions.NWIS query to get all temperature data in the state of California. When converting to a dataframe, there was a division by zero error since the data frequency was miscalculated.
I adjusted the source code so that in the extract_nwis_df at lines 571 and 572 of hydrofunctions.py (version on github), I replaced the variable 'freqs' with 'freqs2', which removes the zeros and prevents the division by zero error.
The new dataframe format has the data qualifiers adjacent to the data, so that different columns have different types of data placed together. Now when you try to plot using: my_dataframe.plot()
it generates an error.
import hydrofunctions as hf
%matplotlib inline
sites = ['01581830', '01589330']
data = hf.NWIS(sites, start_date='2002-01-01', end_date='2005-01-01').get_data()
data.ok
data.df().plot()
-error-
Pandas has no way of converting what is in the column into something it can plot.
Create different functions for creating different kinds of dataframe layout. There is more than one way to organize the data into a dataframe. Create a method for outputting a dataframe that can be plotted automatically.
Perhaps put the qualifiers into an identical dataframe that matches cell for cell with the data.
It should be possible to install Hydrofunctions only using pip. The .travis.yml basically does it this way, despite the difficulties of installing Numpy and Pandas. This process was sped up considerably by using the --only-binaries
option. It might be a good idea to work this into the regular requirements.txt file as well as the test requirements and the requirements-dev.txt files.
conda create -n myenv python=3.4
conda activate myenv
pip install --only-binary=numpy,scipy,pandas,Ipython numpy scipy pandas ipython
pip install hydrofunctions
python
import hydrofunctions as hf
dir(hf)
works!
Obviously, if I used conda to create the environment and set it up with Python 3.4, then I could use it to install numpy! The POINT however, is that it is possible to install the whole complicated set of dependencies now without relying on Anaconda.
My goals are to:
--only binaries
options to fit inside of requirements.txt or something so that a simple pip install hydrofunctions
will work from scratchI just installed Hydrofunctions and tried the "Basic Usage" commands here: https://github.com/mroberge/hydrofunctions:
INPUT
import hydrofunctions as hf
%matplotlib inline
herring = hf.NWIS('01585200', 'iv', period='P55D')
OUTPUT
TypeError Traceback (most recent call last)
in ()
----> 1 herring = hf.NWIS('01585200', 'iv', period='P55D')
~\AppData\Roaming\Python\Python36\site-packages\hydrofunctions\station.py in init(self, site, service, start_date, end_date, stateCd, countyCd, bBox, parameterCd, period, file)
132 try:
133 self.json = self.response.json()
--> 134 self._dataframe, self.meta = hf.extract_nwis_df(self.json)
135 self.ok = self.response.ok
136 if file is not None:
~\AppData\Roaming\Python\Python36\site-packages\hydrofunctions\hydrofunctions.py in extract_nwis_df(nwis_dict, interpolate)
511 )
512 DF = DF[~DF.index.duplicated(keep="first")]
--> 513 if local_freq > to_offset("0min"):
514 local_clean_index = pd.date_range(
515 start=local_start, end=local_end, freq=local_freq, tz="UTC"
pandas_libs\tslibs\timedeltas.pyx in pandas._libs.tslibs.timedeltas._Timedelta.richcmp()
TypeError: Cannot compare type 'Timedelta' with type 'Minute'
I imagine this is a Pandas issue and I'll attempt to resolve, but I thought you may have run into this already.
Travis can take 2-3 minutes to download and build my dependencies, especially Pandas.
It might be possible to use conda to build everything. see this:
Hi,
I am wondering if the monthly statistics data can also be downloaded through the function? Thanks!
Krystal
Travis-ci suddenly exits during the first test with a segmentation fault. This does not affect Python 3.4 or 3.6.
python setup.py test
...
running build_ext
test_charts_cycleplot_exists (tests.test_charts.TestCyclePlot) ... /home/travis/.travis/job_stages: line 57: 1794 Segmentation fault (core dumped) python setup.py test
When I added the 'statYearType' parameter to a request for annual statistics, it printed the wrong URL but still returned something.
hf.stats('01542500', 'annual', parameterCd='00060', statYearType='water')
Retrieved annual statistics for site #01542500 from https://waterservices.usgs.gov/nwis/stat//
Although it prints the wrong URL, it still seems to return the correct data.
This request came in from Jibin Joseph:
Hello Professor Roberge:
I am creating a tool in Jupyter Notebook Appmode and I wanted to avoid the extra line from hydrofunctions ("Requested data from https://waterservices.usgs.gov/nwis/....." ). Is there a way to avoid printing in the output? Is it okay to do that?
-Jibin Joseph
The simplest solution for now is to create a verbose=True
keyword and a line like:
if verbose:
print(f"Requesting data from {url}...")
This will print under most cases, but if someone wants to cancel printing for some other application, it is still possible.
To make this work, it will be necessary to modify hydrofunctions.py get_nwis() and also station.py NWIS.
When you run the peaks(), rating_curve(), field_meas(), or stats() functions, you get a RDB object back. This can be confusing.
type(peaks('01542500'))
will return the type RDB
which doesn't immediately make you realize that you are dealing with something from hydrofunctions. Maybe the name should be changed to hydroRDB
or something like that.
when you get the return value for peaks() printed on your screen, you get the header and a dataframe. I did this so that users would see the warnings in the header. But this means that you can't immediately use the output.
Could I simply set RDB.repr() to return the dataframe, and then let advanced users request to see the header manually?
Would this allow you to operate on the RDB the same as you would a dataframe?
Could you go:
my_rdb = peaks('01451200')
my_rdb.loc['2006']
When I switched to using pytest in TravisCI, it started finding tests/test_online_resources.py
, which
isn't meant to be run everytime you run the tests.
change .travis.yml to use unittest for the tests.
Here is a list of things that could be improved in the documentation:
I never really thought about this until now, but I've made it so that the repl gets printed every time you use a method.
I set it up so that the NWIS has a NWIS.__repl__()
method that prints a nice listing of the variables in the DF every time you type in the object's name.
I also set it up so that you can chain the methods in NWIS by returning the NWIS object.
But now you will trigger the .__repl__()
every time you .save() or whatever.
>>> herring = hf.NWIS('01585200', 'iv', period='P2D')
>>> herring.save('delete_me.parq')
USGS:01585200: WEST BRANCH HERRING RUN AT IDLEWYLDE, MD
00060: <5 * Minutes> Discharge, cubic feet per second
00065: <5 * Minutes> Gage height, feet
Start: 2019-04-17 22:25:00+00:00
End: 2019-04-19 22:05:00+00:00
NWIS('01585200', 'iv', period='P2D')
NWIS.info()
method that returns the stuff I have in the repl now.What does Pandas do?
PyArrow is a relatively exotic dependency that doesn't a pure python wheel. As a result, it often is the first dependency to cause import troubles with cloud-based kernals () or in browser-kernals (for jupyter-lite). Pandas seems to handle Pyarrow as an optional dependency too.
Is it possible to make PyArrow an optional dependency that won't prevent hydrofunctions from installing?
Pandas has this nice system that warns when an optional dependency isn't available:
def import_lzma():
"""
Importing the `lzma` module.
Warns
-----
When the `lzma` module is not available.
"""
try:
import lzma
return lzma
except ImportError:
msg = (
"Could not import the lzma module. Your installed Python is incomplete. "
"Attempting to use lzma compression will result in a RuntimeError."
)
warnings.warn(msg)
In one of my client environments, certain python libraries like siphon produce SSL certification errors. Siphon has a session manager that can either bypass this or add a certificate. Is there a similar kwarg or capability for hydrofunctions?
Commands: current_test = hf.NWIS(['03352988'], 'iv', start_date=begin, end_date=today)
Traceback: requests.exceptions.SSLError: HTTPSConnectionPool(host='waterservices.usgs.gov', port=443): Max retries exceeded with url: /nwis/iv/?format=json%2C1.1&sites=03352988&startDT=2021-06-07&endDT=2021-07-08 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)')))
Thank you!
If you specify a parameterCd in your request, the NWIS will return data only for that parameter. Instead, make the default be parameterCd = None
, so that the default request will return all of the parameters collected at the site.
If I set parameterCd=None
, will the request add parameterCd to the url, or does it drop parameters that are set to None?
[* edit mcr 2/11/19] ==> This works. If you don't specify a parameterCd now, NWIS will return every parameter.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.