gclunies / py_noaa Goto Github PK

View Code? Open in Web Editor NEW

19.0 1.0 5.0 319 KB

Python package to fetch data from NOAA Tides & Currents API

License: MIT License

Python 100.00%

coastal-engineering noaa tides currents

py_noaa's Introduction

py_noaa

NOTE: THIS PACKAGE HAS BEEN REPLACED BY `noaa_coops`. NO FURTHER DEVELOPMENT IS PLANNED.

py_noaa is a Python package that wraps around the NOAA CO-OPS Tides & Currents API and returns data in convenient formats (i.e., pandas dataframe) for further analysis in python. Analysis of the data is left up to the end user.

NOTE:

This package is under development, additional functionality will be added over time.

Installation

pip install py_noaa

You can update py_noaa using:

pip install py_noaa --upgrade

NOAA CO-OPS Tides & Currents

NOAA records tides, currents, and other meteoroligical observations at various locations across the United States and the Great Lakes regions. Predictions are also available for tides and currents.

py_noaa accesses data following the NOAA CO-OPS API documentation.

Available Data

A list of available data products is provided in the API documentation

CO-OPS module basics

Get the station ID for your station of interest, a summary of available stations, by data type, can be found through the following links:
Read the station info if available! Useful station info is typically available based on the datatype recorded at a station. Station info for current stations are NOT the same for water level and tide stations (see examples below).
- Exmaple current station info
- Example water level & tide station info
Fetch data using the coops.get_data() function for various data products, listed here. The currently supported data types are:
- Currents
- Observed water levels
- Observered daily high and low water levels (use product="high_low")
- Predicted water levels
- Predicted high and low water levels
- Winds
- Air pressure
- Air temperature
- Water temperature

Compatibility with other data products listed on the NOAA CO-OPS API may exist, but is not guaranteed at this time.

Examples data requests are shown below:

Observed Currents

>>> from py_noaa import coops
>>> df_currents = coops.get_data(
...     begin_date="20150727",
...     end_date="20150910",
...     stationid="PUG1515",
...     product="currents",
...     bin_num=1,
...     units="metric",
...     time_zone="gmt")
...
>>> df_currents.head() # doctest: +NORMALIZE_WHITESPACE
                     bin  direction  speed
date_time
2015-07-27 20:06:00  1.0      255.0   32.1
2015-07-27 20:12:00  1.0      255.0   30.1
2015-07-27 20:18:00  1.0      261.0   29.3
2015-07-27 20:24:00  1.0      260.0   27.3
2015-07-27 20:30:00  1.0      261.0   23.0

Observed Water Levels

>>> from py_noaa import coops
>>> df_water_levels = coops.get_data(
...     begin_date="20150101",
...     end_date="20150331",
...     stationid="9447130",
...     product="water_level",
...     datum="MLLW",
...     units="metric",
...     time_zone="gmt")
...
>>> df_water_levels.head() # doctest: +NORMALIZE_WHITESPACE
                       flags QC  sigma  water_level
date_time
2015-01-01 00:00:00  0,0,0,0  v  0.023        1.799
2015-01-01 01:00:00  0,0,0,0  v  0.014        0.977
2015-01-01 02:00:00  0,0,0,0  v  0.009        0.284
2015-01-01 03:00:00  0,0,0,0  v  0.010       -0.126
2015-01-01 04:00:00  0,0,0,0  v  0.013       -0.161

Predicted Water Levels (Tides)

Note the use of the interval parameter to specify only hourly data be returned. The interval parameter works with, water level, currents, predictions, and meteorological data types.

>>> from py_noaa import coops
>>> df_predictions = coops.get_data(
...     begin_date="20121115",
...     end_date="20121217",
...     stationid="9447130",
...     product="predictions",
...     datum="MLLW",
...     interval="h",
...     units="metric",
...     time_zone="gmt")
...
>>> df_predictions.head() # doctest: +NORMALIZE_WHITESPACE
                     predicted_wl
date_time
2012-11-15 00:00:00         3.660
2012-11-15 01:00:00         3.431
2012-11-15 02:00:00         2.842
2012-11-15 03:00:00         1.974
2012-11-15 04:00:00         0.953

Also available for the interval parameter is the hilo key which returns High and Low tide predictions.

>>> from py_noaa import coops
>>> df_predictions = coops.get_data(
...     begin_date="20121115",
...     end_date="20121217",
...     stationid="9447130",
...     product="predictions",
...     datum="MLLW",
...     interval="hilo",
...     units="metric",
...     time_zone="gmt")
...
>>> df_predictions.head() # doctest: +NORMALIZE_WHITESPACE
                    hi_lo  predicted_wl
date_time
2012-11-15 06:57:00     L        -1.046
2012-11-15 14:11:00     H         3.813
2012-11-15 19:36:00     L         2.037
2012-11-16 00:39:00     H         3.573
2012-11-16 07:44:00     L        -1.049

Filtering Data by date

All data is returned as a pandas dataframe, with a DatimeIndex which allows for easy filtering of the data by dates.

>>> from py_noaa import coops
>>> df_predictions = coops.get_data(
...     begin_date="20121115",
...     end_date="20121217",
...     stationid="9447130",
...     product="predictions",
...     datum="MLLW",
...     interval="h",
...     units="metric",
...     time_zone="gmt")
...
>>> df_predictions['201211150000':'201211151200'] # doctest: +NORMALIZE_WHITESPACE
                     predicted_wl
date_time
2012-11-15 00:00:00         3.660
2012-11-15 01:00:00         3.431
2012-11-15 02:00:00         2.842
2012-11-15 03:00:00         1.974
2012-11-15 04:00:00         0.953
2012-11-15 05:00:00        -0.047
2012-11-15 06:00:00        -0.787
2012-11-15 07:00:00        -1.045
2012-11-15 08:00:00        -0.740
2012-11-15 09:00:00         0.027
2012-11-15 10:00:00         1.053
2012-11-15 11:00:00         2.114
2012-11-15 12:00:00         3.006

Exporting Data

Since data is returned in a pandas dataframe, exporting the data is simple using the .to_csv method on the returned pandas dataframe. This requires the pandas package, which should be taken care of if you installed py_noaa with pip.

>>> df_currents = coops.get_data(
...     begin_date="20150727",
...     end_date="20150910",
...     stationid="PUG1515",
...     product="currents",
...     bin_num=1,
...     units="metric",
...     time_zone="gmt")
...
>>> df_currents.to_csv(
...     'example.csv',
...     sep='\t',
...     encoding='utf-8')

As shown above, you can set the delimeter type using the sep= argument in the .to_csv method and control the file encoding using the encoding= argument.

Requirements

For use:

requests
numpy
pandas

Suggested for development/contributions:

pytest
pytest-cov

TODO

See issues for a list of issues and to add issues of your own.

Contribution

All contributions are welcome, feel free to submit a pull request if you feel you have a valuable addition to the package or constructive feedback.

The development of py_noaa was originally intended to help me (@GClunies) learn Python packaging, git, and GitHub while also helping to alleviate the pain of downloading NOAA Tides and Current data as part of my day job as a Coastal engineer.

As this project started as a learning exercise, please be patient and willing to teach/learn.

Many thanks to the following contributors!

py_noaa's People

Contributors

Stargazers

Watchers

Forkers

delgadom jcconnell craigharter kalnun webmation

py_noaa's Issues

Would be nice to have a progress bar during download

Could recommend embedding this function in the loop:

def update_progress(progress):
import sys

barLength = 10 # Modify this to change the length of the progress bar
status = ""
if isinstance(progress, int):
    progress = float(progress)
if not isinstance(progress, float):
    progress = 0
    status = "error: progress var must be float\r\n"
if progress < 0:
    progress = 0
    status = "Halt...\r\n"
if progress >= 1:
    progress = 1
    status = "Done...\r\n"
block = int(round(barLength*progress))
text = "\rPercent: [{0}] {1:4.1f}% {2}".format( "#"*block + "-"*(barLength-block), progress*100, status)
sys.stdout.write(text)
sys.stdout.flush()

Connectction times out

Attempted to connect from simple example:

from pprint import pprint
import noaa_coops as nc
seattle = nc.Station(9447130)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='tidesandcurrents.noaa.gov', port=80): Max retries exceeded with url: /mdapi/v1.0/webapi/stations/9447130.json?expand=details,sensors,products,disclaimers,notices,datums,harcon,tidepredoffets,benchmarks,nearby,bins,deployments,currentpredictionoffsets,floodlevels?units=metric (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000212435D1BB0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

from pprint import pprint >>> import noaa_coops as nc >>> seattle = nc.Station(9447130)>>> from pprint import pprint >>> import noaa_coops as nc >>> seattle = nc.Station(9447130)

Same error with two other local station ids.

pandas.io.json.json_normalize is deprecated,

Why is the "issues" link to a back-level unsupported version and not the current version noaa_coops
https://pypi.org/project/noaa-coops/?
Requires a mod to noaa_coops.py C:\Users\Jerry\Python_Projects\Noaa-Coops\lib\site-packages\noaa_coops\noaa_coops.py:356:

Cannot get hourly data for water level and current observations, 6-min only

Using the interval="h" argument for product="water_level" or product="currents" when calling `coops.get_data( ) does not return hourly water level or currents data.

According to NOAA API

"The hourly interval is supported for Met data and Predictions data only."

Would be useful to be able to return water_level and currents data at hourly temporal resolution as 6-min is often overkill

Large requests with data gap between begin_date and end_date may throw errors

Background:
NOAA tides&currents limits data request sizes (either 31 days or 365 days) based on the product type and the interval the user has requested the data.

As a result, when a large request is made (months to years of data) using coops.get_data(), then coops.get_data() handles the request by looping, with each loop making a separate request for a "block" of data. As coops.get_data() loops through, each block's begin_date and end_date is adjusted accordingly.

Issue:
When making large requests (e.g. 16 years as in example below), if there happens to be a "break" in the data record which is longer than the "block" size in each loop... then a ValueError is thrown since at least one of the dates falls inside the break, making the request invalid.

Example of Isssue:
Example below throws an error. Requesting wind from 2010 to 2016 (missing data somewhere between 2010 and 2012). Right now this needs to be split into two separate requests:

In [1]: from py_noaa import coops

In [2]: df_winds_KIP = coops.get_data(
   ...: begin_date="20000101",
   ...: end_date="20160101",
   ...: stationid="8632200",
   ...: product="wind",
   ...: interval="h",
   ...: units="english")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-652ff01d4272> in <module>()
      5 product="wind",
      6 interval="h",
----> 7 units="english")

~\AppData\Local\Continuum\anaconda3\lib\site-packages\py_noaa\coops.py in get_data(begin_date, end_date, stationid, product, datum, bin_num, interval, units, time_zone)
    287                 stationid, product, datum, bin_num, interval, units, time_zone)
    288
--> 289             df_new = url2pandas(data_url, product)  # Get dataframe for block
    290             df = df.append(df_new)  # Append to existing dataframe
    291

~\AppData\Local\Continuum\anaconda3\lib\site-packages\py_noaa\coops.py in url2pandas(data_url, product)
    157     if 'error' in json_dict:
    158         raise ValueError(
--> 159             json_dict['error'].get('message', 'Error retrieving data'))
    160
    161     if product == 'predictions':

ValueError: No data was found. This product may not be offered at this station at the requested time.

Predicted water levels/tides not included in coops.get_data( )

Cannot pull tidal predictions data from NOAA API since we provide no interval parameter in the coops.get_data( ) function.

The existing parameters for coops.get_data( ) are shown below:

def get_data(begin_date,
                     end_date, 
                     stationid, 
                     product, 
                     datum=None, 
                     bin_num=None, 
                     units='metric', 
                     time_zone='gmt'):

See NOAA documentation , interval parameter options are:

Option	Description
h	Hourly Met data and predictions data will be returned
hilo	High/Low tide predictions for subordinate stations.

hourly__height requests do not work

As identified my colleague Craig Harter (not on GitHub)... I have documented the issue he identified below:

Requesting hourly_height as the product results in a error stating no datum is provide, even when a datum IS provided.

This happens because coops.py is missing some code to handle the hourly_height product correctly, similar to how water_level, predictions, and currents are handled with specific parameters required for each.

hourly_height data can actually be requested in 365 day blocks, not 31 days block like most other data, so this needs to be handled as well for optimization.

Fortunately, Craig has generously fixed this issue locally and provided the code fix. I will include update code as PR on his behalf.

Code Example:

from py_noaa import coops

df_currents = coops.get_data(
     begin_date="20180520",
     end_date="20180521",
     stationid="9447130",
     product="hourly_height",
     datum="NAVD88",
     units="metric",
     time_zone="gmt")

Resulting Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-001d91a78aaf> in <module>()
      6      datum="NAVD88",
      7      units="metric",
----> 8      time_zone="gmt")

~\AppData\Local\Continuum\anaconda3\lib\site-packages\py_noaa\coops.py in get_data(begin_date, end_date, stationid, product, datum, bin_num, interval, units, time_zone)
    191                                    time_zone)
    192 
--> 193         df = url2pandas(data_url, product)
    194 
    195     # If the length of the user specified data request is greater than 31 days,

~\AppData\Local\Continuum\anaconda3\lib\site-packages\py_noaa\coops.py in url2pandas(data_url, product)
    133     if 'error' in json_dict:
    134         raise ValueError(
--> 135             json_dict['error'].get('message', 'Error retrieving data'))
    136 
    137     if product == 'predictions':

ValueError:  Wrong Datum: Datum cannot be null or empty  ***station=9447130

for parameter "hourly_height," concatenating subsequent years duplicates data from the first day in the defined series

If I submit the following command:

df = coops.get_data(
begin_date='20070101',
end_date='20090101',
stationid='8737048',
product='hourly_height',
datum='msl',
units='english',
time_zone='lst'
)

then the day of 2008-01-01 will be duplicated in the resulting dataframe. This is because the first API batch call is for a one year period (ending on 2008-01-01) and the subsequent batch call will begin on 2008-01-01. Thus that date is duplicated in the resulting time series.

Travis CI builds failing - Pytest Issue?

@delgadom,

I am wondering if you are able to shed some light on why these builds are failing? Only thing that has changed in latest commit is some of the documentation in README.md, no change to code itself or the tests. I even re-ran some previously passing builds (e.g. build #80) and they now fail as well. It looks like its an issue when Travis CI tries to run pytest?

You can see the build logs for the latest commit's build here build #81

On another note, I plan on replacing this package with noaa_coops, which will have a Station() class and support metadata from NOAA (https://tidesandcurrents.noaa.gov/mdapi/latest/). Early days, but I suspect I may run across the same issue as I develop there and begin to include testing and Travis CI.

predictions with interval="hilo" return values as string

Requesting tide predictions with interval="hilo" returns the value as string. This creates issues for plotting (out of order y-axis values - see stack overflow answer here for explanation)

Example code that reproduces the issue:

# Get data from NOAA CO-OPS API
df_hl = coops.get_data(
    begin_date="20181224",
    end_date="20190107",
    stationid="9447717",
    product="predictions",
    datum="NAVD",
    interval="hilo",
    units="english",
    time_zone="lst_ldt")

df_hl.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 54 entries, 2018-12-24 06:52:00 to 2019-01-06 23:32:00
Data columns (total 2 columns):
hi_lo           54 non-null object
predicted_wl    54 non-null object
dtypes: object(2)
memory usage: 1.3+ KB

The datatype for predicted_wl should be float.

This issue should be checked for other products as well.

Credit where credit's due

As this is turning into a proper package, it deserves some badges. These can be added to the readme to signify build status, pypi distribution, and other features.

Usually, these are added to the very top of the readme, all on one line (e.g. see https://github.com/pydata/xarray)

travis build status

[![Build Status](https://travis-ci.org/GClunies/py_noaa.svg?branch=master)](https://travis-ci.org/GClunies/py_noaa)

pypi version

[![PyPI](https://img.shields.io/pypi/v/py_noaa.svg)](https://pypi.python.org/pypi/py-noaa)

python versions

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/py_noaa.svg)](https://pypi.python.org/pypi/py-noaa)

other services

test coverage: coveralls.io
code quality: codacy.com

there's tons of other stuff, but these are some of the ones I regularly use.

other badges

http://shields.io/

Column names are not descriptive, data not numeric, for requests < 31 days

Any request less than 31 days do not return a pandas dataframe with descriptive column names and all of the columns are still "non-null object', not numeric data. Example request less than 31 days:

In [12]: df_obs_less_than_31day = coops.get_data(begin_date="20121215", end_date="20121217", stationid="9447130", product="water_level", datum="mllw", units="english", time_z
    ...: one="lst_ldt")

In [13]: df_obs_less_than_31day.head()
Out[13]:
         f  q      s                 t       v
0  0,0,0,0  v  0.016  2012-12-15 00:00  -2.863
1  0,0,0,0  v  0.020  2012-12-15 00:06  -2.774
2  0,0,0,0  v  0.026  2012-12-15 00:12  -2.610
3  0,0,0,0  v  0.039  2012-12-15 00:18  -2.459
4  0,0,0,0  v  0.046  2012-12-15 00:24  -2.288

In [14]: df_obs_less_than_31day.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 720 entries, 0 to 719
Data columns (total 5 columns):
f    720 non-null object
q    720 non-null object
s    720 non-null object
t    720 non-null object
v    720 non-null object
dtypes: object(5)
memory usage: 28.2+ KB

Similar request bigger than 31 days results in:

In [4]: df_obs_more_than_31days = coops.get_data(begin_date="20121115", end_date="20121217", stationid="9447130", product="water_level", datum="mllw", units="english", time_z
   ...: one="lst_ldt")

In [5]: df_obs_more_than_31days.head()
Out[5]:
     flags  QC  sigma           date_time  water_level
0  0,0,0,0 NaN  0.062 2012-11-15 00:00:00       -2.429
1  0,0,0,0 NaN  0.043 2012-11-15 00:06:00       -2.206
2  0,0,0,0 NaN  0.033 2012-11-15 00:12:00       -1.996
3  0,0,0,0 NaN  0.043 2012-11-15 00:18:00       -1.793
4  0,0,0,0 NaN  0.039 2012-11-15 00:24:00       -1.537

In [6]: df_obs_more_than_31days.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 8160 entries, 0 to 479
Data columns (total 5 columns):
flags          8160 non-null object
QC             0 non-null float64
sigma          8155 non-null float64
date_time      8160 non-null datetime64[ns]
water_level    8160 non-null float64
dtypes: datetime64[ns](1), float64(3), object(1)
memory usage: 382.5+ KB

Output format should be consistent regardless of request length.

fix readme

Woot! py_noaa is now up and running on Travis!

Aaaaaaand it's failing :)

https://travis-ci.org/GClunies/py_noaa

Looks like your latest commit beat me to the punch! I'll create a PR to fix the doctests in a bit.

Now that the repo is being tested automatically, you should be able to make sure that master always passes tests by doing two things:

when making changes, run pytest frequently to make sure you haven't broken anything
as a best practice, create a branch, push that branch to github, then merge changes into master once you know tests are passing.

And if you really want to venture into best practice land, you can switch to test-driven development. In this realm, you write tests that assert the behavior you want to observe, make sure that the package in its current state fails those tests, then make the fewest changes you can to pass tests. In practice it's difficult to hold to this tough standard unless you're working on a huge project like pandas or something, but it's a good goal post to keep in mind.

Water level predictions throws error if no interval is specified

According to NOAA API, default time interval for predicted water levels is 6 minutes. This does not require interval to be specified in a call to the API.

Currently coops.get_data() requires that an interval is supplied to the function, if no interval is supplied (see options here) it throws an error.

ValueError: No interval specified for water level predictions. See https://tidesandcurrents.noaa.gov/api/#interval for list of available intervals

Code used to generate error:

df_pred = coops.get_data(begin_date="20150101",
                       end_date="20180101",
                       stationid="9447130",
                       product="predictions",
                       datum="MLLW",
                       units="english",
                       time_zone="lst_ldt")

Support CO-OPS Metadata API

Add support for CO-OPS Station Metadata as outlined here:
https://tidesandcurrents.noaa.gov/mdapi/latest/#intro