GithubHelp home page GithubHelp logo

tporkka / create-your-own-weather-database Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 34 KB

Script for extracting historical weather data in bulk using the NOAA API

Python 100.00%
noaa-api weather python3 etl

create-your-own-weather-database's Introduction

Why bother?

Weather can have a significant impact on businesses outcomes and other facets of life. Unfortunately, it can be difficult to analyze this impact without accurate historical data in a structured format. Storing historical weather in clean time series allows you to quantify this impact, and more importantly, create models to better predict future outcomes.

NOAA_Historical_Weather_Extraction.py

This is a script for extracting historical weather data in bulk using the NOAA API. The end result is a series of comma delimeted files that can easily be integrated into a relational database.

Dependencies

Python 3 (Packages: requests, datetime, json, pandas, os, math, time)

How to use this script

  1. Install all dependencies on the machine you will be running this script.
  2. Download all files to your local directory.
  3. Add your NOAA API Token to the config.json file. You can get a token here: https://www.ncdc.noaa.gov/cdo-web/token
  4. Specify the weather stations, metrics, start date, and end date in the config.json file.
 {
   "noaa-config":{
        "token":"YOUR_NOAA_API_TOKEN",
        "base_url":"https://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&stationid=GHCND:"
    },
    "params":{
		"weather_stations": ["USW00012916", "USW00094982"],
		"weather_features": ["station_dt_key", "date", "station", "PRCP", "SNOW", "SNWD", "AWND", "TMAX", "TMIN"],
    "start_date":"2016-01-01",
    "end_date":"2018-12-31"
    }
}
  1. Navigate to the appropriate directory and run the script.
$ cd directory
$ python3 NOAA_weather_extraction.py

This will run the script and create a single comma delimited file for each weather station into the created WeatherExtracts folder. See example_output.txt file.

Other notes

  • station_dt_key serves as a primary key which can be useful for applying update/insert logic in a database.
  • The script breaks down large queries into smaller requests to avoid going over the rate limit.

Todos

  • Improve error handling and communication
  • Add ability to pass dynamic dates (i.e. end_date = datetime.datetime.now().date()) when specified in config file.

Some suggestions for creating your own weather database

This is the first part of an effort to create and maintain a database of historical weather data. To create a fully automated and up-to-date table or set of tables, you will need to set up a pipeline that stores new or updated records as they are available. I recommend running this script once to gather bulk historical data (set the start and end dates in the config file to your liking) and then change the script to just gather the last week of data since some weather stations are updated less frequently. This will gather new or updated records that you can use to update your database tables. I automated this entire workflow using Microsoft Azure: Script run as an Azure Function, dropped resulting text file to Azure Blob storage, Azure Data Factory pipeline to copy data from Blob to a table within Azure Data Warehouse (using a stored proc on ADW to handle update/insert logic). A similar process can be followed regardless of which tools you're using.

create-your-own-weather-database's People

Contributors

tporkka avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.