GithubHelp home page GithubHelp logo

coubanao / manning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ryanmark1867/manning

0.0 0.0 0.0 19.71 MB

DEPRECATED repo for Manning book Deep Learning with Structured Data - please see https://github.com/ryanmark1867/deep_learning_for_structured_data for the current repo

Python 0.20% Jupyter Notebook 99.80%

manning's Introduction

manning

PLEASE NOTE - this is the DEPRECATED repo for Manning book Deep Learning with Structured Data https://www.manning.com/books/deep-learning-with-structured-data. It is no longer being maintained and it is made available for readers who have been following it from earlier version of the MEAP. Please use the current repo https://github.com/ryanmark1867/deep_learning_for_structured_data instead.

Note:

You can find an improved repo for the book, including many updates to the code and a simpler structure, at https://github.com/ryanmark1867/deep_learning_for_structured_data. The old repo (whose readme you are reading now) will remain available, but all further updates to the code will be made to new, rationalized repo.

Prequisites

To run the code in this repo you will need access to an environment (such as Paperspace https://www.paperspace.com/ or Watson Studio Cloud https://cloud.ibm.com/catalog/services/watson-studio ) that supports Jupyter notebooks.

Background

Directory structure

  • CSV_XLS CSV and XLS data
  • pickled_pandas_dataframes pickled versions on intermediate Pandas dataframes. See the Intermediate datasets section below for details
  • sql files related to the section in Chapter 5 about how to perform standard SQL operations in Pandas
  • root directory contains notebook files for the code examples in the book - see the Code section below for details

Intermediate datasets (in pickled_pandas_dataframes directory)

  • 2014_2018.pkl pickled dataframe containing data from the original dataset 2014 to November 2018
  • 2014_2018_df_cleaned_keep_bad_loc_geocoded_SCtrimmed_apr26.pkl pickled dataframe with cleanup from streetcar_data_preparation.ipynb applied and latitude and longitude added for Location values (results of calling Geocoding API in streetcar_data_preparation-geocode.ipynb)

Code

  • chapter2.ipynb code snippets associated with introductory code in chapter 2
  • chapter5.ipynb code snippets associated with SQL / Pandas examples
  • streetcar_data_preparation_refactored.ipynb load original dataset into a single dataframe and perform basic cleanup on values. By default takes 2014_2019_upto_june.pkl as input
  • streetcar_data_exploration.ipynb basic data exploration
  • streetcar_time_series.ipynb additional data exploration using time series forecasting techniques
  • streetcar_data_preparation-geocode-public.ipynb use Google Geocoding API to get latitude and longitude values from Location values. NOTE: to run the code in this file, you will need to get your own API Key from Google Cloud Platform https://developers.google.com/maps/documentation/embed/get-api-key
  • streetcar_DL_train-trimmed.ipynb train Keras deep learning model. By default takes 2014_2018_df_cleaned_keep_bad_loc_geocoded_SCtrimmed_apr26.pkl as input
  • streetcar_DL_refactored.ipynb train Keras deep learning model using refactored dataset. Create dataframe with entries for every date, hour, route, direction combination since Jan 1 2014. Join with dataframe containing the original dataset and generate target column that is 1 if there was a delay in that date / hour / route / direction combination and 0 otherwise. Use this refactored dataset to train Keras model. Focusing on this approach based on feedback received from presentation on this project at AI Beijing conference https://ai.oreilly.com.cn/ai-cn/public/schedule/detail/75461?locale=en
  • streetcar_data-geocode-get-boundaries.ipynb code in development to get geo bounding boxes around each route so that DL_refactored can be augmented with subsets of the route - add a feature to that approach defined by the geographic subset of the route that the user is interested in.

NOTE: All code assumes that datasets (raw or pickled) are in a directory called "data" that is a sibling of the directory containing the notebook.

To re-run the entire flow from scratch:

  1. copy all the XLS files from the original dataset (https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#e8f359f0-2f47-3058-bf64-6ec488de52da) into a directory called "data" that is a sibling to the directory containing your notebook files
  2. streetcar_data_preparation_refactored.ipynb with the following updates:
  • load_from_scratch set to True
  • pickled_output_dataframe set to the name you want for the output pickle file containing the cleansed dataframe
  1. run streetcar_data_preparation-geocode-public.ipynb with the following updates:
  1. run streetcar_DL_train-trimmed.ipynb with the following updates:
  • pickled_dataframe set to the pickle file you specified as output in step 3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.