GithubHelp home page GithubHelp logo

tsdataclinic / newerhoods Goto Github PK

View Code? Open in Web Editor NEW
40.0 8.0 7.0 213.45 MB

A Data Clinic project that aggregates NYC Open Data at the tract-level and uses Machine Learning techniques to re-imagine neighborhood boundaries.

License: Apache License 2.0

R 2.61% HTML 97.02% CSS 0.29% JavaScript 0.07% Dockerfile 0.01% Shell 0.01%
rshiny neighborhoods clustering machine-learning spatial-analysis

newerhoods's Introduction

NewerHoods

New York City’s (NYC’s) neighborhoods are a driving force in the lives of New Yorkers—their identities are closely intertwined and a source of pride. However, the history and evolution of NYC’s neighborhoods don’t follow the rigid, cold lines of statistical and administrative boundaries. Instead, the neighborhoods we live and work in are the result of a more organic confluence of factors.

Data Clinic developed NewerHoods with the goal of helping individuals and organizations better advocate for their communities by enabling them to tailor insights to meet their specific needs. NewerHoods is an interactive web-app that uses open data to generate localized features at the census tract-level and machine learning to create homogeneous clusters. Users are able to select characteristics of interest (currently open data on housing, crime, and 311 complaints), visualize NewerHood clusters on an interactive map, find similar neighborhoods, and compare them against existing administrative boundaries. The tool is designed to enable users without in-depth data expertise to compare and incorporate these redefined neighborhoods into their work and life.

The application is live and available to use here.

Getting Started

The below steps will help you get started and setting up and running NewerHoods locally. Since this is a RShiny application, install RStudio on your machine if you haven't already from here.

Directory Structure

newerhoods/clean_data contains just the cleaned/transformed data sets used directly by the Shiny App.

/src contains all the code to merge and clean the data sets, extract features from it, and cluster the features.

/newerhoods contains the code for the RShiny WebApp.

Running the App

First, the R environment needs to be set up with all the necessary packages.

source("newerhoods/setup.R")

The project uses several APIs from loading data using the APIs developed by NYC Developer Portal and Mapbox for the underlying map visualization in the Shiny App. Getting all of these token are free by signing up here and here. Follow the instructions in the settings.R file which can be found in the newerhoods folder and source the local version of the file to get all the tokens stored in the environment. You would have to source this settings file everytime you start a new session.

Note: If you intend to run only the RShiny App, filling in just the MapBox API Token would suffice.

source("newerhoods/settings_local.R")

Run the App

library(shiny)
runApp("newerhoods")

Alternatively, you can run the application in docker. To build the docker container run

docker build -t newerhoods . 

Then to run the docker container simply run

docker run -it --rm -p 3000:3000 -v $(pwd):/app newerhoods 

any changes you make in the code should trigger an application reload so all you should need to do is refresh your browser to see them.

Contributing to NewerHoods

We invite feedback on the tool and encourage users to contribute. Check out this page for some ways in which you can contribute. To contact Data Clinic about NewerHoods, please email us at [email protected].

Data Sources

  1. NYC Annualized Property Sales Data (2012-2017)
  2. MapPLUTO (18v1)
  3. Geoclient API v1.1
  4. Property Assessment Roll Archives
  5. NYPD Complaint Data Historic
  6. 311 Service Requests from 2010 to Present
  7. NYC 2010 Census Tracts

References

  1. ClustGeo: an R package for hierarchical clustering with spatial constraints
  2. Making Neighborhoods - Understanding New York City Transitions 2000-2010

License

This project is licensed under the Apache 2.0 License - see the LICENSE.md file for details

newerhoods's People

Contributors

kaushik12 avatar rachaelwr avatar santind avatar stuartlynn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

newerhoods's Issues

Performance improvements

Look into

  • Look into shinytest to generate test cases for the app and automate testing.
  • Look into shinyloadtest to check how the app performs at different loads and see where improvements can be made.
  • Watch shiny webinar and resources on golem package for shiny app development

UI/UX: Display Optimal clusters

Optimal clusters for the chosen data are computed internally but not yet displayed in the UI. Modifying the slider seems a bit hard. Need to explore alternative solutions.

Needs:
Display optimal number of clusters (4 options for different ranges [0-50; 50-100; 100-150; 150-200]
Allow user to click on these to automatically set number of clusters and recompute

Ideal Solution:
Display little pointers above/below slider that a user can click on to set the slider val to the corresponding value.

Insights

What do insights from the clustering look like?
Can we provide users with succinct insights from the results?

Ideas:

  • Display where in the distribution the cluster statistic lies (i.e. percentiles in the labels)
  • Display trends (is this going up or down) [Harder to do with uploaded data]
  • Summary reports (% overlap with different boundaries)
  • Highest matching cluster-administrative unit pair
  • Reports for a Community District

Use Case: Finding your neighborhood

A simple use case of this tool could be to allow a user to find a region based the data and their inputs. This would obviously get richer with more datasets integrated. For example, a user can select a set of criterion (range of housing prices, access to schools, parks etc.) and their importance (maybe?) that would result in "areas" (clusters) that best match the criterion.

Features from Facilities Database

Want to integrate features form the Facilities database shapefile on NYC Open Data. These are points or shapes which can't just be joined into a tract to get local features.

Upload-data: Tooltips/how to for upload

Add tooltips for info on geographic identifier requirements.
Add a general "How to" with screenshots for an overview of the process.
Add sample csvs for people to download to perhaps.

Upload-data: Column selection

  • Ensure row numbers are excluded
  • Remove a column from feature selection if selected as geographic identifier
  • Basic version of autofill for geographic identifier

Fix debounce / cancel

Should only send one request at a time. Cancel previous fetch when starting a new one.

Real-estate price growth features

Currently, we only use mean and sd of real-estate prices. Growth in prices over time would be easy to add and could add some value.

Recommend number of clusters

Is your feature request related to a problem? Please describe.
Currently, the choice of number of clusters is entirely up to the user. Ideally, we would like to recommend a few options for the number of clusters based on the data.

Describe the solution you'd like
A set of choices for number of clusters in different ranges (0-50,50-100, etc.) available for the user to select from.

Download-data: UI

Need to move the button to ideally top-right on the map. Not sure if this is possible. If not, @YuanyuanMaggie feel free to modify the design as appropriate

Major Update: Migrate to React from RShiny

Not a immediate need but the general outlook is to try and migrate project to React.

Why?

  • Greater control and reduced dependency on RShiny modules & shinyapps.io
  • Process user data on client side vs. server side
  • Align with overall Data Clinic plan for app development

Missing data?

There seems to be a number of areas that have data that is missing where there shouldn't be. Just checking to make sure that nothing is missing in the data pipeline? Are these just areas with a lack of residential housing?

Missing data

Upload-data: User feature inputs

This is currently a dropdown input which is different from the checkbox inputs. I think the code would work similarly if the input is simply changed to CheckboxInput.

Refresh Housing data

Housing data is currently from 2013-2017. 2018 Annualized sales data for 2018 was updated since. Add it to the data and improve data pre-processing to make it more streamlined. Look into including Rolling sales data which updates more regularly into it.

Download/Share map

Allow users to download/share the results of the clustering. shinyurl can be used to create custom-links.

Major Update: expand to other cities with Open Data

A good second city would be Chicago. Wealth of Open Data and interesting socio-economic structure of the city. Working on this would also be a good time to think through the best data structure to make the process of including more cities easier and possibly migrating the data to an actual database.

Some files don't upload throwing a stopped prematurely error

Describe the bug
Upload stops prematurely when uploading certain files.

Additional context
RStudio Support confirmed that this is a known issue and has to do with file encoding. UTF-8 files are less likely to fail. Alternatively, uploading a zip file and unzipping it on the server side could be a solution.

Improve Help/How tos

Is your feature request related to a problem? Please describe.
Currently, a user doesn't have clear info on how to use the App. We assume the UI is straightforward enough for users to figure out. This may not be the case as we add more features.

Describe the solution you'd like
A help flow which takes a user through each input/control and highlights the process and purpose of each. rintrojs allows us to do this for a shinyapp. Need to implement.

Finding similar neighborhoods

Want to show nieghborhoods similar to each other. Things to adress before that

  • What definition of similarity?
  • How can we quantify this?
  • How do we present this information that is easy to interpret?

Upload-data: Upload button

The button to open the upload input modal in now just a link. A button like the 'Apply' button with an upload icon might be better as discussed.

Temporal data & clustering analysis

The idea of observing clusters change and evolve over time is appealing. Rather than just a snapshot, ideally, a user can observe the patterns in data over time as well.

With some tweaks to the underlying data structure, we could ideally generate features on the fly through basic aggregations up to a certain point in time which is then clustered.

Errors & validations

Show validation error messages in the upload flow where applicable. Currently, server crashes if there's an error.

Control spatial homogeneity

The clustering right now uses an alpha of 0.15, but when using user uploaded data, this might not be optimal. Two possible solutions to this,

  1. Have an slider similar to number of clusters that allows users to control this parameter, ranging from 0.1 to 0.9 maybe to ensure they don't go to extremes in either direction.
  2. Another solution is to quickly calculate level of disjointedness in the clustering and incrementally increase alpha based on this till a predetermined maximum

Any other ideas or thoughts on this, @stuartlynn ?

Features from reduction in rent-controlled housing

Reduction in rent-controlled housing is a major issue in the city. Adding this as a feature along with existing housing features would be interesting to find neighborhoods becoming less affordable/accessible.

Clustering within a sub-region

Current version only allows for all of New York City to be clustered. It might be interesting for someone to just cluster one of the borough, for example.

Download results as PNG for heatmap

Currently, when you download results as a PNG, you get just the cluster map. We might want to provide both cluster and heat map as individual files or zipped together? Or respond with the file based on their selected view (might be easier and more logical thing to do)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.