dcactionforchildren / dcaction Goto Github PK

Free and open-source mapping tools and data workflow for visualizing neighborhood data

Home Page: http://datatools.dcactionforchildren.org/

License: MIT License

CSS 4.59% Ruby 4.52% HTML 16.64% JavaScript 64.78% Python 9.47%

dcaction's Introduction

DC Action for Children Data Tools 2.0

DC Action for Children, the Kids Count grantee for the District of Columbia, in partnership with DataKind and dozens of volunteers, created this map of measures of child well-being. This tool allows residents, program providers, and others to easily access and view this data. Users select from a list of available data layers to display the data for a given "neighborhood cluster," a geographic type used in DC, as well as the neighborhood's demographic composition.

The tool currently uses DC's geographies and data which can be repurposed for other Kids Count grantees, or other projects and purposes. The code that powers the tool is free and open source, meaning others can copy, make changes, and redeploy it. DC Action for Children worked with DataKind DC to develop the map. Across the US and around the world, there are communities of volunteers who can support efforts to create a "fork" of this tool specific to your city, state, or region. Read about examples from around the globe.

How it works
Setting up your own version
- What does it take to set up my own version of something similar?
- What features can I reuse for my version?
Pulling together the data

How it works

The Data Tools are an online map of neighborhood data that is powered by two easy-to-edit documents:

A map file that describes the shape of the neighborhoods called a GeoJSON
A spreadsheet file with one row for each neighborhood unit in the GeoJSON file and one column for each neighborhood dataset measure you want to display as an area Layer in the map. There are also spreadsheet files for points (like schools).

For the DC Action Data Tools deployment, the map file has the 39 DC neighborhood clusters and the spreadsheet file has 65 columns with half of them from the US Census Bureau and half of them from other local sources. The data from these primary sources is "crosswalked", or recalculated, from original geographic areas (some are zipcodes, some are census tracts, some are points) to match the neighborhood cluster areas chosen for map layers in this deployment. There are also a few points files for schools, hospitals, and libraries that display as individual points on the map, instead of as shaded geographic areas. There are also two configuration files called fields and sources that can be updated as described in the Data section below.

Setting up your own version

This section talks about how to set up your own version.

The first part steps you through that process of deploying your own. The second part talks about the key features that may be useful to you depending on where you are in the world.

What does it take to set up my own version of something similar?

What category best describes you?

A. I am web wizard. HTML, Javascript, and CSS crumble before me.
B. I am a neighborhood data champion! I know how to work with Excel.
C. I am a community mover and shaker! I spend most of my time connecting with people in the community and finding ways to help.

If you chose A or B, here's how you get started:

Sign-up for GitHub
Run the application on your local machine
- Get access to the dcaction repo (consider forking, to make it your own).
- Clone the repository to your local machine (e.g., git clone [email protected]:DCActionforChildren/dcaction.git).
- Start a local server so that you can view the site in a web browser.
  - On Mac or Linux, you can open a terminal and run a Python simple server by entering python -m SimpleHTTPServer.
  - On Windows, you'll need an additional tool. One possibility is to install Cygwin and then run the above command.
- Open a browser. If you use the options suggested above, the site will appear at http://localhost:8000
Update the GeoJSON file with the neighborhood geography you want to use
Update the data you want to use (see: "Updating Layer Data" in the next section on "Data")
You can edit index.html to customize the title, description, and more.
Reposition the map to your area of interest
Review the new application on your local machine once again
If you like what you see, push to the web!

If you choose C, here's how you get started:

Get in contact with your local civic hacking community (Code for America Brigades are great).
If no one is local, reach out to the wider online community. Ask around until you find some folks who can help you through the steps above.
You can do this!

What features can I reuse for my version?

Where are you in the world?

If you are anywhere in the world, you can:

Deploy the core D3 map and visualization engine powered by GeoJSON and CSV files
Link directly to a layer
- DC Action tweets a "Monday Map" every week. In order to enable them to create a link directly to a data layer, we check the URL for a hash containing a layer ID on page load. As a * result, you can link directly to a layer by using a URL in the format: http://datatools.dcactionforchildren.org/#population_under_18_val

If you are in the US, you can also use:

The code that extracts and transforms key census data indicators

If you are also in DC, you can also use:

Any of the existing data from the current tool.

Pulling together the data

This section talks about the data that powers the visualization.

In the below, you can learn more about:

The current DC Data Tool data sources and methodology
The process for retrieving the census data
The process for updating layer data
The process for updating point data
How crosswalking from one geography to another works
How to use the crosswalking scripts

Data sources and methodology

The DC Action Data Tool has posted the data sources on its website: https://www.dcactionforchildren.org/dc-kids-count-data-tools-methodology

The status of each of these sources is tracked in this Google Spreadsheet: https://docs.google.com/spreadsheets/d/1uF2nm5CS4tgrx9owv59VBaLnPGYVfXBkRhxT5auQ-3k

Retrieving Census data

There are two functionally identical scripts to retrieve Census data, one written in Ruby and the other in Python. Check out the documentation for the Ruby script.

Updating layer data

In order to make the tool easier to maintain, all data is stored and updated in spreadsheets. The below instructions pertain to the DC Action data, which can be found in a Google Spreadsheet, but could also be powered by an Excel spreadsheet, or any other tool that can output CSVs in the existing format.

We're going to be updating the sheets in this Google spreadsheet. Each of the sheets corresponds to a CSV that is stored in the data/ directory. The neighborhoods sheet contains all of the layer data. This is where we'll be making changes. The fields sheet lists all of the available layers; this populates the navigation. The sources sheet shows the source of each layer; this appears in the bottom left of the map when a layer is selected.
We will add our new data into a scratch sheet. We're expecting that it will have one row per neighborhood cluster, and that it'll have a label that matches the format of columns A, B, or D in the neighborhoods sheet.
If we need to calculate a per capita rate, we can use a VLOOKUP to grab the population from the neighborhoods sheet and calculate a new per capita row.
Finally, we will merge this back into the neighborhoods sheet by means of a VLOOKUP, overwriting the existing columns (if there are any). Before we're done, we'll want to ensure that we have copied and pasted values, so that there are no longer formulas connected to our scratch sheet.
Ensure that fields and source sheets include the same column name and have up-to-date information (i.e. the source contains the correct date).
Download the three sheets as CSVs, and replace the ones in the data folder of the GitHub repository.

Updating points data

In addition to the data described in the Updating Layer Data section above, which colors neighborhoods according to their value, we also have the ability to add "points" to the map. These are points of interest like schools, libraries, and hospitals.

Each points layer is stored in a separate CSV file, named with the ID of the layer. 'dcps.csv' in the 'data/' folder is a great example.
All layers must have the columns 'name,' 'lat,' and 'long.' Schools also need the additional fields shown in 'dcps.csv.'
The CSVs should be saved in the data directory, and we need to make sure any changes are reflected in the fields and source sheets (see above).

Crosswalking geographies

A crosswalk is a means of translating data that is aggregated at one geographic level, such as census tract, to another, such as a neighborhood cluster. We do this by using a crosswalk table, which shows the relationships (overlap) between the two. In the case of the above example, this table would contain columns with 1) the census tract ID, 2) the neighborhood cluster ID, and 3) the proportion of the census tract that is contained in the neighborhood.

Crosswalks are commonly used to express relationships between different levels of geography. We can simply and reliably crosswalk county level data to the state level, for example, because counties cluster within states and we know the population characteristics at each level. We have to make more assumptions when cross-walking data from tract to neighborhood because neighborhood boundaries do not follow census boundary lines and we do not have benchmark estimates of population characteristics at the neighborhood level.

The first assumption we make is that the population is uniformly distributed across the tract. For most tracts, this assumption is not relevant because the entire tract is contained within one neighborhood cluster. It is only important for tracts that cross neighborhood boundaries. In order to allocate the population across those boundaries, we use the proportion of the tract's land area that overlaps each neighborhood as the apportioning factor. The tract level population is apportioned into the two neighborhoods according to the proportion of its land area that is covered by each neighborhood.

The next assumption we make is that tracts are socioeconomically integrated. When we apportion tract level data on characteristics such as poverty, we apply the same land area apportion factor used for the total population. This implies that tracts will not contain concentrated areas of poverty, for example.

The wiki contains more information on the crosswalking DC Action applied to the data.

Using the crosswalk script

We need Python and pandas and numpy.

Mac: Install Homebrew, then from the terminal, run brew install python and then pip install pandas and pip install numpy
Windows: Install Cygwin, then run pip install pandas and pip install numpyfrom the terminal.
Linux: You know what to do.

In the terminal, change to the data/scripts/ directory, and run python crosswalk.py.
An app will open with options for the data file that needs to be crosswalked, the crosswalk file, the geographies to convert to and from, the column(s) that contain the weighted overlap, and the name of the file to which to output the result.

dcaction's People

Contributors

Stargazers

Watchers

dcaction's Issues

Create infocard interaction for point-based visualization

While the 1.0 e-Databook populated the sidebar with information for neighborhoods as the user rolled over, we'd like the sidebar information for point based data to allow multiple schools/child care centers to be compared.

When the user rolls over the points, it will populate the sidebar with information about that school or child care center. However, if they click, they will collect info cards from multiple points, however many they click, so they may compare multiple schools.

This interaction is partly working on the D3 skeleton now, although not all data points are being called into the infocards and the design is not especially elegant.

Convert the navigation to a layer select UI

Instead of dropdown menus, we'd like the navigation UI to function in a way that allows users to select multiple visualization layers to mix and match data. The visualization would allow users to plot a choropleth layer and a point-based layer (either schools or child care centers). Right now the nav is simple links but ideally this navigation would function in a way similar to radio buttons.

Include source information for each layer

We need to add descriptions of our sourcing for each layer when a user clicks each layer. Each layer's sourcing is described in our variable validation spreadsheet:

https://www.dropbox.com/s/z32f54jsbuqlfxq/Indicator%20Tracking%20Updated%202014-01-08.xlsx

Where possible we’d like to link back to these sources (to the Census, for example).

Fix duplicate school display.

Clicking on a school always adds a new info box to the page.

Not sure if the best behavior is to only ever have one displayed, or only ever have one for a single school displayed? Maybe limit it to X amount at a time?

Draft narrative analysis based on new data

Validate new data

Display neighborhood data on click

Add selected metric to sidebar that dynamically changes depending on which layer is selected

The primary metric in the sidebar should be the visualized measure on the current layer selected. As the user switches layers, this measure sure change to reflect the chosen variable.
As demonstrated here:

Fix number display for data points in sidebar

Right now, selecting neighborhoods will display data from our CSV but it is not adding commas to figures in the thousands or correcting percentages to display as percentages and not decimals. For these relevant data points, we'd like to transform the display of these numerals.

Integrate neighborhood layer switching into main JS file

Right now there are two JS files with code we've worked on. One for the point interactions Aaron was working on and another with work me and Emily did on Oct. 27. Shouldn't be too hard to bring these together, but in case I stepped on any toes I kept them separate. Just need someone to bring them together once things are working right.

Bring hovered neighborhood to front

Right now, the active state of the hover shows a black border, but it gets cut off if the path is below other paths. Ideally bring that path up to the top with z-index or d3 equivalent.

Create the #bubble visualization toggle detailed in the bubble interface storyboard.

Similar to the #graph interface issue, pulling this off centers around finding a competent way to convert the lat/long geographic plotting transformation into a collection of bubbles sorted by relative size. We'd like a visualization similar to other popular D3 graphics, such as the NYTimes 2013 Obama budget graphic (http://www.nytimes.com/interactive/2012/02/13/us/politics/2013-budget-proposal-graphic.html). See the attached image from the storyboard for an idea of what we'd like this to look like:

Verify Census Data Pull

We are pulling tract level data from the Census API and crosswalking it from tracts to clusters. The data pulling script or crosswalking program could be wrong.

To verify the data is correct, we should add all of the clusters back together and compare the results from a not crosswalked dataset.

I can perform this verification, I just need the clusters summed up and put into a dataset.

Update fetch_acs.rb script to download 50 variables at a time.

The Census API only allows us to download 50 variables at a time. The script needs to be updated to break the download into chunks to get around this limit.

Populate data to school-point layers

For the schools, the points we’d want to display on click are grades served as of 12-13, total enrollment, DC CAS reading and math proficiency rates, and graduation rates if applicable.

Blank out health data for n<30

For the health data sets (those coming from medicaid) where neighborhood estimates are less than 30, those need to display as blank - this was a data request from the agency supplying the information.

Fix avg graduation rate display

I think it's rounding this value when it should not be. I'll have to take a look at it.

Creating an interaction with the neighborhoods nav menu that toggles and transitions between different columns in the data/neighborhoods.csv file.

Verify formula fo "No HS Degree (18-24)" in the dcation_datacolumns spreadsheet

The formula is a number added to a ratio: a + (b + c)/(x + y).
I want to confirm this is correct.

Visualize race breakdown for each neighborhood on click

Show population (black, hisp, white, other). Check with Nick, but I think we also need to visualize the breakdown for population under 18.

@ajschumacher Had a great suggestion on how we visualize this, I suggest we proceed with his idea. Here is his blog post on the subject:

http://planspace.org/2013/10/06/redesigning-a-double-pie-chart/

Update ACS data to 2012, check data validity

Clear menu when neighborhood is rolled upon or selected

Right now, when a layer is selected and the user begins to explore the neighborhood information, it's not possible to see the sidebar information that displays data values.

On rollover or click, it would be nice if the menu of layers would slide back left to leave an unobstructed view of the whole graphic.

0 vs. 0% vs. n/a on map

Make sure when a value is blank, the map shows the neighborhood in gray, not as the lowest value of green. Also, in non-% variables, make sure the value displays as 0, not 0%.

Make d3 map responsive

Map resizes on browser resize.

Create legends

We need to forward the breakpoints expressed in the visualization.js file to this legend box. Ideally, where breakpoints are percentages, we need to have the legend box expressing them as such. Below is an example of how the code lists these breakpoints:

'math' : {
'domain' : [0.14, 0.355, 0.54, 0.75],
'range' : ["#e5ffc7", "#d9fcb9", "#bbef8e", "#9ad363", "#6eb43f"]
},

The legend in this instance would read like:

0-14%
14%-35.5%
35.5%-54%
54%-75%
75% and up

Zoom out to full view of DC, or just add zoom buttons

Currently, if you click outside of the DC boundaries in a zoomed-in view, it zooms back out. It would make more sense to have a button or "zoom out to DC" icon that is a more explicit indication of how to view all of DC.

Get rid of neighborhood names on text of map

Get rid of text on map layer.

Update DC Action logo/branding

See correct DC Kids Count logo with no periods, above.

Add zoom in and zoom out buttons

Clicking on a neighborhood will zoom to the neighborhood but it's been pointed out that this is not intuitive.

Integrate school and neighborhood info cards with visualization layers

Right now, we only have basic information for school points popping up in the sidebar. It's possible to display multiple information from schools, but this information has not been styled and all requested data does not display.

We'd like for the sidebar to populate with data from the neighborhood shapes and school/child care center points. The data reflected should appear similar to the school and neighborhood storyboards.

Correctly aggregate margins of error

Fix projection of DC

Create the hooks necessary for query UI to highlight points based on text input, neighborhood, ward, or grade

The intended function of the visualization-wide query tool is detailed in the bubble storyboard:

We need to make sure that we're exposing the right variables in the D3 graphic that will allow inputs on this query widget to highlight the corresponding points.

Ideally, matched points would be highlighted as the user narrows their query. Unmatched points would still appear on the map, although they would be grayed out or translucent.

Hide #schools box until a point is selected

I added a new UI element to receive school data instead of the details box. It would be great if this box remained hidden until a user selects one of the schools on the schools layer. I referenced this issue in #32.

Clean up choropleth layer swapping code

dc.html provides for basic layer switching for DC choropleth maps, but the code needs to be simplified. There needs to be an array that defines each layer and threshold. The map needs to toggle between these maps based on a key term to identify each layer.

Add charter school data as a point layer

We would like to add a layer (similar to the public school layer, it would need to behave as is described in issue #39)

I will provide a CSV file for this in the data folder, it will be named charter.csv and will be structured the same way as the public school point layer.

Add Leaflet background to D3 visualization

Right now we have our polygons floating on white with not roads or landmarks to provide guidance for users of what they are clicking on. Adding a tile layer backdrop for our translucent polygons to float over will provide more context for each neighborhood.

Peter suggests doing something like this:

http://bost.ocks.org/mike/leaflet/

This would give nice mapping in the background, which could be nice.

We could use Open Street Map as a backdrop.

Creating choropleth maps that reflect data in the data/neighborhoods.csv file.

The choropleth maps should reflect the existing 1.0 data visualizations in the 2012 e-Databook (http://www.dcactionforchildren.org/kids-count/dc-kids-count-data-tools)

Display neighborhood name and data on hover

Right now data and neighborhood names will only appear on click. We need the neighborhood names and data to appear on hover as well. The difference between hover and click should be that, when clicked, the hover interaction will cease and the graphic will zoom to the selected neighborhood. Clicking again will zoom out and resume the hover interaction.

Fix display of filters in flyout

Right now, it runs longer than the browser height. Ideally, the flyout would have an overflow-y scroll and have a height sized to the browser.

Also open to using a dropdown if interested, but this seems more UX friendly across devices, IMO.

Clear schools

Give some option to clear selected school data entirely.

Making underpopulated neighborhood clusters appear blank on the map

Observatory circle, walter reed and rock creek park neighborhood clusters have fewer than 25 residents each - these should appear as blank outlines on the map to avoid confusion. Bolling AFB is a margin call, bit has a few hundred residents, so it can stay.

Check Child Poverty

Child poverty numbers being pulled from the census look way off - need to retrace steps.

Fix projection of DC

Allow charter school and public school data to appear together on the same map (using the packMetros method)

We'd like the logic of the point layers to work a bit differently than the choropleth layers. Whereas choropleth layers cannot be placed on the map together (only one at a time), we'd like the public and charter school points to appear on the map together, if the user chooses.

Right now we have a function applied to the public schools that allows each circle to clear it's immediate area and "push" schools out of the way so it is easy to distinguish between points in an area where schools are clustered.

Obviously, adding charter schools to the map will greatly clutter it, so we'd like the same function applied when this layer is added. Ideally, both of these layers will push eachother out of the way so you don't have any public school or charter school points on top of one another.

We adapted the packMetros method from one of the D3 examples listed on Mike Bostock's site: http://trends.truliablog.com/vis/metro-movers/

Create the #graph visualization toggle detailed in the graph interface storyboard.

The key to pulling off this D3 interaction centers around finding a competent way to convert the lat/long geographic plotting transformation into a single-axis graph scale for plotting points. See the attached image from the storyboard for an idea of how we'd like this to appear.