GithubHelp home page GithubHelp logo

datadesk / census-data-aggregator Goto Github PK

View Code? Open in Web Editor NEW
42.0 4.0 9.0 264 KB

Combine U.S. census data responsibly

License: MIT License

Python 100.00%
python census statistics math news journalism data-journalism margin-of-error mapping-la-pipeline

census-data-aggregator's People

Contributors

dependabot[bot] avatar irisslee avatar nkrishnaswami avatar palewire avatar sastoudt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

census-data-aggregator's Issues

Correct handling of jam values in median approximation

Thanks to some clarification from our Census friends:

The jam value represents a result from a median calculation when the median can't actually be calculated because it lies in the lowest or highest bin. The jam value is not used in the median calculation itself as a lower or upper bound for the end bins.

This information doesn't impact the calculations of the examples we have now (we've treated the jam value as a bound), but we need to update the median function to handle the scenario where the lower and upper bins don't have concrete bounds (plus add examples of this scenario).

We may want to include an optional input jam_value to use in the case that the median occurs in the highest/lowest bin.

An "aggregation" tool

Accept a list of values and margins and, using the approximation methods in this library, returns the combined value with its estimated margin of error.

optional MOE input for approximate_median

We may want an optional moe input field for approximate_median to handle the case when the n values are estimates themselves (e.g. outputs of approximate_sum). The approximate_median would then need a simulation aspect to account for the n values' uncertainty.

negative values from numpy.random.normal

For smaller values or with large margins of error, the numpy.random.normal in approximate_mean may return a negative number which won't make sense in context. We should probably just use max(0, simulated_value) instead.

deal with annotations

If using the aggregator outside of the downloader, the aggregator needs to know what to do with annotated values.

Source data for approximating median household income

Just to make sure I understand this correctly, to calculate median household income for an aggregate geography using the ACS, as shown in this example, would I use data from a table like ACS table B19001 to get the n (household counts), and min/max incomes for the ranges?

It looks like the wording of the top range of that table is "$200,000 or more". Should I just set an artificial upper bound for that? It looks like in the example and the linked PDF, they use $250,001.

Off the top of my head, this seems like it would be correct for many (most?) cases, but incorrect for very high income areas?

provide check that spatial aggregation doesn't induce spurious patterns

From this paper:

"one can induce geographic patterns in the aggregate data that do not
exist in the input data"

Create a diagnostic to check for this (equations 2 and 3 in paper):

"The statistic S_j measures whether the region-level estimates for a given variable are within the margins of error of their constituent tracts. If a region-level estimate is within the margin of error of all its constituent tracts, then there is no information lost through aggregation; information loss increases as the 90 percent confidence intervals of more and more tract-level estimates do not overlap with the region’s estimate."

disaggregation functions

Functions for breaking geographic units into different geographic units and recalculating quantities of interest [with and without margin of error].

  • sums
  • medians
  • means

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.