datadesk / census-data-aggregator Goto Github PK
View Code? Open in Web Editor NEWCombine U.S. census data responsibly
License: MIT License
Combine U.S. census data responsibly
License: MIT License
Thanks to some clarification from our Census friends:
The jam value represents a result from a median calculation when the median can't actually be calculated because it lies in the lowest or highest bin. The jam value is not used in the median calculation itself as a lower or upper bound for the end bins.
This information doesn't impact the calculations of the examples we have now (we've treated the jam value as a bound), but we need to update the median function to handle the scenario where the lower and upper bins don't have concrete bounds (plus add examples of this scenario).
We may want to include an optional input jam_value
to use in the case that the median occurs in the highest/lowest bin.
Ensure that the output of census-data-downloader plays nice as an input to census-data-aggregator.
Accept a list of values and margins and, using the approximation methods in this library, returns the combined value with its estimated margin of error.
We may want an optional moe
input field for approximate_median
to handle the case when the n
values are estimates themselves (e.g. outputs of approximate_sum
). The approximate_median
would then need a simulation aspect to account for the n
values' uncertainty.
Testing the command with examples listed yields a different result. i'm guessing the denominator was supposed to be 630,498 per acs document linked?
For smaller values or with large margins of error, the numpy.random.normal
in approximate_mean
may return a negative number which won't make sense in context. We should probably just use max(0, simulated_value) instead.
From this paper (page 3):
"There is no CV level that is universally accepted as “too high,” but a comprehensive report on the ACS [5] describes a range of 0.10 to 0.12 as a “reasonable standard of precision for an estimate” (p. 64)"
CV = (MOE / 1.645) / estimate
Some folks at @argo-marketplace were working on a fork of census_area to aggregate census data to arbitrary geographies: datamade/census_area#6
If using the aggregator outside of the downloader, the aggregator needs to know what to do with annotated values.
For the calculation of SE(50 percent) there are different numerators used depending on the particular reference.
page 2 step A uses 99.
page 22 in Example 3 uses 95.
There are also different denominators used (B and 5B respectively).
We currently use 99 and B.
Just to make sure I understand this correctly, to calculate median household income for an aggregate geography using the ACS, as shown in this example, would I use data from a table like ACS table B19001 to get the n (household counts), and min/max incomes for the ranges?
It looks like the wording of the top range of that table is "$200,000 or more". Should I just set an artificial upper bound for that? It looks like in the example and the linked PDF, they use $250,001.
Off the top of my head, this seems like it would be correct for many (most?) cases, but incorrect for very high income areas?
To estimate the margin of error when summing values.
From this paper:
"one can induce geographic patterns in the aggregate data that do not
exist in the input data"
Create a diagnostic to check for this (equations 2 and 3 in paper):
"The statistic S_j measures whether the region-level estimates for a given variable are within the margins of error of their constituent tracts. If a region-level estimate is within the margin of error of all its constituent tracts, then there is no information lost through aggregation; information loss increases as the 90 percent confidence intervals of more and more tract-level estimates do not overlap with the region’s estimate."
Functions for breaking geographic units into different geographic units and recalculating quantities of interest [with and without margin of error].
To calculate more exact margin of errors for aggregated values where possible.
There is an example in the ACS handbook.
If another method is needed in those cases, develop it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.