GithubHelp home page GithubHelp logo

bdecon / econ_data Goto Github PK

View Code? Open in Web Editor NEW
65.0 3.0 35.0 198.62 MB

Python 3 examples of using economic data APIs and working with economic microdata. Includes bd CPS.

Jupyter Notebook 94.95% Python 5.05%
econ-data cps microdata python acs econ census-bureau economic-data

econ_data's People

Contributors

bdecon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

econ_data's Issues

bd CPS: Consistent March CPS - Basic HH Income variable

Build a time-consistent version of the March (ASEC) CPS. I'd like to have some basic geographic information at the household level along with info on the household structure. Eventually, the dataset should note income from various sources, but to start just total household income.

bd CPS: new variable RWKWAGE

Create new variable RWKWAGE equal to the real weekly wage, adjusted by the NSA CPI-U.

Again, make sure the variable doesn't generate an error in the period between the release of the CPS microdata and the BLS estimate of the CPI-U.

bd CPS grapher: stacked bar/area plot

Create a sample stacked area or stacked bar chart from bd CPS data. Data generation should take place in one bd CPS call.

An example might be the age decomposition over time of "new-entrants" among unemployed, 16-19, 20-24, 25+.

bd CPS - new variable for labor force category

Each person age 16 or older should be assigned to one labor force category. The categories should be set so that around 5-20% of the population is in a given group at a given point in time, and also so that groups cover all people.

The groups are:

  1. Employed FT - Government
  2. Employed FT - Private - Goods producing industry
  3. Employed FT - Private - Goods producing industry
  4. Employed FT - Self employed
  5. Employed PT or unpaid
  6. Unemployed
  7. Not in Labor Force (NILF) - Discouraged
  8. NILF - In school
  9. NILF - Retired
  10. NILF - Disabled or Ill
  11. NILF - Caring for others or home
  12. NILF - Other

bd CPS: Create folder README.md

Create a new read me file for the bd CPS. This is part of the documentation. Specify contact information and overview of what is involved. Give links to CPS microdata files and data dictionaries.

bd CPS organize files

Clean up the files related to the bd CPS project. Delete old sections of old files. Make sure the gitignore captures any large files. Upload everything to github.

bd CPS: new variable UNEMPDUR

Create new variable for unemployment duration, either with the number of weeks unemployed or based on categories like those used by BLS.

bd CPS grapher - allow for demographic adjustment

Allow bd CPS grapher to make demographic adjustments (and to adjust for changes to the education level) and to show the adjusted series alongside the adjustment. Perhaps not important that this works with multiple query strings, but important that it work with a decomposition, such the decomposition of those not in the labor force by reason.

Not sure about the best way to carry out the demographic adjustment, but the easiest one, in my experience, re-weights the sample weight for individuals based on what share are various ages, male or female, and in various educational groups, in a given base period.

bd CPS: separate notebook to retrieve CPI regional data from BLS API

Clean up and store the CPS retrieval code in its own notebook.

The bd CPS has variables that are in US dollars that have already been adjusted for changes in prices. I've decided to use the CPI-U regional series for the four regions of the US: Northeast, Midwest, South, and West. There is very little difference between overall results from using the regional CPI instead of the nationwide CPI, but at a more local level, there are some minor differences that should more accurately reflect the local price level.

bd CPS: new variable CERT

Capture the relatively new variable PECERT, with new binary variable CERT equal to 1 if person has a professional certification, 0 if they are employed without a certification, and np.nan if not employed.

bd CPS documentation

Generate some basic notes, and eventually a pdf, with information on what is contained in the bd CPS and how variable values are associated with CPS responses.

Extend bd CPS back to 1989

Basic monthly CPS files from 1989 to December 1993 are available for download and in some cases can be directly comparable with later year files, despite the major revision of the CPS in 1994. I'd like to work back to 1989 using the CPS microdata files hosted on the NBER CPS page.

bd CPS: new variable IND

edit: This issue now refers only to the task to create a time-consistent major industry recode variable IND.


The various industry and occupation codes in the CPS microdata change several times during the period from 1994 to present. Looking at others (IPUMS, CEPR, etc) have done this, I'd like to add one code that groups industries into no more than 9 categories, and one that groups occupations into no more than 9 categories.

The proposed variable names are IND and OCC. I may want to store the individual entries as categorical text, such as "Construction" and "Manufacturing", but am not sure yet.

bd CPS: new variable PTREASON

New categorical variable indicating the reason that someone is part-time. Would like to ID both economic reasons and things like needing to care for children or adult dependents.

Update CPS grapher to use new feather files

The new feather files have slightly different variable names and formats, so they require an update of programs that use them, and specifically of the CPS grapher. This is part of the bd_CPS series.

bd CPS parameterize the percentile in wage calculations

Currently the only wage calculation avaiable in the bd CPS grapher is the median. If the percentile used in the calculation were a parameter set outside the calculation, calculating the interquartile median would be easy.

bd CPS grapher: adjust graph parameters and settings

The graphs produced by the bd CPS grapher are uniform because of matplotlib parameters set in the header of the notebook and also because of settings in the function that turns the data into the graph. These settings all need to be cleaned up and fine-tuned, with some documentation added.

Some examples include: larger title, fonts can be nicer, source shouldn't be centered, add a twitter logo and bd_econ, etc.

bd CPS grapher: flows example

The longitudinal component of the CPS is underutilized but very useful. Create a sample line plot showing matched CPS observations that flow from one labor force category to another. For example, flows from "not in the labor force - disabled or ill" to "employed".

bd CPS: new variable RHRWAGE (real hourly wage)

Create a new variable, RHRWAGE, equal to the real hourly wage, as adjusted by the NSA CPI-U.

Make sure it can be missing for latest month, for cases when CPS microdata are available before the CPI-U estimate from BLS.

bd CPS grapher - horizontal bar chart

Create a working example of efficiently converting bd CPS raw data into a horizontal grouped bar chart. One example might be the union membership rate by industry comparing the twelve months ending September 2018 with the twelve months ending September 2016.

Allow multiple query strings in bd CPS grapher

Allow up to three series to be created during the data retrieval process. The three series can then be plotted together. At present, each series has to be created separately and then combined.

The solution for this one is not to bad if a new categorical variable can be created showing membership in each query group, and this variable can be split later using the same groupby that splits the annual file by month.

bd CPS grapher: sample line plot with two lines

Create efficient sample code to produce a line plot from two separate series (therefore with two lines). At this stage, it's ok (but not ideal) to duplicate the CPS call (reading of each annual feather file). Later, the multiple series line plot should be generated with one CPS call.

An example might be the employment to population ratios for Tennessee and Kentucky (combined) compared to Indiana and Ohio (combined).

bd CPS grapher separate calculation from smoothing

The bd CPS grapher's main function currently repeats the code for smoothing out the calculated results with a 12 month moving average. I would like to have the type of smoothing set by the user's input to the function. The 12 month moving average can be the default for now, but eventually, the better option is to apply whatever calculation to 12 months of actual CPS data at once, in a rolling window. This will be covered in a separate issue.

Calculate BLS-style binned CPS wages

Because wages have a tendency to clump around round numbers, like $12.00, estimates of how wages are moving over time will tend to jump from round number to round number, over time, rather than move smoothly.

BLS and Census use a technique to address this, and I want to replicate the technique. More info to follow.

bd CPS: new variable FORBORN

Create new binary variable FORBORN equal to 1 if born outside the US, and 0 if born in the US (np.nan if unknown).

bd CPS remove extra variables

Remove raw CPS variables that have already been translated in whole or part to bd CPS variables. Goal is to have around 40 variables max, and for each variable to be meaningful, consistent, and well-documented.

bd CPS grapher: recession bar start and end fade

The way a recession month enters and stays in the data is different in most cases in the bd CPS grapher because the series have a 12-month moving average applied.

One options for visually conveying this point is to fade in the recession bars in the background, the full shading would only apply to months where the data shown in the series come from 12 months that were entirely recession months. Continuing the example, if only 3 months (out of 12) have a recession, then the recession bar would be shaded 25% (3/12) of it's maximum alpha value.

This issue is a placeholder to see what it looks like.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.