bdecon / econ_data Goto Github PK

View Code? Open in Web Editor NEW

65.0 3.0 35.0 198.62 MB

Python 3 examples of using economic data APIs and working with economic microdata. Includes bd CPS.

Jupyter Notebook 94.95% Python 5.05%

econ-data cps microdata python acs econ census-bureau economic-data

econ_data's People

Contributors

Stargazers

Watchers

econ_data's Issues

bd CPS: Consistent March CPS - Basic HH Income variable

Build a time-consistent version of the March (ASEC) CPS. I'd like to have some basic geographic information at the household level along with info on the household structure. Eventually, the dataset should note income from various sources, but to start just total household income.

APIs - BEA API - better example (GDP components?)

The BEA API example is broken and also confusing. Update it with a new example if necessary.

bd CPS: new variable RWKWAGE

Create new variable RWKWAGE equal to the real weekly wage, adjusted by the NSA CPI-U.

Again, make sure the variable doesn't generate an error in the period between the release of the CPS microdata and the BLS estimate of the CPI-U.

bd CPS grapher: stacked bar/area plot

Create a sample stacked area or stacked bar chart from bd CPS data. Data generation should take place in one bd CPS call.

An example might be the age decomposition over time of "new-entrants" among unemployed, 16-19, 20-24, 25+.

bd CPS - new variable for labor force category

Each person age 16 or older should be assigned to one labor force category. The categories should be set so that around 5-20% of the population is in a given group at a given point in time, and also so that groups cover all people.

The groups are:

Employed FT - Government
Employed FT - Private - Goods producing industry
Employed FT - Private - Goods producing industry
Employed FT - Self employed
Employed PT or unpaid
Unemployed
Not in Labor Force (NILF) - Discouraged
NILF - In school
NILF - Retired
NILF - Disabled or Ill
NILF - Caring for others or home
NILF - Other

bd CPS: Create folder README.md

Create a new read me file for the bd CPS. This is part of the documentation. Specify contact information and overview of what is involved. Give links to CPS microdata files and data dictionaries.

bd CPS organize files

Clean up the files related to the bd CPS project. Delete old sections of old files. Make sure the gitignore captures any large files. Upload everything to github.

bd CPS: new variable UNEMPDUR

Create new variable for unemployment duration, either with the number of weeks unemployed or based on categories like those used by BLS.

Latest value markers/annotation in bd CPS grapher: automatically relocate if crowded

If there is not enough space between the latest values of bd CPS grapher series, then move the annotations until they do not overlap.

bd CPS: new variable VETERAN

New binary variable VETERAN equal to 1 if veteran and otherwise 0.

APIs - Comtrade API example is broken

Fix Comtrade API example.

CPS grapher allow for aggregate microdata in moving average

Rather than calculate the monthly CPS summary statistic and then average the individual monthly values, the summary statistic should be calculated from the aggregate 12-months of microdata.

Replace bd CPS state ID with two letter value

Switch the two digit state ID, GESTFIPS, with the common two letter state id.

bd CPS grapher - allow for demographic adjustment

Allow bd CPS grapher to make demographic adjustments (and to adjust for changes to the education level) and to show the adjusted series alongside the adjustment. Perhaps not important that this works with multiple query strings, but important that it work with a decomposition, such the decomposition of those not in the labor force by reason.

Not sure about the best way to carry out the demographic adjustment, but the easiest one, in my experience, re-weights the sample weight for individuals based on what share are various ages, male or female, and in various educational groups, in a given base period.

bd CPS: new variable SCHOOLENR

New variable SCHOOLENR to capture enrollment in school (college or HS and ft or pt).

bd CPS: separate notebook to retrieve CPI regional data from BLS API

Clean up and store the CPS retrieval code in its own notebook.

The bd CPS has variables that are in US dollars that have already been adjusted for changes in prices. I've decided to use the CPI-U regional series for the four regions of the US: Northeast, Midwest, South, and West. There is very little difference between overall results from using the regional CPI instead of the nationwide CPI, but at a more local level, there are some minor differences that should more accurately reflect the local price level.

bd CPS: new variable CERT

Capture the relatively new variable PECERT, with new binary variable CERT equal to 1 if person has a professional certification, 0 if they are employed without a certification, and np.nan if not employed.

bd CPS documentation

Generate some basic notes, and eventually a pdf, with information on what is contained in the bd CPS and how variable values are associated with CPS responses.

Extend bd CPS back to 1989

Basic monthly CPS files from 1989 to December 1993 are available for download and in some cases can be directly comparable with later year files, despite the major revision of the CPS in 1994. I'd like to work back to 1989 using the CPS microdata files hosted on the NBER CPS page.

bd CPS: new variable IND

edit: This issue now refers only to the task to create a time-consistent major industry recode variable IND.

The various industry and occupation codes in the CPS microdata change several times during the period from 1994 to present. Looking at others (IPUMS, CEPR, etc) have done this, I'd like to add one code that groups industries into no more than 9 categories, and one that groups occupations into no more than 9 categories.

The proposed variable names are IND and OCC. I may want to store the individual entries as categorical text, such as "Construction" and "Manufacturing", but am not sure yet.

bd CPS: new variable PTREASON

New categorical variable indicating the reason that someone is part-time. Would like to ID both economic reasons and things like needing to care for children or adult dependents.

bd CPS replace CBSA with NaN if not available

In many cases, there is no CBSA code for an individual. The current value for not available is -31676, and it should be NaN or None.

Update CPS grapher to use new feather files

The new feather files have slightly different variable names and formats, so they require an update of programs that use them, and specifically of the CPS grapher. This is part of the bd_CPS series.

bd CPS parameterize the percentile in wage calculations

Currently the only wage calculation avaiable in the bd CPS grapher is the median. If the percentile used in the calculation were a parameter set outside the calculation, calculating the interquartile median would be easy.

bd CPS: bug - median hourly real wage giving odd results

The median hourly real wage calculation is returning very noisy results. It seemed to work previously.

bd CPS grapher: adjust graph parameters and settings

The graphs produced by the bd CPS grapher are uniform because of matplotlib parameters set in the header of the notebook and also because of settings in the function that turns the data into the graph. These settings all need to be cleaned up and fine-tuned, with some documentation added.

Some examples include: larger title, fonts can be nicer, source shouldn't be centered, add a twitter logo and bd_econ, etc.

bd CPS grapher: flows example

The longitudinal component of the CPS is underutilized but very useful. Create a sample line plot showing matched CPS observations that flow from one labor force category to another. For example, flows from "not in the labor force - disabled or ill" to "employed".

bd CPS: new variable RHRWAGE (real hourly wage)

Create a new variable, RHRWAGE, equal to the real hourly wage, as adjusted by the NSA CPI-U.

Make sure it can be missing for latest month, for cases when CPS microdata are available before the CPI-U estimate from BLS.

bd CPS: new variable OTC

New variable to capture overtime, tips, and commissions.

bd CPS grapher - horizontal bar chart

Create a working example of efficiently converting bd CPS raw data into a horizontal grouped bar chart. One example might be the union membership rate by industry comparing the twelve months ending September 2018 with the twelve months ending September 2016.

bd CPS: new variable AREATYPE

New categorical variable AREATYPE equal to either Urban, Suburban, or Rural.

bd CPS: new variable OCC

Create a time-consistent major occupation recode variable OCC.

More details to follow.

bd CPS: new variable EMPLOYED

Create new binary variable equal to 1 if employed and otherwise 0.

bd CPS: new variable MARRIED

Create a new binary variable, married, equal to 1 if married and otherwise 0.

bd CPS new variable - female

Replace the PESEX variable with a binary variable called FEMALE and equal to 1 if female and otherwise 0.

bd CPS: new variable MJH (multiple jobholder)

Create a new binary variable MJH equal to 1 if the person has more than one job, and otherwise equal to zero if they have one job, and np.nan if they have no job.

Allow multiple query strings in bd CPS grapher

Allow up to three series to be created during the data retrieval process. The three series can then be plotted together. At present, each series has to be created separately and then combined.

The solution for this one is not to bad if a new categorical variable can be created showing membership in each query group, and this variable can be split later using the same groupby that splits the annual file by month.

bd CPS rename PRTAGE to AGE

The variable name for the person's age variable, PRTAGE, can be shortened to just AGE.

APIs - OECD API example is broken

Clean up the OECD API example, it is currently broken.

bd CPS grapher: sample line plot with two lines

Create efficient sample code to produce a line plot from two separate series (therefore with two lines). At this stage, it's ok (but not ideal) to duplicate the CPS call (reading of each annual feather file). Later, the multiple series line plot should be generated with one CPS call.

An example might be the employment to population ratios for Tennessee and Kentucky (combined) compared to Indiana and Ohio (combined).

bd CPS reader: method chaining on monthly CPS dataframes

Refactor some sections of code in the bd CPS reader so that the monthly dataframes utilize method chaining for clarity of code and ease of adding new variables.

bd CPS: new variable KIDSNO

New variable KIDSNO or CHILDNO equal to the number of own children under age 18.

bd CPS grapher separate calculation from smoothing

The bd CPS grapher's main function currently repeats the code for smoothing out the calculated results with a 12 month moving average. I would like to have the type of smoothing set by the user's input to the function. The 12 month moving average can be the default for now, but eventually, the better option is to apply whatever calculation to 12 months of actual CPS data at once, in a rolling window. This will be covered in a separate issue.

Calculate BLS-style binned CPS wages

Because wages have a tendency to clump around round numbers, like $12.00, estimates of how wages are moving over time will tend to jump from round number to round number, over time, rather than move smoothly.

BLS and Census use a technique to address this, and I want to replicate the technique. More info to follow.

bd CPS: new variable FORBORN

Create new binary variable FORBORN equal to 1 if born outside the US, and 0 if born in the US (np.nan if unknown).

bd CPS: new variable WBHAOM

New categorical variable WBHAOM (White, Black, Hispanic, Asian, Other, Mixed) for race groups.

bd CPS remove extra variables

Remove raw CPS variables that have already been translated in whole or part to bd CPS variables. Goal is to have around 40 variables max, and for each variable to be meaningful, consistent, and well-documented.

bd CPS: new variable UNEMPTYPE

Create new variable with category of unemployment: Job leavers, Job losers, Re-entrants, and New entrants.

bd CPS grapher - interquartile median line/area plot

Create a special category of graph that shows the median wage on top of the shaded values for the interquartile median (25th to 75th percentile).

bd CPS grapher: recession bar start and end fade

The way a recession month enters and stays in the data is different in most cases in the bd CPS grapher because the series have a 12-month moving average applied.

One options for visually conveying this point is to fade in the recession bars in the background, the full shading would only apply to months where the data shown in the series come from 12 months that were entirely recession months. Continuing the example, if only 3 months (out of 12) have a recession, then the recession bar would be shaded 25% (3/12) of it's maximum alpha value.

This issue is a placeholder to see what it looks like.

bdecon / econ_data Goto Github PK

econ_data's People

Contributors

Stargazers

Watchers

Forkers

econ_data's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs