bdecon / econ_data Goto Github PK
View Code? Open in Web Editor NEWPython 3 examples of using economic data APIs and working with economic microdata. Includes bd CPS.
Python 3 examples of using economic data APIs and working with economic microdata. Includes bd CPS.
Build a time-consistent version of the March (ASEC) CPS. I'd like to have some basic geographic information at the household level along with info on the household structure. Eventually, the dataset should note income from various sources, but to start just total household income.
The BEA API example is broken and also confusing. Update it with a new example if necessary.
Create new variable RWKWAGE equal to the real weekly wage, adjusted by the NSA CPI-U.
Again, make sure the variable doesn't generate an error in the period between the release of the CPS microdata and the BLS estimate of the CPI-U.
Create a sample stacked area or stacked bar chart from bd CPS data. Data generation should take place in one bd CPS call.
An example might be the age decomposition over time of "new-entrants" among unemployed, 16-19, 20-24, 25+.
Each person age 16 or older should be assigned to one labor force category. The categories should be set so that around 5-20% of the population is in a given group at a given point in time, and also so that groups cover all people.
The groups are:
Create a new read me file for the bd CPS. This is part of the documentation. Specify contact information and overview of what is involved. Give links to CPS microdata files and data dictionaries.
Clean up the files related to the bd CPS project. Delete old sections of old files. Make sure the gitignore captures any large files. Upload everything to github.
Create new variable for unemployment duration, either with the number of weeks unemployed or based on categories like those used by BLS.
If there is not enough space between the latest values of bd CPS grapher series, then move the annotations until they do not overlap.
New binary variable VETERAN equal to 1 if veteran and otherwise 0.
Fix Comtrade API example.
Rather than calculate the monthly CPS summary statistic and then average the individual monthly values, the summary statistic should be calculated from the aggregate 12-months of microdata.
Switch the two digit state ID, GESTFIPS, with the common two letter state id.
Allow bd CPS grapher to make demographic adjustments (and to adjust for changes to the education level) and to show the adjusted series alongside the adjustment. Perhaps not important that this works with multiple query strings, but important that it work with a decomposition, such the decomposition of those not in the labor force by reason.
Not sure about the best way to carry out the demographic adjustment, but the easiest one, in my experience, re-weights the sample weight for individuals based on what share are various ages, male or female, and in various educational groups, in a given base period.
New variable SCHOOLENR to capture enrollment in school (college or HS and ft or pt).
Clean up and store the CPS retrieval code in its own notebook.
The bd CPS has variables that are in US dollars that have already been adjusted for changes in prices. I've decided to use the CPI-U regional series for the four regions of the US: Northeast, Midwest, South, and West. There is very little difference between overall results from using the regional CPI instead of the nationwide CPI, but at a more local level, there are some minor differences that should more accurately reflect the local price level.
Capture the relatively new variable PECERT, with new binary variable CERT equal to 1 if person has a professional certification, 0 if they are employed without a certification, and np.nan if not employed.
Generate some basic notes, and eventually a pdf, with information on what is contained in the bd CPS and how variable values are associated with CPS responses.
Basic monthly CPS files from 1989 to December 1993 are available for download and in some cases can be directly comparable with later year files, despite the major revision of the CPS in 1994. I'd like to work back to 1989 using the CPS microdata files hosted on the NBER CPS page.
edit: This issue now refers only to the task to create a time-consistent major industry recode variable IND.
The various industry and occupation codes in the CPS microdata change several times during the period from 1994 to present. Looking at others (IPUMS, CEPR, etc) have done this, I'd like to add one code that groups industries into no more than 9 categories, and one that groups occupations into no more than 9 categories.
The proposed variable names are IND and OCC. I may want to store the individual entries as categorical text, such as "Construction" and "Manufacturing", but am not sure yet.
New categorical variable indicating the reason that someone is part-time. Would like to ID both economic reasons and things like needing to care for children or adult dependents.
In many cases, there is no CBSA code for an individual. The current value for not available is -31676, and it should be NaN or None.
The new feather files have slightly different variable names and formats, so they require an update of programs that use them, and specifically of the CPS grapher. This is part of the bd_CPS series.
Currently the only wage calculation avaiable in the bd CPS grapher is the median. If the percentile used in the calculation were a parameter set outside the calculation, calculating the interquartile median would be easy.
The median hourly real wage calculation is returning very noisy results. It seemed to work previously.
The graphs produced by the bd CPS grapher are uniform because of matplotlib parameters set in the header of the notebook and also because of settings in the function that turns the data into the graph. These settings all need to be cleaned up and fine-tuned, with some documentation added.
Some examples include: larger title, fonts can be nicer, source shouldn't be centered, add a twitter logo and bd_econ, etc.
The longitudinal component of the CPS is underutilized but very useful. Create a sample line plot showing matched CPS observations that flow from one labor force category to another. For example, flows from "not in the labor force - disabled or ill" to "employed".
Create a new variable, RHRWAGE, equal to the real hourly wage, as adjusted by the NSA CPI-U.
Make sure it can be missing for latest month, for cases when CPS microdata are available before the CPI-U estimate from BLS.
New variable to capture overtime, tips, and commissions.
Create a working example of efficiently converting bd CPS raw data into a horizontal grouped bar chart. One example might be the union membership rate by industry comparing the twelve months ending September 2018 with the twelve months ending September 2016.
New categorical variable AREATYPE equal to either Urban, Suburban, or Rural.
Create a time-consistent major occupation recode variable OCC.
More details to follow.
Create new binary variable equal to 1 if employed and otherwise 0.
Create a new binary variable, married, equal to 1 if married and otherwise 0.
Replace the PESEX variable with a binary variable called FEMALE and equal to 1 if female and otherwise 0.
Create a new binary variable MJH equal to 1 if the person has more than one job, and otherwise equal to zero if they have one job, and np.nan if they have no job.
Allow up to three series to be created during the data retrieval process. The three series can then be plotted together. At present, each series has to be created separately and then combined.
The solution for this one is not to bad if a new categorical variable can be created showing membership in each query group, and this variable can be split later using the same groupby that splits the annual file by month.
The variable name for the person's age variable, PRTAGE, can be shortened to just AGE.
Clean up the OECD API example, it is currently broken.
Create efficient sample code to produce a line plot from two separate series (therefore with two lines). At this stage, it's ok (but not ideal) to duplicate the CPS call (reading of each annual feather file). Later, the multiple series line plot should be generated with one CPS call.
An example might be the employment to population ratios for Tennessee and Kentucky (combined) compared to Indiana and Ohio (combined).
Refactor some sections of code in the bd CPS reader so that the monthly dataframes utilize method chaining for clarity of code and ease of adding new variables.
New variable KIDSNO or CHILDNO equal to the number of own children under age 18.
The bd CPS grapher's main function currently repeats the code for smoothing out the calculated results with a 12 month moving average. I would like to have the type of smoothing set by the user's input to the function. The 12 month moving average can be the default for now, but eventually, the better option is to apply whatever calculation to 12 months of actual CPS data at once, in a rolling window. This will be covered in a separate issue.
Because wages have a tendency to clump around round numbers, like $12.00, estimates of how wages are moving over time will tend to jump from round number to round number, over time, rather than move smoothly.
BLS and Census use a technique to address this, and I want to replicate the technique. More info to follow.
Create new binary variable FORBORN equal to 1 if born outside the US, and 0 if born in the US (np.nan if unknown).
New categorical variable WBHAOM (White, Black, Hispanic, Asian, Other, Mixed) for race groups.
Remove raw CPS variables that have already been translated in whole or part to bd CPS variables. Goal is to have around 40 variables max, and for each variable to be meaningful, consistent, and well-documented.
Create new variable with category of unemployment: Job leavers, Job losers, Re-entrants, and New entrants.
Create a special category of graph that shows the median wage on top of the shaded values for the interquartile median (25th to 75th percentile).
The way a recession month enters and stays in the data is different in most cases in the bd CPS grapher because the series have a 12-month moving average applied.
One options for visually conveying this point is to fade in the recession bars in the background, the full shading would only apply to months where the data shown in the series come from 12 months that were entirely recession months. Continuing the example, if only 3 months (out of 12) have a recession, then the recession bar would be shaded 25% (3/12) of it's maximum alpha value.
This issue is a placeholder to see what it looks like.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.