GithubHelp home page GithubHelp logo

andersonfrailey / cpstaxunits Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 56.31 MB

Create a tax unit file for use in microsimulation models from the Current Population Survey

Python 70.81% Jupyter Notebook 27.81% R 1.37%

cpstaxunits's People

Contributors

andersonfrailey avatar

Stargazers

 avatar

Watchers

 avatar  avatar

cpstaxunits's Issues

Vectorize CPS data creation for speed

In looking to import the 2015 CPS ASEC, I saw cpsmar_2015.py here, which seems to be the only Python script out there for importing this file. I think this can be sped up by reading the file as a Pandas Series and then extracting the fields with string functions, then merging the person/family/household features by their common identifiers, rather than one record at a time.

I gave this a starting shot in this notebook, starting with the person fields only, e.g.:

dats = pd.read_csv('http://thedataweb.rm.census.gov/pub/cps/march/' +
                   'asec2015_pubuse.dat.gz', compression='gzip', 
                   header=0).iloc[:, 0]
rec_type = dats.str[0]
p = dats[rec_type == '3']
df_p = pd.DataFrame()
df_p['precord'] = p.str[0]
df_p['ph_seq'] = p.str[1:6]
# And so on...

This takes about 5 minutes to run, so I'd expect under 15 minutes with family and household records, and merging them (compared to 2 hours as-is).

Curious what you think--I'm happy to see it through and ensure it matches the current script. At some point it might even be cool to build a Python version of the ipumsr R package (GitHub) which parses the data dictionary to avoid hard-coding string positions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.