GithubHelp home page GithubHelp logo

bipartitepandas's Introduction

Thibaut Lamadon Github profile

Please see my personal page at lamadon.com.

I will update this page soon!

bipartitepandas's People

Contributors

adamoppenheimer avatar tlamadon avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

santiagohermo

bipartitepandas's Issues

Stack overflow when trying to identify leave-one-out-connected set

Dear developers,

First of all, thank you for supplying the great BipartitePandas and PyTwoWay packages.

I have an issue identifying the leave-one-out-connected set using BipartitePandas.

Everything works well on small simulated and test data. However, if I apply the data cleaning to real admin data (>21 Mio. worker-year obs.), Python crashes. The error I get is ‘Windows fatal exception: stack overflow’. A window pops up saying “Python has stopped working” and the Python kernel in Spyder restarts.

The server I use runs on a Windows 64-bit OS. It has got a 2.1 GHz processor and 256GB RAM. Python uses only about 10GB RAM at max while my program is running. The Python version installed on this server is 3.8.8 and BipartitePandas is version 1.0.35.

I tried increasing both the stack size and recursion depth (see code in txt file). This did not resolve the problem though.

I would be grateful if you have any idea what may cause the stack overflow and how to fix it. Please find my code and associated output from the Spyder console below. (I redacted some paths from these files for security reasons.)

Best
Martin

22_8_22_bp_tw_code.txt
output_console.txt

AttributeError when cleaning simulated data

Hi,

I am just starting to use the package with simulated data using the example code, however I cannot get beyond cleaning - when I simulate data using the example and cleaning parameters, I get the following error:

>>>bdf=bdf.clean(clean_params)
checking required columns and datatypes
sorting rows
dropping NaN observations
generating 'm' column
keeping highest paying job for i-t (worker-year) duplicates (how='max')
dropping workers who leave a firm then return to it (how='returners')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/ssb/bruker/sml/.local/lib/python3.6/site-packages/bipartitepandas/bipartitelong.py", line 100, in clean
    frame = super().clean(params)
  File "/ssb/bruker/sml/.local/lib/python3.6/site-packages/bipartitepandas/bipartitelongbase.py", line 154, in clean
    frame = frame._drop_returns(how=drop_returns, is_sorted=True, reset_index=True, copy=False)
  File "/usr/local/lib64/python3.6/site-packages/pandas/core/generic.py", line 5179, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute '_drop_returns'
(4 lines skipped)

When I instead load fake dataset from leed2way, the error occurs one step earlier with the AttributeError: 'DataFrame' object has no attribute 'gen_m'.

Better error messages when constructing data sets

If one of the 4 required column is missing or has the wrong type, we should provide a message that says exactly what the issue is. Right now it just throws an error about wrong type if you define the wage as w instead of y.

I think we should also add a construction method that simply takes the 4 required columns as input argument, ie

Bipartite(i= I, j= J, t= T , y=Y )

where I, J, T and Y are simply numpy vectors.

Group observations based on windows of time

Add method to group observations based on windows of time.

This should work for long format, and maybe collapsed long format.

This could look like .windows(length, start=None, end=None).

This will group observations together over windows of period length.

If start and end are None, it will specify these windows based on when the worker worked.

If start and/or end are specified, drop observations that are from before start or after end. Then, share the windows between all workers. It will need to handle cases where workers don't have observations for some windows - either drop those workers, fill in their missing data, or leave their data as-is.

Partialling out controls

Add option to partial out controls.

This is probably only feasible for long format data.

General idea: add method .partial(outcome, continuous_controls, categorical_controls, controls_to_partial). Then it runs the regression outcome = controls_not_to_partial @ beta_1 + controls_to_partial @ beta_2. Then update outcome to become outcome - controls_to_partial @ beta_2.

If the columns i and j are included, I think the easiest way to run the regression would be to use the class FEControlEstimator - but to make the method more flexible will require specifying the regression inside the method.

Specify periods to keep

Add an option to force all observations to be in a particular set of periods.

If a worker doesn't have observations for those periods, either fill them in or drop the worker.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.