GithubHelp home page GithubHelp logo

policyengine / policyengine-core Goto Github PK

View Code? Open in Web Editor NEW
14.0 14.0 19.0 13.68 MB

Core microsimulation engine for PolicyEngine models. Forked from OpenFisca-Core.

Home Page: https://policyengine.github.io/policyengine-core

License: GNU Affero General Public License v3.0

Python 99.38% Makefile 0.11% HTML 0.51%
benefit framework microsimulation tax

policyengine-core's People

Contributors

abhcs avatar alexiseidelman avatar anna-livia avatar anth-volk avatar benjello avatar benoit-cty avatar bonjourmauko avatar bouvard avatar bouxtehouve avatar br3nda avatar cbenz avatar clems avatar dependabot-preview[bot] avatar dependabot-support avatar eraviart avatar fpagnoux avatar guillett avatar haekadi avatar jsantoul avatar mattisg avatar maxghenis avatar michaelsnowden avatar morendil avatar nikhilwoodruff avatar pavelmakarchuk avatar pblayo avatar pigo86 avatar ramparameswaran avatar sandcha avatar sarahdi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

policyengine-core's Issues

Bug in Dataset constructor

When trying to instantiate a Dataset object (in the same way I was successfully doing before the model-name-change process was undertaken), I get the following error message:

Traceback (most recent call last):
.
.
.
  File "//Users/mrh/anaconda3/envs/policyengine-us/lib/python3.9/site-packages/policyengine_core/data/dataset.py", line 41, in __init__
    if not self.folder_path.exists():
AttributeError: 'str' object has no attribute 'exists'

The error message says it all: a string (folder_path) does not have an exists method.

If you are trying to check the existence of the folder_path, then try something like this:

if not os.path.exists(self.folder_path):

Add type hints to all variables

Type hints, as well as being good practice, enable IDEs like Visual Studio Code to provide relevant information about properties at the time of writing. We should annotate all non-inferable types in the codebase to aid this.

FutureWarning regarding bool checking in enum.py

Following warning is generated when running tests on GitHub:

policyengine_us/tests/microsimulation/test_microsim.py: 3 warnings
  /opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/policyengine_core/enums/enum.py:74: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
    if isinstance(array == 0, bool):

Code:

if isinstance(array == 0, bool):
array = array.astype(str)

Add custom time periods

We should allow country packages to define their own time periods. For example, the UK fiscal year begins on the 1st of April. I think the cleanest implementation would involve the Period class having period-specific features removed, and adding base classes Day, Month, Year and Eternity inherit from Period.

Support scale parameter `.calc()` based on `threshold_variable` in parameter file

If a scale parameter has a threshold_variable metadata field (#83), we could automatically calculate scale parameters.

For example, if the tax.mtr_schedule parameter had a threshold_variable: taxable_income metadata field, we could condense this formula:

taxable_income = tax_unit("taxable_income", period)
tax = parameters(period).tax.mtr_schedule.calc(taxable_income)

into this:

tax = parameters(period).tax.mtr_schedule.calc()

This further condenses the proposal in #84.

Assume `input: {}` in yaml tests

Currently, yaml tests without an input throw a cryptic error message:

AttributeError: ‘NoneType’ object has no attribute ‘items’

We could throw a more informative error message, but I think assuming input: {} when no input is provided will be easier.

Add venv to .gitignore

Some users may want to make a venv (and we may want to suggest this), so ignoring contents will avoid git issues there.

For posterity, steps would be something like

pyenv shell 3.9.16 
python -m venv venv  

cc @ibushong

Add microsimulation interface

The Simulation class in Core is useful as a base class, but there is a special case of Simulation which deserves its own interface: simulations on weighted survey microdata. We have an implementation in OpenFisca-Tools which we should port over to Core.

Pre-install steps

Upgrading wheel, setuptools, and pip before installing other requirements can avert some environment issues. We can do this with a separate requirements.txt file that does something like pip install -U wheel setuptools pip (but with pins) and adding something like make setup as a pre-install step.

Thanks @ibushong for the tip here!

Support `adds/subtracts` with empty parameter list

For example, suppose we're using adds to create a variable representing a state's refundable credits based on a policy parameter, and the state only has one refundable credit in some years, zero in others. It currently breaks when trying to sum from the empty list. @PavelMakarchuk

Compiled formulas

Magic methods are what enable machine learning libraries like TensorFlow or PyTorch to compile Python expressions of Tensors into optimised instructions. Can we do the same with PolicyEngine country packages?

`threshold_variable` metadata field specifying the variable that parameter scale thresholds are based on

To enable this:

description: NYC provides a household credit of this fixed amount to single filers, based on recomputed federal AGI.
metadata:
  label: NYC Household Credit Fixed Amount for single filers
  type: single_amount
  threshold_unit: currency-USD
  amount_unit: currency-USD
  threshold_variable: adjusted_gross_income # PROPOSAL
  reference:
    - title: Instructions for Form IT-201
      href: https://www.tax.ny.gov/pdf/2022/printable-pdfs/inc/it201i-2022.pdf#page=17
values:
  - threshold:
      2022-01-01: 10_000
    amount:
      2022-01-01: 15
  - threshold:
      2022-01-01: 12_500
    amount:
      2022-01-01: 10
  - threshold:
      2022-01-01: 12_501
    amount:
      2022-01-01: 0

cc @leogoldman

CLI datasets list error `no attribute 'years'`

Following these instructions: https://policyengine.github.io/policyengine-core/usage/cli.html#data

(policyengine) maxghenis@MacBook-Air-3 policyengine-us % policyengine-core data datasets list -c policyengine_us
Traceback (most recent call last):
  File "/Users/maxghenis/miniconda3/envs/policyengine/bin/policyengine-core", line 8, in <module>
    sys.exit(main())
  File "/Users/maxghenis/miniconda3/envs/policyengine/lib/python3.9/site-packages/policyengine_core/scripts/policyengine_command.py", line 128, in main
    return sys.exit(main(parser))
  File "/Users/maxghenis/miniconda3/envs/policyengine/lib/python3.9/site-packages/policyengine_core/scripts/run_data.py", line 36, in main
    print(dataset_summary(datasets))
  File "/Users/maxghenis/miniconda3/envs/policyengine/lib/python3.9/site-packages/policyengine_core/scripts/run_data.py", line 12, in dataset_summary
    years = list(sorted(list(set(sum([ds.years for ds in datasets], [])))))
  File "/Users/maxghenis/miniconda3/envs/policyengine/lib/python3.9/site-packages/policyengine_core/scripts/run_data.py", line 12, in <listcomp>
    years = list(sorted(list(set(sum([ds.years for ds in datasets], [])))))
AttributeError: type object 'CPS_2020' has no attribute 'years'

Support Python 3.10

Not high priority and IIUC it's blocked by a pytest issue (#32), but dropping here for the future

Bug in Dataset method save_dataset()

TWO THINGS: what seems to be a bug and a feature request:

BUG:
In the Dataset class, exists is a property not a method.
So, should the error-causing line be:

with h5py.File(file, "a" if file.exists else "w") as f:

ERROR MESSAGE:

vdset.save_dataset(vdset)
  File "//Users/mrh/anaconda3/envs/policyengine-us/lib/python3.9/site-packages/policyengine_core/data/dataset.py", line 172, in save_dataset
    with h5py.File(file, "a" if file.exists() else "w") as f:
AttributeError: 'str' object has no attribute 'exists'

FEATURE REQUEST:
Also, could you add an optional argument to the save_dataset method that allows use of a file path that differs from self.file_path?

Reform impact explanations

A common task analysts need to carry out is to understand why reforms impact subsections of society. It'd be really helpful for a (e.g.) streamlit-based chatbot to be able to answer questions about a reform, for example:

Pull out a household that benefits from the reform in the code box above.

OK, I picked household #2230.

Explain why they benefited?

Their net income increased by $X. It looks like this was because of their benefits, which in turn was because of their SNAP.

Slim down package

Looking through the package contents, there's a lot of legacy code that is unused in the latest version and exists as a result of previous evolutions of the codebase. For example, the Scenario interface that was succeeded by SimulationBuilder. I think removing these features will make the package more maintainable and help to reduce bugs.

Add branched simulations

We're seeing a common pattern in modelling legislation: rules that take the form:

variable X = what you would get if parameter Y were instead Z

Inside variable formulas, we should be able to 'branch off' a new simulation, modifying parameters and calculating variables. We already have an implementation in OpenFisca Tools we can port over, though we should investigate to see if there's a cleaner way now that we can edit Core directly in this fork.

Formalise references

Currently, references are a union of different formats: strings, tuples and lists. We should define a concrete Reference class, with a defined JSON translation, to make detailed references machine-friendly.

Soft definition periods

Definition periods in PolicyEngine Core are fixed: each variable must have exactly one time period for which it can have distinct values. Policy in country models often involves mid-year parameter changes or other ways that more flexible time period management might be useful.

Core has some shortcuts for this (e.g. set_input_divide, options=[ADD, DIVIDE]) but a more elegant solution to reduce the development burden on country models would just be to simplify all this into more autonomous Simulation behaviour: e.g. if I said my employment income was £30,000 per year, calculate my income tax in March, we really shouldn't need any country-specific logic describing how the time periods interact with each other here. The only thing we need to specify at the country level is whether each variable is a stock or a flow, because this implies different behaviour is correct:

  • wealth in 2022 is 450k => wealth in 2022-01 is 450k
  • income in 2022 is 30k => income in 2022-01 is 2.5k

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.