policyengine / policyengine-core Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 19.0 13.68 MB

Core microsimulation engine for PolicyEngine models. Forked from OpenFisca-Core.

Home Page: https://policyengine.github.io/policyengine-core

License: GNU Affero General Public License v3.0

Python 99.38% Makefile 0.11% HTML 0.51%

benefit framework microsimulation tax

policyengine-core's People

Contributors

Stargazers

Watchers

policyengine-core's Issues

Bug in Dataset constructor

When trying to instantiate a Dataset object (in the same way I was successfully doing before the model-name-change process was undertaken), I get the following error message:

Traceback (most recent call last):
.
.
.
  File "//Users/mrh/anaconda3/envs/policyengine-us/lib/python3.9/site-packages/policyengine_core/data/dataset.py", line 41, in __init__
    if not self.folder_path.exists():
AttributeError: 'str' object has no attribute 'exists'

The error message says it all: a string (folder_path) does not have an exists method.

If you are trying to check the existence of the folder_path, then try something like this:

if not os.path.exists(self.folder_path):

Update numpy to >=0.24.2

We are on

policyengine-core/setup.py

Line 14 in e41f19a

"numpy>=1.21,<1.22",

openfisca-core is now on numpy >=1.24.2, <1.25. They updated to >=1.24.2 in openfisca/openfisca-core#1181 (which has many changes due to code reformatting), and limited to <1.25 in openfisca/openfisca-core#1184.

Add type hints to all variables

Type hints, as well as being good practice, enable IDEs like Visual Studio Code to provide relevant information about properties at the time of writing. We should annotate all non-inferable types in the codebase to aid this.

Support scale parameter that interpolates over a specified range rather than a rate

For example, PolicyEngine/policyengine-canada#313 (review) refers to a tax credit with a maximum amount that partially phases out over a dollar range, as does the Guaranteed Income for the 21st Century.

Remove `aggr`

We can use add instead

Throw descriptive error when `adds` and `subtracts` encounter non-existent variable names

cc @MaxGhenis

FutureWarning regarding bool checking in enum.py

Following warning is generated when running tests on GitHub:

policyengine_us/tests/microsimulation/test_microsim.py: 3 warnings
  /opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/policyengine_core/enums/enum.py:74: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
    if isinstance(array == 0, bool):

Code:

policyengine-core/policyengine_core/enums/enum.py

Lines 74 to 75 in e41f19a

 if isinstance(array == 0, bool): 

 array = array.astype(str)

Add custom time periods

We should allow country packages to define their own time periods. For example, the UK fiscal year begins on the 1st of April. I think the cleanest implementation would involve the Period class having period-specific features removed, and adding base classes Day, Month, Year and Eternity inherit from Period.

New changes break parametric reforms in US model

PRs #127 and/or #128 fixed bugs that affected UK microsimulation accuracy with respect to uprating and reforms, but caused the US one to break entirely (PolicyEngine/policyengine-us#3311).

Support Python 3.11

This is now the latest stable version of Python

Test that .pdf references in parameters have a page number

I'd like to enforce via CI all pdf references in parameters including a page number via the .pdf#page=x URL.

Bug when using `map_to` in `Microsimulation.calculate_dataframe`

Doesn't occur when reinitialising from a new MicroDataFrame. Found in this notebook.

Support scale parameter `.calc()` based on `threshold_variable` in parameter file

If a scale parameter has a threshold_variable metadata field (#83), we could automatically calculate scale parameters.

For example, if the tax.mtr_schedule parameter had a threshold_variable: taxable_income metadata field, we could condense this formula:

taxable_income = tax_unit("taxable_income", period)
tax = parameters(period).tax.mtr_schedule.calc(taxable_income)

into this:

tax = parameters(period).tax.mtr_schedule.calc()

This further condenses the proposal in #84.

Assume `input: {}` in yaml tests

Currently, yaml tests without an input throw a cryptic error message:

AttributeError: ‘NoneType’ object has no attribute ‘items’

We could throw a more informative error message, but I think assuming input: {} when no input is provided will be easier.

Subtracts function does not support adds of a list parameter

Currently the model returns an error

Add venv to .gitignore

Some users may want to make a venv (and we may want to suggest this), so ignoring contents will avoid git issues there.

For posterity, steps would be something like

pyenv shell 3.9.16 
python -m venv venv

cc @ibushong

Test for valid yaml in CI

For example PolicyEngine/policyengine-us#3409 introduced invalid parameter YAML (see PolicyEngine/policyengine-us#3417), but didn't break CI

The API did catch it though: https://github.com/PolicyEngine/policyengine-api/actions/runs/7162451862/job/19499442307?pr=964

Replace `adds = "gov.x"` with `adds = "gov['x']` to address keyword issues

For example see PolicyEngine/policyengine-us#2815 (comment), where

adds = "gov.states.in.tax.income.credits.refundable"

breaks due to in being a keyword.

In advance of addressing paths with these keywords in general, we could parse them in the adds calls to replace all .x with ['x'] when parsing it.

Fix Enum mapping issues

Currently, Enum variables often lose their encodings when mapping between entities.

Pin numpy==1.22.4

As a step toward #37, per #37 (comment)

Support scale parameter `.calc("threshold_variable")`

To condense this:

taxable_income = tax_unit("taxable_income", period)
tax = parameters(period).tax.mtr_schedule.calc(taxable_income)

to this:

tax = parameters(period).tax.mtr_schedule.calc("taxable_income")

Add microsimulation interface

The Simulation class in Core is useful as a base class, but there is a special case of Simulation which deserves its own interface: simulations on weighted survey microdata. We have an implementation in OpenFisca-Tools which we should port over to Core.

Pre-install steps

Upgrading wheel, setuptools, and pip before installing other requirements can avert some environment issues. We can do this with a separate requirements.txt file that does something like pip install -U wheel setuptools pip (but with pins) and adding something like make setup as a pre-install step.

Thanks @ibushong for the tip here!

Support `adds/subtracts` with empty parameter list

For example, suppose we're using adds to create a variable representing a state's refundable credits based on a policy parameter, and the state only has one refundable credit in some years, zero in others. It currently breaks when trying to sum from the empty list. @PavelMakarchuk

Compiled formulas

Magic methods are what enable machine learning libraries like TensorFlow or PyTorch to compile Python expressions of Tensors into optimised instructions. Can we do the same with PolicyEngine country packages?

Drop Python 3.6 support

Python 3.6 is at end of life

Boolean variable naming convention

Should be the answer to "Is Y X", and in the documentation as best practice.

Remove `name` parameter metadata field

Support `adds = "float_or_int_policy_parameter"`

For example, PolicyEngine/policyengine-us#2306 involves a nh_total_exemptions variable which sums several individual exemption variables using adds=[], and one of those is a nh_base_exemption which is a flat amount. So it'd be easier to avoid a formula just to return a parameter value.

`threshold_variable` metadata field specifying the variable that parameter scale thresholds are based on

To enable this:

description: NYC provides a household credit of this fixed amount to single filers, based on recomputed federal AGI.
metadata:
  label: NYC Household Credit Fixed Amount for single filers
  type: single_amount
  threshold_unit: currency-USD
  amount_unit: currency-USD
  threshold_variable: adjusted_gross_income # PROPOSAL
  reference:
    - title: Instructions for Form IT-201
      href: https://www.tax.ny.gov/pdf/2022/printable-pdfs/inc/it201i-2022.pdf#page=17
values:
  - threshold:
      2022-01-01: 10_000
    amount:
      2022-01-01: 15
  - threshold:
      2022-01-01: 12_500
    amount:
      2022-01-01: 10
  - threshold:
      2022-01-01: 12_501
    amount:
      2022-01-01: 0

cc @leogoldman

Accept parameters in `adds` and `subtracts`

Like sum_of_variables does.

CLI datasets list error `no attribute 'years'`

Following these instructions: https://policyengine.github.io/policyengine-core/usage/cli.html#data

(policyengine) maxghenis@MacBook-Air-3 policyengine-us % policyengine-core data datasets list -c policyengine_us
Traceback (most recent call last):
  File "/Users/maxghenis/miniconda3/envs/policyengine/bin/policyengine-core", line 8, in <module>
    sys.exit(main())
  File "/Users/maxghenis/miniconda3/envs/policyengine/lib/python3.9/site-packages/policyengine_core/scripts/policyengine_command.py", line 128, in main
    return sys.exit(main(parser))
  File "/Users/maxghenis/miniconda3/envs/policyengine/lib/python3.9/site-packages/policyengine_core/scripts/run_data.py", line 36, in main
    print(dataset_summary(datasets))
  File "/Users/maxghenis/miniconda3/envs/policyengine/lib/python3.9/site-packages/policyengine_core/scripts/run_data.py", line 12, in dataset_summary
    years = list(sorted(list(set(sum([ds.years for ds in datasets], [])))))
  File "/Users/maxghenis/miniconda3/envs/policyengine/lib/python3.9/site-packages/policyengine_core/scripts/run_data.py", line 12, in <listcomp>
    years = list(sorted(list(set(sum([ds.years for ds in datasets], [])))))
AttributeError: type object 'CPS_2020' has no attribute 'years'

Warn when a threshold is missing a rate or amount

To avoid issues like in this PR, which had a missing amount that went undiagnosed.

Add check for variables having labels to CI

Implement defined-for logic

We should port the defined-for logic from OpenFisca-Tools, making improvements where we can.

Relax pytest version requirements

Per PolicyEngine/policyengine-uk#645, pytest version requirements are causing conflicts.

We restrict the version range in a way that mirrors openfisca-core. Can we uncap the version?

Support Python 3.10

Not high priority and IIUC it's blocked by a pytest issue (#32), but dropping here for the future

Deprecate `sum_of_variables`

Pending #68, which would bring the same functionality via adds

Pin all package versions

Bug in Dataset method save_dataset()

TWO THINGS: what seems to be a bug and a feature request:

BUG:
In the Dataset class, exists is a property not a method.
So, should the error-causing line be:

with h5py.File(file, "a" if file.exists else "w") as f:

ERROR MESSAGE:

vdset.save_dataset(vdset)
  File "//Users/mrh/anaconda3/envs/policyengine-us/lib/python3.9/site-packages/policyengine_core/data/dataset.py", line 172, in save_dataset
    with h5py.File(file, "a" if file.exists() else "w") as f:
AttributeError: 'str' object has no attribute 'exists'

FEATURE REQUEST:
Also, could you add an optional argument to the save_dataset method that allows use of a file path that differs from self.file_path?

Avoid recomputing unnecessary subtrees

Based on the cached baseline

cc @MichaelSnowden @ibushong

Reform impact explanations

A common task analysts need to carry out is to understand why reforms impact subsections of society. It'd be really helpful for a (e.g.) streamlit-based chatbot to be able to answer questions about a reform, for example:

Pull out a household that benefits from the reform in the code box above.

OK, I picked household #2230.

Explain why they benefited?

Their net income increased by $X. It looks like this was because of their benefits, which in turn was because of their SNAP.

Slim down package

Looking through the package contents, there's a lot of legacy code that is unused in the latest version and exists as a result of previous evolutions of the codebase. For example, the Scenario interface that was succeeded by SimulationBuilder. I think removing these features will make the package more maintainable and help to reduce bugs.

YAML tests don't fill `Simulation.input_variables`

The property is empty but should contain all variables which were specified in the test.

Add branched simulations

We're seeing a common pattern in modelling legislation: rules that take the form:

variable X = what you would get if parameter Y were instead Z

Inside variable formulas, we should be able to 'branch off' a new simulation, modifying parameters and calculating variables. We already have an implementation in OpenFisca Tools we can port over, though we should investigate to see if there's a cleaner way now that we can edit Core directly in this fork.

Propagate references but not labels

Often references apply to children, but not labels. We should allow this propagation to distinguish between them.

Formalise references

Currently, references are a union of different formats: strings, tuples and lists. We should define a concrete Reference class, with a defined JSON translation, to make detailed references machine-friendly.

Test that country packages don't have formulas with entity mismatches

We've had several undetected crashes arising from formulas that mix up entity types, resulting in this type of error:

ValueError: operands could not be broadcast together with shapes (100000,) (279932,)

For example, PolicyEngine/policyengine-us#3298

Let's test for this centrally.

@martinholmer could you share the command you executed in that issue to throw the error?

Update to numpy 2

numpy is now on version 1.23, but we cap it, following openfisca-core:

policyengine-core/setup.py

Line 14 in d0a69ac

"numpy>=1.11,<1.21",

Soft definition periods

Definition periods in PolicyEngine Core are fixed: each variable must have exactly one time period for which it can have distinct values. Policy in country models often involves mid-year parameter changes or other ways that more flexible time period management might be useful.

Core has some shortcuts for this (e.g. set_input_divide, options=[ADD, DIVIDE]) but a more elegant solution to reduce the development burden on country models would just be to simplify all this into more autonomous Simulation behaviour: e.g. if I said my employment income was £30,000 per year, calculate my income tax in March, we really shouldn't need any country-specific logic describing how the time periods interact with each other here. The only thing we need to specify at the country level is whether each variable is a stock or a flow, because this implies different behaviour is correct:

wealth in 2022 is 450k => wealth in 2022-01 is 450k
income in 2022 is 30k => income in 2022-01 is 2.5k

policyengine / policyengine-core Goto Github PK

policyengine-core's People

Contributors

Stargazers

Watchers

Forkers

policyengine-core's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs