synthesized-io / fairlens Goto Github PK

View Code? Open in Web Editor NEW

87.0 87.0 8.0 6.28 MB

Identify bias and measure fairness of your data

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

bias data data-analysis data-science fairness pandas python statistics

fairlens's People

Contributors

Stargazers

Watchers

Forkers

shandou yyht souvickg vamsikavuru dal3006 m01k0 oss-pole-emploi jadenyifanhe

fairlens's Issues

Simple statistical comparisons of sensitive groups

Is your feature request related to a problem? Please describe.

Biases are currently identified by comparing the distributions of a target variable and this comparison is achieved by calculating a statistical distance metric between the distributions, e.g the earth-mover's distance. Although this method can identify differences between target variables, it's both hard to interpret the numeric result and understand what the difference actually is.

To provide further interpretation of a potential bias, I think it would be useful to be able to calculate simpler statistical measures (e.g central moments, variance) of the target variable distributions and compare them across sensitive groups. Statistical tests can then be performed to determine the significance of these differences.

Additionally, it may also be useful to perform and provide different hypothesis tests that compare the distributions. For example, the Brunner-Munzel test may be appropriate and more powerful at discovering differences in the distributions.

Describe the solution you'd like
New metrics that can be used to calculate and show simpler statistical properties of target variable distributions plus corresponding hypothesis tests. Implementations of non-parametric hypothesis tests for comparing distributions and discovering significant biases (e.g Brunner-Munzel). These metrics and test p-values can then reported in the FairnessScorer

Describe alternatives you've considered

Additional context

Configuration for sensitive attributes, aliases, & values

Sensitive attribute detection currently has two mappings, one from sensitive attributes to their aliases, and one from sensitive attributes to potential values taken by them. These can both be encoded in json config files.

Confidence intervals for metrics

Add version number to docs

Is your feature request related to a problem? Please describe.
There's no indication of what fairlens version we are working with when reading the docs. Readthedocs only gives "latest" which isn't very informative

Describe the solution you'd like
Add the fairlens version to the docs index

Describe alternatives you've considered

Additional context

Mitigate biases that are detected in datasets.

Is your feature request related to a problem? Please describe.
Once a bias has been measured in a dataset, it would be nice to be able to still use the dataset without having to worry about the biases.

Describe the solution you'd like

for a given metric be able to improve the measure of bias in a dataset.

Describe alternatives you've considered

Additional context

Bottleneck in deep search

The fine grained deep search in sensitive.detection._deep_search performs poorly on large datasets and bottlenecks the sensitive attribute detection in the fairness scorer. Currently, we compute the str_distance between all values in the dataset paired with the expected values for each sensitive attribute. Only considering the unique values would speed this up considerably. Additionally we may want to limit the deep search to only look at a sample of unique values in large datasets, since we don't necessarily need to check all the unique values.

Roadmap

I've made a start on creating a roadmap in the projects page.

Publishing

I believe the aim here is to publish the package onto pypi. @rob-tay do you know much about how to do that?

Simple statistical metrics for bias

Brand Assets

Visual assets needed for fairlens:

documentation logo
readme logo
example screenshots

Add user guides

The docs need detailed user guides for bias measurement, sensitive attribute detection and visualization. Worth looking into using ipython extension for sphinx.

Add more detail to docs

The docs need more details ie. contributing, installation, changelog, etc.

Add visualization tools

Plots aren't returning axes

Contribution Guidelines

Compas dataset needs to be separated

ProPublica's website indicates that the repeats in the dataset are due to people receiving different COMPAS assessments, ie. one for Risk of Violence, one for Risk of Recidivism, and one for Risk of Failure to Appear. It seems to be that different algorithms are used by COMPAS for each of those cases. It therefore makes sense to split the dataset into separate parts, ie. one for each type of assessment.

Broken Logo Image on PyPi

Describe the bug
Broken Logo Image on Pypi: https://pypi.org/project/fairlens/

Screenshots

[ISSUE] Templates for Pull Requests

Is there an existing issue for this?

I have searched the existing issues

Issue Type

Improvement

Describe the issue

At the moment, Fairlens does not have a general template for Pull Requests and would benefit from having a standardized method for them.

Describe a solution you would like

Add a subfolder PULL_REQUEST_TEMPLATE in .github containing markdown files with the different sections and required information or context for a new Pull Request.

Describe alternatives you have considered

It might be possible to use YAML files as with the Issue forms, but it seems that other repositories do not use them and this way of doing it isn't documented by Github either.

Additional context

No response

bug in configuration paths for build.

setup.py needs to include the config files in the build
use the correct relative path for config files.

Test Issue

Test Description

import numpy as np

x = np.array([0, 1, 1, 3])

Add "open in colab" badge

Is your feature request related to a problem? Please describe.
We should have a badge that links to colab. Either to the SDK notebook (when fairlens is added) or to the notebook in this repo)

See here:
https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Review current fairness measurement packages.

We'd like to provide a comprehensive solution that stacks up well against similar packages. It's important then to know what the current solutions in this space are. This issue will be resolved if we have a comprehensive confluence page that gives a good idea of the current space of fairness packages. In particular, we're interested in what measures of fairness they use and the motivation behind those choices.

Report generation in fairness scorer

The fairness scorer should be able to produce a report which aggregates the demographic score, hidden correlations, and any other metrics. Would be useful to have a single value representing the bias of the dataset.

Implement fairness scorer

The fairness scorer is a module which combines the features of the bias and sensitive package to generate a report, figures, etc, analysing potential biases the dataset.

Infer sensitive attribute inference from type/value

Add p value testing

Need to add methods to compute the p-value for each metric using bootstrapping, permutation tests, etc. Should be in a separate module and use a wrapper.

Refactor bias metrics

Decide on whether to use abstract classes or functions for distance metrics and refactor accordingly.

Updated README and documentation

What things do we have left to do here?

Updated:

Write short 2-3 tutorials based on either COMPAS, German Credit, Adult, or LSAC datasets.
Include a fairness scorer use case in README.
Polishing overview and quickstart.
Include contribution guides in docs

Plot scaling

Is your feature request related to a problem? Please describe.
I think y-axis of distribution plots needs to be scaled similarly for data from the same column.

Describe the solution you'd like
We could set the upper y limit of the plot to some constant times the maximum value in the target column.

plt.ylim(0, df[target_attr].max() * 1.2)

Import the insight package into fairlens and remove any duplicated code

This should remove all of the metrics that are defined in fairlens. We will need to think about how to deal with the documentation (ie. can we include docs from the insights repo?)

Blocked by synthesized-io/insight#7

Integrating insight into fairlens

Discussed in #108

We have to sync fairlens and insight and start using insights methods in fairlens. The desired structure for metrics in fairlens is the following.

plotting histograms of dates doesn't work properly

Histograms of dates don't render the correct axes labels.

Fairlens Logo Difficult to read with a dark-mode theme

Describe the bug
The Fairlens logo is difficult to read when using a dark theme on github.

Screenshots

Pairwise distance computation in the fairness scorer

Is there an existing issue for this?

I have searched the existing issues

Is your feature request related to a problem?

At the moment, the fairness scorer compares the distribution of a variable in a sensitive sub-group to the overall distribution. This works well for symmetrical statistical distance metrics, but it would be useful to have a way of using asymmetrical metrics such as disparate impact, to produce a similar table comparing distributions of different pairings of subgroups instead.

Describe the solution you would like

A method similar to fairlens.FairnessScorer.distribution_score, which iterates through pairs of the subgroups instead.

Integrate p-values into fairness scorer

Add documentation for p-values too.

Setting up CI + packaging

We need to automate the building and packaging of fairlens so that v1.0 can be built when we open source. I suppose we'll want to publish on pypi too. We should have a think about the future release goals too.

Linting
Packing
#64
Build Docs on main #74 #83

Heatmap of interactions between sensitive and non sensitive columns

Datasets often have proxies for sensitive attributes; i.e. non sensitive columns highly correlated with sensitive columns. The fairness scorer should be able to detect these and plot 2D correlation heat maps for all pairs of columns. This would require integrating some of the work done on correlations done in the sensitive package with the fairness scorer, and adding additional correlation metrics to account for all types of columns.

Script to walk files and generate docs using the automodule directive

Some popular repositories such as pandas seem to have custom scripts for api doc generation. It could be useful to have a script that walks through all the files in src/fairlens and builds rst files for them. Nice use cases include having a separate html file for each metric, method, etc. Alternatively we could use templates for sphinx api-doc.

Cross reference API reference in docs

Is your feature request related to a problem? Please describe.
The documentation references object names in the user guides and tutorials but they are not cross-referenced to the respective entry in the API reference. It would be nicer to be able to click on the name and it takes you to the more detailed API reference for that object.

Describe the solution you'd like
Links to the respective API reference for any named objects in the documentation

Describe alternatives you've considered

Additional context

Detecting sensitive attributes using word vectors

Deep search currently uses the Ratcliffe-Obershelp algorithm to match strings in a column with potential aliases to determine whether the attribute corresponds to a sensitive attribute. Using word vectors will remove the need to match with aliases and might be faster.

Continuous data is handled incorrectly in emd metric

The emd metric function builds a metric space from all the unique values in the data, and uses pd.Series.value_counts to create the histogram.

fairlens/src/fairlens/bias/metrics.py

Lines 79 to 92 in a11b4e9

 if counts is None: 

 if group1 is None: 

 raise InsufficientParamError() 

 # Find the predicates for the two groups 

 pred1, pred2 = utils.get_predicates(df, group1, group2) 

 # Compute the histogram / counts for each group 

 g1_counts = df[pred1][target_attr].value_counts().to_dict() 

 g2_counts = df[pred2][target_attr].value_counts().to_dict() 

 counts = g1_counts, g2_counts 

 space = df[target_attr].unique()

For categorical attributes this is fine, but if the data is continuous this won't work as intended; the metric space blows up and the distance isn't really being calculated between meaningful distributions. Instead, either the continuous data needs to be binned before pd.Series.value_counts or should be passed to pyemd.emd_samples instead, which also does the binning.

Create a summary of fairness metrics.

For the first stage of this project, we'd like to have a solution that measures fairness in a few common ways. To start with it would be helpful to create some documentation summarising the current common ways to measure fairness and bias.

Distance computation for all demographics in the fairness scorer

The fairness scorer needs a fast and efficient (multi-threaded) method to compute the statistical distance between the distribution of a demographic and the distribution of the population without the demographic, for all combinations of demographics in the selected (sensitive) attributes. Efficiently mapping distance metrics across a group by object might involve handling some cases differently, for instance with categorical distance metrics, it would be much faster to compute the bin edges for the entire column beforehand rather than calling np.histogram_bin_edges once for each group.

Sample architecture:

def distribution_score(
    self,
    mode: str = "auto",
    alpha: float = 0.05,
    min_dist: Optional[float] = None,
    min_count: Optional[int] = 50,
    weighted: bool = True,
    max_comb: Optional[int] = 3
) -> pd.DataFrame:
    """Returns the biases and fairness score by analyzing the distribution difference between sensitive
    variables and the target variable.
    Args:
        mode (str, optional):
            Choose a different metric to use. Defaults to automatically chosen metric depending on
            the distribution of the target variable.
        alpha (float, optional):
            Maximum p-value to accept a bias. Defaults to 0.05.
        min_dist (Optional[float], optional):
            If set, any bias with smaller distance than min_dist will be ignored. Defaults to None.
        min_count (Optional[int], optional):
            If set, any bias with less samples than min_count will be ignored. Defaults to 50.
        max_comb (Optional[int], optional):
            Max number of combinations of sensitive attributes to be considered. Defaults to 3.
    """

Sample output on compas:

Demographic              Distance        P-Value        Counts

African-American Male

Caucasian Male

...

Describe alternatives you've considered
A simple solution would be to make the bars appear side by side. Open to suggestions for alternative plots.

Add statistical difference measurements

Binomal, EMD, KS, etc.

	if counts is None:
	if group1 is None:
	raise InsufficientParamError()

	# Find the predicates for the two groups
	pred1, pred2 = utils.get_predicates(df, group1, group2)

	# Compute the histogram / counts for each group
	g1_counts = df[pred1][target_attr].value_counts().to_dict()
	g2_counts = df[pred2][target_attr].value_counts().to_dict()

	counts = g1_counts, g2_counts

	space = df[target_attr].unique()

synthesized-io / fairlens Goto Github PK

fairlens's People

Contributors

Stargazers

Watchers

Forkers

fairlens's Issues

Is there an existing issue for this?

Issue Type

Describe the issue

Describe a solution you would like

Describe alternatives you have considered

Additional context

Discussed in #108

Is there an existing issue for this?

Is your feature request related to a problem?

Describe the solution you would like

Recommend Projects

Recommend Topics

Recommend Org

Jobs