dssg / aequitas Goto Github PK
View Code? Open in Web Editor NEWBias Auditing & Fair ML Toolkit
Home Page: http://www.datasciencepublicpolicy.org/aequitas/
License: MIT License
Bias Auditing & Fair ML Toolkit
Home Page: http://www.datasciencepublicpolicy.org/aequitas/
License: MIT License
Determine desired behavior of Aequitas methods including visualization for when group metrics equal zero and when they are NA
Edit documentation to clarify that rank_pct input values must be floats between 0 and 1
Hi,
in the Data Formatting section to preprocess input-data for Aequitas the documentation link provided does not work. Coud you please update it? Thanks.
fairness.get_fairness_measures_supported(self, input_df)
make it more flexible and less hardcoded.
actually the all fairness class should be agnostic of the actual names of columns
I'm working through using Aequitas for a bias audit and am noticing a few issues in the documentation:
The Python API preprocessing input statement is incorrect, it should be from Aequitas.preprocessing import preprocess_input_df
, i.e. preprocess_input_df should not be callable
The documentation says that attribute columns can be categorical or continuous, this is confusing in terms of when cleaning functions are called in the pre-processing step. For instance I had made age categories using the cut function that returned a categorical datatype, which caused the discretize cleaning function to be called which then threw an error. It should be clear that categorical columns must be converted to strings.
Import statements are missing for the Plot and Bias modules; the docs should include from aequitas.bias import Bias
and from aequitas.plotting import Plot
in those example calls.
The call for the visualization of the disparity treemaps doesn't work as is, it should be j = p.plot_disparity_all(....
rather than j = aqp.plot_disparity_all(...
to match the earlier use of Plot().
https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb
Under "Levels of recidivism" you write label_by_age = sns.countplot(x="sex", hue="label_value", data=df, palette=aq_palette)
and then label_by_sex = sns.countplot(x="age_cat", hue="label_value", data=df, palette=aq_palette)
. It seems like the names were switched.
Also I can make a pull request for the issues I am filing.
There is a warning associated with copying dataframes. See the bottom of the CLI output.
CLI OUTPUT:
`############################################################################
############################################################################
___ _ __
/ | ___ ____ ___ __(_) /_____ ______
/ /| |/ _ \/ __ `/ / / / / __/ __ `/ ___/
/ ___ / __/ /_/ / /_/ / / /_/ /_/ (__ )
/_/ |_\___/\__, /\__,_/_/\__/\__,_/____/
/_/
Bias and Fairness Audit Tool
/mnt/data/users/aanisfeld/venv/aequitas_env/lib/python3.4/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi.
""")
Welcome to Aequitas-Audit
Fairness measures requested: Statistical Parity,Impact Parity,FPR Parity,FDR Parity
model_id, score_thresholds 1 {'rank_abs': [150, 300, 500, 1000], 'rank_pct': [1.0, 2.0, 5.0, 10.0]}
COUNTS::: gender
F 82601
M 310418
dtype: int64
COUNTS::: race
Black 90769
Hispanic 206528
Other 42483
White 53239
dtype: int64
COUNTS::: age
18-25 138154
26-35 129956
36-45 75097
46-55 34431
55 15381
dtype: int64
audit: df shape from the crosstabs: (88, 26)
get_disparity_predefined_group()
Any NaN?: True
bias_df shape: (88, 46)
Fairness Threshold: 0.8
Fairness Measures: ['Statistical Parity', 'Impact Parity', 'FPR Parity', 'FDR Parity']
get_group_value_fairness: No Parity measure input found on bias_df
{'Unsupervised Fairness': False, 'Supervised Fairness': False, 'Overall Fairness': False}
****************** 0 0.21
1 0.79
2 0.21
3 0.79
4 0.21
5 0.79
6 0.21
7 0.79
8 0.21
9 0.79
10 0.21
11 0.79
12 0.21
13 0.79
14 0.21
15 0.79
16 0.23
17 0.53
18 0.11
19 0.14
20 0.23
21 0.53
22 0.11
23 0.14
24 0.23
25 0.53
26 0.11
27 0.14
28 0.23
29 0.53
...
58 0.35
59 0.33
60 0.19
61 0.09
62 0.04
63 0.35
64 0.33
65 0.19
66 0.09
67 0.04
68 0.35
69 0.33
70 0.19
71 0.09
72 0.04
73 0.35
74 0.33
75 0.19
76 0.09
77 0.04
78 0.35
79 0.33
80 0.19
81 0.09
82 0.04
83 0.35
84 0.33
85 0.19
86 0.09
87 0.04
Name: group_size_pct, Length: 88, dtype: object
/mnt/data/users/aanisfeld/venv/aequitas_env/lib/python3.4/site-packages/pandas-0.21.0-py3.4-linux-x86_64.egg/pandas/core/indexing.py:194: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
/mnt/data/users/aanisfeld/venv/aequitas_env/lib/python3.4/site-packages/aequitas-0.23.0-py3.4.egg/aequitas_cli/utils/report.py:126: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
aux_df.loc[idx, col] = 'Ref'`
Issue with writing new rows to table when previous rows in column had null value. Postgres expects double precision when col should be boolean per error message below.
File "/home/ubuntu/.local/lib/python3.6/site-packages/ohio/ext/pandas.py", line 96, in to_sql_method_pg_copy_to
cursor.copy_expert(sql, csv_buffer)
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type double precision: "True"
CONTEXT: COPY aequitas_group, line 1, column FOR Parity: "True"
we can copy this from the documentation but would be good for people at first glance to see how it can be used (cmd line, web demo, python)
About https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb
In the part "Visualizing a single absolute group metric across all population groups" you say "We can see from the longer bars that across 'age_cat', 'sex', and 'race' attributes, the groups COMPAS incorrectly predicts as 'low' or 'medium' risk most often are 25-45, Male, and African American.".
You display the FNR, which evaluates which groups have incorrectly been assigned 'low' or 'medium' scores as you say - but what you say about the bar length doesn't add up. Are you talking about the absolute numbers instead of the bar lengths? Or am I missing something?
Hi everyone,
I am a CS student and I am trying out your Aequitas COMPAS analysis example.
When I run https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb on my local machine (Python 3), when trying to execute the cell with code
bdf = b.get_disparity_predefined_groups(xtab, original_df=df,
ref_groups_dict={'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'},
alpha=0.05, check_significance=True,
mask_significance=True)
bdf.style
I get the error
TypeError: Input must be Index or array-like
With the following details:
get_disparity_predefined_group()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-46-a7be4e5ab5de> in <module>
2 ref_groups_dict={'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'},
3 alpha=0.05, check_significance=True,
----> 4 mask_significance=True)
5 bdf.style
C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in get_disparity_predefined_groups(self, df, original_df, ref_groups_dict, key_columns, input_group_metrics, fill_divbyzero, check_significance, alpha, mask_significance)
370 # for predefined groups, use the largest of the predefined groups as
371 # ref group for score and label value
--> 372 check_significance = df_cols.intersection(check_significance).tolist()
373
374 # compile dictionary of reference groups based on bias-augmented crosstab
C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in intersection(self, other, sort)
2391 """
2392 self._validate_sort_keyword(sort)
-> 2393 self._assert_can_do_setop(other)
2394 other = ensure_index(other)
2395
C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _assert_can_do_setop(self, other)
2590 def _assert_can_do_setop(self, other):
2591 if not is_list_like(other):
-> 2592 raise TypeError('Input must be Index or array-like')
2593 return True
2594
Does this run on your own Notebook? If so, any guesses why it doesn't work for me?
EDIT:
The problem seems to be that when I installed Aequitas using pip it did not install the up to date version? My version is 38.0 and apparently there is a newer version 38.1.
EDIT2:
Yup, it works now, updating the version of aequitas was it :)
If multiple 'score_thresholds' are configured in group crosstabs, they get squashed if passed to 'get_disparity_min_metric'.
from aequitas.preprocessing import preprocess_input_df
from aequitas.bias import Bias
from aequitas.group import Group
import pandas as pd
protected_df = pd.read_csv('compas_for_aequitas.csv')
g = Group()
score_thresholds = {'rank_abs': [25], 'rank_pct': [50]}
df, attr_cols = preprocess_input_df(protected_df)
groups_model, attr_cols = g.get_crosstabs(df, score_thresholds=score_thresholds, model_id=45, attr_cols=attr_cols)
bias = Bias()
bias_df = bias.get_disparity_min_metric(groups_model, df)
bias_df
The bias_df in this case does end up with more rows than if we were to only give it one score threshold, but they all have the same resulting score_threshold and k values.
This problem does not seem to apply to get_disparity_major_group.
when trying to specify the 'for' and 'fpr' columns for the check_significance argument of the get_disparity_predefined_groups method of the Bias class, I am running into a number of key errors for columns that have not been included ('fpr', 'precision', and 'tpr' to start). It seems that the user cannot specify a small number of cols for check significance
A basic installation of the aequitas
distribution should only satisfy the requirements of the aequitas
package:
pip install aequitas
Having done the above, import aequitas_cli
should raise an ImportError
(either because the package is not installed or because it raises this error due to unsatisfied dependencies).
To install the CLI, (etc.):
pip install aequitas[cli]
This issue it's about creating a new class maybe named "Individual" that implements individual notions of fairness based on label differences (impurities) for similar individuals. Each method of the class just needs a list of dataframes as input (let's consider that in the future we might want to compare multiple train/test sets labels) and finds similar data points and then look to the label distribution of the pair/cluster.
On the fairness tree, it asks whether we trust the labels.
Could we use this for regression tasks? I thought discrete class labels are for classification tasks.
Kind regards,
Alex
Add methods for calculating statistical significance of scores, labels, TP, TF, FP, FN
In https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb
Under "Visualizing default absolute group metrics across all population groups"
Under "Default absolute group metrics"
You say "We can also see that the model is equally likely to predict a woman as 'high' risk as it is for a man (false positive rate FPR of 0.32 for both Male and Female)."
As far as I understood the rate that measures the likelyhood to predict someone as 'high' risk is the PPR (Predicted Positive Rate) and the FPR measures how likely it is that someone was incorrectly predicted as 'high' risk. Did you mean to say "equally likely to incorrectly predict"?
Hi everyone,
I am analyzing whether an API performs equally well for all groups. So I have a data set where I have continuous "score" values between 0 and 1. They are predictions made by the API for it's result being correct. Then there is the "label_value" which is either 0 (meaning that the API result was incorrect) or 1 (meaning that the API result was correct).
Now if I calculate the true positives for instance for women by hand I do:
f_tp = df[((df['sex'] == 'Female') & (df['label_value'] == 1) & (df['score'] >= t))]
where t is my treshold of 0.8. So I consider a score of 0.8 to be a prediction of being correct.
I do similar stuff to calculate tn, fp, fn:
f_tn = df[((df['sex'] == 'Female') & (df['label_value'] == 0) & (df['score'] < t))]
f_fp = df[((df['sex'] == 'Female') & (df['label_value'] == 0) & (df['score'] >= t))]
f_fn = df[((df['sex'] == 'Female') & (df['label_value'] == 1) & (df['score'] < t))]
And I obtain:
Now I would like to make these calculations automatically using Aequitas.
g = Group()
xtab, _ = g.get_crosstabs(df, attr_cols=["sex"], score_thresholds= {'score': [0.8]})
In the result, I get values similar to those I got by hand but they are switched around :
Here, tp is what I considered to be fn and fn is what I considered to be tp. What I considered to be tn is fp and what I considered to be fp is tn.
I can't quite wrap my mind around this. What am I missing? I think I might be confusing something? I'd be so happy if you could give me a hint.
Thanks!
May want to have an option return of information based on statistical significance parameters and outputs such as:
According to Verma & Rubin 2018 PPV + FDR = 1 and thus if there is PPV Parity there is also FDR Parity.
However, Aequitas sais there is no FDR Parity, but PPV Parity:
I understand that using Aequitas it looks like there was no FDR Parity because Aequitas looks at relative disparities while Verma & Rubin talk about absolute differences. Why does Aequitas prefer relative disparities?
Aequitas CLI config file currently requires the DB credentials in the file. Should be updated to reference an external secrets file.
Test: test_all_0_scores_4 fails.
The reason for it is that in group.get_crosstabs there is a line calling for df['score'].value_counts()[1.0]
, however the values are all 0s and so there is no threshold.
Not sure if the fix is to change the test or the check in group.
Here: https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb
Under "How do I assess model fairness?" > "Pairities Calcuated" it sais
Predicted Positive Ratio k Parity | Statistical Parity |
---|
According to your definitions Predicted Positive Ratio k is PPR and Predicted Positive Ratio g is PPrev. So Aequitas defines Statistical Parity as PPR Parity That is not the definition according to Verma & Rubin 2018, who say Statistical Parity was PPrev Parity. Then again, if taking Statistical Parity synonymous to Demographic Parity, your definition of it being PPR Parity makes more sense I think.
So, any thoughts on this?
attrs_cols = ['age']
score_col = 'predict_proba'
label_col = 'label'
Hi, sorry to be bothering you again!
I am currently looking at src/aequitas/bias.py to find out how the significance is calculated. Here is what I think I understood:
I have two questions:
1. Is this correct?
I didn't understand what you mean with "sample" group at first. Now I understand that you mean the lists of binary encoded values that say for each entry of each group whether the entry belongs to whichever measure is relevant, fpr or fnr. It canalso be a list of scores, right?
2. Do we not check whether the ref. group is normally distributed? If not, why not? If we do, where?
I found it! It is in an if condition
Line 521 in a61ef33
I try executing the following code:
bdf = b.get_disparity_predefined_groups(xtab, original_df=df,
ref_groups_dict={'race':'Caucasian'},
alpha=0.05, check_significance=True,
mask_significance=False)
bdf.style
but it raises an Attribute Error with the following details:
get_disparity_predefined_group()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-22-8a5ae26f1e35> in <module>
2 ref_groups_dict={'race':'Caucasian'},
3 alpha=0.05, check_significance=True,
----> 4 mask_significance=False)
5 bdf.style
C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in get_disparity_predefined_groups(self, df, original_df, ref_groups_dict, key_columns, input_group_metrics, fill_divbyzero, check_significance, alpha, mask_significance, selected_significance)
439 self._get_statistical_significance(
440 original_df, df, ref_dict=full_ref_dict, score_thresholds=None,
--> 441 attr_cols=None, alpha=5e-2, selected_significance=selected_significance)
442
443 # if specified, apply T/F mask to significance columns
C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in _get_statistical_significance(cls, original_df, disparity_df, ref_dict, score_thresholds, attr_cols, alpha, selected_significance)
745 for name, func in binary_col_functions.items():
746 func = func(thres_unit, 'label_value', thres_val)
--> 747 original_df.loc[:, name] = original_df.apply(func, axis=1)
748
749 # add columns for error-based significance
C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
6485 args=args,
6486 kwds=kwds)
-> 6487 return op.get_result()
6488
6489 def applymap(self, func):
C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
149 return self.apply_raw()
150
--> 151 return self.apply_standard()
152
153 def apply_empty_result(self):
C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
255
256 # compute the result using the series generator
--> 257 self.apply_series_generator()
258
259 # wrap results
C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
284 try:
285 for i, v in enumerate(series_gen):
--> 286 results[i] = self.f(v)
287 keys.append(v.name)
288 except Exception as e:
C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in <lambda>(x)
734
735 binary_score = lambda rank_col, label_col, thres: lambda x: (
--> 736 x[rank_col] <= thres).astype(int)
737
738 binary_col_functions = {'binary_score': binary_score,
AttributeError: ("'bool' object has no attribute 'astype'", 'occurred at index 0')
It works if I set check_significance=False
.
My data frame:
entity_id int64
race object
score float64
label_value float64
rank_abs int32
rank_pct float64
dtype: object
Any ideas why this is? I have the up to date Aequitas version this time.
every data points with score > rank_val will be considered 1s
_assemble_ref_groups needs to create a list of reference groups per model_id otherwise it crashes when we have multiple model_ids in the bias df
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.