viadee / javaanchoradapters Goto Github PK

View Code? Open in Web Editor NEW

5.0 11.0 0.0 6.63 MB

Getting the Anchors Explainer to work in Different Settings

License: BSD 3-Clause "New" or "Revised" License

Java 100.00%

ai java anchors machine-learning explainability

javaanchoradapters's Introduction

JavaAnchorAdapters

Adapter [/əˈdaptə/] noun, a device for connecting pieces of equipment that cannot be connected directly.

This is a collection of tools that serve to make the Java implementation of the Anchors algorithm more easy to use. The algorithm (as introduced Marco Tulio Ribeiro, 2018) is model-agnostic, but the nature of the dataset needs to be considered.

This repository includes methodological aspects, i.e. default approaches on how to apply the algorithm to tabular data in typical use cases with tabular data (such as bpmn.ai), images or texts as well as technical aspects, such as running Anchors explanations on Apache Spark.

This project is to be considered research-in-progress.

JavaAnchorAlgorithm Repository

For more information on Anchors and this implementation, see main repository.

Exemplary Use / Tutorial

Examples of using the Anchors implementation and its various adapters are provided within the XAI Examples repository. Please refer to this project for tutorials and easy-to-run applications.

Collaboration

The project is operated and further developed by the viadee Consulting AG in Münster, Westphalia. Results from theses at the WWU Münster and the FH Münster have been incorporated.

Further theses are planned: Contact person is Dr. Frank Köhne from viadee. Community contributions to the project are welcome: Please open Github-Issues with suggestions (or PR), which we can then edit in the team. For general discussions please refer to the main repository.
We are looking for further partners who have interesting process data to refine our tooling as well as partners that are simply interested in a discussion about AI in the context of business process automation and explainability.

javaanchoradapters's People

Contributors

Stargazers

Watchers

javaanchoradapters's Issues

Random Search not working for all global explainers

Currently random search only works with the Coverage Pick global explanations. For future inclusion of Magix it needs to be applicable to all global explainers.

Random search number of executions as termination condition

Anchors rules with transformed value and not discretized value

When running Anchors the rules it provides always contain the discretized value instead of the transformed value. For understandability we should use the transformed value instead. Example:
Instance:
Sex='male'
Survived='TRUE'
Rule:
IF Sex='male"
THEN PREDICT 1

Here TRUE is discretized with 1 but it should state TRUE in the Rule instead of 1.

Titanic tutorial?

The readme file would really benefit from a well known example including at least one reasonable anchor (and the steps required to re-create it).

Log Random Search iteration results in csv file

Discretization bug of 0-1 distribution

Wrong discretization of distributions between 0 and 1
Endless loop bug when min of relation is max bound of other

Implement unsupervised & non-parameterized discretizer

It would be a Nice-To-Have to add an unsupervised discretizer without any parameters or userinput because this would be easier to understand for the average user. If such a discretizer exists it should be called or implemented.

New Performance Measures

Random Search for Hyperparameter Optimization

Enable Supervised Discretization

Supervised Discretization might enable better classifications. In the current implementation the discretizers only have the continuous variable as a parameter. Supervised discretization additionally needs the target column.

Time conditional termination of Random Search

At given time budget terminate Random Search when reached.

Invalid POM

de.viadee.xai.anchor:DefaultConfigsAdapter has issues in it's pom which is eliminating transitive dependencies.

Option to start RS with default values for hyperparameters

Automate tabular data initialization

detect column data type
create AnchorTabular.Builder with all columns

Tabular adapter is saving data in rows

The tabular adapter is saving all data in rows. maybe it would be more efficient to save data per column

Coverage is skewed towards local instances

Coverage is calculated based upon the perturbation function.
The tabular perturbation function changes the non-fixed features randomly, given a predefined probability. As soon as this specified probability is < 1, non representative instances get generated, as the instance's value appear statistically more often.
This returns in a coverage that is not representative regarding the whole dataset.
If we, however, change the probability to 1 for generating coverage perturbations, we might violate anchor's theoretical description but make rules more intuitive.

Performance measure of rules does not work after refactor of discretization

New disretization resulted in bugs in the measurement of the performance of a given rule(-set)

Tabular perturbation is inaccurate when using discretization

The default tabular perturbation function currently takes a random instance and replaces the perturbed instance's values by the non-fixed feature values of the other instance.
The fixed values remain unchanged.
This is inaccurate when using discretization and even the fixed values should randomly change within their discretized class.

Overview of tabular preprocessing

As more operations such as transformations and discretizations are being automized, it gets harder for the user to comprehend how the dataset is actually being preprocessed.
There should be some kind of overview (e.g. R summary style) after the preprocessing steps to facilitate data understanding.

Hyperparameter spaces not immutable

The best hyperparameter space found by random search always points to the same parameter configuration and not the configuration of the best hyperparameter space object. List of parameters needs to be cloned.