Continual Learning Benchmark

This repo contains the code for reproducing the results of the following papers (done as part of my Master's thesis at St Andrews):

Benchmarking Continual Learning in Sensor-based Human Activity Recognition: an Empirical Analysis [Accepted in the Information Sciences (April 2021)]
Continual Learning in Human Activity Recognition (HAR): An Emperical Analysis of Regularization [ICML workshop on Continual Learning (July 2020)]

A sub-total of 11 recent continual learning techniques have been implemented on a component-wise basis:

Maintaining Discrimination and Fairness in Class Incremental Learning (WA-MDF) [Paper]
Adjusting Decision Boundary for Class Imbalanced Learning (WA-ADB) [Paper]
Large Scale Incremental Learning (BiC) [Paper]
Learning a Unified Classifier Incrementally via Rebalancing (LUCIR) [Paper]
Incremental Learning in Online Scenario (ILOS) [Paper]
Gradient Episodic Memory for Continual Learning (GEM) [Paper]
Efficient Lifelong Learning with A-GEM [Paper]
Elastic Weight Consolidation (EWC) [Paper]
Rotated Elastic Weight Consolidation (R-EWC) [Paper]
Learning without Forgetting (LwF) [Paper]
Memory Aware Synapses (MAS) [Paper]

Additionally, the following six exemplar-selection techniques are available (for memory-rehearsal):

Herding from ICaRL [Paper]
Frank-Wolfe Sparse Regression (FWSR) [Paper]
K-means sampling
DPP sampling
Boundary-based sampling [Paper]
Sensitivity-based sampling [Paper]

Running the code

For training, please execute the runner.sh script that creates all the directories required for logging the outputs. One can add further similar commands for running further experiments.

For instance, training on ARUBA dataset with FWSR-styled exemplar selection:

>>> python runner.py --dataset 'aruba' --total_classes 11 --base_classes 2 --new_classes 2 --epochs 160 --method 'kd_kldiv_wa1' --exemplar 'fwsr' # e.g. for FWSR-styled exemplar selection

Proposed Forgetting Score

The existing forgetting measure metric [1] suffers from self-relativeness, i.e., the forgetting score will remain low throughout the training if the model did not learn much information about the class at the beginning. Class-imbalance scenarios (as in our case) further amplify its ramifications [2]. Code for our correction to the forgetting score can be found here.

Datasets

The experiments were performed on 8 publicly available HAR datasets. These can downloaded from the drive link in datasets/.

Experimental protocol

The experiments for each dataset and for each train set / exemplar size were performed on 30 random sequences of tasks. The logs in output_reports/[dataname] (created after executing the bash script) contain the performances of each individual task sequence as the incremental learning progresses. The final accuracy is then reported as the average over the 30 runs (see instructions below for evaluation).

Evaluating the logs

For evaluation, please uncomment the lines per the instructions in runner.py. This can be used to measure forgetting scores [2], base-new-old accuracies, and average report by holdout sizes.

Combination of techniques

The component-wise implementation of techniques nevertheless helps in playing with two or more techniques. This can be done by tweaking the --method argument. The table below details some of these combinations:

Technique	Argument for `--method`
Knowledge distillation with margin ranking loss (KD_MR)	kd_kldiv_mr
KD_MR with WA-MDF	kd_kldiv_mr_wa1
KD_MR with WA-ADB	kd_kldiv_mr_wa2
KD_MR with less forget constraint loss (KD_LFC_MR)	kd_kldiv_lfc_mr
KD_LFC_MR with WA-MDF	kd_kldiv_lfc_mr_wa1
KD_LFC_MR with WA-ADB	kd_kldiv_lfc_mr_wa2
Cosine normalisation with knowledge distillation	cn_kd_kldiv

Furthermore, the logits replacement tweak of ILOS and weight initialisation from LUCIR can be used with either of the above methods by simply setting the following arguments:

Technique	Argument
ILOS (with either of above)	`--replace_new_logits = True`
LUCIR-styled weight initialisation (with either of above)	`--wt_init = True`

Please feel free to play around with these. We would be interested in knowing if the combinations deliver better results for you!

Notes on incremental classes

All the experiments in our papers used number of base classes and incremental classes as 2. For replicating this, set --base_classes = 2 and --new_classes = 2.
For offline learning (i.e., without incremental training), set --base_classes to the total number of classes in the dataset and --new_classes = 0.
For experiments with permuted datasets, set --base_classes = --new_classes where --base_classes = the total number of classes in the dataset.

Verification

The implementations have been verified through runs on Split-MNIST and Permumted-MNIST - also available for download in datasets/.

Acknowledgement

Special thanks to sairin1202's implementation of BiC and Electronic Tomato's implementation of GEM/AGEM/EWC/MAS.

References

[1] Chaudhry, A., Dokania, P.K., Ajanthan, T., & Torr, P.H. (2018). Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence. ECCV.

[2] Kim, C. D., Jeong, J., & Kim, G. (2020). Imbalanced continual learning with partitioning reservoir sampling. ECCV.

Cite

If you found this repo useful in your work, please feel free to cite us:

@article{jha2021continual,
  title={Continual Learning in Sensor-based Human Activity Recognition: an Empirical Benchmark Analysis},
  author={Jha, Saurav and Schiemer, Martin and Zambonelli, Franco and Ye, Juan},
  journal={Information Sciences},
  year={2021},
  publisher={Elsevier}
}

@article{jha2020continual,
  title={Continual learning in human activity recognition: an empirical analysis of regularization},
  author={Jha, Saurav and Schiemer, Martin and Ye, Juan},
  journal={Proceedings of Machine Learning Research},
  year={2020}
}

srvcodes / continual-learning-benchmark Goto Github PK