Machine Learning Reproducibility Challenge (MLRC) of PT4AL

▶ Project Objective

Review the reproducibility of previously published papers.
We experimented by adding a dataset that was not in the official code of the selected paper and adding the code required for the pretext task.

▶ Selected Papers and Official codes

▶ Team Member


MinSeok Yoon	Sohee Bae	Seonyoung Yoon

Overview

This paper proposes an Active learning method using Self-Supervised Pretext Tasks.
First, they learn the representation of unlabeled data using Pretext Task, which initializes the main task model.
Since then, they have proposed an Active learning method that utilizes the loss of Pretext Task to select difficult and representative data.

Environmnet Setting

Required Computer Specifications

GPU CUDA CUDNN

GTX-2080TI * 8 11.4 8.2
Python version is 3.9.
Installing all the requirements may take some time. After installation, you can run the codes.
Please notice that we used 'PyTorch' and device type as 'GPU'.
We utilized 8 GPUs in our implementation. If the number of GPUs differs, please adjust the code accordingly based on the specific situation.
[requirements.txt] file is required to set up the virtual environment for running the program. This file contains a list of all the libraries needed to run your program and their versions.

In Anaconda Environment,
```
$ conda create -n [your virtual environment name] python=3.9

$ conda activate [your virtual environment name]

$ pip install -r requirements.txt
```
Create your own virtual environment.
Activate your Anaconda virtual environment where you want to install the package. If your virtual environment is named 'test', you can type conda activate test.
Use the command pip install -r requirements.txt to install libraries.

GPU	CUDA	CUDNN
GTX-2080TI * 8	11.4	8.2

Prerequisites

To generate train and test dataset:

python make_data.py

Create the Cifar10, Imbalanced_Cifar10, Caltech101 folder required for the experiment

To train the pretext task(rotation, colorization) on the unlabeled set:

python rotation.py --task rotation --dataset Cifar10

python colorization.py --task rotation --dataset Cifar10

The dataset and pretext task can be set as an argument, the available datasets include Cifar10, Imbalanced_Cifar10, Caltech101, and the available pretext tasks include rotation, colorization.

To extract pretext task losses and create batches:

python make_batches.py --task rotation --dataset Caltech101

The dataset and the prext task are set as arguments to extract the loss of the prext task and divide the batches required for learning.
The available datasets are Cifar10, Imbalanced_Cifar10, Caltech101, and the available pretext tasks are rotation, colorization.
Sort the loss of the pretext task of each data in ascending order, then divide it into 10 batches and save it as a text file.
loss_{dataset}_{task} folder : You can check each batch.

{task}_loss_{dataset}.txt: You can check the loss for the task of each dataset.

├── loss_{dataset}_{task}
│   ├── batch_0.txt
│   │    ...
│   ├── batch_9.txt

Each text file contains, in sequential order, the paths of the corresponding data. An excerpt from an example text file is provided below(ex.loss_Cifar10_rotation/batch_0.txt):
```
./Cifar10/DATA/train/5/38343.png
./Cifar10/DATA/train/0/29348.png
./Cifar10/DATA/train/5/22390.png
...
```

Running the Code

To evaluate on PT4AL task:

python main.py --dataset Cifar10 --task rotation

Experiments on PT4AL proposed by the paper.
Configure the initial sample with the batch of pretext task loss built in advance, and then conduct active learning by sampling data from the loss batch in subsequent cycles.
The available datasets are Cifar10, Imbalanced_Cifar10, Caltech101, and the available pretext tasks are rotation, colorization.
Results for the experiment are found in main_best_{dataset}_{task}.txt.

To evaluate on random sampling active learning task:

python main_random.py --dataset Cifar10

Experiment on random sampling for comparison with the proposed method.
In the case of initial samples, the data sampled at the time of the cycle is also randomly extracted and active learning is carried out.
The available datasets are Cifar10, Imbalanced_Cifar10, Caltech101.
Results for the experiment are found in main_best_{dataset}_random.txt.

Correlation Plot

To extract a classification loss with superivsed learning:

python main_supervised.py --dataset Cifar10

In order to verify the assumption of this paper, 'H1: Pretext task loss is correlated with the main task loss', we extract the loss of the downstream task.
The available datasets are Cifar10, Imbalanced_Cifar10, Caltech101.

To determine the correlation between main task loss and pretext task loss:

python correlation_plot.py --dataset Cifar10 --task rotation

It is possible to check the correlation plot between classification loss, which is the main task, and each pretext task loss. (Correlation plot for the entire dataset, and correlation plot sampled 1000 by random)
The available datasets are Cifar10, Imbalanced_Cifar10, Caltech101, and the available pretext tasks are rotation, colorization.

Result

Active learning result

For the CIFAR10 dataset, both PT4AL methods began with about 30% accuracy and reached approximately 90% accuracy by the 10th cycle. Colorization performed better than rotation. The random sampling method started with 30% accuracy and reached 70% by the 10th cycle, but performed worse than the PT4AL methods.
For the Caltech101 dataset, both PT4AL methods started with about 10% accuracy and reached around 90% accuracy by the 10th cycle, with no significant difference between the two methods. The random sampling method started with 20% accuracy but showed lower performance than the PT4AL methods in the final cycle.
For the imbalanced CIFAR10 dataset, both PT4AL methods started at about 30% accuracy and reached approximately 80% in the end. Colorization generally showed higher performance than rotation.

Correlation result

rotation

The plots were generated by sampling 1,000 data points, as was done in the original paper, to visualize these correlations.

The correlation between the pretext task loss for rotation and the main task loss was very low across all datasets tested: CIFAR10 ($\rho = 0.44$), Caltech101 ($\rho = 0.17$), and Imbalanced Cifar10 ($\rho = 0.42$). These results differed from the original paper, which reported higher correlations: CIFAR10 ($\rho = 0.79$) and Caltech101 ($\rho = 0.78$).

colorization

The correlation between the pretext task loss for colorization and the main task loss was very low across all datasets tested: CIFAR10 ($\rho = -0.06$), Caltech101 ($\rho = 0.00$), and Imbalanced Cifar10 ($\rho = -0.14$).

Citation

@inproceedings{yi2022using,
  title = {Using Self-Supervised Pretext Tasks for Active Learning},
  author = {Yi, John Seon Keun and Seo, Minseok and Park, Jongchan and Choi, Dong-Geol},
  booktitle = {Proc. ECCV},
  year = {2022},
}

bae-sohee / machine_learning_reproducibility_challenge Goto Github PK

machine_learning_reproducibility_challenge's Introduction

Machine Learning Reproducibility Challenge (MLRC) of PT4AL

▶ Project Objective

▶ Selected Papers and Official codes

▶ Team Member

Overview

Environmnet Setting

In Anaconda Environment,

Prerequisites

Running the Code

Correlation Plot

Result

Active learning result

Correlation result

rotation

colorization

Citation

machine_learning_reproducibility_challenge's People

Contributors

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org

Jobs