GithubHelp home page GithubHelp logo

extreme-classification / eclare Goto Github PK

View Code? Open in Web Editor NEW
41.0 8.0 7.0 72 KB

ECLARE: Extreme Classification with Label Graph Correlations

License: MIT License

Python 96.84% Perl 0.43% Shell 2.73%
extreme-classification multi-label-classification label-text graph-correlation label-correlations gcn meta-data python machine-learning deeplearning

eclare's Introduction

ECLARE

@InProceedings{Mittal21b,
	author       = "Mittal, A. and Sachdeva, N. and Agrawal, S. and Agarwal, S. and Kar, P. and Varma, M.",
	title        = "ECLARE: Extreme classification with label graph correlations",
	booktitle    = "Proceedings of The ACM International World Wide Web Conference",
	month = "April",
	year = "2021",
	}

SETUP WORKSPACE

mkdir -p ${HOME}/scratch/XC/data 
mkdir -p ${HOME}/scratch/XC/programs

SETUP ECLARE

cd ${HOME}/scratch/XC/programs
git clone https://github.com/Extreme-classification/ECLARE.git
conda create -f ECLARE/eclare_env.yml
conda activate eclare
git clone https://github.com/kunaldahiya/pyxclib.git
cd pyxclib
python setup.py install
cd ../ECLARE

DOWNLOAD DATASET

cd ${HOME}/scratch/XC/data
gdown --id <dataset id>
unzip *.zip
dataset dataset id
LF-AmazonTitles-131K 1VlfcdJKJA99223fLEawRmrXhXpwjwJKn
LF-WikiSeeAlsoTitles-320K 1edWtizAFBbUzxo9Z2wipGSEA9bfy5mdX
LF-AmazonTitles-1.3M 1Davc6BIfoTIAS3mP1mUY5EGcGr2zN2pO

RUNNING ECLARE

cd ${HOME}/scratch/XC/programs/ECLARE
chmod +x run_ECLARE.sh
./run_ECLARE.sh <gpu_id> <ECLARE TYPE> <dataset> <folder name>
e.g.
./run_ECLARE.sh 0 ECLARE LF-AmazonTitles-131K ECLARE_RUN

YOU MAY ALSO LIKE

eclare's People

Contributors

anshumitts avatar deepaksaini119 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eclare's Issues

Application to long document XC

Thanks for this amazing work.

I am curious to hear your thoughts on if and how this method can be applied to long document classification (>10K tokens, like e.g. patent applications) instead of short text. At first sight it seems the simple composition of token embeddings in the document embedding module may be problematic, and may require at least something like a conv layer. Are there other components in the model architecture that you suspect would be a problem? Have you maybe tried the method on longer documents than product titles? I would be grateful for any insights you can share.

Cheers!

Input data format

Hi,

I am trying to apply your method to my dataset, and attempting to figure out the format of the input data from reading the code and looking at the example datasets (LF-AmazonTitles-131K). Some documentation for this would be greatly appreciated, but I can mostly figure it out. Only the filter_labels_train.txt and filter_labels_test.txt matrices are not clear to me. If you could briefly explain what these matrices are and why they are needed, that would be great.

Thanks in advance!

ablout the 1-vs-all label classifier Wl

GZ5VPURXKDGYG Q64C_(7NY
Hi! Eclare is absolutely an amazing work!
After reading I was a little bit confused about the one-vs-all label classifier Wl. As you mentioned in the paper, Wl is a vector generated by z1, z2, and refinement vector z3.
So I was wondering how it could accomplish the classification task(like input a sentence and judge whether it belongs to the specific label)? It is just a label's feature vector.
Thank u soooooooo much!: D

How to obtain trn_X_Xf.txt, trn_X_Y.txt, tst_X_Xf.txt, tst_X_Y.txt

Hi,

To run run_ECLARE.sh, it seems four additonal data files trn_X_Xf.txt, trn_X_Y.txt, tst_X_Xf.txt, tst_X_Y.txt is needed. However, I cannot find them from the provided download link for all three datasets (and also not in the dataset webpage http://manikvarma.org/downloads/XC/XMLRepository.html). Could you please point me to where the data files are, or how can I generate the input data?

Also, in run_ECLARE.sh line 32, json.load(open('${model_type}/$dataset.json'). May I ask where I can find this JSON file?

Thanks a lot!

What if #labels in training set is different than #labels in test set?

Hi,

I realized that the datasets used for XMC models have the same number of labels in both training & test set. I'm wondering what happens if they are different. I tried to run ECLARE model on a custom dataset that satisfies the previous condition (#labels_train != #labels_test) and the mismatch dimension issue crops up.

Why should #labels_train & #labels_test be equal? Do I need to tweak the code to resolve this problem?
Many thanks,

Error Dimensions

Hello,

I tried ECLARE model on custom dataset for multilabel classification task. I encountered the following error after running the code:

Traceback (most recent call last):
File "/root/scratch/XC/programs/ECLARE/ECLARE/main.py", line 213, in
main(args.params)
File "/root/scratch/XC/programs/ECLARE/ECLARE/main.py", line 194, in main
train(model, params)
File "/root/scratch/XC/programs/ECLARE/ECLARE/main.py", line 58, in train
model.fit(
File "/root/scratch/XC/programs/ECLARE/ECLARE/libs/model_base.py", line 373, in fit
self._fit(train_dataset, valid, model_dir, result_dir, validate_after)
File "/root/scratch/XC/programs/ECLARE/ECLARE/libs/model_base.py", line 305, in _fit
self._train_depth(train_ds, valid_ds, model_dir,
File "/root/scratch/XC/programs/ECLARE/ECLARE/libs/model_base.py", line 288, in _train_depth
tr_avg_loss = self._step(train_dl)
File "/root/scratch/XC/programs/ECLARE/ECLARE/libs/model_base.py", line 196, in _step
loss = self._compute_loss(out_ans, batch_data)
File "/root/scratch/XC/programs/ECLARE/ECLARE/libs/model_base.py", line 183, in _compute_loss
return self.criterion(out_ans, _true).to(device)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/loss.py", line 713, in forward
return F.binary_cross_entropy_with_logits(input, target,
File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 3130, in binary_cross_entropy_with_logits
raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))

I have no idea why the target size mismatch the input size?!
Many thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.