TagRec

##Towards A Standardized Tag Recommender Benchmarking Framework

Description

The aim of this work is to provide the community with a simple to use, generic tag-recommender framework written in Java to evaluate novel tag-recommender algorithms with a set of well-known std. IR metrics such as MAP, MRR, P@k, R@k, F1@k and folksonomy datasets such as BibSonomy, CiteULike, LastFM or Delicious and to benchmark the developed approaches against state-of-the-art tag-recommender algorithms such as MP, MP_r, MP_u, MP_u,r, CF, APR, FR, GIRP, GIRPTM, etc.

Furthermore, it contains algorithms to process datasets (e.g., p-core pruning, leave-one-out splitting and LDA topic creation).

The software already contains three novel tag-recommender approaches based on cognitive science theory. The first one (3Layers) uses topic information and is based on the ALCOVE theory (Krutschke et al., 1992). The second one (BLL+C) uses time information is based on the ACT-R theory (Anderson et al., 2004). The third one (3LT) is a combination of the former two approaches and integrates the time component on the level of tags and topics.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Please cite the papers if you use this software in one of your publications.

Download

The source-code can be directly checked-out through this repository. It contains an Eclipse project to edit and build it and an already deployed .jar file for direct execution. Furthermore, the folder structure that is provided in the repository is needed, where csv is the input directory and metrics is the output directory in the data folder. Both of these directories contain subdirectories for the different datasets:

bib_core for BibSonomy
cul_core for CiteULike
flickr_core for Flickr
wiki_core for Wikipedia (based on bookmarks from Delicious)

How-to-use

The tagrecommender .jar uses three parameters: First the algorithm:

3layers for 3Layers (based on ALCOVE theory) (Seitlinger et al., 2013)
3LT for the time-based 3Layers on the levels of tags and topics (Kowald et al., 2014a)
bll_c for BLL and BLL+C (based on ACT-R theory) (Kowald et al., 2014b)
lda for Latent Dirichlet Allocation (Krestel et al., 2009)
cf for Collaborative Filtering (Jäschke et al., 2007)
fr for Adapted PageRank and FolkRank (Hotho et al., 2006)
girptm for GIRP and GIRPTM (Zhang et al., 2012)
mp for MostPopular tags
mp_u_r for MostPopular tags by user and/or resource (Jäschke et al., 2007)
core for calculating p-cores on a dataset
split for splitting a dataset into training and test-sets using a leave-one-out method
lda_samples for creating LDA topics for the resources in a dataset

, second the dataset(-directory):

bib for BibSonomy
cul for CiteULike
flickr for Flickr
wiki for Wikipedia (based on bookmarks from Delicious)

and third the filename (without file extension)

Example: java -jar tagrecommender.jar bll_c bib bib_sample

Input format

The input-files have to be placed in the corresponding subdirectory and are in csv-format (file extension: .txt) with 5 columns (quotation marks are mandatory):

User
Resource
Timestamp in seconds
List of tags
List of categories (optional)

Example: "0";"13830";"986470059";"deri,web2.0,tutorial,www,conference";""

There are three files needed:

one file for training (with _train suffix)
one file for testing (with _test suffix)
one file that first contains the training-set and then the test-set (no suffix - is used for generating indices for the calculations)

Example: bib_sample_train.txt, bib_sample_test.txt, bib_sample.txt (combination of train and test file)

Output format

The output-file is generated in the corresponding subdirectory and is in csv-format with 5 columns:

Recall
Precision
F1-score
Mean Reciprocal Rank
Mean Average Precision

for k = 1 to 10 (each line is one k)

Example: 0,5212146123336273;0,16408544726301685;0,22663857529082376;0,26345775109372344;0,3242776089324113

References

D. Kowald, P. Seitinger, C. Trattner, and T. Ley.: Forgetting the Words but Remembering the Meaning: Modeling Forgetting in a Verbal and Semantic Tag Recommender, 2014a. (under review)
D. Kowald, P. Seitlinger, C. Trattner, and T. Ley. Long Time No See: The Probability of Reusing Tags as a Function of Frequency and Recency. In Proceedings of the 23rd international conference on World Wide Web, WWW '14, Seoul, Korea, 2014b. ACM.
P. Seitinger, D. Kowald, C. Trattner, and T. Ley.: Recommending Tags with a Model of Human Categorization. In The ACM International Conference on Information and Knowledge Management (CIKM 2013), ACM, New York, NY, USA, 2013.
A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In The semantic web: research and applications, pages 411–426. Springer, 2006.
L. Zhang, J. Tang, and M. Zhang. Integrating temporal usage pattern into personalized tag prediction. In Web Technologies and Applications, pages 354–365. Springer, 2012.
R. Jäschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag recommendations in folksonomies. In Knowledge Discovery in Databases: PKDD 2007, pages 506–514. Springer, 2007.
R. Krestel, P. Fankhauser, and W. Nejdl. Latent dirichlet allocation for tag recommendation. In Proceedings of the third ACM conference on Recommender systems, pages 61–68. ACM, 2009.
J. R. Anderson, M. D. Byrne, S. Douglass, C. Lebiere, and Y. Qin. An integrated theory of the mind. Psychological Review, 111(4):1036–1050, 2004.
J. K. Kruschke et al. Alcove: An exemplar-based connectionist model of category learning. Psychological review, 99(1):22–44, 1992.

Contact

Dominik Kowald, Know-Center, Graz University of Technology, [email protected]
Christoph Trattner, Know-Center, Graz University of Technology, [email protected]

ctrattner / tagrec Goto Github PK

tagrec's Introduction

TagRec

Description

Download

How-to-use

Input format

Output format

References

Contact

tagrec's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs