guoguibing / librec Goto Github PK

LibRec: A Leading Java Library for Recommender Systems, see

License: Other

Java 96.51% Shell 0.13% Batchfile 0.19% Python 0.06% Scala 3.12%

recommender-systems recommendation-algorithms collaborative-filtering matrix-factorization tensor-factorization probabilistic-graphical-models recommender systems factorization matrix

librec's Introduction

LibRec (https://guoguibing.github.io/librec/index.html) is a Java library for recommender systems (Java version 1.7 or higher required). It implements a suit of state-of-the-art recommendation algorithms, aiming to resolve two classic recommendation tasks: rating prediction and item ranking.

LibRec Demo

A movie recommender system is designed and available here.

Documentation

Please refer to LibRec Documentation and API Documentation

Authors Words about the NEW Version

It has been a year since the last version was released. In this year, lots of changes have been taken to the LibRec project, and the most significant one is the formulation of the LibRec team. The team pushes forward the development of LibRec with the wisdom of many experts, and the collaboration of experienced and enthusiastic contributors. Without their great efforts and hardworking, it is impossible to reach the state that a single developer may dream of.

LibRec 2.0 is not the end of our teamwork, but just the begining of greater objectives. We aim to continously provide NEXT versions for better experience and performance. There are many directions and goals in plan, and we will do our best to make them happen. It is always exciting to receive any code contributions, suggestions, comments from all our LibRec users.

We hope you enjoy the new version!

PS: Follow us on WeChat to have first-hand and up-to-date information about LibRec.

Features

Rich Algorithms: More than 70 recommendation algorithms have been implemented, and more will be done.
High Modularity: Six main components including data split, data conversion, similarity, algorithms, evaluators and filters.
Great Performance: More efficient implementations than other counterparts while producing comparable accuracy.
Flexible Configuration: Low coupling, flexible and either external textual or internal API configuration.
Simple Usage: Can get executed in a few lines of codes, and a number of demos are provided for easy start.
Easy Expansion: A set of recommendation interfaces for easy expansion to implement new recommenders.

The procedure of LibRec is illustrated as follows.

Download

by maven

<dependency>
    <groupId>net.librec</groupId>
    <artifactId>librec-core</artifactId>
    <version>2.0.0</version>
</dependency>

by packages

Execution

You can run LibRec with configurations from command arguments:

librec rec -exec -D rec.recommender.class=itemcluster -D rec.pgm.number=10 -D rec.iterator.maximum=20

or from a configuration file:

librec rec -exec -conf itemcluster-test.properties

Code Snippet

You can use LibRec as a part of your projects, and use the following codes to run a recommender.

public void main(String[] args) throws Exception {
	
	// recommender configuration
	Configuration conf = new Configuration();
	Resource resource = new Resource("rec/cf/userknn-test.properties");
	conf.addResource(resource);

	// build data model
	DataModel dataModel = new TextDataModel(conf);
	dataModel.buildDataModel();
	
	// set recommendation context
	RecommenderContext context = new RecommenderContext(conf, dataModel);
	RecommenderSimilarity similarity = new PCCSimilarity();
	similarity.buildSimilarityMatrix(dataModel, true);
	context.setSimilarity(similarity);

	// training
	Recommender recommender = new UserKNNRecommender();
	recommender.recommend(context);

	// evaluation
	RecommenderEvaluator evaluator = new MAEEvaluator();
	recommender.evaluate(evaluator);

	// recommendation results
	List recommendedItemList = recommender.getRecommendedList();
	RecommendedFilter filter = new GenericRecommendedFilter();
	recommendedItemList = filter.filter(recommendedItemList);
}

News Report

An Introduction to Open-source Recommendaion Toolkit: LibRec [by ResysChina in Chinese]
LibRec: an Open-source and Cross-platform Software for Recommender Systems [by InfoQ in Chinese]

Acknowledgement

We would like to express our appreciation to the following people for contributing source codes to LibRec, including Prof. Robin Burke, Bin Wu, Diego Monti, Ge Zhou, Li Wenxi, Marco Mera, Ran Locar, Shawn Rutledge, ShuLong Chen, Tao Lian, Takuya Kitazawa, Zhaohua Hong, Tan Jiale, Daniel Velten, Qian Shaofeng, etc. We gratefully thank Mr. Lijun Dai for designing and contributing the logo of LibRec, and also many thanks to Mr. Jianbin Zhang for implementing and sharing a LibRec demo.

We also appreciate many others for reporting bugs and issues, and for providing valuable suggestions and support.

Publications

Please cite the following papers if LibRec is helpful to your research.

G. Guo, J. Zhang, Z. Sun and N. Yorke-Smith, LibRec: A Java Library for Recommender Systems, in Posters, Demos, Late-breaking Results and Workshop Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP), 2015.
G. Guo, J. Zhang and N. Yorke-Smith, TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings, in Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI), 2015, 123-129.
Z. Sun, G. Guo and J. Zhang, Exploiting Implicit Item Relationships for Recommender Systems, in Proceedings of the 23rd International Conference on User Modeling, Adaptation and Personalization (UMAP), 2015.

GPL License

LibRec is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. LibRec is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with LibRec. If not, see http://www.gnu.org/licenses/.

librec's People

Contributors

Stargazers

Watchers

Forkers

newlionwang timedcy mindis disc5 imgaara wachaong xiaolubian louis72 panjunchizju ichakroun nksg tomzhang chagge weixiaohua yuwentao operaculus nul1miao maymaomao yucc136 hunter99 zdx z595054650 hezila lenovor njuhugn ninjahzw ndscigdata xincode biaochangb wubinzzu kuyezhiying chenzhen zhangdianlei jeppe 466152112 helloworld163 jiangxiluning lexiao811 beifeizhou rbaral shirazu dongjingwang jelychow t0903 lamalu111232 wuntoguo zhangchaoltt irecsys xiaohangcom jimmy-dq beidouw bidexbido cswangjing dungld yindafeidhgate hzhaoaf milesqli geledek seichis linving zehsilva naamii lemonflow xxqcheers zwyanswer yyghefei alonsa yanlz davegeneral rainy-rainy tcsong anikacyp answer1992 nelsonwhf zuishusheng philip85517 yfsun1 oldray xjzhou ericxsun xingzhixi szywind zhoujialinmumu algorithmfan tmp2 vangogh0318 lviiii gucasbrg jiangwm litoupu sensirly just4jc litowang meghanathmacha swati7 fangzheng354 rocketballs whuxiaoqiang luyee graytowne

librec's Issues

.

edge case of Mean Squared Difference similarity

Line 103 sets the similarity value to the inverse of mean squared difference. It is in line with the description in page 214 of the article Shardanand and Maes [1995] that says "the weights are inverse proportional to the dissimilarity".

LIne 105 sets it to 1.0 if the inverse of mean squared difference is infinity when the two vectors are exactly the same, which is rational. But the value is left the same if it is larger than 1.0 but not positive infinity. In this case, the similarity will be larger than 1.0 (the similarity value when the two vectors are exactly the same), which seems irrational.

default predicted rating of UserKNN: global mean or user-specific mean?

In the model UserKNN, whether the default predicted rating is global mean or user-specific mean when nns.size() == 0 or ws == 0?
Line 103 in UserKNN.java
Line 122 in UserKNN.java

There is the same issue with respect to ItemKNN: whether global mean or item-specific mean?

The results of ranking.LRMF is really bad

The NDCG value in the paper is about 0.6, but in this project it is about 0.3. movielens-100k

Usage of TimeSVD when prediction for current(present) day needs to be calculated

How would a system use model trained with TimeSVD to predict the rating of a user at given moment(for example request comes and list of items needs to be reccomended) when always the parameters with the current day will be unknown ? The items, users and factors will not have learned parameters associated for the current(present) day and as i saw in the code for the predict, zero will be used for those values. So what is the benefit of using the algorithm when the temporal features will have zero value ?

LLORMA performance issue

Hi, all.

First of all, I really appreciate the developers that made the libRec.
It is really helpful for various experiments for my work.

I have an issue that LLORMA's performance on MovieLens-1M data is going worse as follows:

[DEBUG] 2016-11-09 22:36:11,789 -- With Specs: {Users, Items, Ratings} = {6040, 3815, 20000}, Scale = {1.0,2.0,3.0,4.0,5.0}
[DEBUG] 2016-11-09 22:36:16,645 -- LLORMA: [factors, lRate, maxLRate, regB, regU, regI, iters, boldDriver] = [10, 0.001, -1.0, 0.001, 0.001, 0.001, 100, true]
[DEBUG] 2016-11-09 22:36:42,528 -- LLORMA has written rating predictions to .\Results\LLORMA-rating-predictions.txt
[DEBUG] 2016-11-09 22:36:42,530 -- LLORMA iter 5:[MAE,MPE,NMAE,Perplexity,RMSE,R_MAE,R_RMSE] [1.261908,1.000000,Infinity,0.000000,1.556156,3.535900,3.719341]
[DEBUG] 2016-11-09 22:36:56,349 -- LLORMA has written rating predictions to .\Results\LLORMA-rating-predictions.txt
[DEBUG] 2016-11-09 22:36:56,350 -- LLORMA iter 10:[MAE,MPE,NMAE,Perplexity,RMSE,R_MAE,R_RMSE] [2.524851,2.000000,Infinity,0.000000,2.201272,7.071800,5.259943]
[DEBUG] 2016-11-09 22:37:08,754 -- LLORMA has written rating predictions to .\Results\LLORMA-rating-predictions.txt
[DEBUG] 2016-11-09 22:37:08,755 -- LLORMA iter 15:[MAE,MPE,NMAE,Perplexity,RMSE,R_MAE,R_RMSE] [3.796789,3.000000,Infinity,0.000000,2.702933,10.607700,6.442088]
[DEBUG] 2016-11-09 22:37:21,414 -- LLORMA has written rating predictions to .\Results\LLORMA-rating-predictions.txt
[DEBUG] 2016-11-09 22:37:21,415 -- LLORMA iter 20:[MAE,MPE,NMAE,Perplexity,RMSE,R_MAE,R_RMSE] [5.070666,4.000000,Infinity,0.000000,3.126489,14.143600,7.438683]

Can give me any advice?

Thanks.

Similarity Shrinkage

In Section 5.2 of Herlocker 1999 and Table 2 of Herlocker 2002, the significance weighting or shrinkage parameter for the similarity value takes the form of the fraction of the number of common elements and a threshold, for example, min(n/50, 1).

Line 546 of Recommder.java
I am wondering whether there is any advantage of sim *= n / (n + shrinkage + 0.0); over sim *= Math.min(n / (double)shrinkage, 1.0); or other considerations here.

Extending LibRec

Some access control decisions seems to make LibRec harder to extend than would seem to be necessary. Making the core recommendation methods (predict, ranking) protected means that it is not possible to build a hybrid recommender (for example, one with other Recommender classes as components) without an ugly work-around.

See the attached pair of classes. I am trying to build a UserKNN recommender that uses MostPopular as a fall-back when there are not enough neighbors. What I would like to do is to create a subclass of UserKNN with a MostPopular component as a member and just call its ranking method, but I can't. Also, I can't access the superclass's userCorrs variable, so I have compute and store the correlations twice.

There is also no easy way to add to the list of available recommenders or the set of available evaluation measures.

I'm curious if there is any work underway to make libRec more extensible.

Archive.zip

Has anyone tried LRMF?

Has anyone tried LRMF? I did, but I think it does not reproduce the result of the paper. I want to hear from Mr. Guo or those who have tried LRMF.

Thanks.

PMF Issue - Not getting the example results and Getting Error Instead

Hello,

It is a very nice Java Package which is very useful. Thank you.
I have an issue. I am trying to run the package with the default dataset and just test it. I am trying to set the parameters as they are mentioned in the example page, but not only I don'g get same results, but also I get error. Here is my config file for PMF:

dataset.ratings.wins=.\\demo\\Datasets\\FilmTrust\\ratings.txt
dataset.ratings.lins=./demo/Datasets/FilmTrust/ratings.txt

ratings.setup=-columns 0 1 2 -threshold -1

recommender=PMF
evaluation.setup=cv -k 5 -p on --rand-seed 1 --test-view all 
item.ranking=off -topN -1 -ignore -1

num.factors=10
num.max.iter=1000

learn.rate=0.0030 -momentum 0.8 -moment 0.8 -bold-driver
reg.lambda=0.1 -u 0.1 -i 0.1 -b 0.1 -s 0.001

output.setup=on -dir ./demo/Results/

Here are the example configurations:
factors=10, reg=0.1, learn.rate=30, momentum=0.8, max.iter=200

and here is what I get:

Dataset: ...demo/Datasets/FilmTrust/ratings.txt
With Specs: {Users, Items, Ratings} = {1508, 2071, 35497}, Scale = {0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0}
With Setup: cv -k 5 -p on --rand-seed 1 --test-view all
Fold [1]: training amount: 28395, test amount: 7099
PMF: [factors, lRate, maxLRate, regB, regU, regI, iters, boldDriver] = [10, 30.0, -1.0, 0.1, 0.1, 0.1, 200, true]
PMF fold [1] iter 1: loss = NaN, delta_loss = NaN, learn_rate = 30.0
Loss = NaN or Infinity: current settings does not fit the recommender! Change the settings and try again!

The point is that if I make the learning rate to some small number: 0.003, it won't give me error. Can anyone help me? I am using the latest version of the package.

Rating ratio implementation

I am writing small-scale data sets for students to use as test cases, and this exercise has brought to light what I think is a bug in how ratings are divided for the rating ratio split.

I have a five items rated by five users. My expectation is that an 80% split by users would mean that exactly 1 (randomly-chosen) rating for each user is omitted. That is not what happens: 2 users have 2 ratings omitted and others users none.

I understand why this happens: there is a bernoulli process and each rating is included or excluded on that basis, so in a small data set, a user might get 5 lower random draws in a row. But I don't think this is what people conventionally mean by a 80% split in evaluation.

I would prefer an implementation that draws exactly k ratings for the training data at random, where k = user profile size * ratio. I've written a simple implementation of this idea using Randoms.randInts(), which I can submit. I don't know if you want to have this as a separate configuration option ("userratiofixed", maybe) so that the current (more efficient, but less precise) behavior is still available.

Cold-start bug

There is a bug in ItemKNN.predict(). If the user id exceeds the size of the current matrix, an exception is thrown. This may happen in cold start situations where there are users in the test data not in the training data, so I think it is more appropriate to treat this as a "missing user" rather than an error. I have a fix that I will submit as a pull request.

Installing LibRec on a server machine

I want to install LibRec on my server machine without using Eclipse. (I don't have GUI)

Is it possible? I tried to install using "javac LibRec.java" without success.

FYI, I am trying this because I want to test with Netflix dataset for which you do not provide results.

I installed maven and tried to build librec as follows

mvn compile
mvn package
java -cp target/librec-1.2.jar librec.main.LibRec

then, I got the following error...

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/collect/Table
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Table
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more

Any idea how to execute librec without eclipse?

Thanks

Question on SVD Recommender

Hi All,

I am just trying out few SVD recommender libraries using MovieLens data set.

Considering the MovieLens dataset i have a question on Librec recommendation system.

If my rating data set is like this: userid:itemid:rating

Now i want to enter the userid and recommend other items similar to his other ratings on the items.

I found predict and ranking methods in the recommender class. Here i have to enter the userid and itemid for the prediction. Is there any kind of API for which i can get the top 5 recommendation given the userid. With the existing APIs i might have to write my own logic iterating over all the itemids for the userids and returning the top kth.

cheers,

Saurav

Publish via Maven?

First of all, tnx for the awesome library! Is it possible to get it distributed via Maven, such that we can update more easily?

why I couldn't Compiling on Command Line

I followed this:
Compiling on Command Line

Download or check out the Source Code from GitHub.
Unzip files into a local directory.
Run the next command to compile the source codes.

javac librec/main/LibRec.java

the dataset path is ok, but the commend line could not work.
Thank you for your help.

librec ignores -headline on dataset w/ Unix line endings on Windows 10

So I have a dataset on Windows with Unix line endings. It's a CSV with a headline so I want to of course remove the headline. Here is my line to determine how to parse the file:
ratings.setup=-columns 0 1 2 4 -threshold -1 --time-unit SECONDS -headline
However, it keeps attempting to read the headline even though I've told it to ignore it. I am not sure if this has to do with the fact that the file uses Unix line endings on a Windows system, but it is rather ignoring that i have to copy a 5GB file to remove 1 line from a file. :( Regardless, if this parameter has changed the Wiki documentation needs to be updated. Otherwise, thank you so much for this awesome recommendation engine.

Also, as a side question, only some of my data in my dataset has timestamps? Can I ask librec to use only the data with timestamps or use timestamps only when they do not equal 0 (their uninitialized value).

Duplicate class: librec.rating.CPTF

When trying to compile the main Demo, 1.4, in IntelliJ, I got this message:

Error:(34, 8) java: duplicate class: librec.rating.CPTF

when I set trust degrees parameters in the librec.conf , I confront a problem.

When I set the trust degrees parameters in the librec.conf to do experiments , I encountered a problem which is showing as follow:
2015-03-25 17:04:16,065 [INFO] Training: F:\dataset\Epinions\ratings.txt, ratio: 0.8, trust-degree [1, 5]
2015-03-25 17:04:19,125 [DEBUG] Dataset: {Users, Items, Ratings} = {40163, 139738, 664824}, Scale = {1.0, 2.0, 3.0, 4.0, 5.0}
2015-03-25 17:04:19,736 [DEBUG] training amount: 531982, test amount: 132842
2015-03-25 17:04:19,736 [DEBUG] Social dataset: F:\dataset\Epinions\trust.txt
2015-03-25 17:04:21,216 [DEBUG] Dataset: {Users, Users, Links} = {49289, 49289, 487183}, Scale = {1.0}
2015-03-25 17:04:22,173 [DEBUG] TrustSVD: 0.01,-1.0,0.9,0.9,0.9,0.5,10,100,true
2015-03-25 17:04:45,640 [DEBUG] TrustSVD iter 1: errs = 345709.44, delta_errs = -345709.44, loss = 885255.7, delta_loss = -885255.7, learn_rate = 0.01
2015-03-25 17:05:08,851 [DEBUG] TrustSVD iter 2: errs = 320324.3, delta_errs = 25385.13, loss = 5924772.0, delta_loss = -5039516.5, learn_rate = 0.01
2015-03-25 17:05:32,294 [DEBUG] TrustSVD iter 3: errs = NaN, delta_errs = NaN, loss = NaN, delta_loss = NaN, learn_rate = 0.005
2015-03-25 17:05:32,294 [ERROR] Loss = NaN or Infinity: current settings cannot train the recommender! Try other settings instead!

The main configuration parameters in the librec.conf which I has changed is showing as follow:
dataset.training.wins=F:\dataset\Epinions\ratings.txt
dataset.social.wins=F:\dataset\Epinions\trust.txt
min.trust.degree=1
max.trust.degree=5
rating.pred.view=trust-degree
val.reg.user=0.9
val.reg.item=0.9
val.reg.bias=0.9
val.reg.social=0.5
num.factors=10
recommender=TrustSVD

I have encountered the NaN or Infinity problem When I did experiments. Finally, I resolved it by sorted the records in ratings.txt and trust.txt of Epinions. Why I got the problem again ? Is the parameters wrong in in the librec.conf?

Error when running SoRec and SoReg, Loss goes to Infinity.

I keep getting errors say Loss comes to Infinity when trying SoRec and SoReg. I attach the messages here.

Do you have any idea why the results come?

Obtain Pu and Qi(Computed user and item factors)

Hi,

Is it possible to obtain the latent factors that desribes each user and item that are computed by Matrix Factorization algorithm ?

Many thanks in advance

Potential bug in BPR

In the unlikely event that a user has rated ALL items, the loop searching for j

what are those means? like ASYMM MPE

/**
* Recommendation measures
*
*/
public enum Measure {
MAE, RMSE, NMAE, ASYMM, MPE, D5, D10, Pre5, Pre10, Rec5, Rec10, MAP, MRR, NDCG, AUC, TrainTime, TestTime
}

expect for first three , i don't know what the others are short for.Would anybody give a clear explanation?

Is it possible to output the top-N recommendations for a given user using Librec?

The demos directly output the evaluation results. Is it possible to output the recommendations for a user in the form of item ids?

Thanks!

Bug in Class DataDAO

Today, when I tried to test my algorithm about trust propagation by breadth first search ,I found a bug in class DataDAO. The following code in DataDAO Class has a bug.
// inner id starting from 0
int row = userIds.containsKey(user) ? userIds.get(user) : userIds.size();
userIds.put(user, row);

int col = itemIds.containsKey(item) ? itemIds.get(item) : itemIds.size();
itemIds.put(item, col);

When I try to read the trust network information in my dish recording in trust.txt through method readData(int[] cols, double binThold) in class DataDAO ,I found the data in memory is inconsistent to the data in trust.txt.
The following data is in trust.txt
1 2 1
1 3 1
1 6 1
1 8 1
2 3 1
2 4 1
2 6 1
3 1 1
3 7 1
4 2 1
4 5 1
4 6 1
5 2 1
6 7 1
7 3 1
8 1 1
8 7 1
The following data is in memory
1 1 1.000000
1 2 1.000000
1 3 1.000000
1 4 1.000000
2 2 1.000000
2 3 1.000000
2 5 1.000000
3 6 1.000000
3 7 1.000000
4 1 1.000000
4 3 1.000000
4 8 1.000000
5 1 1.000000
6 7 1.000000
7 2 1.000000
8 6 1.000000
8 7 1.000000
The reason why case this bug is that the columns number in trust.text is not start from 1,when the columns number start from 1 no bug found!

Unable to import project into eclipse

hi,
i followed the guide and tried to import the project into eclipse (Kepler before and Luna after),but after selecting the unzipped folder librec-v1.1 or librec.v1.0 it says that no project has been detected.
Neither it is possible to run the jar file from command line since it gives the following error:

Exception in thread "main" java.lang.NullPointerException
at happy.coding.io.Configer.getPath(Configer.java:41)
at librec.main.LibRec.debugInfo(LibRec.java:248)
at librec.main.LibRec.main(LibRec.java:70)

If i'm doing something wrong please tell me!
Thanks for the patience
AF

Bug in RankALS, Q-Step

I used your RankALS do debug my RankALS implementation. I think I found some bug in your implementation.

If I see correctly, your \bar{A} in the Q-Step, which you named sum_cpp is equal to P^T.dot(P), but it should be P^T_{U_i}.dot(P_{U_i}). You should move it into the if-condition on r_{ui} as well.

if (rui > 0) {
    sum_cpp = sum_cpp.add(pp);
  sum_cpr = sum_cpr.add(pu.scale(rui));
  sum_c_sr_p = sum_c_sr_p.add(pu.scale(m_sum_sr.get(u)));
  sum_p_r_c = sum_p_r_c.add(pu.scale(rui * m_sum_c.get(u)));
}

Bugs in CLiMF: compute sgds for items rated by user u

According to the paper, the gradients of the objective for user i with respect to Vj is:

This equation is different from that in the paper, due to the author has missed a Ui. From this equation, it's clear that the gradient of those items which haven't rated by user i is 0. However, the code given in this project is:

// compute sgds for items rated by user u
                Map<Integer, List<Double>> itemSgds = new HashMap<>();
                // for (int j : items) {
                for (int j = 0; j < numItems; j++) {

                    double fuj = predict(u, j);
                    List<Double> jSgds = new ArrayList<>();
                    for (int f = 0; f < numFactors; f++) {
                        double puf = P.get(u, f);
                        double qjf = Q.get(j, f);

                        double yuj = uv.contains(j) ? 1.0 : 0.0;
                        double sgd = yuj * g(-fuj) * puf - regI * qjf;

                        for (int k : uv.getIndex()) {
                            if (k == j)
                                continue;

                            double fuk = predict(u, k);
                            double x = fuk - fuj;

                            sgd += gd(-x) * (1.0 / (1 - g(x)) - 1.0 / (1 - g(-x))) * puf;
                        }
                        jSgds.add(sgd);
                    }

                    itemSgds.put(j, jSgds);
                }

I think the first for loop should be writen as: for (int j : uv.getIndex()).

there is something wrong with the movielens100k dataset

2015-01-24 20:45:15,775 [DEBUG] User amount: 943, item amount: 1682
I haven't modify the data set, item amount should be 1683.
thank you for your reply.

How to use Librec other than for testing purpose using librec.conf ?

I wonder if Librec can be tested out of the box in batch or online mode with the appropriate librec.conf (ie. without coding a dedicated Java layer to test it). So, is there any option to :

resume/load a saved model in the librec.conf ?
run Librec in daemon mode and send/receive data via stdin/stdout or any communication layer ?

Thank you,
Xavier

Can we get the model output in plain text?

First of all, thanks for providing such a great tool!

I was reading the manual and found the --save-model command, but it turns out to be saved in Java's binary format. As I am not a Java programmer but still trying to make some post-processing on the latent factors, a "save to plain text" option is definitely preferred.

I tried to read the binary files from R/python but failed, so do you think it is possible to have it added? Thank you very much.

Why I run SoReg algorithm got an Infinity problem!

According to your suggestion ,I set parameter for SoReg algorithm as fallows:
SoReg.beta=0.001
val.reg.user=0.001
val.reg.item=0.001
in librec.conf
However, when I run the algorithm in Ciao data set ,I got the problem as fallows:
2015-04-29 14:53:26,384 [INFO] Training: F:\dataset\CiaoDVDs\ratings.txt, ratio: 0.8
2015-04-29 14:53:26,868 [DEBUG] Dataset: {Users, Items, Ratings} = {17615, 16121, 72665}, Scale = {1.0, 2.0, 3.0, 4.0, 5.0}
2015-04-29 14:53:27,040 [DEBUG] training amount: 57929, test amount: 14416
2015-04-29 14:53:27,040 [DEBUG] Social dataset: F:\dataset\CiaoDVDs\trust.txt
2015-04-29 14:53:27,196 [DEBUG] Dataset: {Users, Users, Links} = {19533, 19533, 40133}, Scale = {1.0}
2015-04-29 14:53:27,291 [DEBUG] SoReg: 0.01,-1.0,0.001,0.001,0.001,0.001,10,100,true
2015-04-29 14:53:27,525 [DEBUG] SoReg iter 1: errs = 121772.8, delta_errs = -121772.8, loss = 121965.305, delta_loss = -121965.305, learn_rate = 0.01
2015-04-29 14:53:27,649 [DEBUG] SoReg iter 2: errs = 949440.1, delta_errs = -827667.3, loss = 950113.56, delta_loss = -828148.25, learn_rate = 0.01
2015-04-29 14:53:27,728 [DEBUG] SoReg iter 3: errs = 1.42163802E9, delta_errs = -1.42068851E9, loss = 1.42168947E9, delta_loss = -1.42073933E9, learn_rate = 0.005
2015-04-29 14:53:27,839 [DEBUG] SoReg iter 4: errs = 3.23917665E18, delta_errs = -3.23917665E18, loss = 3.23917665E18, delta_loss = -3.23917665E18, learn_rate = 0.0025
2015-04-29 14:53:27,901 [DEBUG] SoReg iter 5: errs = Infinity, delta_errs = -Infinity, loss = Infinity, delta_loss = -Infinity, learn_rate = 0.00125
2015-04-29 14:53:27,996 [DEBUG] SoReg iter 6: errs = Infinity, delta_errs = -Infinity, loss = Infinity, delta_loss = -Infinity, learn_rate = 6.25E-4
2015-04-29 14:53:28,058 [DEBUG] SoReg iter 7: errs = Infinity, delta_errs = -Infinity, loss = Infinity, delta_loss = -Infinity, learn_rate = 3.125E-4
2015-04-29 14:53:28,058 [ERROR] Loss = NaN or Infinity: current settings cannot train the recommender! Try other settings instead!

Would you like tell me why I got such problem!

Testing TrustSVD with a separate TestSet

When I run TrustSVD with following conf setup

dataset.ratings.wins=.\demo\Datasets\FilmTrust\ratings.txt
dataset.ratings.lins=/Users/sam/Downloads/librec/librec/demo/Datasets/FilmTrust/pets_pets_train.txt

dataset.social.wins=.\demo\Datasets\FilmTrust\trust.txt
dataset.social.lins=/Users/sam/Downloads/librec/librec/demo/Datasets/FilmTrust/pets_pets_user_user_con1.txt

ratings.setup=-columns 0 1 2 -threshold -1

recommender=TrustSVD
evaluation.setup=cv -k 5 -p on --rand-seed 1 --test-view all
item.ranking=off -topN -1 -ignore -1

num.factors=10
num.max.iter=100

learn.rate=0.001 -max -1 -bold-driver
reg.lambda=0.1 -u 1.2 -i 1.2 -b 1.2 -s 0.9

output.setup=on -dir ./demo/Results/

it runs fine
However when i replace evaluation setup by
evaluation.setup = test-set -f /Users/sam/Downloads/librec/librec/demo/Datasets/FilmTrust/pets_pets_test.txt

it gives following error

java.lang.ArrayIndexOutOfBoundsException: 1137
at librec.data.SparseMatrix.columnSize(SparseMatrix.java:563)
at librec.rating.TrustSVD.initModel(TrustSVD.java:83)
at librec.intf.Recommender.execute(Recommender.java:318)
at librec.main.LibRec.runAlgorithm(LibRec.java:485)
at librec.main.LibRec.run(LibRec.java:399)
at librec.main.LibRec.execute(LibRec.java:148)
at librec.main.Demo.execute(Demo.java:197)
at librec.main.Demo.main(Demo.java:40)

I cant figure out what went wrong here

Why does Ciaos dataset with PMF get RMSE 2.823082?

Hi, glad to encounter such a nice java RS project! I just have met some problems with the 1.2ver. code or dataset, and feel difficult to find out the cause.

I tried Movielens-1M and Ciao dataset, both have the result of PMF on example page, my results under the Configurations from the example page are following:
Movielens-1M:
MAE RMSE
PMF 0.746997 0.926199 (nearly the same with the comparison data [0.747 0.926] in the example page)

Configurations:
PMF:
factors=10, reg=0.1, learn.rate=50, momentum=0.8, max.iter=100

Ciaos:
MAE RMSE
PMF 2.498656 2.823082 (while the exmaple gives [0.822 1.078] under the same configuration)

Configurations:
PMF:
factors=10, reg=0.4, learn.rate=50, momentum=0.8, max.iter=200

What's more, Ciaos dataset with SVD++ gets 0.702264 0.947451 which is much better than the PMF score, I cannot find the reason, could you help me with this result, thank you very much!

Configurations:
SVD++:
factors=10, reg=0.1, learn.rate=0.01, max.iter=100

Question about updating P, Q and W in TrustSVD algorithm

Hi,
I am recently learning the TrustSVD algorithm, but I have some trouble understanding codes about updating parameter P, Q, and W in TrustSVD.java.

Here is my question:
updates of user feature matrix P and user trust matrix W are cached in PS and WS during one pass of rating and social data, so P and W are not changed during one pass of whole data. However, item feature Q, user bias vector and item bias vector are updated for every rating sample using the old user feature P instead of using updated feature in PS, this seems not what described in the paper of TrustSVD, why? am I getting the point of TrustSVD algorithm wrong?

Thanks~

The "--headline" parameter in the config file is not working for librec-v1.3

Error when Running Librec

Hi. I have copied all the classes in my netbeans but when I want to run this library it says null pointer exception error. Any ideas what I should do?

Potential bug in BPR

In the (very?) unlikely event that a user has rated ALL items, the loop searching for j

                do {
                    j = Randoms.uniform(numItems);
                } while (pu.contains(j));

would actually become an infinite loop.

Question about large-scale data

Also ，First of all, thank you for this nice library, it is very interesting。

When I use BiasedMF with about 600M training data，the system will run so slowly ，after half of an hour，it threw an outofmemory exception ， the configuration of JVM that I set was

”HEAP_OPTS="-Xmx2048m -XX:PermSize=64m -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit"

Could I just add more Xmx memory to figure out this problem，or the system is suitable for small amount of data ? because there are several backups for the training data 。

Can't run item ranking recommender

Sorry to disturb you, but when I set the 'item.ranking' on, I can't get information about precision, recall as so on.

Below is what I get:
[INFO ] 2016-11-27 11:46:41,431 -- Metrics: MAE,MPE,NMAE,Perplexity,RMSE,R_MAE,R_RMSE,TestTime,TrainTime
LDA,0.000000,0.000000,NaN,-1.000000,0.000000,0.000000,0.000000,27.600000,27363.600000,,10, 2.0, 0.5, 1000, 300, 10,'00:27','00:00'

Question about BPMF

First of all, thank you for this nice library, it is very interesting, and I am planning to support it through RiVal in the near future.

My question is about the BPMF code. I cannot find where Equation 10 from the paper is being applied. I have the same problem with the PREA implementation; however, in the Matlab code provided by the authors the average seems to be there (when probe_rat_all is updated, if I am correct).

Please, let me know whether such operation is actually used in your code or why it is not needed.
Thank you in advance.

Regards,
Alejandro

Model persistence [QUESTION]

Is there any way to persistence the model generated that i've chosen?

is it possible to publish it on Jitpack or Maven?

Hi,

i want to write Clojure wrapper for your library and it's difficult to set this project up as non-Java developer with Leiningen as it seems it doesnt find all the dependencies, specially happy.coding.utils.

Is it possible to release 1.3 on Jitpack or Maven ?

Error at running BPMFTestCase

net.librec.common.LibrecException: m.numColumns should equal to n.numRows
at net.librec.math.structure.DenseMatrix.product(DenseMatrix.java:324)
at net.librec.recommender.cf.rating.BNPoissMFRecommender.predict(BNPoissMFRecommender.java:230)
at net.librec.recommender.AbstractRecommender.predict(AbstractRecommender.java:321)
at net.librec.recommender.AbstractRecommender.recommendRating(AbstractRecommender.java:287)
at net.librec.recommender.AbstractRecommender.recommend(AbstractRecommender.java:237)
at net.librec.recommender.AbstractRecommender.recommend(AbstractRecommender.java:221)
at net.librec.job.RecommenderJob.executeRecommenderJob(RecommenderJob.java:140)
at net.librec.job.RecommenderJob.runJob(RecommenderJob.java:118)
at net.librec.recommender.cf.rating.BNPoissMFTestCase.testRecommender(BNPoissMFTestCase.java:55)

Train with one file and query predictions from another file?

Hi!
I was wondering if it's possible to create a config file such that LibRec reads ratings from a file, say training.txt, and then I give it another file, say queries.txt, with only user_id's and item_id's, and then LibRec generates the rating predictions for those users over those items (i.e: answers those queries). I ask this question because as far as I understand LibRec reads a single file, and then it partitions the file into a training set and a test set, but in my case I would like to use the whole file for training and then make queries from another file. Note that I don't want to "test" with the other file, I just want to answer queries (in the queries.txt file I don't actually know the rating that user x would give to item y, so testing doesn't make sense in this context).

Looking forward to your response.

Thanks in advance.

I can't load the movielens_1m dataset.

Hi,
why the movielens 1M is failed to train? Hope for your answer. The error is:

E:\rec_alg\librec-v1.1> java -jar librec.jar
[INFO ] Training: ratings.dat, kFold: 5 [Parallel]
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at librec.data.DataDAO.readData(DataDAO.java:161)
at librec.data.DataDAO.readData(DataDAO.java:134)
at librec.data.DataDAO.readData(DataDAO.java:126)
at librec.main.LibRec.main(LibRec.java:102)

Cannot find AoBPR algorithm

Hi,

as I try to use AoBPR algorithm using librec v1.3 it gives me error:
java.lang.Exception: No recommender is specified!
at librec.main.LibRec.getRecommender(LibRec.j
at librec.main.LibRec.runCrossValidation(LibR
at librec.main.LibRec.runAlgorithm(LibRec.jav
at librec.main.LibRec.execute(LibRec.java:147
at librec.main.LibRec.main(LibRec.java:223)

As i went through source code in "https://github.com/guoguibing/librec/blob/master/librec/src/main/java/librec/main/LibRec.java"
it seems that it is not imported, is that so?

How predicting rating for specific user

Hello I am try to predict the rating for a specific user, but I don't understand what parameter should to configurate for enter the user and item.

Alpha 2.0 KCVDataSplitter

Attempting to use the KCVDataSplitter (data.model.splitter=kcv) generates a run-time error.

Exception in thread "main" java.lang.RuntimeException: java.lang.NoSuchMethodException: net.librec.data.splitter.KCVDataSplitter.()
at net.librec.util.ReflectionUtil.newInstance(ReflectionUtil.java:72)
at net.librec.util.ReflectionUtil.newInstance(ReflectionUtil.java:55)
at net.librec.data.model.TextDataModel.buildDataModel(TextDataModel.java:61)
at net.librec.job.RecommenderJob.executeRecommenderJob(RecommenderJob.java:134)
at net.librec.job.RecommenderJob.runJob(RecommenderJob.java:108)

It seems that this functionality is not implemented. I realize many parts of the system are like this. It would be helpful if these non-functional classes were commented out in the driver.classes.props file. That would be a good indicator of the current state of the various components.

Issue loading dataset without getting IndexOutOfBoundsException

I'm using the the latest compiled executable found on the website(1.2) and I'm loading a dataset of this type,

3034,244,1
[...]
14234,90424,1
[...]
325535,1000,1
[...]

And trying to run librec through the jar with a conf set to something like this(Some details omitted, it's basically just the default config with the recommender and dataset values changed).

dataset.training.wins=trainData.csv
dataset.training.lins=-1

dataset.social.wins=-1
dataset.social.lins=-1

# in case you use separate testing files
dataset.testing.wins=testData.csv
dataset.testing.lins=-1

recommender=ItemKNN

and so on ends with the following exception:

[INFO ] 2015-02-24 20:11:04,707 -- Training: trainData.csv, Testing:: testData.csv.
[DEBUG] 2015-02-24 20:11:09,711 -- Dataset: {Users, Items, Ratings} = {462685, 17319, 1508677}, Scale = {1.0}
[DEBUG] 2015-02-24 20:11:10,914 -- Dataset: {Users, Items, Ratings} = {462685, 17471, 205340}, Scale = {1.0}
[DEBUG] 2015-02-24 20:11:11,112 -- Build item similarity matrix ...
[ERROR] 2015-02-24 20:11:11,446 -- 17320
java.lang.ArrayIndexOutOfBoundsException: 17320
        at librec.data.SparseMatrix.column(SparseMatrix.java:579)
        at librec.intf.Recommender.buildCorrs(Recommender.java:309)
        at librec.rating.ItemKNN.initModel(ItemKNN.java:70)
        at librec.intf.Recommender.execute(Recommender.java:209)
        at librec.main.LibRec.runTestFile(LibRec.java:285)
        at librec.main.LibRec.runAlgorithm(LibRec.java:196)
        at librec.main.LibRec.main(LibRec.java:125)

Would it be possible to get some more saying error messages? I think this might be because the test or training data has keys that are not found in the other, or something but I can't be sure.