GithubHelp home page GithubHelp logo

test.fm's Introduction

Build Status

Introduction

Test.fm is (yet another) testing framework for Collaborative Filtering models. It integrates well with pandas as the default data manipulation library and gives an easy way to investigate how well your models perform and why. You can build a model using okapi and then check how it performs on the testing data. Or if you have only a little data set, you can use it directly.

Example of using the Test.fm framework

	import pandas as pd
	import testfm
	from testfm.models.baseline_model import Popularity, RandomModel
	from testfm.models.tensorcofi import TensorCoFi
	from testfm.evaluation.evaluator import Evaluator
	
	evaluator = Evaluator()

	# Prepare the data
	df = pd.read_csv(..., names=["user", "item", "rating", "date", "title"])
	training, testing = testfm.split.holdoutByRandom(df, 0.9)

	# Tell me what models we want to evaluate
	models = [
	    RandomModel(),
	    Popularity(),
	    TensorCoFi()
	    ]
	
	# Evaluate
	items = training.item.unique()
	for m in models:
		m.fit(training)
		print m.getName().ljust(50),
		print evaluator.evaluate_model(m, testing, all_items=items)

See other examples here...

Installation

You can check the official documentation here.

  1. download and extract the sources.
  2. check the dependencies in conf/requirements.txt
  3. run #sudo python setup.py install
  4. if you are a developer of test.fm better do python setup.py develop
  5. enjoy and contribute
  6. Check travis for the latest builds...
  7. Check yaml for the build script.

Nosetests

$ nosetests -w src/ -vv --with-cover --cover-tests --cover-erase --cover-html --cover-package=testfm --with-doctest --doctest-tests tests testfm/evaluation testfm/models testfm/fmio testfm/splitter

Build Documentation

$ sphinx-build -b html source_folder doc_folder

Similar Projects

  1. mrec from Mendeley. Good at building models. (python, ?)
  2. okapi from Telefonica Research. Good at distributed model building using Apache Giraph (java, giraph, apache2).
  3. graphlab from CMU. Probably the richest library of modern algorithms (c++, apache2).
  4. mymedialite from Uni Hildesheim. Has ranking implementations. (c#, GPL).
  5. mahout of apache. Uses hadoop to build the models. (java, hadoop, apache2)
  6. lenskit Grouplens (java, GPL2.1)

test.fm's People

Contributors

alexkz avatar baltrunasl avatar joaonrb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

test.fm's Issues

SVDpp Minor Error

AttributeError: 'SVDpp' object has no attribute '_lamb'

Apparently this model cannot print it string form because has a reference for a non existing Attribute.

install fail , reason is "Blas library is not detected in the system"

version 2.7.13 will work but it will have other problem
ERROR: Command errored out with exit status 1:
command: 'c:\python36\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\lenovo\AppData\Local\Temp\pip-req-build-cvsi0be3\setup.py'"'"'; file='"'"'C:\Users\lenovo\AppData\Local\Temp\pip-req-build-cvsi0be3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\lenovo\AppData\Local\Temp\pip-req-build-cvsi0be3\pip-egg-info'
cwd: C:\Users\lenovo\AppData\Local\Temp\pip-req-build-cvsi0be3
Complete output (7 lines):
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\lenovo\AppData\Local\Temp\pip-req-build-cvsi0be3\setup.py", line 11, in
from compile import ext_modules, build_ext
File "C:\Users\lenovo\AppData\Local\Temp\pip-req-build-cvsi0be3\compile.py", line 59, in
raise EnvironmentError("Blas library is not detected in the system")
OSError: Blas library is not detected in the system

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output

C implementation of TensorCoFi

C implementation of TensorCoFi randomly make model that produce results a bit different that the standard. The results continue way better that Random.

PyTensorCoFi parameter bug

When trying to set parameters on PyTensorCoFi the internal structures don't change crashing the system

remove graphchi files

Now for each training of graphchi we create a lot of file in /tmp, we need to clean them up, or directly use pipes.

error 'ValueError: sample larger than population' running the models

HI all,

I've run libfm in Ubuntu using the dataset detailed below,
Random model run OK,
however when I tried the rest of the models in my list (I followed the models_example.py example provided here), i.e. BPR, TFIDFModel, Popularity, TensorCoFi, ..., an error "ValueError: sample larger than population" is always triggered with every model.

Please, does anyone know what could be the source of this problem? any suggestion?
There are many entries in Internet related with this problem in python, but the answers and potential causes described I think doesn't apply this case, so is unclear for me.
Btw, the dataset size is bigger than 5K rows...

Thanks in advance,
regards,
R.
------------------ test:

python modeltest2.py
user item rating time title
0 1123 0 2 838985046 NameFilm
1 1107 0 1 838985046 NameFilm
2 1107 0 1 838985046 NameFilm
3 1107 0 2 838985046 NameFilm
4 1107 1 1 838985046 NameFilm
0:00:00.083082 Random [0.262394934911661]
0:00:09.887563 BPR (dim=10,iter=15,reg=0.0001,eta=0.001)

Traceback (most recent call last):
File "modeltest2.py", line 57, in
print evaluator.evaluate_model(m, testing, all_items=items,)
File "build/bdist.linux-x86_64/egg/testfm/evaluation/evaluator.py", line 83, in evaluate_model
File "build/bdist.linux-x86_64/egg/testfm/evaluation/evaluator.py", line 30, in partial_measure
File "/usr/lib/python2.7/random.py", line 321, in sample
raise ValueError("sample larger than population")
ValueError: sample larger than population

Installation on Ubuntu12.04

$> sudo pip install test.fm-1.0.tar.gz
.....(output omitted)....
error: Command "gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/testfm/evaluation/cutil/measures.c -o build/temp.linux-x86_64-2.7/src/testfm/evaluation/cutil/measures.o" failed with exit status 1


Command /usr/bin/python -c "import setuptools;file='/tmp/pip-LXq7fY-build/setup.py';exec(compile(open(file).read().replace('\r\n', '\n'), file, 'exec'))" install --single-version-externally-managed --record /tmp/pip-q437OF-record/install-record.txt failed with error code 1

$> sudo python setup.py install
.....(output omitted)....
cythoning src/testfm/evaluation/cutil/measures.pyx to src/testfm/evaluation/cutil/measures.c

Error compiling Cython file:

...
cdef int i
for i in range(list_size):
#printf(">>>%f %f\n", ranked_list[i], ranked_list[i+1])
if ranked_list[i*2] == 1.:
relevant += 1.
map_measure += (relevant / (i+1.))

^

src/testfm/evaluation/cutil/measures.pyx:48:41: Pythonic division not allowed without gil, consider using cython.cdivision(True)

Error compiling Cython file:

...
for i in range(list_size):
#printf(">>>%f %f\n", ranked_list[i], ranked_list[i+1])
if ranked_list[i*2] == 1.:
relevant += 1.
map_measure += (relevant / (i+1.))
return 0.0 if relevant == 0. else (map_measure/relevant)

^

src/testfm/evaluation/cutil/measures.pyx:49:54: Pythonic division not allowed without gil, consider using cython.cdivision(True)
building 'testfm.evaluation.cutil.measures' extension
C compiler: gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC

compile options: '-I/usr/include/python2.7 -c'
gcc: src/testfm/evaluation/cutil/measures.c
src/testfm/evaluation/cutil/measures.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
src/testfm/evaluation/cutil/measures.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
error: Command "gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/testfm/evaluation/cutil/measures.c -o build/temp.linux-x86_64-2.7/src/testfm/evaluation/cutil/measures.o" failed with exit status 1

Setup doesn't work

The setup doesn't install anymore. Complains about the requirements. Possible special characters in URL requirement are messing with the string.

Pyjnius

This module gives to much problems. Take it off.

Error: undefined symbol: clapack_sgesv

Hi @ALL,
once I've installed and compiled LAPACK + ATLAS (atlas3.10.3) + test.fm-1.0
under:

  • Environment: Linux dev-host-01 4.2.0-27-generic #32~14.04.1-Ubuntu SMP
  • Python 2.7.6

now running one of the available tests-src as:

sysadmin@dev-host-01:~/testfm/test.fm-1.0/src/tests$ python test_models.py

	     Traceback (most recent call last):
	       File "test_models.py", line 10, in <module>
	         from testfm.models.graphchi_models import SVDpp
	       File "build/bdist.linux-x86_64/egg/testfm/models/graphchi_models.py", line 9, in <module>
	       File "build/bdist.linux-x86_64/egg/testfm/models/cutil/interface.py", line 7, in <module>
	       File "build/bdist.linux-x86_64/egg/testfm/models/cutil/interface.py", line 6, in __bootstrap__
	       File "src/testfm/models/cutil/float_matrix.pxd", line 10, in init testfm.models.cutil.interface (src/testfm/models/cutil/interface.c:8464)
	       File "build/bdist.linux-x86_64/egg/testfm/models/cutil/float_matrix.py", line 7, in <module>
	       File "build/bdist.linux-x86_64/egg/testfm/models/cutil/float_matrix.py", line 6, in __bootstrap__
	     **ImportError: /home/sysadmin/.cache/Python-Eggs/testfm-1.0-py2.7-linux-x86_64.egg-tmp/testfm/models/cutil/float_matrix.so: undefined symbol: clapack_sgesv**

Please, any idea?? At this time I've tried many options and read a lot of forums related with this error, but none of them have fixed this problem,
thanks in advance for your help,
regards,
@rheras

build error on mac

If I build on mac, I get:
$python setup.py develop
building 'testfm.evaluation.cutil.measures' extension
/usr/bin/clang -fno-strict-aliasing -fno-common -dynamic -pipe -Os -fwrapv -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/testfm/evaluation/cutil/measures.c -o build/temp.macosx-10.9-x86_64-2.7/src/testfm/evaluation/cutil/measures.o
clang: error: no such file or directory: 'src/testfm/evaluation/cutil/measures.c'
clang: error: no input files
error: command '/usr/bin/clang' failed with exit status 1

can it be that measures.c is missing?

Test for splits fails

if you run nose tests, this one fails:
/Users/linas/devel/test.fm/src/tests/test_fm.py

BPR performance

The performance of BPR is close to random.
Alex, could you try to fix it. If you run examples/models_example.py the performance should be better similar to popularity or at least better than random :)

BTW, I use 1M movielens to check the performance using this data loading:

prepare the data

df = pd.read_csv(DATAPATH+"/1M_movielens/ratings.dat",
                 sep=" ", header=None, names=["user", "item", "rating", "date"])
print df.head()
training, testing = testfm.split.holdoutByRandom(df, 0.8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.