nyagato-00 / predictor Goto Github PK

View Code? Open in Web Editor NEW

502.0 502.0 41.0 2.38 MB

Fast and efficient recommendations and predictions using Redis

License: MIT License

Ruby 100.00%

predictor's People

Contributors

Stargazers

Watchers

predictor's Issues

respond_to needs to take optional second argument

Hi, I added Delayed::Job in my project to handle updating the matrices and DJ calls respond_to with both arguments here:
https://github.com/collectiveidea/delayed_job/blob/master/lib/delayed/performable_method.rb#L10

And it looks like Ruby's respond_to now has two arguments, so the method here:
https://github.com/Pathgather/predictor/blob/c1690699186a44a0be15dc6dc678acd459131ec8/lib/predictor/base.rb#L79

should use the second argument as well.

Add support for a stored exclusion set in Redis

Each public method has support to manually pass in an exclusion set, but it'd be nice if there was support for this within Redis. This would allow users to essentially ignore specific recommendations permanently, which would prevent that item from ever being recommended to the user again.

What to do with different matrix-structures?

Hi guys,

I love this gem, it shows surprising new relations in my data. But I do wonder what the limits are for different data-sets in Redis. I think that I actually need another matrix, or can I add labels with a different structure to the same class; or do i need to build a new classe for that? So until now (only for "projects") it works fine, but I'm still a Newbie on Redis and don't know if the following is possible:

I want to get simiarities and predictions for projects and products. I therefore want to create the following (double?) matrix:

projects:
"user-1" -> "project-1", "project-2"
"product-1" -> "project-1", "project-2"

products:
"user-1" -> "product-1", "product-2"
"project-1" -> "product-1", "product-2"
"benefits-1" -> "product-1", "product-2"

What to do?

Minimum set length for similarity calculation?

Hello, thanks a lot for predictor, it's a great library. I'd quite like to add a feature, and wanted your thoughts on it.

We have some items, in lists, and are calculating similarity based on how often items appear on lists with each other. We have lots of item that only appear on one list together, and we'd like to remove those from the prediction engine, forcing the distance to 0.

I can see that I can probably add a check in Distance for this, but I wanted to know if this was something you have an opinion about the design of, before I add a pull request.

Redis::CommandError (BUSY Redis is busy running a script.

I am encountering an error processing a large data set using processing_technique :lua. I first used add_matrix for the dataset, 20k items, with up to 1k matches, using limit_similarities_to 1000. All the items are added, but I receive the following error ome time after calling process!

Redis::CommandError (BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.)

I am trying to figure out if I should be processing along the way, if I need to change the Redis config, or if I should revert to the Ruby processor.

Ability to get Prediction Score for one object

This seems like it should be possible, I just may not know the right method to use. Otherwise this would fall under a feature request.

Is it possible to pull the prediction score of a specific course/object for a user?

For instance something like: recommender.predictions_for("user-1", matrix_label: :users, on: "course-2", with_scores: true)

I would like to assess the likelihood a user will like something. On a related note, if I did have access to the score: is there a general threshold for the scores over or under which the item is recommended, neutral or not recommended or is it relative to the set?

Love this gem. Thanks!!

Question: Similarities vs predictions

Is it expected that similarities_for("item-1") returns the same as predictions_for(item_set: ["item-1"])?

A couple of newcomer questions

Hello! First thanks for this gem, I'm about to use it on a freelancing client's app and it seems to provide fairly relevant results in an efficient manner!

I have a couple of newcomer questions and could not find a mailing list, so I assumed it would be OK to just group the questions here.

I dived a bit into the code, but maybe some of these are worth adding to the readme? Let me know and I'll prepare pull-requests if you want.

Is add_to_matrix idempotent?

It looks like add_to_matrix is idempotent - I believe if I call r.add_to_matrix(:users, 'user-5', 'feed-800') many times, it won't change the similarities.

Is that correct ?

Are these calls equivalent in terms of resulting similarities?

Is this:

r.add_to_matrix(:users, 'user-5', 'feed-800', 'feed-801')
r.add_to_matrix(:users, 'user-5', 'feed-802', 'feed-801')

Giving the same output as:

r.add_to_matrix(:users, 'user-5', 'feed-800', 'feed-801', 'feed-802')

Is there a symetric call to add_to_matrix?

Instead of doing:

r.add_to_matrix(:users, 'user-5', 'feed-800', 'feed-801')
r.add_to_matrix(:users, 'user-6', 'feed-800')

Is there something to do:

r.add_to_item('feed-800', :users, 'user-5', 'user-6')

It could be useful depending on how you iterate initially.

Full reindexing without deletion

For our first implementation, we just do:

clean!
add_to_matrix
add_to_matrix
process!

(We will later move to sidekiq-based sync for real-time indexing).

I believe that when using this technique, during the processing, if similarities_for is requested, an empty result will be returned.

Could this be used instead to make sure deleted items are deleted from the index, without necessarily having a period where empty results are returned ?

r.all_items.each do |item|
  r.delete_item!(item) if does_not_exist_anymore_in_our_database
end

# then iterate, do add_matrix, call process!

Would that work as I assume?

Thanks!

Default similarity limit to 128 to take advantage of Redis's memory optimization for sorted sets

See http://redis.io/topics/memory-optimization

Reference to recommendation algorithm

It would be useful to have a reference -maybe a link to a paper- of the
recommendation algorithm/s that predictor uses.

Thanks, predictor is a really interesting project.

will predictor support ruby 1.9 ?

For now it use some 2.0 syntax like this.

NoMethodError (delete_pair_from_matrix!)

Following the instructions for deleting a single pair from a matrix result in calling a method (delete_pair_from_matrix!) that seems to no longer exist.

Any idea what happened to this method since it was merged in #37? Is there a newer way to go about this?

Newbie: wasn't able to install it, help!

Hi,

I'm a Newbie in Ruby-On-Rails who just got his first full web=project setup in Ruby-On-Rails. I think that Predictor would be a perfect fit for my final project. But I'm embarrassed to say that I'm not even able to install it :(

I added redis, hiredis and predictor to the gemfile and did a bundle install. After that I tried to install predictor by using rails g predictor:install, but that didn't give me anything.

Do I need to generate my own model or do I need to do something else? Because I don't have the predictor model or initialzer file that are mentioned in your examples.

So sorry for the typical newbie-question, but I'm stuck :)

Cheers,
Sebastian

Recommend similar users.

Hi, im developing a solcial network and users can download things. I would also like to recommend a user to follow similar users based on their download activity.

For example

User 1 Downloads Thing1 and Thing2
User 2 Downloads Thing1 and Thing2
User 3 Downloads Thing9 and Thing4

Recommend User1 to follow User2

How can I achieve that?

Question: including additional attributes in predictions

In the example for recommending a course it looks at users who have taken the course tags, and topics. What is the recommended way to use additional information about the users who have taken a course to recommend related courses? Eg, a user from north america takes one course and a user in south africa takes a different course. I'd like future users in north america to be recommended the first course.

users k-nearest neighbors

I just wanted to ask, if a matrix of k-nearest neighbors is kept for every user? And if yes, how could i possibly access it. I browsed the code and it seemed to me, there is no such thing. Instead there are item-item matrices. But i just wanted to make there is or is not.

Thx, Christoph

Would love a bit more info about usage

Hi, I am starting to play with predictor and right away I have a few questions I hoped to get answers to in the wiki or readme.

Difference between Sorensen-Dice vs. Jaccard?
Quick look at wikipedia said something about triangle difference, no time to study in depth. Having a simple one or two sentence explanation, or guide when to use which would be most helpful.

Limiting similarities to lower number
I am experimenting with weights and I process the whole dataset each time. Will limiting similarities to say 32 make the process much faster? Will it decrease accuracy of predictions?

Is it costly to use more matrices? Is it better to group some?
in my dataset each item has something like these: "tags", "authors", "publisher", "topics", "themes", "industries". I can feed them separately, or I can group tags, industries, topics and themes into one matrix, will grouping be much slower? Will it yield better predictions?

Predicting different topics
Say that I read papers about cold fusion, high performance cycling and metalwork. Each is my hobby and I hope to get a recommendation where each of my hobby is represented proportionally to how much time I dedicate to it, rather than top 10 papers by sheer relevance to all of my hobbies, which might lead to recommending only papers on cold fusion, or trying to find paper that fits all, but might be garbage reading.

I guess I could use boost and keep track of what are my hobbies based on what I read. Is there a way to extract different topics from a set of articles I've read?

How would you go about it?

Hopefully answers to those questsions will be interesting for others as well.

Cosine Similarity?

Is there any implementation for getting the cosine similarities between users rather than the Jaccard or Sorsen Dice?

TODO: Try moving processing logic to a Lua script.

Maybe try to run the script in batches so that a long-running Lua script doesn't tie up Redis for other uses. Also need some benchmarks for all this.

Rebuilding Index

I have this code to rebuild my index

    Book.where(is_public: true).order(id: :desc).find_each do |book|
      recommender = Books_recommender.new
      if @books.try(:bookstats).present?
        recommender.add_to_matrix!(:readability, book.bookstats.readability, book.id)
      else
        recommender.add_to_matrix!(:readability, "50", book.id)
      end
      if book.downloadcount.present?
       recommender.add_to_matrix!(:popularity, book.downloadcount, book.id)
      end
      book.tags.each do |tag|
        recommender.add_to_matrix!(:tags, tag, book.id)
      end
    end

but I feel like this isn't the most efficient method, there are only 70,000 books I am using the LUA method and it has taken over 12 hours so far and it's still not done, and even if it is the most efficent method... I feel like rebuilding the index should be something in the core gem. Like how in sunspot / solr I have a variety of options for reindexing: https://github.com/sunspot/sunspot#reindexing-objects

get links, names, images and etc..

i would like to know if i can get the images and links from similarities.
for exemple i have a predictor for products
@recommender.add_to_matrix!(:products, "impression-1", "product-4")

and on the view

<% ProductRecommender.instance.similarities_for("product-3").each do |item| %>
<%= link_to product_path(item) do %>
<%= image_tag item.image.url(:thumb).to_s %>
<%= item.name%>
<% end%>
<% end %>

because rails is spitting out undefined method `name' for "product-6":String

thank's

could publish 2.4.0 gem?

Is this gem maintained?

I was hoping to integrate this library into a new rails project I am working on but it seems like the project has been mostly abandoned.

Are there any updates on the status of this project, considering the last commit was in 2015?

Users to follow

Hello!
I'm completely new to recommendation engines and other relevant stuff so i'm not sure if i'm getting it all well. I'm trying to create a user recommendation engine. Each user can follow any other user and i want to get a list of all users suggested for given user based on users he follows. Is that how it should work like?

Recommender class

class UserRecommender
  include Predictor::Base

  input_matrix :users, weight: 3.0
end

Init

rec = UserRecommender.new

Workflow

# John clicks a "Follow" button on Mike's profile
def follow(follower = 'john', following = 'mike')
  rec.users.add_to_set(following, follower)
  rec.process_items!(following)
end

# Jake clicks a "Follow" button on John profile
def follow(follower = 'jake', following = 'john')
  rec.users.add_to_set(following, follower)
  rec.process_items!(following)
end

# Jake wants to know what are some user recommendations for him based on
# users he follows
rec.similarities_for('jake')

It's a bit of a pseudo-code but i'm not sure how to show you how i'm imaging that workflow. Is that how it should look? Which parts should be handled by background workers - add_to_set, process_items! or both?

It would be super awesome to add some more real-life examples for basic scenarios in the readme or in the wiki - that could be one of these when we're gonna resolve it :)

I'm really desperate and excited to use that gem so it would be extremely great if anyone could explain that to me :) Thanks in advance!

Deleting a set

First of all, thanks for the library, it's great!
I've noticed that current API doesn't provide a way to delete a set.
For example, if we have an input matrix for articles favoured by users. If user removes an articles from favourites, then we want to reset his set. At the moment, we can either add new articles to his set or remove articles from all sets.
Is this something you have in plans to implement or would you accept a PR for this feature?

Document :measure option and Sorenson coefficient support in README

Best approach for updating recommendations?

I've got a website that has lessons, and I want to keep fresh predictions for my users. I base the recommendations on various things. Likes, users watching, related tags, etc and was wondering what is the recommended way to process recommendations to keep them fresh? When should I run recommender.process! vs updating individual lessons on their own? I'm using Heroku, and currently run process! once a day, but the docs don't give any advice in this regard.

Thank you in advance.

Support partial updates of item data

See previous discussion in #20.

Redis::CommandError: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.

Hi, we're using this gem in combination with hiredis v0.6.0. We've imported about 100k items previously using add_to_matrix (no bang), and now we're trying to process the entire dataset. With the lua processor, the code runs, then after a minute or so crashes:

[7] pry(main)> recommender.process!
Redis::CommandError: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.

Now interestingly, I tried chunking the processing, and it still seems to happen:

[7] pry(main)> recommender.all_items.each_slice(100) {|r| recommender.process_items!(*r); puts "chunk!" }
chunk!
chunk!
Redis::CommandError: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.

Possibly related to some sort of a timeout in the redis client?

Excluding a product-set of my current_user when showing similar products.

Hi guys,

I'm trying to exclude a product-set of my current_user in a similarity set.

products:
"user-1" -> "product-1", "product-2"
"project-1" -> "product-1", "product-2"
"benefits-1" -> "product-1", "product-2"

For the above matrix I want to show similarities of other users who like the same product as the current_user and also like... But the other products of the current_user shouldn't be displayed. I tried to use the following exclude, but it doesn't seem to work.

recommender.similarities_for("product-#{@product.id}", exclusion_set: ["user-#{current_user.id}"])

Any ideas on how to solve it?

How does the lua processing technique "blocks" redis?

Hi everyone,

The Readme states that the lua processing technique is substantially faster than the :ruby technique, but blocks the Redis server while each set of calculations are run

How does it "blocks" redis? For example, I have a bunch Sidekiq processes that are always accessing my redis instance, will the calculations interfere with that?

Thanks