Hi, Thanks for stepping in to solve one of the major issue in personalized content

There is an <a href="https://github.com/metarank/metarank/blob/master/doc/xx_event_sch

Can I use only click data? about metarank HOT 3 CLOSED

metarank commented on June 24, 2024

Can I use only click data?

from metarank.

Comments (3)

shuttie commented on June 24, 2024

The reason there is a requirement to have a ranking/impression data - we use it to teach an underlying ML model on what item is relevant, and what is not. So generally speaking, the Learn-to-Rank approach is about training some sort of a binary classifier, which is then asked a question "given items A and B, which one of them should be ranked higher?"

If you have impression data with information that items A-B-C-D were shown, and item C was clicked afterwards, then you can make an assumption that items A+B+C were examined by the visitor, but only C is actually relevant from this group. So you teach model that C should be ranked higher than A and B.

The other important reason why ranking data is essential is position information: people tend to click on first items in the list much more frequently. In ecommerce the number is around 50% of clicks going on top-5 results. Then a click on position 1 is not really giving you much information on was this item relevant at all. But if visitor scrolled to the bottom of the list and only there found something relevant enough to make a click - then this click is a pure gold from the value perspective.

And the last point: we have quite a lot of quite important feature extractors using the impression information, like rate one to compute per-item CTR/conversion rates. Which item will be clicked, the one with 100 clicks and 100k impressions, or the one with 50 clicks and 100 impressions? If you only count clicks, it's not that clear anymore, as everything is relevant.

In recommender systems theory there is an approach to deal with this type of problem when there is no negative feedback: you can just sample some random non-clicked items from the inventory and imagine that they are your negative samples. But usually it gives a sub-par result in quality: it's still a synthetic data and saying that this particular pair of socks is less relevant than a teapot - probably will lead to bad ranking results.

I'm wondering why do you have this problem? Is it more about collecting historical data to train the model? In our demo (the one on demo.metarank.ai and in RanklensTest) we use only around 5k user sessions and it's more than enough to get the real impact on the ranking, so I guess you don't need to wait years to collect enough data for initial training. This ranking events from the JSON format perspective are the actual request bodies you send to Metarank API to do the reranking itself, so you only need to log them and wait for some time.

from metarank.

laxmimerit commented on June 24, 2024

Thank you so much for such a detailed explanation!
I have got the impression data with the relative position where it was shown. I don't find any position or any other related variable in your movie dataset. In the ranking event, there is a list of items with zero relevancy for all items. In the interaction event, there is just item_id and other session-related info.

How algorithm will know the position of the click?

If it is taking index position from ranking event then I don't think it is a correct way to do it. Because for one session, it is okay but when data is shown over the multiple session (which is an ideal case in e-commerce), it is very much possible that the impression list sequence will vary a lot. In that case, it would be kind of impossible to relate click and impression position. I would like to also extend my question over the relevancy. For what purpose it is used and why it is zero?

from metarank.

shuttie commented on June 24, 2024

There is an event schema doc describing the format of the events, and there are a couple of important points from there:

all events do have an unique identifier, the id field
each interaction event as an explicit impression field, pointing to a corresponding parent ranking event identifier, so later they both can be joined together. We actually join ranking with all the interactions happened later into a single click-through. So for each ranking we can clearly see which items were clicked, and which were examined and ignored.
in ranking event there is an items field with ordering of items, which were actually displayed to the visitor. Their ordering is important, as position is taken from there.

So there is no single constant ranking of items needed in Metarank. Each time you present a listing (for example, search results) to a visitor, you should emit it as an event downstream. And each time visitor clicks on an item in this particular ranking, you also should sent a yet another event with clicked item id AND parent ranking, which resulted in this click.

from metarank.

Can I use only click data? about metarank HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs