GithubHelp home page GithubHelp logo

Can I use only click data? about metarank HOT 3 CLOSED

metarank avatar metarank commented on June 24, 2024
Can I use only click data?

from metarank.

Comments (3)

shuttie avatar shuttie commented on June 24, 2024

The reason there is a requirement to have a ranking/impression data - we use it to teach an underlying ML model on what item is relevant, and what is not. So generally speaking, the Learn-to-Rank approach is about training some sort of a binary classifier, which is then asked a question "given items A and B, which one of them should be ranked higher?"

If you have impression data with information that items A-B-C-D were shown, and item C was clicked afterwards, then you can make an assumption that items A+B+C were examined by the visitor, but only C is actually relevant from this group. So you teach model that C should be ranked higher than A and B.

The other important reason why ranking data is essential is position information: people tend to click on first items in the list much more frequently. In ecommerce the number is around 50% of clicks going on top-5 results. Then a click on position 1 is not really giving you much information on was this item relevant at all. But if visitor scrolled to the bottom of the list and only there found something relevant enough to make a click - then this click is a pure gold from the value perspective.

And the last point: we have quite a lot of quite important feature extractors using the impression information, like rate one to compute per-item CTR/conversion rates. Which item will be clicked, the one with 100 clicks and 100k impressions, or the one with 50 clicks and 100 impressions? If you only count clicks, it's not that clear anymore, as everything is relevant.

In recommender systems theory there is an approach to deal with this type of problem when there is no negative feedback: you can just sample some random non-clicked items from the inventory and imagine that they are your negative samples. But usually it gives a sub-par result in quality: it's still a synthetic data and saying that this particular pair of socks is less relevant than a teapot - probably will lead to bad ranking results.

I'm wondering why do you have this problem? Is it more about collecting historical data to train the model? In our demo (the one on demo.metarank.ai and in RanklensTest) we use only around 5k user sessions and it's more than enough to get the real impact on the ranking, so I guess you don't need to wait years to collect enough data for initial training. This ranking events from the JSON format perspective are the actual request bodies you send to Metarank API to do the reranking itself, so you only need to log them and wait for some time.

from metarank.

laxmimerit avatar laxmimerit commented on June 24, 2024

Thank you so much for such a detailed explanation!
I have got the impression data with the relative position where it was shown. I don't find any position or any other related variable in your movie dataset. In the ranking event, there is a list of items with zero relevancy for all items. In the interaction event, there is just item_id and other session-related info.

How algorithm will know the position of the click?

If it is taking index position from ranking event then I don't think it is a correct way to do it. Because for one session, it is okay but when data is shown over the multiple session (which is an ideal case in e-commerce), it is very much possible that the impression list sequence will vary a lot. In that case, it would be kind of impossible to relate click and impression position. I would like to also extend my question over the relevancy. For what purpose it is used and why it is zero?

from metarank.

shuttie avatar shuttie commented on June 24, 2024

There is an event schema doc describing the format of the events, and there are a couple of important points from there:

  • all events do have an unique identifier, the id field
  • each interaction event as an explicit impression field, pointing to a corresponding parent ranking event identifier, so later they both can be joined together. We actually join ranking with all the interactions happened later into a single click-through. So for each ranking we can clearly see which items were clicked, and which were examined and ignored.
  • in ranking event there is an items field with ordering of items, which were actually displayed to the visitor. Their ordering is important, as position is taken from there.

So there is no single constant ranking of items needed in Metarank. Each time you present a listing (for example, search results) to a visitor, you should emit it as an event downstream. And each time visitor clicks on an item in this particular ranking, you also should sent a yet another event with clicked item id AND parent ranking, which resulted in this click.

from metarank.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.