The current method of pagination does not take score sorting into account. News items

Even if the Mongo feature <a class="user-mention notranslate" data-hovercard-type="use

Continuing from <a class="user-mention notranslate" data-hovercard-type="user" data-ho

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Score Aware Pagination about pullup HOT 13 CLOSED

larvalabs commented on August 27, 2024

Score Aware Pagination

from pullup.

Comments (13)

marcofiset commented on August 27, 2024

Do you have any idea how we could achieve this without loading the whole database every time?

I can't think of something other than limiting results to posts of the last X days or less. But then a post submitted X + 1 days ago that still has a lot of interest won't show up anymore.

Perhaps store the score in the database so we can sort on it? And on every vote, update the score, or something like that. But score is also based on time so that will require more thinking.

from pullup.

dfjones commented on August 27, 2024

Sure, here are some thoughts.

I think we will need to persist the score information somewhere. Right now the database is the obvious choice, but a cache or something like Redis with sorted sets could be options as well.

Assuming we go with the database, I think we will want to store the score either on the news item or perhaps in its own collection. When a vote is written, we should update the score as well. Since the score is time based, I think we will need a background tasks that updates the scores in the database at some frequency (every minute, for example). To keep this from writing to all news items, I think we will want to define a policy to limit the number of documents that are updated. For example, news items with a score < 1 will no longer be updated. This way, scores will naturally become frozen as an item ages. To choose the right value, I think we will want to do some analysis of the score function and what values it emits as an item ages. We don't want to prevent fresh items from being updated.

from pullup.

treygriffith commented on August 27, 2024

As far as I'm aware, there is no good way of doing this in MongoDB, although there is a planned (but not scheduled) feature supporting it: https://jira.mongodb.org/browse/SERVER-153

Constantly updating the database with new scores as time passes seems like a pretty limited solution.

My vote is to pull all the records from the database and perform the sort and slice in the application until it becomes a problem (which I don't expect until we have a lot more users than we currently do), at which point we can either do some caching (in Redis or similar) or determine the age of the set we should be fetching from the database to perform the sort and slice. (i.e. determine X in @marco-fiset's answer with a high level of confidence).

from pullup.

dfjones commented on August 27, 2024

Even if the Mongo feature @treygriffith mentioned was implemented, running any sort of custom JavaScript in Mongo isn't a long-term scalable solution either.

I think every solution to this problem will have some sort of tradeoffs associated with them. What @treygriffith suggested will work, but won't scale long term. However, I think if we limited the query to something like 1000 news items, the query & sorting will probably be fast enough for a long time.

from pullup.

treygriffith commented on August 27, 2024

Totally true that it's not scalable. It's basically a punt on this issue until we have more data about which trade offs are most appropriate given the application's real world usage.

That said, doing a 1000 item limit would do effectively for the first page. I'm not sure how that would work for the nth page though, since you can't do a .skip
—
Sent from Mailbox for iPhone

On Tue, Feb 18, 2014 at 6:58 PM, Doug Jones [email protected]
wrote:

Even if the Mongo feature @treygriffith mentioned was implemented, running any sort of custom JavaScript in Mongo isn't a long-term scalable solution either.

I think every solution to this problem will have some sort of tradeoffs associated with them. What @treygriffith suggested will work, but won't scale long term. However, I think if we limited the query to something like 1000 news items, the query & sorting will probably be fast enough for a long time.

Reply to this email directly or view it on GitHub:
#223 (comment)

from pullup.

dfjones commented on August 27, 2024

I think if we are going to load 1000 items at a time, you'd use a query like the one we already have and I've copied in the original description. Then, sort all 1000 items by score, then select page N of the items from the sorted array. So, the most pages we could support would be 1000/M where M is the number of items on a page.

The big downside is that every page view has to query for and sort 1000 items.

from pullup.

chall8908 commented on August 27, 2024

Continuing from @dfjones, posts beyond this 1000 limit could be "archived" and viewable from a separate, differently sorted (e.g. only post time) list view.

from pullup.

treygriffith commented on August 27, 2024

@dfjones I see what you're saying now.

I don't think that querying for 1000 records and sorting is actually a big problem - I imagine that most of the time spent in the query is overhead in the actual request itself, and doesn't vary much with the size of the result set (at least for 1000 results). The sort itself is very fast (I'd bet less than 2 tenths of a millisecond for 1000 items).

from pullup.

dfjones commented on August 27, 2024

Something to keep in mind is that for each news item we have to perform a query to join in the item's vote count, comment count, and author info. So we are seemingly in the neighborhood of 3000 queries to build 1 page. (Although, Mongoose might be batching or doing something to reduce query overhead. I don't know it well enough to say one way or the other.)

This probably won't be an issue for the immediate future. But when the day comes to optimize, this probably should be the first place we look at :-)

from pullup.

dfjones commented on August 27, 2024

Also, we should probably close this issue. I don't think I have the rights to do it. @megamattron ?

from pullup.

treygriffith commented on August 27, 2024

@dfjones The queries for vote count and comment count are batched together, so that's only one extra query each. I'm also pretty sure that Mongoose batches populate queries, so that's one more query for author info. For those queries, we should be doing pretty well (4 queries total, regardless of the number of news items).

I did just notice that a recent PR to get the latest comment time performs a separate query for each news item, so that's 899 unnecessary queries that can be eliminated.

But I agree that this issue is resolved.

from pullup.

treygriffith commented on August 27, 2024

I take back that bit about the comment counts - those are separate queries. I'm going to see what I can do about the comment counts as well as last comment time queries - Performing ~2000 queries to build the front page is really not a great idea.

from pullup.

megamattron commented on August 27, 2024

Looks like we've got a reasonable compromise in there now, so I'll close this.

from pullup.

Score Aware Pagination about pullup HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs