GithubHelp home page GithubHelp logo

Score Aware Pagination about pullup HOT 13 CLOSED

larvalabs avatar larvalabs commented on August 27, 2024
Score Aware Pagination

from pullup.

Comments (13)

marcofiset avatar marcofiset commented on August 27, 2024

Do you have any idea how we could achieve this without loading the whole database every time?

I can't think of something other than limiting results to posts of the last X days or less. But then a post submitted X + 1 days ago that still has a lot of interest won't show up anymore.

Perhaps store the score in the database so we can sort on it? And on every vote, update the score, or something like that. But score is also based on time so that will require more thinking.

from pullup.

dfjones avatar dfjones commented on August 27, 2024

Sure, here are some thoughts.

I think we will need to persist the score information somewhere. Right now the database is the obvious choice, but a cache or something like Redis with sorted sets could be options as well.

Assuming we go with the database, I think we will want to store the score either on the news item or perhaps in its own collection. When a vote is written, we should update the score as well. Since the score is time based, I think we will need a background tasks that updates the scores in the database at some frequency (every minute, for example). To keep this from writing to all news items, I think we will want to define a policy to limit the number of documents that are updated. For example, news items with a score < 1 will no longer be updated. This way, scores will naturally become frozen as an item ages. To choose the right value, I think we will want to do some analysis of the score function and what values it emits as an item ages. We don't want to prevent fresh items from being updated.

from pullup.

treygriffith avatar treygriffith commented on August 27, 2024

As far as I'm aware, there is no good way of doing this in MongoDB, although there is a planned (but not scheduled) feature supporting it: https://jira.mongodb.org/browse/SERVER-153

Constantly updating the database with new scores as time passes seems like a pretty limited solution.

My vote is to pull all the records from the database and perform the sort and slice in the application until it becomes a problem (which I don't expect until we have a lot more users than we currently do), at which point we can either do some caching (in Redis or similar) or determine the age of the set we should be fetching from the database to perform the sort and slice. (i.e. determine X in @marco-fiset's answer with a high level of confidence).

from pullup.

dfjones avatar dfjones commented on August 27, 2024

Even if the Mongo feature @treygriffith mentioned was implemented, running any sort of custom JavaScript in Mongo isn't a long-term scalable solution either.

I think every solution to this problem will have some sort of tradeoffs associated with them. What @treygriffith suggested will work, but won't scale long term. However, I think if we limited the query to something like 1000 news items, the query & sorting will probably be fast enough for a long time.

from pullup.

treygriffith avatar treygriffith commented on August 27, 2024

Totally true that it's not scalable. It's basically a punt on this issue until we have more data about which trade offs are most appropriate given the application's real world usage.

That said, doing a 1000 item limit would  do effectively for the first page. I'm not sure how that would work for the nth page though, since you can't do a .skip

Sent from Mailbox for iPhone

On Tue, Feb 18, 2014 at 6:58 PM, Doug Jones [email protected]
wrote:

Even if the Mongo feature @treygriffith mentioned was implemented, running any sort of custom JavaScript in Mongo isn't a long-term scalable solution either.

I think every solution to this problem will have some sort of tradeoffs associated with them. What @treygriffith suggested will work, but won't scale long term. However, I think if we limited the query to something like 1000 news items, the query & sorting will probably be fast enough for a long time.

Reply to this email directly or view it on GitHub:
#223 (comment)

from pullup.

dfjones avatar dfjones commented on August 27, 2024

I think if we are going to load 1000 items at a time, you'd use a query like the one we already have and I've copied in the original description. Then, sort all 1000 items by score, then select page N of the items from the sorted array. So, the most pages we could support would be 1000/M where M is the number of items on a page.

The big downside is that every page view has to query for and sort 1000 items.

from pullup.

chall8908 avatar chall8908 commented on August 27, 2024

Continuing from @dfjones, posts beyond this 1000 limit could be "archived" and viewable from a separate, differently sorted (e.g. only post time) list view.

from pullup.

treygriffith avatar treygriffith commented on August 27, 2024

@dfjones I see what you're saying now.

I don't think that querying for 1000 records and sorting is actually a big problem - I imagine that most of the time spent in the query is overhead in the actual request itself, and doesn't vary much with the size of the result set (at least for 1000 results). The sort itself is very fast (I'd bet less than 2 tenths of a millisecond for 1000 items).

from pullup.

dfjones avatar dfjones commented on August 27, 2024

Something to keep in mind is that for each news item we have to perform a query to join in the item's vote count, comment count, and author info. So we are seemingly in the neighborhood of 3000 queries to build 1 page. (Although, Mongoose might be batching or doing something to reduce query overhead. I don't know it well enough to say one way or the other.)

This probably won't be an issue for the immediate future. But when the day comes to optimize, this probably should be the first place we look at :-)

from pullup.

dfjones avatar dfjones commented on August 27, 2024

Also, we should probably close this issue. I don't think I have the rights to do it. @megamattron ?

from pullup.

treygriffith avatar treygriffith commented on August 27, 2024

@dfjones The queries for vote count and comment count are batched together, so that's only one extra query each. I'm also pretty sure that Mongoose batches populate queries, so that's one more query for author info. For those queries, we should be doing pretty well (4 queries total, regardless of the number of news items).

I did just notice that a recent PR to get the latest comment time performs a separate query for each news item, so that's 899 unnecessary queries that can be eliminated.

But I agree that this issue is resolved.

from pullup.

treygriffith avatar treygriffith commented on August 27, 2024

I take back that bit about the comment counts - those are separate queries. I'm going to see what I can do about the comment counts as well as last comment time queries - Performing ~2000 queries to build the front page is really not a great idea.

from pullup.

megamattron avatar megamattron commented on August 27, 2024

Looks like we've got a reasonable compromise in there now, so I'll close this.

from pullup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.