GithubHelp home page GithubHelp logo

Comments (5)

strideynet avatar strideynet commented on June 19, 2024
SELECT
    cp.*
FROM
    candidate_posts cp
        INNER JOIN candidate_actors ca ON cp.actor_did = ca.did
        INNER JOIN post_hotness ph
                   ON ph.post_uri = cp.uri AND ph.algo = @algo AND
                      ph.generated_at = @generated_at
WHERE
      cp.is_hidden = false
  AND cp.deleted_at IS NULL
  AND ca.status = 'approved'
  AND (@require_tags::TEXT[] = '{}' OR @require_tags::TEXT[] <@ cp.tags)
  AND (@exclude_tags::TEXT[] = '{}' OR NOT (@exclude_tags::TEXT[] && cp.tags))
  AND (ph.hotness < @hotness_cursor)
ORDER BY
    ph.hotness DESC
LIMIT @_limit;

Following discussion with Tolf - we like the idea of having a background generation process that spits scores out to a table that can be joined in.

from bsky-furry-feed.

itstolf avatar itstolf commented on June 19, 2024

still writing up some design notes for this, but what do you think about different tables for different algorithms rather than just putting them all in the same table? i think my feeling is that because each algorithm is semantically distinct, it might not make a lot of sense to put them in the same table and have the "hotness" value have a very different meaning per algorithm, but i don't have a super strong opinion either way!

from bsky-furry-feed.

strideynet avatar strideynet commented on June 19, 2024

still writing up some design notes for this, but what do you think about different tables for different algorithms rather than just putting them all in the same table? i think my feeling is that because each algorithm is semantically distinct, it might not make a lot of sense to put them in the same table and have the "hotness" value have a very different meaning per algorithm, but i don't have a super strong opinion either way!

I see the argument from a semantic side, as the hotness across different algos won't be comparable, but I do think that splitting the tables for it will be more pain than it's worth. It'll reduce our ability to introduce new algos dynamically in future and general housekeeping tasks will be more complex (e.g the background task that cleans out old post hotness scores).

I'm also unsure how well sqlc and other parts of our toolchain will play with this.

If we wanted to track what went into the hotness score for debugging purposes, we could probably just a JSONB field for this (especially as I doubt we'll ever search by it and it'd mostly be for debugging)

from bsky-furry-feed.

itstolf avatar itstolf commented on June 19, 2024

leaving this here for now until it finds a better home:

schema

CREATE TABLE post_hotness (
    uri TEXT PRIMARY KEY,
    alg TEXT NOT NULL,
    score REAL NOT NULL,
    generated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX post_hotness_score_idx ON post_hotness (alg, score);

formula

timebase = 2 # hours
gravity = 1.85
score = likes / (t + timebase) ** gravity

materializing query (every 5 minutes)

BEGIN;

DELETE FROM post_hotness
WHERE generated_at < NOW() - INTERVAL '30 minutes';

INSERT INTO post_hotness (uri, alg, score)
SELECT
    cp.uri,
    'classic',
    (SELECT COUNT(*) FROM candidate_likes cl WHERE cl.subject_uri = cp.uri AND cl.deleted_at IS NULL) /
        (EXTRACT(EPOCH FROM NOW() - cp.created_at) / (60 * 60) + 2) ^
        1.85
FROM candidate_posts cp
WHERE
    cp.deleted_at IS NULL AND
    cp.created_at >= NOW() - INTERVAL '48 hours';  -- only compute score over last 48 hours

COMMIT;

selection query

SELECT
    cp.*
FROM
    candidate_posts cp
INNER JOIN candidate_actors ca ON cp.actor_did = ca.did
INNER JOIN post_hotness ph
            ON ph.post_uri = cp.uri AND ph.alg = @alg AND
                ph.generated_at = @generated_at
WHERE
      cp.is_hidden = false
  AND ca.status = 'approved'
  AND (COALESCE($1::TEXT[], '{}') = '{}' OR $1::TEXT[] && cp.hashtags)
  AND ($2::BOOLEAN IS NULL OR COALESCE(cp.has_media, false) = $2)
  AND ($3::BOOLEAN IS NULL OR (ARRAY['nsfw', 'mursuit', 'murrsuit'] && cp.hashtags) = $3)
  AND (cp.indexed_at < $4)
  AND cp.deleted_at IS NULL
  AND (ph.hotness < @hotness_cursor)
ORDER BY
    ph.hotness DESC
LIMIT @_limit;

from bsky-furry-feed.

strideynet avatar strideynet commented on June 19, 2024

Completed by #127

from bsky-furry-feed.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.