Comments (5)
SELECT
cp.*
FROM
candidate_posts cp
INNER JOIN candidate_actors ca ON cp.actor_did = ca.did
INNER JOIN post_hotness ph
ON ph.post_uri = cp.uri AND ph.algo = @algo AND
ph.generated_at = @generated_at
WHERE
cp.is_hidden = false
AND cp.deleted_at IS NULL
AND ca.status = 'approved'
AND (@require_tags::TEXT[] = '{}' OR @require_tags::TEXT[] <@ cp.tags)
AND (@exclude_tags::TEXT[] = '{}' OR NOT (@exclude_tags::TEXT[] && cp.tags))
AND (ph.hotness < @hotness_cursor)
ORDER BY
ph.hotness DESC
LIMIT @_limit;
Following discussion with Tolf - we like the idea of having a background generation process that spits scores out to a table that can be joined in.
from bsky-furry-feed.
still writing up some design notes for this, but what do you think about different tables for different algorithms rather than just putting them all in the same table? i think my feeling is that because each algorithm is semantically distinct, it might not make a lot of sense to put them in the same table and have the "hotness" value have a very different meaning per algorithm, but i don't have a super strong opinion either way!
from bsky-furry-feed.
still writing up some design notes for this, but what do you think about different tables for different algorithms rather than just putting them all in the same table? i think my feeling is that because each algorithm is semantically distinct, it might not make a lot of sense to put them in the same table and have the "hotness" value have a very different meaning per algorithm, but i don't have a super strong opinion either way!
I see the argument from a semantic side, as the hotness across different algos won't be comparable, but I do think that splitting the tables for it will be more pain than it's worth. It'll reduce our ability to introduce new algos dynamically in future and general housekeeping tasks will be more complex (e.g the background task that cleans out old post hotness scores).
I'm also unsure how well sqlc
and other parts of our toolchain will play with this.
If we wanted to track what went into the hotness score for debugging purposes, we could probably just a JSONB field for this (especially as I doubt we'll ever search by it and it'd mostly be for debugging)
from bsky-furry-feed.
leaving this here for now until it finds a better home:
schema
CREATE TABLE post_hotness (
uri TEXT PRIMARY KEY,
alg TEXT NOT NULL,
score REAL NOT NULL,
generated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX post_hotness_score_idx ON post_hotness (alg, score);
formula
timebase = 2 # hours
gravity = 1.85
score = likes / (t + timebase) ** gravity
materializing query (every 5 minutes)
BEGIN;
DELETE FROM post_hotness
WHERE generated_at < NOW() - INTERVAL '30 minutes';
INSERT INTO post_hotness (uri, alg, score)
SELECT
cp.uri,
'classic',
(SELECT COUNT(*) FROM candidate_likes cl WHERE cl.subject_uri = cp.uri AND cl.deleted_at IS NULL) /
(EXTRACT(EPOCH FROM NOW() - cp.created_at) / (60 * 60) + 2) ^
1.85
FROM candidate_posts cp
WHERE
cp.deleted_at IS NULL AND
cp.created_at >= NOW() - INTERVAL '48 hours'; -- only compute score over last 48 hours
COMMIT;
selection query
SELECT
cp.*
FROM
candidate_posts cp
INNER JOIN candidate_actors ca ON cp.actor_did = ca.did
INNER JOIN post_hotness ph
ON ph.post_uri = cp.uri AND ph.alg = @alg AND
ph.generated_at = @generated_at
WHERE
cp.is_hidden = false
AND ca.status = 'approved'
AND (COALESCE($1::TEXT[], '{}') = '{}' OR $1::TEXT[] && cp.hashtags)
AND ($2::BOOLEAN IS NULL OR COALESCE(cp.has_media, false) = $2)
AND ($3::BOOLEAN IS NULL OR (ARRAY['nsfw', 'mursuit', 'murrsuit'] && cp.hashtags) = $3)
AND (cp.indexed_at < $4)
AND cp.deleted_at IS NULL
AND (ph.hotness < @hotness_cursor)
ORDER BY
ph.hotness DESC
LIMIT @_limit;
from bsky-furry-feed.
Completed by #127
from bsky-furry-feed.
Related Issues (20)
- Include hash tags in alt text
- Upgrade to Go 1.21
- Refactor Transactions
- GetPostsWithLikes and GetFurryNewFeed cursors should take URI into account
- add support for "pinned" users on certain feeds HOT 1
- Implement dark mode + light mod for the website
- furry-new feed without the #nsfw tag
- Self labels test keeps failing HOT 1
- Cannot approve actor with no profile configured HOT 1
- Improve error handling
- Run `sqlfluff` in PR CI/CD to ensure SQL files are formatted
- add fallback for obtaining a new refresh token from username/password if refresh token expires
- "Top" feed
- Furry Writers feed
- IaC for secrets
- Use OIDC to access GCP
- Run Migrations in Deployment workflow
- Admin Panel Request - Reject Reasons
- Build image proxy service into our API
- handle federation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bsky-furry-feed.