September 17th, 2024

A Simple Spaced Repetition Algo (In Ugly SQL)

A spaced-repetition software algorithm for a podcasting platform was developed in SQL, focusing on user feedback, simplicity, and cost-effectiveness while optimizing learning through timestamps and performance adjustments.

Read original articleLink Icon
A Simple Spaced Repetition Algo (In Ugly SQL)

The article discusses the development of a spaced-repetition software (SRS) algorithm implemented in SQL for a podcasting platform called flashcasts. The author aimed to create a system that is optional, debuggable, and cost-effective. The algorithm allows users to provide feedback on their memory of flashcards, adjusting future episodes based on their input without penalizing them for inconsistent usage. The author highlights the importance of simplicity in the algorithm, referencing the SM-2 algorithm as a more complex alternative. The proposed algorithm sorts cards based on timestamps, penalizes them by episode intervals, and incorporates a learning curve based on card length. It also includes a feedback loop that adjusts the sorting based on user scores. The final SQL code combines these elements to generate new episodes for each feed, ensuring that the system remains economical and stable. The author expresses a desire for feedback on simplifying complex processes in software development.

- The SRS algorithm is designed to be optional and user-friendly, accommodating inconsistent usage.

- It incorporates a feedback mechanism that adjusts future episodes based on user performance.

- The algorithm prioritizes simplicity and debuggability, contrasting with more complex alternatives.

- SQL is chosen for its cost-effectiveness and efficiency in handling the algorithm's requirements.

- The final implementation includes various factors like card timestamps, length, and user scores to optimize learning.

Link Icon 3 comments
By @sivers - 4 months
So weird how some minds think alike in more than one aspect.

I already loved your writing, but I never would have predicted that anyone else (besides me) was trying to put spaced repetition algorithms into SQL.

I've been in touch with the guys making Open Spaced Repetition: https://github.com/open-spaced-repetition

... specifically the Free Spaced Repetition Scheduler (FSRS), of which they recommended if we're going to copy one of the implementations it should be the TypeScript one, since that's the best documented: https://github.com/open-spaced-repetition/ts-fsrs + https://open-spaced-repetition.github.io/ts-fsrs/

I've been working on it the last few days and was working on it this morning, then I saw this/your post on HN. Jaw dropped.

I always enjoy making things from scratch as a way of getting to know them, but in this case it seems wise (and a bit of a new experience for me) to copy someone else's open source work and try to do a real implementation of the FSRS-5 scheduler in PostgreSQL PL/pgSQL functions.

Then cards and their data-logic can stay together in one place, and anyone can use whatever language they want for UI/API.

By @paveworld - 4 months
Why wouldn’t you use something like this instead?

WITH episode_ AS (...), -- stale episodes s_ AS ( SELECT e.feed_id, s.card_id, MAX(e.created_at) AS last_created_at, AVG(COALESCE(s.score::INT::REAL, 0.5)) AS _avg, COUNT(s.score) AS _count FROM score s INNER JOIN episode e ON s.episode_id = e.episode_id INNER JOIN feed f ON e.feed_id = f.feed_id WHERE f.usr_id IS NOT NULL GROUP BY e.feed_id, s.card_id ) INSERT INTO episode (episode_id, feed_id, card_ids) SELECT LEFT(MD5(RANDOM()::TEXT), 16), feed_id, ARRAY_AGG(card_id) FROM ( SELECT feed_id, card_id FROM ( SELECT feed_id, card_id, per_episode, ROW_NUMBER() OVER ( PARTITION BY feed_id ORDER BY COALESCE(s_.last_created_at, c.created_at, NOW()) + f.every * RANDOM() * COALESCE(fd.chaos, 0.3) + f.every * COALESCE(s_._count, 7) ^ COALESCE(fd.deceleration, 0.7) * s_._avg ^ COALESCE(fd.feedback, 1.2) + f.every * LN(1 + c.bytes_b) * COALESCE(fd.ease, 1.0), RANDOM() ) AS n FROM episode_ e_ INNER JOIN feed f USING (feed_id) INNER JOIN feed_deck fd USING (feed_id) INNER JOIN card c USING (deck_id) LEFT JOIN s_ ON s_.feed_id = e_.feed_id AND s_.card_id = c.card_id WHERE NOT c.card_id = ANY(fd.hidden_card_ids) ) a WHERE per_episode >= n ) b GROUP BY feed_id;

By @thraxil - 4 months
I built an SRS app in Django about 16 years ago: https://github.com/thraxil/sebastian/tree/main and have been using it ever since. The core part of the business logic is here: https://github.com/thraxil/sebastian/blob/main/sebastian/lei...

It's not as terse as the SQL, but I think it's pretty easy to understand and it's proven itself over 16 years of regular use (currently around 12k cards learned). For each card, I track the rung, which more or less determines the interval, set a due date on the card after it's tested mostly just using the intervals from Sebastian Leitner's original research. Then, I can always just select the card with the closest due date < today.