Imagine you are given one task: every time a user opens the app, show them the tweets they will find most valuable - out of billions of possibilities - in under 200 milliseconds.
This is one of the hardest recommendation problems in industry. Let us design it from scratch.
The journey of a tweet from creation to your timeline
The Problem at Scale
Twitter (now X) handles roughly 500 million tweets per day. Each of its 350+ million monthly active users expects a feed that feels personally curated - relevant, timely, and engaging.
A chronological feed does not work at this scale. If you follow 500 accounts, you might have 10,000 unread tweets. Most of them are irrelevant. The AI feed's job is to surface the 50 that matter most to you.
💡
This is not just a ranking problem - it is a multi-stage system involving candidate generation, feature engineering, real-time scoring, content safety, and massive-scale serving.
Stage 1: Candidate Generation
You cannot score all 500 million daily tweets for every user. Instead, you narrow the field first.
Sources of candidates:
In-network: Tweets from accounts you follow (highest recall, moderate relevance).
Out-of-network: Tweets liked or retweeted by people you follow, or trending in your interest graph.
Topic-based: Tweets matching topics you have engaged with recently.
How it works:
A lightweight retrieval model (often embedding-based) generates a candidate set of roughly 1,000–5,000 tweets per user. This runs on a schedule or is triggered when the user opens the app.
\ud83e\udde0Verificação Rápida
Why does Twitter use candidate generation instead of scoring every tweet for every user?
Stage 2: The Ranking Model
Once you have a few thousand candidates, a heavy ranking model scores each one. This is where the real intelligence lives.
Features the model uses:
User features:
Interests inferred from past engagement (likes, retweets, replies, dwell time).
User-author affinity: how often you engage with this author.
User-topic affinity: does the tweet topic match your interest profile?
Social proof: have people similar to you engaged with this tweet?
Model architecture:
A deep neural network (often a multi-task model) predicts multiple engagement types simultaneously:
Probability of like.
Probability of retweet.
Probability of reply.
Probability of extended dwell time (reading for more than two seconds).
The final score is a weighted combination of these predictions, tuned to optimise a platform-level objective like "healthy engagement".
\ud83e\udd14
Think about it:
If you optimise purely for clicks and likes, you risk promoting sensational or outrage-driven content. How would you design the scoring formula to balance engagement with content quality?
Stage 3: Real-Time Features
Static features are not enough. The feed must respond to what is happening right now.
Recency decay: A tweet from 30 seconds ago gets a boost over one from three hours ago.
Trending topics: If a major event is unfolding, surface related tweets even if they are from outside the user's usual interests.
Session context: If a user has been scrolling for ten minutes, diversify the content to reduce fatigue.
Real-time engagement signals: A tweet going viral in the last five minutes gets a temporary boost.
These features require a streaming infrastructure - often Apache Kafka feeding into a feature store like Redis or Feast - that updates in near real-time.
\ud83e\udd2f
Twitter's engineering team reported that adding real-time features to their ranking model increased user engagement by 8 per cent - a massive improvement at their scale, translating to millions of additional daily interactions.
The "For You" Algorithm: Explore vs Exploit
The feed faces a classic exploration-exploitation dilemma:
Exploit: Show content similar to what the user has engaged with before. Safe, predictable, high short-term engagement.
Explore: Introduce new topics, new authors, and new formats. Riskier, but essential for long-term retention and content diversity.
Twitter's approach blends both:
Roughly 80 per cent of the feed is exploitation - content the model is confident the user will enjoy.
Roughly 20 per cent is exploration - new content injected to test the user's evolving interests.
The exploration ratio is itself a parameter tuned by A/B testing across millions of users.
\ud83e\udde0Verificação Rápida
What is the main risk of a feed that only 'exploits' and never 'explores'?
Content Moderation and Safety
Before any tweet reaches the feed, it passes through a safety layer:
Pre-ranking filter: Known spam, harassment, and policy-violating content is removed before ranking.
Post-ranking filter: After ranking, a secondary check catches edge cases - for example, a tweet that is individually acceptable but harmful when clustered with similar content.
User-level controls: Users can mute topics, block accounts, and mark content as "not interested", all of which feed back into their personalisation model.
The safety layer must be fast (adding no more than 10–20ms of latency) and conservative (when in doubt, filter it out and route to human review).
💡
Content moderation is not a separate system bolted on afterwards. It is deeply integrated into the ranking pipeline, influencing scores and filtering at every stage.
Scaling: Fan-Out on Read vs Fan-Out on Write
This is the architectural decision that defines Twitter's infrastructure.
Fan-out on write:
When a user tweets, push that tweet into the feed cache of every follower. Fast reads, but expensive writes - especially for accounts with millions of followers.
Fan-out on read:
When a user opens the app, pull the latest tweets from all accounts they follow and rank them on the fly. Cheap writes, but expensive reads.
Twitter's hybrid approach:
For most users (fewer than 10,000 followers): fan-out on write. Pre-compute and cache the feed.
For celebrity accounts (millions of followers): fan-out on read. Compute the feed at request time to avoid flooding millions of caches on every tweet.
This hybrid strategy balances write amplification against read latency, handling both a normal user's tweet and a pop star's viral post efficiently.
\ud83e\udd2f
When a celebrity with 50 million followers tweets, fan-out on write would require 50 million cache insertions - taking minutes and consuming enormous bandwidth. The hybrid approach handles this in milliseconds by deferring the work to read time.
Putting It All Together
The full pipeline for a single feed request:
User opens app → request hits the feed service.
Candidate generation retrieves ~3,000 tweets from in-network and out-of-network sources.
Feature store provides real-time and batch features for each candidate.
Ranking model scores all candidates in a single batched inference call.
Safety filters remove policy-violating content.
Exploration logic injects 20 per cent diverse content.
Final ranked list is returned to the client in under 200ms.
\ud83e\udd14
Think about it:
You are asked in an interview: "If you could add one new signal to improve the feed, what would it be and why?" Consider signals that are not commonly discussed - perhaps something related to user wellbeing or content credibility.
\ud83e\udde0Verificação Rápida
In Twitter's hybrid fan-out strategy, how are tweets from celebrity accounts handled?