AI EducademyAIEducademy
🌳

Trilha de Aprendizado em IA

🌱
AI Seeds

Comece do zero

🌿
AI Sprouts

Construa bases

🌳
AI Branches

Aplique na prática

🏕️
AI Canopy

Aprofunde-se

🌲
AI Forest

Domine a IA

🔨

Trilha de Engenharia e Código

✏️
AI Sketch

Comece do zero

🪨
AI Chisel

Construa bases

⚒️
AI Craft

Aplique na prática

💎
AI Polish

Aprofunde-se

🏆
AI Masterpiece

Domine a IA

Ver Todos os Programas→

Laboratório

7 experimentos carregados
🧠Playground de Rede Neural🤖IA ou Humano?💬Laboratório de Prompts🎨Gerador de Imagens😊Analisador de Sentimento💡Construtor de Chatbots⚖️Simulador de Ética
Entrar no Laboratório→
📝

Blog

Últimos artigos sobre IA, educação e tecnologia

Ler o Blog→
nav.faq
🎯
Missão

Tornar a educação em IA acessível para todos, em todo lugar

💜
Valores

Open Source, multilíngue e movido pela comunidade

⭐
Open Source

Construído de forma aberta no GitHub

Conheça o Criador→Ver no GitHub
Começar
AI EducademyAIEducademy

Licença MIT. Open Source

Aprender

  • Acadêmicos
  • Aulas
  • Laboratório

Comunidade

  • GitHub
  • Contribuir
  • Código de Conduta
  • Sobre
  • Perguntas Frequentes

Suporte

  • Me Pague um Café ☕
Acadêmicos de IA e Engenharia›🏆 AI Masterpiece›Aulas›Projetar o Feed IA do Twitter
🐦
AI Masterpiece • Avançado⏱️ 30 min de leitura

Projetar o Feed IA do Twitter

Design Twitter's AI Feed

Imagine you are given one task: every time a user opens the app, show them the tweets they will find most valuable - out of billions of possibilities - in under 200 milliseconds.

This is one of the hardest recommendation problems in industry. Let us design it from scratch.

Pipeline diagram showing tweet candidates flowing through ranking and filtering stages to produce a personalised feed
The journey of a tweet from creation to your timeline

The Problem at Scale

Twitter (now X) handles roughly 500 million tweets per day. Each of its 350+ million monthly active users expects a feed that feels personally curated - relevant, timely, and engaging.

A chronological feed does not work at this scale. If you follow 500 accounts, you might have 10,000 unread tweets. Most of them are irrelevant. The AI feed's job is to surface the 50 that matter most to you.

💡

This is not just a ranking problem - it is a multi-stage system involving candidate generation, feature engineering, real-time scoring, content safety, and massive-scale serving.

Stage 1: Candidate Generation

You cannot score all 500 million daily tweets for every user. Instead, you narrow the field first.

Sources of candidates:

  • In-network: Tweets from accounts you follow (highest recall, moderate relevance).
  • Out-of-network: Tweets liked or retweeted by people you follow, or trending in your interest graph.
  • Topic-based: Tweets matching topics you have engaged with recently.

How it works:

A lightweight retrieval model (often embedding-based) generates a candidate set of roughly 1,000–5,000 tweets per user. This runs on a schedule or is triggered when the user opens the app.

\ud83e\udde0Verificação Rápida

Why does Twitter use candidate generation instead of scoring every tweet for every user?

Stage 2: The Ranking Model

Once you have a few thousand candidates, a heavy ranking model scores each one. This is where the real intelligence lives.

Features the model uses:

User features:

  • Interests inferred from past engagement (likes, retweets, replies, dwell time).
  • Social graph: who you interact with most.
  • Demographics and language preferences.

Tweet features:

  • Content: text embeddings, media type, hashtags.
  • Author: follower count, engagement rate, trust score.
  • Freshness: age of the tweet in minutes.

Cross features:

  • User-author affinity: how often you engage with this author.
  • User-topic affinity: does the tweet topic match your interest profile?
  • Social proof: have people similar to you engaged with this tweet?

Model architecture:

A deep neural network (often a multi-task model) predicts multiple engagement types simultaneously:

  • Probability of like.
  • Probability of retweet.
  • Probability of reply.
  • Probability of extended dwell time (reading for more than two seconds).

The final score is a weighted combination of these predictions, tuned to optimise a platform-level objective like "healthy engagement".

\ud83e\udd14
Think about it:

If you optimise purely for clicks and likes, you risk promoting sensational or outrage-driven content. How would you design the scoring formula to balance engagement with content quality?

Stage 3: Real-Time Features

Static features are not enough. The feed must respond to what is happening right now.

  • Recency decay: A tweet from 30 seconds ago gets a boost over one from three hours ago.
  • Trending topics: If a major event is unfolding, surface related tweets even if they are from outside the user's usual interests.
  • Session context: If a user has been scrolling for ten minutes, diversify the content to reduce fatigue.
  • Real-time engagement signals: A tweet going viral in the last five minutes gets a temporary boost.

These features require a streaming infrastructure - often Apache Kafka feeding into a feature store like Redis or Feast - that updates in near real-time.

\ud83e\udd2f

Twitter's engineering team reported that adding real-time features to their ranking model increased user engagement by 8 per cent - a massive improvement at their scale, translating to millions of additional daily interactions.

The "For You" Algorithm: Explore vs Exploit

The feed faces a classic exploration-exploitation dilemma:

  • Exploit: Show content similar to what the user has engaged with before. Safe, predictable, high short-term engagement.
  • Explore: Introduce new topics, new authors, and new formats. Riskier, but essential for long-term retention and content diversity.

Twitter's approach blends both:

  • Roughly 80 per cent of the feed is exploitation - content the model is confident the user will enjoy.
  • Roughly 20 per cent is exploration - new content injected to test the user's evolving interests.

The exploration ratio is itself a parameter tuned by A/B testing across millions of users.

\ud83e\udde0Verificação Rápida

What is the main risk of a feed that only 'exploits' and never 'explores'?

Content Moderation and Safety

Before any tweet reaches the feed, it passes through a safety layer:

  1. Pre-ranking filter: Known spam, harassment, and policy-violating content is removed before ranking.
  2. Post-ranking filter: After ranking, a secondary check catches edge cases - for example, a tweet that is individually acceptable but harmful when clustered with similar content.
  3. User-level controls: Users can mute topics, block accounts, and mark content as "not interested", all of which feed back into their personalisation model.

The safety layer must be fast (adding no more than 10–20ms of latency) and conservative (when in doubt, filter it out and route to human review).

💡

Content moderation is not a separate system bolted on afterwards. It is deeply integrated into the ranking pipeline, influencing scores and filtering at every stage.

Scaling: Fan-Out on Read vs Fan-Out on Write

This is the architectural decision that defines Twitter's infrastructure.

Fan-out on write:

When a user tweets, push that tweet into the feed cache of every follower. Fast reads, but expensive writes - especially for accounts with millions of followers.

Fan-out on read:

When a user opens the app, pull the latest tweets from all accounts they follow and rank them on the fly. Cheap writes, but expensive reads.

Twitter's hybrid approach:

  • For most users (fewer than 10,000 followers): fan-out on write. Pre-compute and cache the feed.
  • For celebrity accounts (millions of followers): fan-out on read. Compute the feed at request time to avoid flooding millions of caches on every tweet.

This hybrid strategy balances write amplification against read latency, handling both a normal user's tweet and a pop star's viral post efficiently.

\ud83e\udd2f

When a celebrity with 50 million followers tweets, fan-out on write would require 50 million cache insertions - taking minutes and consuming enormous bandwidth. The hybrid approach handles this in milliseconds by deferring the work to read time.

Putting It All Together

The full pipeline for a single feed request:

  1. User opens app → request hits the feed service.
  2. Candidate generation retrieves ~3,000 tweets from in-network and out-of-network sources.
  3. Feature store provides real-time and batch features for each candidate.
  4. Ranking model scores all candidates in a single batched inference call.
  5. Safety filters remove policy-violating content.
  6. Exploration logic injects 20 per cent diverse content.
  7. Final ranked list is returned to the client in under 200ms.
\ud83e\udd14
Think about it:

You are asked in an interview: "If you could add one new signal to improve the feed, what would it be and why?" Consider signals that are not commonly discussed - perhaps something related to user wellbeing or content credibility.

\ud83e\udde0Verificação Rápida

In Twitter's hybrid fan-out strategy, how are tweets from celebrity accounts handled?

Aula 3 de 100% concluído
←Simulação Completa de Entrevista
Construa Seu Próprio Projeto de IA→