AI EducademyAIEducademy
कार्यक्रमलैबब्लॉगहमारे बारे में
साइन इन करें
AI EducademyAIEducademy

सभी के लिए, हर भाषा में मुफ्त AI शिक्षा।

सीखें

  • कार्यक्रम
  • पाठ
  • लैब
  • डैशबोर्ड
  • हमारे बारे में

समुदाय

  • GitHub
  • योगदान करें
  • आचार संहिता

सहायता

  • कॉफ़ी खरीदें ☕

सभी के लिए मुफ्त AI शिक्षा

MIT लाइसेंस — ओपन सोर्स

Programs›🏆 AI Masterpiece›Lessons›Design Twitter's AI Feed — End-to-End Case Study
🐦
AI Masterpiece • उन्नत⏱️ 30 मिनट पढ़ने का समय

Design Twitter's AI Feed — End-to-End Case Study

Design Twitter's AI Feed

This is a capstone case study — you'll combine everything from AI Sketch through AI Polish to design a complete AI-powered social media feed.


The Problem Statement

Design a system that serves personalised tweets to 500M+ users in real-time, ranking content by relevance, recency, and engagement potential.

What makes this hard:

  • Scale: 500M daily active users, 500M+ tweets/day
  • Latency: Feed must load in under 200ms
  • Freshness: New tweets appear in seconds
  • Personalisation: Each user sees a unique feed
  • Cold start: New users with no history
💡

This is the exact type of question asked at senior+ interviews at Meta, Google, Twitter/X, and LinkedIn. It tests DSA, system design, AND ML knowledge simultaneously.


Phase 1: Data Modelling (DSA Foundations)

Core Entities

User { id, name, followers[], following[], interests[] }
Tweet { id, authorId, content, media[], timestamp, metrics }
Engagement { userId, tweetId, type: view|like|retweet|reply, timestamp }

Key Data Structures

| Structure | Purpose | Why This One? | |-----------|---------|---------------| | Hash Map | User → profile lookup | O(1) access by userId | | Sorted Set | Timeline ranking | O(log n) insert + range queries | | Bloom Filter | "Already seen" dedup | Space-efficient set membership | | Inverted Index | Content search | Maps tokens → tweet IDs | | Graph (adjacency list) | Social connections | Follow/follower relationships |

🤔
Think about it:

Why use a Bloom Filter instead of a regular HashSet for "already seen" detection? With 500M users each seeing 100+ tweets/day, a HashSet would consume terabytes. A Bloom Filter uses ~1% of that memory with an acceptable false positive rate (~1%).


Phase 2: System Architecture

High-Level Components

┌─────────────┐     ┌──────────────┐     ┌─────────────────┐
│  Tweet       │────▶│  Fan-out     │────▶│  Timeline Cache  │
│  Ingestion   │     │  Service     │     │  (Redis Sorted   │
└─────────────┘     └──────────────┘     │   Sets)          │
                                          └────────┬────────┘
                                                   │
┌─────────────┐     ┌──────────────┐              ▼
│  User        │────▶│  Ranking     │     ┌─────────────────┐
│  Request     │     │  Service     │────▶│  Feed Response   │
└─────────────┘     └──────────────┘     └─────────────────┘
                           │
                    ┌──────┴──────┐
                    │  ML Model   │
                    │  Service    │
                    └─────────────┘

Fan-Out Strategy: Hybrid Approach

Fan-out on write for users with fewer than 10K followers:

  • When a user tweets, push to all follower timelines
  • Low-latency reads (timeline is pre-computed)
  • O(followers) writes per tweet

Fan-out on read for celebrities (> 10K followers):

  • Don't push — merge at read time
  • Avoids writing to millions of timelines
  • Slightly higher read latency
function handleNewTweet(tweet, author):
    if author.followerCount < CELEBRITY_THRESHOLD:
        // Fan-out on write
        for follower in author.followers:
            timelineCache.add(follower.id, tweet, score=tweet.timestamp)
    else:
        // Fan-out on read — store in celebrity tweets pool
        celebrityTweets.add(author.id, tweet)
⚠️

The hybrid approach is what Twitter actually uses. Pure fan-out-on-write doesn't scale for accounts with millions of followers (one tweet = millions of cache writes).


Phase 3: ML Ranking Model

Feature Engineering

The ranking model predicts: P(user engages with tweet)

| Feature Category | Examples | |-----------------|----------| | User features | interests, past engagement patterns, active hours | | Tweet features | age, media type, length, hashtags, author popularity | | Cross features | user-author interaction history, topic overlap | | Context features | time of day, device type, connection speed |

Two-Stage Ranking

Stage 1 — Candidate Generation (fast, broad):

  • Retrieve ~1000 candidates from timeline cache + celebrity pool
  • Use lightweight model (logistic regression) for initial scoring
  • Latency budget: 50ms

Stage 2 — Fine Ranking (slow, precise):

  • Re-rank top 200 candidates
  • Deep neural network with attention mechanism
  • Consider cross-features and sequence patterns
  • Latency budget: 100ms
function rankFeed(userId, candidates):
    // Stage 1: Coarse ranking
    scored = candidates.map(tweet =>
        ({tweet, score: lightweightModel.predict(userId, tweet)})
    )
    top200 = scored.sortBy(s => -s.score).slice(0, 200)

    // Stage 2: Fine ranking
    reranked = deepModel.batchPredict(userId, top200)

    // Stage 3: Business rules (diversity, freshness boost)
    return applyBusinessRules(reranked)

Diversity & Freshness

The ranking model alone would show only viral content. Apply post-ranking rules:

  • Topic diversity: No more than 3 consecutive tweets on the same topic
  • Author diversity: Spread out tweets from the same author
  • Freshness boost: Logarithmic time decay — newer tweets get bonus score
  • Exploration: 5% of feed slots show content from outside the user's bubble

Phase 4: Handling Edge Cases

Cold Start Problem

New users have no engagement history. Solutions:

  1. Onboarding interests: Ask users to pick 3-5 topics during signup
  2. Popular content fallback: Show trending tweets in selected topics
  3. Explore-exploit: Aggressively explore in first 48 hours, then exploit
  4. Demographic priors: Use location, language, device for initial signals

Real-Time Updates

// WebSocket connection for live updates
connection.onTweet(tweet => {
    if (isHighPriority(tweet)):  // Breaking news, close friend
        injectToFeed(tweet, position=TOP)
    else:
        showNewTweetsButton(count++)
})

Abuse & Safety

  • Content moderation: ML classifier flags harmful content before ranking
  • Rate limiting: Token bucket per user (see AI Craft — Rate Limiter)
  • Spam detection: Graph-based analysis of follow patterns + content similarity

Phase 5: Behavioural Discussion

In a real interview, expect follow-up questions:

🤔
Think about it:
  1. "How would you measure if the new ranking model is better?" → A/B testing with engagement metrics (CTR, time spent, DAU retention)
  2. "What happens when the ML model goes down?" → Graceful degradation to chronological timeline
  3. "How do you handle political content bias?" → Transparency reports, user controls, diverse committee review
  4. "Walk me through a production incident with this system" → Use the STAR framework from AI Polish

Summary: What You Demonstrated

| Skill Area | Applied Here | |-----------|-------------| | DSA (AI Sketch) | Hash maps, sorted sets, bloom filters, graphs | | Patterns (AI Chisel) | Two pointers (sliding window for time ranges), BFS (social graph) | | System Design (AI Craft) | Distributed architecture, caching, fan-out strategies | | Behavioural (AI Polish) | Trade-off discussion, incident handling, ethical considerations |

This is what a senior engineer answer looks like — it connects algorithms to architecture to real-world constraints.


Practice Exercises

  1. Estimate storage: How much Redis memory for 500M users × 800 tweets per timeline?
  2. Design the A/B testing system for rolling out a new ranking model safely
  3. Write pseudocode for the diversity algorithm that prevents topic clustering
  4. Prepare a 5-minute STAR story about a time you debugged a caching issue at scale
Lesson 1 of 30 of 3 completed
←Back to programFull Mock Interview — 60-Minute Senior Engineer Simulation→