AI EducademyAIEducademy
🌳

AI 学习路径

🌱
AI 种子

从零开始

🌿
AI 萌芽

打好基础

🌳
AI 枝干

付诸实践

🏕️
AI 树冠

深入探索

🌲
AI 森林

精通AI

🔨

工程技能路径

✏️
AI 草图

从零开始

🪨
AI 雕刻

打好基础

⚒️
AI 匠心

付诸实践

💎
AI 打磨

深入探索

🏆
AI 杰作

精通AI

查看所有学习计划→

实验室

已加载 7 个实验
🧠神经网络游乐场🤖AI 还是人类?💬提示实验室🎨图像生成器😊情感分析器💡聊天机器人构建器⚖️伦理模拟器
进入实验室→
📝

博客

关于AI、教育和技术的最新文章

阅读博客→
nav.faq
🎯
使命

让AI教育触达每一个人、每一个角落

💜
价值观

开源、多语言、社区驱动

⭐
Open Source

在 GitHub 上公开构建

认识创始人→在 GitHub 上查看
立即开始
AI EducademyAIEducademy

MIT 许可证。开源项目

学习

  • 学习计划
  • 课程
  • 实验室

社区

  • GitHub
  • 参与贡献
  • 行为准则
  • 关于
  • 常见问题

支持

  • 请我喝杯咖啡 ☕
AI & 工程学习计划›⚒️ AI 匠心›课程›设计分布式缓存
⚡
AI 匠心 • 高级⏱️ 23 分钟阅读

设计分布式缓存

Design a Distributed Cache - Redis Architecture from Scratch

Caching is the single most effective technique for improving system performance. A well-designed distributed cache can reduce database load by 90%+ and cut response times from hundreds of milliseconds to sub-millisecond.

Step 1 - Requirements Gathering

Functional Requirements

  • Store key-value pairs with optional TTL (time-to-live)
  • Support common operations: GET, SET, DELETE, and bulk operations
  • Distribute data across multiple nodes for horizontal scaling

Non-Functional Requirements

  • Low latency - sub-millisecond reads for cache hits
  • High throughput - 100K+ operations per second per node
  • High availability - cache failures shouldn't bring down the system
  • Scalability - add nodes without downtime

When to Cache

  • Read-heavy workloads - when reads vastly outnumber writes
  • Expensive computations - ML model inference results, complex aggregations
  • Frequently accessed data - user sessions, product catalogues, configuration

Step 2 - Caching Strategies

Cache-Aside (Lazy Loading)

The most common pattern. Application checks the cache first; on a miss, it reads from the database, then populates the cache:

GET user:123 → Cache HIT → return cached data
GET user:456 → Cache MISS → query DB → SET cache → return data

Pros: only caches what's actually requested. Cons: cache misses incur latency; data can become stale.

Read-Through

The cache itself handles database reads on a miss. The application only talks to the cache layer, which abstracts the data source.

Write-Through

Every write goes to both the cache and the database synchronously. Ensures cache is always consistent but adds write latency.

Write-Behind (Write-Back)

Writes go to the cache immediately; the cache asynchronously flushes to the database in batches. Lower write latency but risks data loss if the cache node crashes before flushing.

| Strategy | Consistency | Read Latency | Write Latency | Use Case | |----------|-------------|-------------|---------------|----------| | Cache-Aside | Eventual | Miss penalty | N/A | General purpose | | Read-Through | Eventual | Miss penalty | N/A | Simplified reads | | Write-Through | Strong | Fast | Slower | Consistency-critical | | Write-Behind | Eventual | Fast | Fast | Write-heavy workloads |

\ud83e\udde0小测验

An e-commerce site needs to cache product prices that change infrequently but must be accurate within 5 minutes. Which strategy is most appropriate?

Step 3 - Eviction Policies

When the cache is full, which entries do we remove?

  • LRU (Least Recently Used) - evicts the entry that hasn't been accessed for the longest time. Best general-purpose policy
  • LFU (Least Frequently Used) - evicts the entry accessed least often. Better for skewed access patterns
  • TTL-based - entries expire after a set time regardless of access. Essential for data freshness
  • Random - surprisingly effective and very simple to implement
\ud83e\udd2f
Redis uses an approximated LRU algorithm that samples a small number of keys (default: 5) and evicts the least recently used among them. This is nearly as accurate as true LRU but requires zero additional memory overhead for tracking access order.

Step 4 - Consistent Hashing for Distribution

The Problem with Simple Hashing

With hash(key) % N nodes, adding or removing a node remaps nearly all keys, causing a massive cache miss storm.

Consistent Hashing with Virtual Nodes

Consistent hashing maps both keys and nodes onto a ring. Each key is assigned to the next node clockwise on the ring:

Hash Ring (0 to 2^32):
     Node A (3 virtual nodes: A1, A2, A3)
     Node B (3 virtual nodes: B1, B2, B3)
     Node C (3 virtual nodes: C1, C2, C3)

Key "user:123" → hash → lands between B2 and A1 → assigned to Node A

Virtual nodes solve the uneven distribution problem. Each physical node gets multiple positions on the ring, ensuring balanced load even with few physical nodes.

When a node is added or removed, only keys adjacent to it on the ring are remapped - approximately K/N keys (where K is total keys and N is node count), not all of them.

Consistent hashing ring showing physical nodes mapped to virtual nodes, with keys assigned to the nearest clockwise node
Consistent hashing ensures that adding or removing a node only remaps a small fraction of keys, preventing cache stampedes
\ud83e\udde0小测验

A cache cluster has 4 nodes. Node 3 fails. With simple modulo hashing (key % 4 → key % 3), what percentage of keys get remapped?

Step 5 - Cache Stampede Prevention

A cache stampede occurs when a popular cached entry expires and hundreds of concurrent requests simultaneously miss the cache and hit the database.

Solutions

Mutex/Locking: When a cache miss occurs, acquire a distributed lock. Only one request fetches from the database; others wait for the cache to be repopulated.

Probabilistic Early Expiry: Each request has a small random chance of refreshing the cache before the TTL actually expires. This spreads refresh load over time rather than creating a thundering herd at expiry.

Background Refresh: A background process refreshes popular cache entries before they expire. The cache is never empty for hot keys.

\ud83e\udd14
Think about it:Consider a flash sale where a product page receives 100,000 requests per second. The product details cache entry expires. Without stampede prevention, all 100K requests hit the database simultaneously. Which prevention strategy would you choose, and why?

Step 6 - Redis Cluster Architecture

Redis Cluster is the most widely deployed distributed cache:

  • 16,384 hash slots distributed across master nodes
  • Each key maps to a slot via CRC16(key) % 16384
  • Replication: each master has one or more replicas for failover
  • Automatic failover: if a master fails, its replica is promoted within seconds
Slots 0-5460:     Master A ←→ Replica A'
Slots 5461-10922: Master B ←→ Replica B'
Slots 10923-16383: Master C ←→ Replica C'

Monitoring Key Metrics

  • Hit ratio - target 95%+; below 80% indicates a caching problem
  • Memory usage - stay below 80% to avoid evictions
  • Eviction rate - high evictions mean the cache is undersized
  • Latency percentiles - p99 should be under 1ms for cache hits

Step 7 - Real-World Caching Patterns

| Pattern | Implementation | TTL | |---------|---------------|-----| | Session store | SET session:{id} {data} | 30 minutes | | API response cache | SET api:/users/123 {json} | 5 minutes | | Database query cache | SET query:{hash} {result} | 1 minute | | Rate limiter counter | INCR ratelimit:{user}:{window} | Window duration | | Feature flags | HSET features {flag} {value} | 10 seconds |

Cache Invalidation

The hardest problem in computer science (alongside naming things):

  • TTL-based - simple, tolerates staleness for the TTL duration
  • Event-driven - publish invalidation events when data changes (via Kafka or Redis Pub/Sub)
  • Version-based - include a version in the cache key; increment version on writes to effectively invalidate old entries
\ud83e\udde0小测验

Which cache invalidation approach provides the strongest consistency guarantee?

\ud83e\udd14
Think about it:Phil Karlton famously said: 'There are only two hard things in Computer Science - cache invalidation and naming things.' Why is cache invalidation so difficult in distributed systems? Think about what happens when a write to the database succeeds but the cache invalidation message is lost.

📚 Further Reading

  • Consistent Hashing - ByteByteGo - visual explanation of consistent hashing and virtual nodes
  • Redis Cluster Specification - official Redis Cluster architecture documentation
  • Caching Strategies - AWS - practical caching patterns for cloud applications
第 10 课,共 10 课已完成 0%
←设计网络爬虫
💎 AI 打磨→