AI EducademyAIEducademy
🌳

AI基础

🌱
AI 种子

从零开始

🌿
AI 萌芽

打好基础

🌳
AI 枝干

付诸实践

🏕️
AI 树冠

深入探索

🌲
AI 森林

精通AI

🔨

AI精通

✏️
AI 草图

从零开始

🪨
AI 雕刻

打好基础

⚒️
AI 匠心

付诸实践

💎
AI 打磨

深入探索

🏆
AI 杰作

精通AI

🚀

职业准备

🚀
面试发射台

开启你的旅程

🌟
行为面试精通

掌握软技能

💻
技术面试

通过编程轮次

🤖
AI与ML面试

ML面试精通

🏆
Offer与未来

拿下最好的Offer

查看所有学习计划→

实验室

已加载 7 个实验
🧠神经网络游乐场🤖AI 还是人类?💬提示实验室🎨图像生成器😊情感分析器💡聊天机器人构建器⚖️伦理模拟器
🎯模拟面试进入实验室→
学习旅程博客
🎯
关于

让AI教育触达每一个人、每一个角落

❓
常见问题

Common questions answered

✉️
Contact

Get in touch with us

⭐
Open Source

在 GitHub 上公开构建

立即开始
AI EducademyAIEducademy

MIT 许可证。开源项目

学习

  • 学习计划
  • 课程
  • 实验室

社区

  • GitHub
  • 参与贡献
  • 行为准则
  • 关于
  • 常见问题

支持

  • 请我喝杯咖啡 ☕
  • 服务条款
  • 隐私政策
  • 联系我们
AI & 工程学习计划›✏️ AI 草图›课程›排序与搜索
🔍
AI 草图 • 中级⏱️ 15 分钟阅读

排序与搜索

Finding Things Fast

When you search Google, results appear in under a second - ranked from most to least relevant. When Netflix recommends films, it sorts thousands of titles by how likely you are to enjoy them. Behind every fast lookup and every ranked list sits a sorting or searching algorithm.

Why Sorting Matters

Sorted data is powerful data. Once a list is in order, you can:

  • Search it efficiently using binary search (we'll get to this shortly).
  • Find duplicates - they'll be sitting right next to each other.
  • Identify the top-N results - just grab the first N items.
  • Merge datasets - combining two sorted lists is much faster than combining unsorted ones.
An unsorted array being transformed into a sorted array, with a magnifying glass highlighting binary search
Sorting transforms chaotic data into something searchable and structured.

Bubble Sort - Simple but Slow

Bubble sort repeatedly steps through the list, comparing adjacent items and swapping them if they're in the wrong order. Larger values "bubble" to the end.

[5, 3, 8, 1, 2]
 ↕
[3, 5, 8, 1, 2]  → swapped 5 and 3
[3, 5, 1, 8, 2]  → swapped 8 and 1
[3, 5, 1, 2, 8]  → swapped 8 and 2
... keep going until no swaps needed

Time complexity: O(n²) - for each of n items, you might compare against every other item. With 1,000 items, that's up to 1,000,000 comparisons. With 1,000,000 items? A trillion comparisons. Not practical for AI workloads.

🤔
Think about it:

If bubble sort takes roughly n² comparisons, how much slower would it be to sort a million items compared to a thousand? Think about the ratio: (1,000,000)² vs (1,000)². That's a million times slower - just for having a thousand times more data.

Merge Sort - Divide and Conquer

Merge sort takes a cleverer approach: split the list in half, sort each half, then merge the two sorted halves together.

[5, 3, 8, 1, 2, 7, 4, 6]
         split
[5, 3, 8, 1]   [2, 7, 4, 6]
    split            split
[5, 3] [8, 1]  [2, 7] [4, 6]
  ↓       ↓       ↓       ↓
[3, 5] [1, 8]  [2, 7] [4, 6]
    merge            merge
[1, 3, 5, 8]   [2, 4, 6, 7]
         merge
[1, 2, 3, 4, 5, 6, 7, 8]

Time complexity: O(n log n) - dramatically faster. For a million items, that's about 20 million comparisons instead of a trillion. This is the kind of algorithm that makes real AI systems possible.

第 3 课,共 10 课已完成 0%
←字符串与文本处理

Discussion

Sign in to join the discussion

建议修改本课内容
🤯

Python's built-in sort uses Timsort - a hybrid algorithm that combines merge sort with insertion sort. It was invented by Tim Peters in 2002 and is now used in Python, Java, and Android. It's specifically designed to perform well on real-world data that's often partially sorted already.

Bubble Sort vs Merge Sort at Scale

| Items | Bubble Sort (O(n²)) | Merge Sort (O(n log n)) | |-------|---------------------|-------------------------| | 100 | 10,000 ops | ~700 ops | | 10,000 | 100,000,000 ops | ~130,000 ops | | 1,000,000 | 1,000,000,000,000 ops | ~20,000,000 ops |

The difference isn't academic - it's the difference between "finishes in a second" and "finishes next week."

🧠小测验

Why is merge sort preferred over bubble sort for large datasets in AI applications?

Binary Search - The Phone Book Trick

Imagine looking up "Smith" in a phone book. You wouldn't start at page one and read every name. You'd open the book roughly in the middle, see where you are, and jump to the correct half. Then repeat.

That's binary search - and it only works on sorted data.

sorted_list = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
target = 23

Step 1: Middle = 16 → 23 > 16, search right half
Step 2: Middle = 38 → 23 < 38, search left half
Step 3: Middle = 23 → Found it!

Time complexity: O(log n). In a sorted list of one million items, binary search finds any item in at most 20 steps. A linear search would take up to one million steps.

🧠小测验

A sorted database contains 1,000,000 records. How many comparisons does binary search need in the worst case?

How AI Uses Sorting and Searching

Ranking Search Results

When Google processes your query, it scores every relevant page and sorts them by relevance. The top 10 results appear on page one. Without efficient sorting, this would take minutes instead of milliseconds.

Recommendation Systems

Netflix calculates a "match score" for thousands of titles based on your viewing history, then sorts them to show you the best matches first. The sorting algorithm directly affects what you see on your home screen.

K-Nearest Neighbours

This classic AI algorithm finds the K most similar items to a given input. It calculates distances to every item, then partially sorts to find the K smallest distances. Efficient sorting makes this practical for millions of data points.

💡

You don't always need to fully sort. If you only need the top 10 results from a million items, a partial sort or heap can find them in O(n log k) time - much faster than sorting everything.

Training Data Preparation

Before training, AI practitioners often sort data to create balanced batches - ensuring each training batch contains a mix of easy and hard examples, or a balanced distribution of categories.

When to Sort vs When to Use a Hash Map

This is a crucial design decision:

| Scenario | Best Choice | Why | |----------|------------|-----| | Find one item by key | Hash map | O(1) lookup | | Find the top-10 items | Sort | Need ordered results | | Check if item exists | Hash map | O(1) vs O(log n) | | Get items in order | Sort | Hash maps have no order | | Range queries (items between A and B) | Sorted array + binary search | Hash maps can't do ranges |

🤔
Think about it:

A music streaming service needs to show your "Top 50 most played songs." Would you sort your entire listening history, or maintain a data structure that always knows the top 50? What are the trade-offs of each approach?

🤯

Google processes over 8.5 billion searches per day. Each search involves sorting and ranking hundreds of potential results in milliseconds. The efficiency of sorting algorithms directly impacts how much electricity Google's data centres consume - better algorithms literally save megawatts of power.

🧠小测验

When would binary search NOT be appropriate?

Key Takeaways

  • Sorting transforms chaotic data into structured, searchable data - essential for ranking and recommendations.
  • O(n²) algorithms like bubble sort are educational but impractical at scale; O(n log n) algorithms like merge sort power real systems.
  • Binary search is extraordinarily efficient on sorted data - 20 steps to search a million items.
  • Choose between sorting and hash maps based on whether you need ordered results or instant lookups.
  • Every search result, recommendation, and ranked list you see online relies on these fundamental algorithms.