🔍

AI 草图 • 中级⏱️ 15 分钟阅读

排序与搜索

Finding Things Fast

When you search Google, results appear in under a second - ranked from most to least relevant. When Netflix recommends films, it sorts thousands of titles by how likely you are to enjoy them. Behind every fast lookup and every ranked list sits a sorting or searching algorithm.

Why Sorting Matters

Sorted data is powerful data. Once a list is in order, you can:

Search it efficiently using binary search (we'll get to this shortly).
Find duplicates - they'll be sitting right next to each other.
Identify the top-N results - just grab the first N items.
Merge datasets - combining two sorted lists is much faster than combining unsorted ones.

Sorting transforms chaotic data into something searchable and structured.

Bubble Sort - Simple but Slow

Bubble sort repeatedly steps through the list, comparing adjacent items and swapping them if they're in the wrong order. Larger values "bubble" to the end.

[5, 3, 8, 1, 2]
 ↕
[3, 5, 8, 1, 2]  → swapped 5 and 3
[3, 5, 1, 8, 2]  → swapped 8 and 1
[3, 5, 1, 2, 8]  → swapped 8 and 2
... keep going until no swaps needed

Time complexity: O(n²) - for each of n items, you might compare against every other item. With 1,000 items, that's up to 1,000,000 comparisons. With 1,000,000 items? A trillion comparisons. Not practical for AI workloads.

🤔

Think about it:

If bubble sort takes roughly n² comparisons, how much slower would it be to sort a million items compared to a thousand? Think about the ratio: (1,000,000)² vs (1,000)². That's a million times slower - just for having a thousand times more data.

Merge Sort - Divide and Conquer

Merge sort takes a cleverer approach: split the list in half, sort each half, then merge the two sorted halves together.

[5, 3, 8, 1, 2, 7, 4, 6]
         split
[5, 3, 8, 1]   [2, 7, 4, 6]
    split            split
[5, 3] [8, 1]  [2, 7] [4, 6]
  ↓       ↓       ↓       ↓
[3, 5] [1, 8]  [2, 7] [4, 6]
    merge            merge
[1, 3, 5, 8]   [2, 4, 6, 7]
         merge
[1, 2, 3, 4, 5, 6, 7, 8]

Time complexity: O(n log n) - dramatically faster. For a million items, that's about 20 million comparisons instead of a trillion. This is the kind of algorithm that makes real AI systems possible.

第 3 课，共 10 课已完成 0%

←字符串与文本处理

Discussion

建议修改本课内容

🤯

Python's built-in sort uses Timsort - a hybrid algorithm that combines merge sort with insertion sort. It was invented by Tim Peters in 2002 and is now used in Python, Java, and Android. It's specifically designed to perform well on real-world data that's often partially sorted already.

Bubble Sort vs Merge Sort at Scale

| Items | Bubble Sort (O(n²)) | Merge Sort (O(n log n)) | |-------|---------------------|-------------------------| | 100 | 10,000 ops | ~700 ops | | 10,000 | 100,000,000 ops | ~130,000 ops | | 1,000,000 | 1,000,000,000,000 ops | ~20,000,000 ops |

The difference isn't academic - it's the difference between "finishes in a second" and "finishes next week."

🧠小测验

Why is merge sort preferred over bubble sort for large datasets in AI applications?

Binary Search - The Phone Book Trick

Imagine looking up "Smith" in a phone book. You wouldn't start at page one and read every name. You'd open the book roughly in the middle, see where you are, and jump to the correct half. Then repeat.

That's binary search - and it only works on sorted data.

sorted_list = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
target = 23

Step 1: Middle = 16 → 23 > 16, search right half
Step 2: Middle = 38 → 23 < 38, search left half
Step 3: Middle = 23 → Found it!

Time complexity: O(log n). In a sorted list of one million items, binary search finds any item in at most 20 steps. A linear search would take up to one million steps.

🧠小测验

A sorted database contains 1,000,000 records. How many comparisons does binary search need in the worst case?

How AI Uses Sorting and Searching

Ranking Search Results

When Google processes your query, it scores every relevant page and sorts them by relevance. The top 10 results appear on page one. Without efficient sorting, this would take minutes instead of milliseconds.

Recommendation Systems

Netflix calculates a "match score" for thousands of titles based on your viewing history, then sorts them to show you the best matches first. The sorting algorithm directly affects what you see on your home screen.

K-Nearest Neighbours

This classic AI algorithm finds the K most similar items to a given input. It calculates distances to every item, then partially sorts to find the K smallest distances. Efficient sorting makes this practical for millions of data points.

💡

You don't always need to fully sort. If you only need the top 10 results from a million items, a partial sort or heap can find them in O(n log k) time - much faster than sorting everything.

Training Data Preparation

Before training, AI practitioners often sort data to create balanced batches - ensuring each training batch contains a mix of easy and hard examples, or a balanced distribution of categories.

When to Sort vs When to Use a Hash Map

This is a crucial design decision:

| Scenario | Best Choice | Why | |----------|------------|-----| | Find one item by key | Hash map | O(1) lookup | | Find the top-10 items | Sort | Need ordered results | | Check if item exists | Hash map | O(1) vs O(log n) | | Get items in order | Sort | Hash maps have no order | | Range queries (items between A and B) | Sorted array + binary search | Hash maps can't do ranges |

🤔

Think about it:

A music streaming service needs to show your "Top 50 most played songs." Would you sort your entire listening history, or maintain a data structure that always knows the top 50? What are the trade-offs of each approach?

🤯

Google processes over 8.5 billion searches per day. Each search involves sorting and ranking hundreds of potential results in milliseconds. The efficiency of sorting algorithms directly impacts how much electricity Google's data centres consume - better algorithms literally save megawatts of power.

🧠小测验

When would binary search NOT be appropriate?

Sorting transforms chaotic data into structured, searchable data - essential for ranking and recommendations.
O(n²) algorithms like bubble sort are educational but impractical at scale; O(n log n) algorithms like merge sort power real systems.
Binary search is extraordinarily efficient on sorted data - 20 steps to search a million items.
Choose between sorting and hash maps based on whether you need ordered results or instant lookups.
Every search result, recommendation, and ranked list you see online relies on these fundamental algorithms.

AI基础

AI精通

职业准备

实验室

排序与搜索

Finding Things Fast

Why Sorting Matters

Bubble Sort - Simple but Slow

Merge Sort - Divide and Conquer

Discussion

Bubble Sort vs Merge Sort at Scale

Binary Search - The Phone Book Trick

How AI Uses Sorting and Searching

Ranking Search Results

Recommendation Systems

K-Nearest Neighbours

Training Data Preparation

When to Sort vs When to Use a Hash Map

Key Takeaways