Every AI system โ from recommendation engines to search indexes โ is built on two primitives: arrays and hash maps. Master these and you master the foundation.
An array stores elements in contiguous memory โ each slot is one step from the next.
Key characteristics:
Arrays are the backbone of tensors in machine learning. A 2D array is a matrix, a 3D array is a tensor โ the core data structure of every neural network.
A hash map converts a key into an index using a hash function, then stores the value in that bucket.
The process:
"apple" (key)hash("apple") % size โ bucket 00 = {apple: 3}Key characteristics:
Why can't you use a list as a dictionary key in Python? Because lists are mutable โ if you change the list after hashing, the hash value changes but the bucket doesn't, breaking the lookup.
Given an array and a target, find two numbers that add up to the target.
Brute force: Check every pair โ O(nยฒ)
Hash map approach: For each number, check if target - num exists โ O(n)
Step by step with nums = [2, 7, 11, 15], target = 9:
| Step | Current | Need (9 - current) | In Map? | Action |
|------|---------|---------------------|---------|--------|
| 1 | 2 | 7 | No | Store {2: 0} |
| 2 | 7 | 2 | Yes! | Return [0, 1] |
This pattern โ "store what you've seen, check what you need" โ appears in dozens of interview problems.
Count how often each element appears.
def frequency_count(nums):
freq = {}
for num in nums:
freq[num] = freq.get(num, 0) + 1
return freq
# [1, 2, 2, 3, 3, 3] โ {1: 1, 2: 2, 3: 3}
Used in:
Group anagrams:
["eat","tea","tan","ate","nat","bat"]
Strategy: Sort each word โ use sorted version as hash key.
| Original | Sorted (Key) | Group | |----------|-------------|-------| | "eat" | "aet" | ["eat", "tea", "ate"] | | "tan" | "ant" | ["tan", "nat"] | | "bat" | "abt" | ["bat"] |
In production ML systems, feature stores use hash maps at their core:
Netflix's recommendation engine performs millions of hash map lookups per second to match users with content. Each user's viewing history is stored as a hash map of {movie_id: rating}, enabling O(1) feature retrieval during model inference.
| Operation | Array | Hash Map | |-----------|-------|----------| | Access by index | O(1) | โ | | Search by value | O(n) | O(1) avg | | Insert at end | O(1)* | O(1) avg | | Insert at start | O(n) | O(1) avg | | Delete | O(n) | O(1) avg | | Space | O(n) | O(n) |
*amortised for dynamic arrays
When would you choose an array over a hash map? When you need ordered data, cache performance, or minimal memory overhead. Hash maps trade space for speed โ each entry has overhead for the hash, key, and pointer.