Every time you type a search query, send a message, or ask an AI a question โ strings are being processed. Understanding how they work under the hood unlocks both interview success and AI intuition.
A string is just a character array with special rules.
Key facts:
In Python, s = "hello" then s[0] = "H" throws an error. Strings are immutable. Use s = "H" + s[1:] or list(s) to modify.
Reverse a string in-place using two pointers.
Visual walkthrough โ "HELLO":
| Step | L ptr | R ptr | Action | Result | |------|-------|-------|--------|--------| | 1 | H (0) | O (4) | Swap | OELLH | | 2 | E (1) | L (3) | Swap | OLLEH | | 3 | L=L | โ | Done | OLLEH |
def reverse_string(s: list[str]) -> None:
left, right = 0, len(s) - 1
while left < right:
s[left], s[right] = s[right], s[left]
left += 1
right -= 1
Time: O(n) | Space: O(1) โ no extra memory needed.
Is the string the same forwards and backwards?
Two-pointer approach โ compare from both ends, move inward:
left = 0, right = len - 1s[left] != s[right] โ not a palindromedef is_palindrome(s: str) -> bool:
left, right = 0, len(s) - 1
while left < right:
if s[left] != s[right]:
return False
left += 1
right -= 1
return True
How would you handle "A man, a plan, a canal: Panama"? You need to skip non-alphanumeric characters and ignore case. Add while left < right and not s[left].isalnum(): left += 1 before each comparison.
Are two strings anagrams? (Same characters, different order)
Hash map approach โ count character frequencies:
def is_anagram(s: str, t: str) -> bool:
if len(s) != len(t):
return False
count = {}
for c in s:
count[c] = count.get(c, 0) + 1
for c in t:
count[c] = count.get(c, 0) - 1
if count[c] < 0:
return False
return True
| | "listen" | "silent" | |---|----------|----------| | l | 1 | 1 | | i | 1 | 1 | | s | 1 | 1 | | t | 1 | 1 | | e | 1 | 1 | | n | 1 | 1 |
All counts zero โ anagram confirmed.
Find the longest substring without repeating characters.
The technique: Maintain a "window" of valid characters. Expand right, shrink left when a duplicate is found.
Walkthrough with "ABCABCBB":
| Step | Window | Seen | Length | |------|--------|------|--------| | 1 | A | {A} | 1 | | 2 | AB | {A,B} | 2 | | 3 | ABC | {A,B,C} | 3 | | 4 | A is duplicate โ shrink โ BCA | {B,C,A} | 3 | | 5 | B is duplicate โ shrink โ CAB | {C,A,B} | 3 |
Answer: 3 (any of "ABC", "BCA", or "CAB")
def length_of_longest_substring(s: str) -> int:
seen = {}
left = max_len = 0
for right, char in enumerate(s):
if char in seen and seen[char] >= left:
left = seen[char] + 1
seen[char] = right
max_len = max(max_len, right - left + 1)
return max_len
The sliding window pattern solves dozens of substring problems. Key insight: if expanding the window breaks a condition, shrink from the left until the condition holds again.
Before any NLP model (GPT, BERT, LLaMA) processes text, it tokenizes โ splits text into sub-word pieces using techniques built on the patterns above.
How BPE (Byte Pair Encoding) works:
["h", "e", "l", "l", "o"]("l", "l")["h", "e", "ll", "o"]This uses:
GPT-4's tokenizer has ~100,000 tokens. The word "indescribable" might become ["in", "desc", "rib", "able"] โ 4 tokens. Common words like "the" are a single token. This is why AI models "think" in tokens, not words!
| Pattern | Time | Space | Key Technique | |---------|------|-------|---------------| | String reversal | O(n) | O(1) | Two pointers | | Palindrome check | O(n) | O(1) | Two pointers | | Anagram detection | O(n) | O(1)* | Hash map counting | | Sliding window | O(n) | O(k) | Hash set + two pointers |
*O(1) because character set is fixed (26 letters)
Why is sliding window O(n) and not O(nยฒ)? Because each character is added and removed from the window at most once. The left pointer only moves forward, never backwards โ so the total work across all steps is linear.