Skip to content

Commit 921dd82

Browse files
⚡️ Speed up function word_frequency by 22%
The optimized code achieves a **21% speedup** by replacing the manual dictionary construction loop with Python's built-in `Counter` class from the collections module. **Key optimization applied:** - **Eliminated the manual loop**: The original code iterates through each word, checks if it exists in the dictionary (`if word in frequency`), and either increments or initializes the count. This involves multiple dictionary lookups and assignments. - **Used Counter's optimized C implementation**: `Counter` is implemented in C and optimized specifically for counting operations, avoiding the overhead of Python's interpreted loop execution. **Why this leads to speedup:** The original code performs O(n) dictionary lookups where each lookup has potential hash collision overhead. The line profiler shows that 64.4% of the total time (33.1% + 31.3%) is spent on the loop iteration and dictionary membership checks. Counter eliminates this by using optimized internal counting mechanisms that batch these operations more efficiently. **Performance characteristics by test case type:** - **Small inputs (< 10 words)**: Optimized version is actually **50-76% slower** due to Counter's initialization overhead outweighing the simple loop benefits - **Large inputs (500+ words)**: Optimized version shows **12-70% speedup**, with the greatest gains on highly repetitive data (like `test_large_repeated_words` at 69.9% faster) - **Medium repetitive datasets**: Best performance gains occur when the same words appear multiple times, as Counter's internal optimizations for duplicate counting become more beneficial than the original's repeated dictionary lookups The optimization trades initialization overhead for loop efficiency, making it most effective on larger datasets with word repetition.
1 parent 9b951ff commit 921dd82

File tree

1 file changed

+3
-8
lines changed

1 file changed

+3
-8
lines changed

src/dsa/various.py

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import re
2+
from collections import Counter
23

34

45
class Graph:
@@ -78,14 +79,8 @@ def is_palindrome(text: str) -> bool:
7879

7980

8081
def word_frequency(text: str) -> dict[str, int]:
81-
words = text.lower().split()
82-
frequency = {}
83-
for word in words:
84-
if word in frequency:
85-
frequency[word] += 1
86-
else:
87-
frequency[word] = 1
88-
return frequency
82+
# Use Counter for faster word counting
83+
return dict(Counter(text.lower().split()))
8984

9085

9186
class PathFinder:

0 commit comments

Comments
 (0)