Photo by VERPEX
The Knuth-Morris-Pratt (KMP) algorithm is an efficient pattern matching algorithm that searches for occurrences of a substring (or “pattern”) within a main string. It achieves this by preprocessing the pattern to create a longest prefix-suffix (LPS) array that helps avoid redundant comparisons, making it faster than the brute-force approach.
Here’s a Python implementation of the KMP algorithm:
KMP Algorithm in Python
def compute_lps(pattern):
"""
Computes the Longest Prefix Suffix (LPS) array for the given pattern.
The LPS array is used to skip characters while matching.
"""
length = 0 # length of the previous longest prefix suffix
i = 1
lps = [0] * len(pattern)
# Loop to calculate lps[i] for i = 1 to len(pattern) - 1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
# Try the previous lps value
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
def kmp_search(text, pattern):
"""
KMP algorithm for pattern searching in the text.
Returns a list of starting indices where the pattern is found in the text.
"""
n = len(text)
m = len(pattern)
# Preprocess the pattern (compute LPS array)
lps = compute_lps(pattern)
result = []
i = 0 # index for text
j = 0 # index for pattern
while i < n:
if pattern[j] == text[i]:
i += 1
j += 1
if j == m:
# Pattern found at index i - j
result.append(i - j)
j = lps[j - 1] # Reset j to check for more occurrences
elif i < n and pattern[j] != text[i]:
# Mismatch after j matches
if j != 0:
j = lps[j - 1] # Use lps to avoid unnecessary comparisons
else:
i += 1
return result
# Example usage:
if __name__ == "__main__":
text = "ababcabcabababd"
pattern = "ababd"
result = kmp_search(text, pattern)
if result:
print(f"Pattern found at indices: {result}")
else:
print("Pattern not found")
Explanation:
- LPS Array (Longest Prefix Suffix):
- This array stores the length of the longest proper prefix of the pattern that is also a suffix for every position in the pattern.
- It helps to avoid redundant comparisons in the main string.
- compute_lps(pattern) Function:
- It calculates the LPS array for the given pattern, which helps optimize the matching process by allowing jumps in the pattern rather than scanning every character.
- kmp_search(text, pattern) Function:
- This function searches for occurrences of the pattern in the given text using the LPS array to skip unnecessary comparisons.
- It returns a list of starting indices where the pattern is found.
Example Output:
For the input text “ababcabcabababd” and pattern “ababd”, the output will be:
Pattern found at indices: [10]
How the KMP Algorithm Works:
- The algorithm begins by comparing the pattern with the text.
- When a mismatch occurs, instead of moving the pattern by one character as in the brute-force approach, the algorithm uses the LPS array to determine the next position to continue the comparison.
- This reduces the number of comparisons significantly.
Time Complexity:
- Preprocessing (LPS Array): O(m), where m is the length of the pattern.
- Searching: O(n), where n is the length of the text.
- The overall time complexity is O(n + m), which is much more efficient than the brute-force O(n * m) for substring search.
Leave a Reply