Suffix Array
#string
- 2 minutes read - 239 words
Suffix Array
- Given a String
- List down all suffixes starting from 0 to N (full string to empty string)
- Create an integer array that stores the indices of these suffixes (0, N)
- sort the list of suffizes and the corresponding integer array
- that interger array is the suffix array
Example
String: “CBA”
Suffixes: List: [“CBA”, “BA”, “A” “"] Index Array: [0, 1, 2, 3]
Sort: List: ['', ‘A’, ‘BA’, ‘CBA’, “"] Index Array: [3, 2, 1, 0]
Suffix Array : [3, 2, 1, 0]
Application
- Finding a substring in a string
The task is to find a string s inside some text t online - we know the text t beforehand, but not the string s. We can create the suffix array for the text t in O(|t|log|t|) time. Now we can look for the substring s in the following way. The occurrence of s must be a prefix of some suffix from t. Since we sorted all the suffixes we can perform a binary search for s in p. Comparing the current suffix and the substring s within the binary search can be done in O(|s|) time, therefore the complexity for finding the substring is O(|s|log|t|). Also notice that if the substring occurs multiple times in t, then all occurrences will be next to each other in p. Therefore the number of occurrences can be found with a second binary search, and all occurrences can be printed easily.