Substring Search 方法
Rabin-Karp Substring Search: O(n + m)
Key idea: if two strings are the same, they must have the same hash value. (The converse, however, is not true.)
为了做到linear time search,我们需要使用一个好的hash funtion来帮助我们避免不必要的collide。
比如说,我们可以使用rolling hash function。
hash('doe') = code('d') * 128 ^ 2 + code('o') * 128 + code('e')
hash('oea') = (hash('doe') - code('d') * 128 ^ 2) * 128 + code('a')
This will considerably cut down on the number of false matches.
Worst case: O(nm), when all hash values are equal.
Things we need to worry about: Overflow. Hashed value can be too big to be represented by 64 bits.
// String A, String B
int PRIME = 31;
int BASE = 1000000;
int aHash = 0;
for(int i = 0; i < A.length(); i++) {
aHash = (aHash * PRIME + A.charAt(i)) % BASE;
if(aHash < 0) {
aHash += BASE;
}
}
int power = 1;
for(int i = 0; i < A.length(); i++) {
power = (power * PRIME) % BASE;
}
int bHash = 0;
for(int i = 0; i < B.length(); i++) {
bHash = (bHash * PRIME + B.charAt(i)) % BASE;
if(i < A.length() - 1) continue;
if(i >= A.length()) {
bHash = (bHash - power * B.charAt(i - A.length())) % BASE;
if(bHash < 0) {
bHash += BASE;
}
}
if(aHash == bHash && A.equals(B.substring(i - A.length() + 1,i + 1))) {
...
}
}
尝试A中以B为长度的子字符串的所有可能性。
public int repeatedStringMatch(String A, String B) {
int count = 0;
StringBuilder sb = new StringBuilder();
while (sb.length() < B.length()) {
sb.append(A);
count += 1;
}
if (sb.indexOf(B) >= 0) {
return count;
}
sb.append(A);
if (sb.indexOf(B) >= 0) {
return count + 1;
}
return -1;
}