Substring Search

Substring Search 方法

Rabin-Karp Substring Search: O(n + m)

Key idea: if two strings are the same, they must have the same hash value. (The converse, however, is not true.)

为了做到linear time search,我们需要使用一个好的hash funtion来帮助我们避免不必要的collide。

比如说,我们可以使用rolling hash function。

hash('doe') = code('d') * 128 ^ 2 + code('o') * 128 + code('e')

hash('oea') = (hash('doe') - code('d') * 128 ^ 2) * 128 + code('a')

This will considerably cut down on the number of false matches.

Worst case: O(nm), when all hash values are equal.

Things we need to worry about: Overflow. Hashed value can be too big to be represented by 64 bits.

// String A, String B 

int PRIME = 31; 
int BASE = 1000000; 

int aHash = 0; 
for(int i = 0; i < A.length(); i++) { 
    aHash = (aHash * PRIME + A.charAt(i)) % BASE; 
    if(aHash < 0) { 
        aHash += BASE; 
    } 
} 

int power = 1; 
for(int i = 0; i < A.length(); i++) { 
    power = (power * PRIME) % BASE; 
} 

int bHash = 0; 
for(int i = 0; i < B.length(); i++) { 
    bHash = (bHash * PRIME + B.charAt(i)) % BASE; 
    if(i < A.length() - 1) continue; 
    if(i >= A.length()) { 
        bHash = (bHash - power * B.charAt(i - A.length())) % BASE; 
        if(bHash < 0) { 
            bHash += BASE; 
        } 
    } 
    if(aHash == bHash && A.equals(B.substring(i - A.length() + 1,i + 1))) { 
        ... 
    } 
}

686. Repeated String Match

尝试A中以B为长度的子字符串的所有可能性。

public int repeatedStringMatch(String A, String B) {
    int count = 0;
    StringBuilder sb = new StringBuilder();
    while (sb.length() < B.length()) {
        sb.append(A);
        count += 1;
    }
    if (sb.indexOf(B) >= 0) {
        return count;
    } 
    sb.append(A);
    if (sb.indexOf(B) >= 0) {
        return count + 1;
    }
    return -1;
}

Last updated