String Matching Algorithms String Searching The context of the problem is to find out whether one string (called "pattern") is contained in another string. This problem correspond to a part of more general one, called "pattern recognition". The strings considered are sequences of symbols, and symbols are defined by an alphabet. The size and other features of the alphabet are important factors in the design of string-processing algorithms. Brute-force algorithm: The obvious method for pattern matching is just to check, for each possible position in the text at which the pattern could match, whether it does in fact match. The following program searches in this way for the first occurrence of a pattern string p in a text string a: or function brutesearch: integer; var i, j: integer; begin i:= 1; j:= 1; repeat if a[i] = p[j] then begin i:= i+1; j:= j+1 end else begin i:= i - j +2; j:= 1 end; until ( j > M ) or ( i > N ); if j > M then brutesearch:= i - M else brutesearch:= i end; Property: brute-force string searching can require about M N character comparisons. Knuth-Morris-Pratt Algorithm The basic idea behind the algorithm is this: when a mismatch is detected, our "false start" consists of characters that we know in advance (since they're in the pattern). Somehow we should be able to take advantage of this information instead of backing up the i pointer over all those known characters. A simple example of the pattern of 10000000. The idea is worth thinking about and the Knuth-Morris-Pratt algorithm is a generalization of it. Searching for pattern 10100111 in the string 1010100111 we detect mismatch at 5th position, but we back up at 3rd position. Surprisingly, it is always possible to arrange things so that the i pointer is never decremented. It depends entirely on the pattern: Restart positions for KMP search The array next[M] will be used to determine how far to go back up when a mismatch is detected. By the defini