1 Ch32 String Matching ? Introduction ? Na ? ve Algorithm ? Rabin-Karp Algorithm ? String Matching using Finite Automata ? Knuth-Morris-Pratt (KMP) Algorithm ? Literature – Cormen, Leiserson, Rivest, “ Introduction to Algorithms ”, chapter 32, string matching, The MIT Press, 2001, 906-932 . 2 Introduction ? What is string matching ? – Finding all occurrences of a pattern in a given text (or body of text )? Many applications – While using editor/word processor/browser – Login name & password checking – Virus detection – Header analysis in munications – DNA sequence analysis 3 History of String Search ? The brute force algorithm: – invented in the dawn puter history – re-invented many times, still common ? Knuth & Pratt invented a better one in 1970 – published 1976 as “ Knuth-Morris-Pratt ”? Boyer & Moore found a better one before 1976 – Published 1977 ? Karp & Rabin found a “ better ” one in 1980 – Published 1987 4 String-Matching Problem ? The text is in an array T [1.. n] of length n ? The pattern is in an array P [1.. m] of length m ? Elements of T and P are characters from a finite alphabet ?– ., ? = {0,1} or ? = {a, b, …, z} ? Usually T and P are called strings of characters 5 String-Matching Problem … contd ? We say that pattern P occurs with shift s in text T if: a) 0 ≤s≤n-m and b) T [(s +1)..( s+m )] = P [1.. m]? If P occurs with shift s in T , then s is a valid shift , otherwise s is an invalid shift ? String-matching problem: finding all valid shifts for a given T and P 6 Example 1 abcabaabcabac abaa text T pattern P s = 3 shift s = 3 is a valid shift (n=13, m=4 and 0 ≤ s ≤ n-m holds) 123456789 10 11 12 13 12347 Example 2 abcabaabcabaa abaa text T pattern P s = 3abaa abaa s = 9 123456789 10 11 12 13 12348 Na ? ve String-Matching Algorithm Input : Text strings T [1.. n ] and P [1.. m] Result : All valid shifts displayed NA ? VE-STRING-MATCHER (T, P) n← length [T]m← length