Here are the notes for Chapter 2. These are just the first lines; the whole document can be found HERE.




The original problem in string matching pattern recognition is so easy to state that even a child could understand it. Two strings of characters are given; one is called the pattern, and the other the text. The problem is then to find all the occurrences of the pattern in the text. Usually,  the text is longer than the pattern. We will call m the length of the pattern and n that of the text. From the beginning of the era of digital computing this problem was identified as a fundamental one in computer science. However,  the real applications of string matching were scant, basically text-finding in word processors. For some time the true interest of this problem lied in solving it in linear time, that is, in O(n+m) time. Morris  and Pratt were the first to succeed in such an endeavor.

As computational methods have been pervading almost every field of knowledge, applications of string matching techniques became ubiquitous and gained in importance. In the late 80's the need of intensive computing to perform analysis of biological data brought forth the hot field of Bioinformatics. In this field a number of string-matching techniques are used for sequence analysis, gene finding, evolutionary biology studies, analysis of protein expression, just to name a few. Other fields such as Music Technology, Computational Linguistics, Artificial Intelligence, Artificial Vision, among others, have been using string matching algorithm as part of their arsenal of theoretical and practical tools. New problems in string matching appeared as a result of such continuous, exhaustive use, which in turn were promptly solved by the computer scientists. This situation has culminated in productive, ceaseless feedback between string matching theorists and its practitioners.

Go to top