Here they are the notes for the first lesson. This are just the first lines; the whole document can be found HERE.

 


There are many fundamental problems in String Pattern Recognition, which have arisen either out of theoretical considerations or  practical problems, the latter  chiefly originated in Bioinformatics. We start by introducing some definitions prior to presenting these problems.

Typically, a string pattern recognition problem is defined by specifying an object, the pattern, to be searched in a larger object, the text, under certain conditions. In principle, both the pattern and the text will be arrays of symbols. We shall denote the pattern by P=P[1..m] and the text by T=T[1..n], where m and n are their lengths, respectively. Traditionally, in pattern recognition symbols are just called characters. Characters in P and T are drawn from a common finite alphabet Ʃ. Typical alphabets are {0, 1}, {a, b,..., z}, the ASCII code  (in computer science applications), {C, G, T, A} (in Bioinformatics applications), sets of integers or real numbers (in musical applications and others) and, in general, finite sets of symbols. The number of characters in Ʃ will be denoted by d.

The first fundamental problem in string pattern recognition, the string-matching existence problem (SME problem), can be stated as follows.

The string-matching existence problem: Given a pattern P and a text T,  determine whether there is an occurrence of P in T.

The immediate generalization of the existence problem is the string-matching computation problem (SMC), namely, reporting how many times a pattern occur in a text.

The string-matching computation problem: Given a pattern P and a text T, determine all the occurrences of P in T

.................................................................................................................................................................................

.................................................................................................................................................................................