A formal language is a set of finite-length words (or "strings") over some finite alphabet. A typical alphabet would be {a, b}, a typical string over that alphabet would be "ababba", and a typical language over that alphabet containing that string would be the set of all strings which contain the same number of a's as b's.
The empty word is allowed and is usually denoted by e, ε or λ. Note that while the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member strings.
Some examples of formal languages:
- the set of all words over {a, b},
- the set { a^{n} | n is a prime number },
- the set of syntactically correct programs in some programming language, or
- the set of inputs upon which a certain Turing machine halts.
A formal language can be specified in a great variety of ways, such as:
Several operations can be used to produce new languages from given ones. Suppose L_{1} and L_{2} are languages over some common alphabet.
- The concatenation L_{1}L_{2} consists of all strings of the form vw where v is a string from L_{1} and w is a string from L_{2}.
- The intersection of L_{1} and L_{2} consists of all strings which are contained in L_{1} and also in L_{2}.
- The union of L_{1} and L_{2} consists of all strings which are contained in L_{1} or in L_{2}.
- The complement of the language L_{1} consists of all strings over the alphabet which are not contained in L_{1}.
- The right quotient L_{1}/L_{2} of L_{1} by L_{2} consists of all strings v for which there exists a string w in L_{2} such that vw is in L_{1}.
- The Kleene star L_{1}* consists of all strings which can be written in the form w_{1}w_{2}...w_{n} with strings w_{i} in L_{1} and n ≥ 0. Note that this includes the empty string ε because n = 0 is allowed.
- The reverse L_{1}^{R} contains the reversed versions of all the strings in L_{1}.
- The shuffle of L_{1} and L_{2} consists of all strings which can be written in the form v_{1}w_{1}v_{2}w_{2}...v_{n}w_{n} where n ≥ 1 and v_{1},...,v_{n} are strings such that the concatenation v_{1}...v_{n} is in L_{1} and w_{1},...,w_{n} are strings such that w_{1}...w_{n} is in L_{2}.
A typical questions asked about a formal language is how difficult it is to decide whether a given word belongs to the language.
This is the domain of computability theory and complexity theory.
All Wikipedia text
is available under the
terms of the GNU Free Documentation License