To formalize the above definition of complexity, one has to specify exactly what types of programs are allowed. Fortunately, it doesn't really matter: one could take a particular notation for Turing machines, or LISP programs, or Pascal programs, or Java virtual machine bytecode. If we agree to measure the lengths of all objects consistently in bits, then the resulting notions of complexity will only differ by a constant factor: if I1(s) and I2(s) are the complexitites of the string s according to two different programming languages L1 and L2, then there are constants C and D (which only depend on the languages chosen, but not on s) such that
In the following, we will fix one definition and simply write I(s) for the complexity of the string s.
The first surprising result is that I(s) cannot be computed: there is no general algorithm which takes a string s as input and produces the number I(s) as output. The proof is a formalization of the amusing Berry paradox: "Let n be the smallest number that cannot be defined in less than twenty English words. Well, I just defined it in less than twenty English words."
It is however straightforward to compute upper bounds for I(s): simply compress the string s with some method, implement the corresponding decompressor in the chosen language, concatenate the decompressor to the compressed string, and measure the resulting string's length.
The next important result is about the randomness of strings. Most strings are complex in the sense that they cannot be significantly compressed: I(s) is not much smaller than |s|, the length of s in bits. The precise statement is as follows: there is a constant K (which depends only on the particular specification of "program" used in the definition of complexity) such that for every n, the probability that a random string s has complexity less than |s| - n is smaller than K 2-n. The proof is a counting argument: you count the programs and the strings, and compare. This theorem is the justification for Mike Goldman's challenge in the comp.compression FAQ (http://www.faqs.org/faqs/compression-faq/):
Now for Chaitin's incompleteness result: though we know that most strings are complex in the above sense, the fact that a specific string is complex can never be proven (if the string's length is above a certain threshold). The precise formalization is as follows. Suppose we fix a particular consistent axiomatic system for the natural numbers, say Peano's axioms. Then there exists a constant L (which only depends on the particular axiomatic system and the choice of definition of complexity) such that there is no string s for which the statement
Similar ideas are used to prove the properties of Chaitin's constant.