# Descriptions and Regularity

Objects in the world can be described by data. Such descriptions can come in different forms: picture and sound files, data base tables representing addresses or measurements, maps or databases representing them (think of navigation devices), descriptive texts and so on. If the structure of an object is complex, this will be mirrored in the complexity of descriptions of it.

One way to define the complexity of an object is to look at the amount of information needed to store a description of it. If the object has some inherent regularity, the amount of information is small. Take the following example, taken from Wikipedia[1], comparing two strings:

abababababababababababababababab

and

4c1j5b2p0cv4w1x8rx2y39umgw5q85s7

The first one can be described in English language as “ab, 16 times”. For the second, such a short description is not possible. It does not contain any obvious regularity. Probably that string itself is the shortest description of itself, it is not compressible. On the other hand, the description of the first string is shorter than that string itself. In this sense, the second string is a more complex object than the first one.

This idea can be generalized to arbitrary objects or data structures. We can try to exploit any regularity they contain in order to arrive at shorter descriptions. This is actually what data compression programs are doing. We can now define the complexity of an object as the length of the shortest complete description of the object. For the description to be complete, it must be possible to reconstruct the original Data from the description, i.e. decompress it.

Obviously, the length of the shortest description depends of the language used. Mathematicians have introduced such a measure of complexity called the Kolmogorov complexity (named after one of the mathematicians who came up with this idea in the 1960s). What is needed in order to define such a measure is a universal description language. It turns out that programming languages can be used for this purpose. You can describe any kind of data (like a picture, a text, a sound file, a web page, a table of numbers, a data set with addresses or physical measurement, a program etc.) by a program that generates that data (with no further input). Such a program might contain loops that represent repetition in the output (like in the first example above) or just contain a copy of the object to be produced as a constant. Which way ever, any kind of data can be represented by a program and any object from the physical world can be described by some data structure. In this way, it is possible to use programming languages as universal description languages. For example, there are some programs written in some programming language underlying all the different types of data on your computers or other devices as well as the internet servers we are using. For each file format containing a certain type of data, there are programs, like viewers and the like, that describe how to decode the information in those formats.

There are several different programming languages but it does not really matter which of them you are using. The exact numbers you come up with as the shortest length of a description will be different, but it is possible to show that when you switch from one programming language to another, the descriptions will, at most, become longer by a fixed value.[2]

The really important result about this type of complexity measure, however, is that one can prove that it is not Turing-computable (see the above-mentioned Wikipedia-article for more technical details and references to the literature). In a nutshell this means that there is no algorithm to find the shortest possible description for arbitrary data. Note that the data might be infinite structures, e.g. infinite strings. You can, for example write a program that produces a string, one character after the other. The question is: is there another, shorter program that produces the same output? What is the length of the shortest program that does? We might have an algorithm that answers this question for some cases, but there will always be cases for which it does not produce the correct answer. We might extend that algorithm to cover more cases, but the resulting procedure will again be incomplete.

To find a shorter program or at least to calculate that there must be one means that we discover another instance of regularity in the data. The result that the Kolmogorov-complexity is not Turing-computable means that there is no algorithm that can find all the regularity present in arbitrary data. All algorithms able to discover regularity are incomplete and special.

Put another way, this means that in the general case, to find order in data we need creativity.

From this result we can derive interesting philosophical consequences:

a)      If cognitive processes of humans can be described completely by an algorithm, as some researchers in the areas of cognitive science and especially of “artificial intelligence” seem to assume, then our ability to understand the world and to discover the regularity and order contained in it would be limited. Our ability to gain knowledge would not be universal. We would be limited like a fly bumping against a window pane again and again, lacking the ability to understand that there is glass. There could be instances of order in the world that we would be unable to understand because of the structure of our minds. Conversely (and that is the possibility I am favoring), if our ability to discover order is universal, we must be creative in the sense that our minds cannot be described as algorithms[3].

b)      Basically the same consideration applies to science. If science is able to discover arbitrary instances of order in the world, its method must contain an element of creativity. This means that it would be impossible to describe the scientific method as an algorithm. Any formal description of what scientists are doing would have to be incomplete, or else science cannot be universal. So there cannot be an a-priori scientific method that is fixed once and for all. Instead, the scientific method has to develop together with science, and must be extended; or any description of it will be incomplete, not completely explicit or vague in some places.

c)       Both cognition in general and science in particular contain a risk of error. There cannot be any procedure that guarantees that no errors are made. If such a procedure existed, one could find the shortest program describing the data by simply enumerating all programs, starting with the shortest, and checking the question if they describe the data correctly, using the error-checker procedure. As a result, the Kolmogorov complexity would be computable, which it isn’t. So the correctness of descriptions is not decidable. Truth of our knowledge cannot be guaranteed and we must always take the possibility of error into consideration. We may be able to guarantee the truth of descriptions of finite objects described with a finite resolution, but not of descriptions of arbitrary aspects of reality. Put another way, inductive inferences always have a risk of error.

d)      As a result of this, cognition in general and science in particular, must be open to criticism and self-reflection. Science must be public and open to public debate or else we run the risk of pathological science where groups of people decide among themselves what they regard as true, who is getting which position, who can publish what in which journal etc.

e)      Science is a process that cannot be described completely and explicitly. It is a historical process with methods that are historically developing and changing. In order to get a proper understanding of science itself, scientists must come to terms with the situation that not everything in reality is working according to fixed, unchangeable laws. Science itself is an example of a process where this is not the case. As a result, human culture in general, of which science is a part, is such a historical object as well. We have to understand that a complete understanding of such “objects” as science, culture and human beings is impossible and that any description of them will necessarily be incomplete.

f)       The attempt to extend the scientific method, as developed in physics, to such areas, is naïve and bound to fail. Scientists may react to this in two ways. One way is to stick with the limited definition of science and accept that science, looking for general laws and excluding historically changing systems, is limited (not as much as the fly bumping against the window pane, but in an analogous way, i.e. science is not universal). They would then accept that what there are other aspects of reality that they have to leave to the philosophers and the humanities. The other way would be to extend the meaning of the word science and integrate those areas. There would still be different areas and necessarily a multitude of methods and partial approaches, but the borders would become more fluid, making it easier to tackle complex problems in an “interdisciplinary” way.

(The picture is from (http://en.wikipedia.org/wiki/File:Mandelpart2_red.png). See the Wikipedia article linked in the foot note for an explanation).

[2] See the Wikipedia article referenced in the foot note for details on this “invariance theorem”. The reason the choice of programming language does not really matter is that you can use any programming language to write an “interpreter” for another programming language, so you can implement one in terms of the other.

[3]It might still be possible to model our minds in terms of computer programs. This is so because it has been shown that computers are not limited to what Turing-machines can do (see https://creativisticphilosophy.wordpress.com/2013/02/06/an-e%EF%AC%80ective-procedure-for-computing-uncomputable-functions/) and that instead it should be possible to implement creative system in computers, i.e. systems for which no description in terms of an algorithm exists.