Occasionally in this series of articles, and in this blog, I have been writing about “formal theories”. I don’t intend to introduce a formal definition of what a formal theory is here. I only want to provide an intuitive understanding. So this article is not meant for mathematicians or logicians or computer scientists, who might find it by far too imprecise. But to get these things really formally exact, I would have to write a book about formal logic, mathematics, and programming. I don’t want to do that. Excellent books on these topics exist already and I do not need to write another one. For my purposes here, an intuitive, though maybe slightly vague, understanding of the terms “formal theory” and “formal system” will do.
One can think of a formal theory as a set of rules to produce (or to justify) propositions or statements. In most cases, there are some propositions to start with, called “axioms” and some rules of inference. By applying rules of inference to the axioms, additional statements can be “derived”. By applying rules of inference to axioms or statements derived already, still further statements can be derived. Mathematicians and logicians have come up with variants of this general scheme, but it is not necessary to go into these details here.
The statements that can be derived inside a given formal theory are called “theorems” or “formulas” of that theory. The sequence of derivational steps leading to a theorem (starting from the axioms) can be viewed as a “proof” of the theorem within the theory. Typically, rules of formal logic are part of such systems and systems of formal logic can be viewed as examples of formal theories. Formal theories are called consistent if you cannot derive a contradiction inside them. The application of a rule of inference to derive statements must be computable, so one can think of these rules as little programs and the statements as sequences of characters manipulated by those programs. However, you can also think about inference rules and proofs in a non-procedural way, as justifications for theorems.
Logicians have come up with several methods of assigning interpretations or meaning to the statements derived within a formal theory (i.e. there is something like a “semantics” of formal theories), but I don’t want to deal with that rather complex topic here. The derivation process does not rely on the semantics; it can be viewed as manipulations of strings of characters, only depending on the form, or syntax, of the statements. This is the reason such theories are called “formal”.
I have sometimes been using the terms “algorithm” and “formal theory” more or less interchangeably. The reason is that for an algorithm calculating a function (let’s call it “f”), one can construct a corresponding formal theory. If the inputs of the function, say a, b, c … (any kind of data, e.g. natural numbers) yield values i, j, k, the corresponding formal theory would produce statements of the kind “f(a) = i”, “f(b) = j”, “f(c) = k”, and so on. These statements mean that “the function f assigns the value i to the input value a” and so on.
On the other hand, if you have a formal theory, you can write an algorithm that enumerates all the theorems that are derivable in that formal theory. So the two notions, formal theories and algorithms, can be viewed as different ways to express the same thing. Formal theories are another way to define what is computable.
In a wider sense, there is a concept of “formal systems” in which derivational rules (which must be computable) are used to produce strings of characters from other strings of characters. In formal theories such strings are interpreted as statements about something. In the more general notion of formal systems, they can be interpreted as representing arbitrary data. Again, this is another way to define what is computable. Algorithms and formal theories can both be viewed as “formal systems” in a general sense of the term.
Roughly speaking, when you read “algorithm” you may think of a box where you put some data in at one side and some data comes out at the other. When you read “formal system” you may think of “rules” that can be applied to data. Each of these can be thought of as a little “algorithm box”. In formal theories, the data consists of statements or propositions and the rules represent general rules of logic and/or some more specific kind of knowledge (e.g. mathematical knowledge or knowledge about laws of physics).
In all cases, formal systems each contain only a finite amount of information. They can be represented by finite strings of characters written using some system of notation (a “formal language” that itself is defined in terms of some formal system, e.g. a “formal grammar”). One can think of this information as knowledge that describes how to do some type of calculations.
One point is important: The expressions or theorems derivable in formal systems and formal theories are enumerable. One can systematically apply the rules of inference to generate all the theorems or expressions derivable in a formal system. They can be brought into a unique sequence based on their derivation in the formal system. It is then possible to write a program that takes natural numbers as inputs and that for each number n entered produces the nth expression of the formal system or the nth formula or theorem produced. One can replace these expressions or theorems by their Gödel numbers, so for a formal system, we can construct a corresponding computable total function having natural numbers as inputs and outputs. The proof sketched here that the set of total computable functions of natural numbers is not enumerable can therefore be generalized to arbitrary formal systems and formal theories.
(The picture is from https://commons.wikimedia.org/wiki/File:P._Oxy._I_29.jpg. It is showing an early fragment of Euclids “Elements”, representing an early stage in the development of the ideas that lead to notions of mathematics, formal logic and formal systems in general.)