In my previous article, I have argued that cognition does not have a fixed, unchanging structure. Instead, I think it starts with what I have called a “developmental core”. This core may be viewed as a body of initial knowledge that is pre-programmed into the structure of the neuronal networks. It is then extended and changed in subsequent development. Some parts of it, especially parts concerned with low-level sensory processing and with the motor neurons controlling the muscles might not change much, they might be fine-tuned by learning processes. Other parts have a high plasticity and are restructured during learning processes.
How much information does the genetically pre-programmed core contain?
If the brain’s structure is encoded in some genes, the information content of these structures cannot be larger than the information content of these genes. The generation of the neuronal network under the control of these genes can be viewed as a process of information transformation that cannot generate new information that is not already contained or encoded in some form in the genes. Therefore, the information content of these genes can give us an estimate for an upper bound on the information needed to describe the genetically determined neuronal networks and hence an upper bound on the amount of innate information.
Since there are many more neurons then genes and coding triplets in those genes, a lot of their connectivity must be repetitive. Indeed, the brains cortex consists of neuronal columns that show a lot of repetitive structure. Likewise, other parts of the brain contain highly repetitive structures.
If one would describe the innate brain circuitry, i.e. that found in a newborn (or developing in the small child in processes of subsequent ripening of genetically determined structures), and you compress that information to the smallest possible size, determining its information content, that information content cannot be larger than the information content of the genes involved in its generation.
The brain structure might contain random elements (i.e. new information created by random processes during its development) and also information taken up from the environment through processes of perception, experimentation and learning, but this additional information is, by definition, not part of the innate, genetically determined structures.
So the complexity of the innate structures or the innate knowledge, i.e. the complexity of the innate developmental core of cognition, must be limited by the information content of the genes involved in generating the brain.
The following is a very crude estimate of the informational complexity of the innate knowledge of human beings. To be more exact, it is a crude estimate of an upper limit to the information content of this knowledge. It might be off by an order of magnitude or so. So this is a “back of an envelope” or “back of a napkin” kind of calculation. It just gives a direction into which to try to get a more accurate calculation. To get a more exact number, the parameters put in here as mere estimates must be determined with more precision. This can be done by doing some research of the relevant literature (for which I am missing the time) and, where the required information has not yet been determined by science, by doing additional genetic and neurologic research (for which I am not qualified). So I am leaving it to others to follow this approach to get a more exact result.
According to the human proteome project (http://www.proteinatlas.org/humanproteome/brain), about 67 % of the human genes are expressed in the brain. Most of these genes are also expressed in other parts of the body, so they probably form part of the general biochemical machinery of all cells. However, 1223 genes have been found that have an elevated level of expression in the brain. In one way or the other, the brain-specific structures must be encoded in these genes, or mainly in these genes, or a sub-set of them. Some of these genes are probably not directly involved in determining the distribution and connectivity of neurons and might form part of the brain-specific cellular infrastructure underlying those networks, but in one way or the other, the innate knowledge must be encoded in these genes, whose combined activity somehow leads to the development of the brain. One source of error might bee that the study might not have looked at genes active in the fetus or in the small child’s developing brain (I don’t know), so it is possible there are some more genes involved here, but for the sake of this estimate, I assume that the innate information is somehow represented in these genes.
There are about 20.000 genes in the human genome. So the 1223 genes form about 6.115 % of our genes (by number). So about 6.115 % of our genes are brain specific. Probably, we share many of these with primates and other animals, like rodents, so the really human-specific part of the brain-specific genes or the human-specific part of their sequences might be much smaller. However, I am only interested here in an order-of-magnitude-result for an upper limit.
I have no information about the total length of these brain-specific genes, so I can only assume that they have average length.
According to https://en.wikipedia.org/wiki/Human_genome, the human genome has 3,095,693,981 base paris (of course, there is variation here, and information on Wikipedia always has to be taken with some caution). According to the same source, only about 2 % of this is coding DNA. There is also some non-coding DNA that has a function (in regulation, or in production of some types of RNA) but let us assume that the functional part of the genome is maybe 3%. That makes something in the order of 92 – 93 million base pares with a function (probably less). That makes 30 million to 31 million triplets (remember that base pairs are working in groups of three, each group coding for an amino acid or acting as a start- or stop-stignal for transcription). If the brain genes have average length, 6.115 % of this would be brain specific. That makes that is something like 1.89 million triplets.
The different triplets code for 20 different amino acids. There are also start- and stop-signals. The exact information content of a triplet would depend on how often it appears, and they are definitely not equally distributed, but let us assume that each of them codes for one out of 20 possibilities (calculating the exact information content of a triplet will require much more sophisticated reasoning and specific information about the frequency distribution of triplest and hence of amino acids, but for our purposes, this is enough). The information content of a triplet can then be estimated as the dual logarithm of 20 You need 4 bits to encode 16 possibilities and 5 bits to encode 32 possibilites, so this should be between 4 and 5 bits. A more exact value for the dual logarithm of 20 is 4.322. So we multiply this with the number of triplets and get 8.200.549 bits. This is 1.025.069 bytes, or roughly a megabyte (something like 130 times the length of this blog article, comparable to the information content of a typical book). These genes might contain a lot of redundancy in the sense that it might be possible to compress a complete description of these sequences (i.e. to “zip” them) to a smaller amount of information).
So the information content of the brain coding genes that determine the structure of the brain is in the order of a megabyte (much smaller than many pieces of commercial software), and possibly much smaller. The structure of the brain is somhow generated out of the information contained in these genes. This is probably an overestimate because many of these genes might not be involved in the encoding of the connective pattern of the neurons, but, for example, in the glial immune system of the brain or other brain specific, “non-neuronal” stuff, and many of them might be the same or nearly the same in apes, monkeys and rodents, so the human-specific part could even be much smaller.
This means also that the minimum complexity of an artificial intelligent system capable of reaching human-type general intelligence cannot be larger than that.
We should note, however, that human beings who learn and develop their intelligence are embedded in a world they can interact with through their bodies and senses and that they are embedded into societies. Most of the knowledge encoded in the brain of a grown up adult who is part of modern society comes not from the genetic core but from our cultures. These societies are the carriers of cultures whose information content is larger by many orders of magnitude. The question would be if it is possible to embed an artificial intelligent system into a world and a culture or society in a way to enable it to reach human-like intelligence. This also raises ethical questions that are beyond the scope of this particular article. It might be possible to embed it into the internet, but the kind of cognitive development it would take as a result could be very different from that of a human being.
In any case, if this line of thought is correct, the total complexity of innate knowledge of humans can hardly extend the amount of information contained in an average book, and is probably much smaller. It cannot be very sophisticated or complex.
(The picture, displaying the genetic code, is from https://commons.wikimedia.org/wiki/File:GeneticCode21-version-2.svg. I have published a draft version of this article before here: https://denkblasen.wordpress.com/2015/12/18/estimating-the-complexity-of-innate-knowledge/).