Towards mathematical understanding of intelligence

# Summary

Research in artificial intelligence dates back at least to Alan Turing's ideas in the 1950s. However, there is still no accepted definition of what artificial intelligence is. Turing's own definition, the Turing test, is empirical rather than mathematical: it is a procedure involving a human examinator which sheds no light on intelligence unless you accept the concept of "human examinator" as given.

It is not obvious that there should be a definition of intelligence in the rigorous mathematical sense. In fact, most everyday objects do not and cannot have such a definition: for example one would be hard pressed to provided a mathematical definition of "table". However, it is my personal conviction that intelligence is a fundamental concept which should be mathematical or at least metamathematical. It also seems natural to seek for this definition in the domain of theoretical computer science, since intelligence seems to be a sort of computational process

To arrive at a definition, it is sometime useful to examine the principal properties of the concept. I suggest to consider 3 such properties of intelligence:

• The ability of intelligent agents to recognize patterns in their observations or laws governing the universe they inhabit
• Their ability to continue such pattern or form predictions i.e. guess future observations given past observations
• Their ability to solve problems of very general form. This stands in stark contrast to artificial machines and algorithms that are always limited to solving a very narrow range of problems

The notion of pattern recognition can be formalized via the theory of Kolmogorov complexity. In this language, finding the pattern in given data (e.g. a string of bits) means finding the shortest possible program which produces this data as output. For example, a random string of bits carries no pattern therefore essentially the only way to produce it is by hard-coding it into the program. On the other hand a string of the form 010101010101…01 can be produced by outputting 01 in a loop of appropriate length. This is a much more compact description which reflects the fact this string follows a simple pattern. Unfortunately finding the shortest program is in general an uncomputable problem: there is no algorithm which solves it, a fact closely related to the Berry paradox

An "ideal" theory of prediction was suggested by Solomonoff in 1964. This theory suggests using Bayesian inference where our prior assumption on the "universe" we're trying to predict is more or less that it's generated by a random program. The notion of a random program is formalized using a universal prefix-free computer, like in the definition of Chaitin's omega. In particular, the probability of a given program falls exponentially with the length of the program, so Solomonoff inference is a mathematical formalization of Occam's razor: the simpler an explanation the more probable it is. It is easy to prove making predictions with this prior allows predicting any computable sequence given enough bits from the beginning of this sequence. Unfortunatelly, this procedure is also uncomputable

In 2006 Shane Legg published an article discussing general-purpose predicition algorithms. He shows it is impossible to construct a universal predictor, i.e. a predictor which can predict any computable sequence given enough bits from its beginning. This is easily seen by constructing a "screw-you" sequence which runs a given predictor and outputs the opposite of its prediction. Nevertheless it is possible to construct a predictor applicable to all sequence with Kolmogorov complexity beneath some fixed K. The minimal Kolmogorov complexity of such a predictor turns out to be roughly equal to K. However, given any fixed system of axioms (e.g. ZFC) and for sufficiently high K, it is impossible to construct such a predictor for which its predicting ability is provable in this system of axioms. Thus powerful predictors are not only complex but also unprovable

Arguably even intelligent agents can only solve problems for which the solution can be efficiently verified, otherwise how can the agent know she solved the problem? For example proving theorems is a formidable task but verifying formal proofs can be easily done by an efficient algorithm. In complexity theory, the notion of a problem with efficiently verifiable solution is formalized as a non-deterministic algorithm. The universal problem solving ability of intelligent agents can thus be naively formalized as the ability to efficiently find solutions of problems for which the solution can be efficiently verified. This is exactly the most famous open question computer science: is P = NP? Most scientists think the answer is negative thus there are no universal problem solvers in this sense

Shane Legg & Marcus Hutter published an article in 2007 in which they provide what is to my knowledge the only attempt so far at an actual mathematical definition. Their definition is quantitative rather qualitative i.e. it does not aim to separate intelligent "things" from unintelligent "things" but rather measure the quantity of intelligence: a sort of abstract IQ test. The framework they consider is "black boxes" with abstract input & output, a special part of the input being the utility function the agent is supposed to be maximizing. The intelligence of the agent is then defined to be its expected total utility when placed in universe generated by a random program. In my opinion this approach has two weaknesses. One is its purely quantitative nature which I feel is insufficient. Another is the lack of complexity considerations: the black box is purely abstract without any computational model. The latter is acknowledged by the authors but they do not consider it a conceptual problem. Finally, the authors define the most intelligent possible agent called AIXI (it was actually constructed earlier by Hutter) but it is again uncomputable

Another interesting attempt in modeling intelligence is Juergen Shmidhuber's Goedel machine published by him in 2003. It is no so much a general mathematical definition as an example of an intelligent machine. The Goedel machine uses a similar input/output/utility framework but it is an actual Turing machine, with an instruction set allowing reprogramming itself in arbitrary ways. The machine runs a (Levin algorithm) proof search which looks for proofs of theorems of the form "if I am reprogrammed in way X my expected utility will increase". Once it founds such a proof it performs the reprogramming. Since everything is reprogrammable, including the reprogramming algorithm, the Goedel machine is a metalearner: not only that it learns how to maximize its utility function, it also learns how to learn, learns how to learn how to learn etc. In order to reason formally, it requires a specific axiom system, in particular some prior assumption about the universe. This can be e.g. the Solomonoff prior or the speed prior invented by Shmidhuber. The limitations of this construction in my view are its inability to transgress the axiom system it started with & the fact that for certain universes we might get boring behavior. The latter seems for me to imply that this model intelligence is unsatisfactory unless supplemented with a characterization of "interesting" universes. Of course the construction is completely impractical: the proof search is very slow. Given time it will reprogram itself into something faster but it would take a zillion years

All of the uncomputable problems mentioned above are computable in a weaker sense, namely in the sense of asymptotic computation. That is, we can construct programs which generate a sequence of "improving guesses" about the answer which eventually converges (but the program doesn't know it already converged). I think this is a realistic model of intelligence: when looking for a pattern or predicting we can never be sure we won't find a better answer by thinking for longer time. The question is then whether these problems can be asymptotically solved efficiently, i.e. with time that is polynomial in the execution time of the model (the program representing the pattern or the "Occam" model we use for prediction). It turns out the answer is positive if we allow for an NP oracle: a magical black box which can solve some NP-complete problem in polynomial time. Thus we meet again the P = NP question. This leads me to speculate that maybe the P = NP problem is so hard *because* it is key to understanding the nature of intelligence

To summarize, there is no satisfactory definition yet, however there are strong connections between properties of intelligence and natural concepts in theoretical computer science. Solomonoff inference, the works of Hutter, Shmidhuber and Legg provide important pieces for the puzzle. My personal impression is that further progress will come from the P vs. NP question