Summary: We might already have had that solution for a theory of consciousness, or at least a hypothesis for it, lying around under our noses.
NB: I intended this essay to be light reading. I failed miserably in that intention. Given that, instead of editing too much the content of the essay, I will instead invite anyone who reads this and has any questions whatsoever to email me at hleehowon at the Google email service, and I will answer them to the best of my ability.
This is a fundamentally speculative essay about an idea which is speculative to its core. Many of the statements I will make are not actually supported by rigorous proof but by empiricism, as is common in disordered systems physics. Therefore this entire contention is hypothesis only, and more akin to the attacks that philosophers make on problems than the attacks that mathematicians make. However, the problem of consciousness remains extraordinarily difficult and it seemed to me that any angle of attack might still be a contribution.
Survey Propagation, introduced much too quickly
There is a phenomenon where one idea participates in very many disciplines at once. Among the most prolific of those ideas is the concept of belief propagation, or the sum-product rule 1. It is an algorithm derived from basic algebraic properties of addition-like and multiplication-like operators, taking advantage of the generalized distributive law 2. Among the many fields that belief propagation participates in are artificial intelligence, where it is a
message passing algorithm to marginalize complicated structured distributions such as those defined by Bayesian networks and Markov networks taking advantage of the optimal substructure and overlapping subproblems of the marginalization problem, finding exact solutions in tree factor graphs and solid heuristic solutions in loopy factor graphs,
and the statistical physics of disordered systems, where it is a
cavity method to find magnetizations and related quantities for single instances of dilute mean field spin glasses in the zero temperature limit, taking advantage of the structure of the interaction graph, using the Bethe-Peierls ansatz.
The canny reader may have realized that those two sentences are only a little different from each other.
The belief propagation algorithms correspond to the replica symmetric solution to the spin glass 3. However, the replica symmetric solution is not the whole story for such important NP-complete spin glasses as satisfiability. Cheeseman et al 4 found the phase transition between SAT and UNSAT and noted that a radical increase in difficulty occurred near that phase transition and in problems like SAT. That was the beginning of the investigation of the phase transition of NP-complete problems. The phase space of k-SAT is actually more complicated than that story, with many phases known and the relation between the phases and the ultimate story of how they arise not fully known 5. One known feature of the phase structure can be leveraged, however. An imagined Markov process with detailed balance cannot keep on mixing in linear time on the solutions as it did in previous phases, and now needs exponential time - the solution state is now a mixture with very many extremal Gibbs measures. Of course, the solutions have to coincide with that story about the phase diagram, which seems to happen, empirically, in k-SAT and CSP but not in certain other problems 6.
The algorithm of Mezard, Parisi and Zecchina that attacked kSAT (and therefore all NP-complete problems, including notably general CSP) by taking advantage of some certain features of the freezing of variables in certain parts of the phase space is called survey propagation 7. It is a close relative, a dual with a construction that is not too complicated, of belief propagation 8. Nonetheless, it is radically more effective than belief propagation in a section of the phase space of SAT and CSP problems.
The nature of the dual is that survey propagation works by obtaining marginals over covers, not configurations. Covers are a sort of “representative generalization of a cluster of satisfying assignments” 9. To my knowledge, that algorithm is basically unique in this property, with the exception of the family of tweaks on it which people made for practical usage, including an interpolation between belief propagation and survey propagation 10 and survey propagation with backtracking 11.
Consciousness of Hard Problems
I noted the other day with great surprise that that this survey propagation algorithm and the replica symmetry breaking formalism that goes along with it, has some properties that correspond with some of those properties that are claimed to be important for systems that are conscious, throughout the literature of the philosophy of mind. I allege that thinking about consciousness as a method in vivo to take advantage of some properties of the phase space of constraint satisfaction problems allows a powerful functionalist attack on some fundamental philosophical problems of consciousness.
If you are not an epiphenomenalist, then you probably believe that consciousness is relevant in some way to cognition 12. Of course there are exceptions. The phenomenon of frozen variables is absolutely and certainly computationally relevant to problems which seem to be basically similar to perceptual ones. That is, survey propagation convincingly dominates similar methods when used for decimation in k-SAT, CSP and other related problems 7.
As far as we know, moreover, the phase transition phenomenon is universal (in the physical sense) in NP-complete problems 13 (although we do not have a proper proof yet). The problem at hand is search for a ground state energy of a spin glass corresponding to the NP-complete problem. That problem, or a problem reducible to that one, must be overcome by any learning system to do practical tasks, without characteristic reference to specific algorithm. But taking advantage of that freezing phenomenon in the phase transition may be essential to a really great attack on the problem in certain parts of the phase space. If you are flubbing around for evolutionary reasons for consciousness, this might be a good one, if you accept the hypothesis and that computational concerns have bearing on cognition.
Note also that no quantum mechanics is involved and we can safely stay in the warm classical regime.
The hypothesis presents to us an unoriginal 14 attack on Searle’s Chinese room 15, by simple computational arguments. The claim is that if the problem to be solved can deal with freezing, any system that can implement the Chinese room is conscious because there is a system taking advantage of the cover structure of the computational problem, because it has to in order to attack a computational problem which is in NP-complete and in the hard phase (which we imagine Chinese translation to be), unless P=NP.
This is a surprising claim. That is, there is something Searle overlooks in the progression in the quality of the imagined Chinese translation and the English translation, that it might be logically impossible unless P=NP, due to the structure of the phase space of the logical problem, that the system of the lookup Chinese script and the human English translation could be of the same speed and same quality. Such a consciousness-quality, or something like it, would be required to ameliorate the difficulty of the (we assume) NP-Complete problem of deciphering Chinese or English. Most such discussions of problems which are problems unless P=NP note that in practice, there are easier heuristics and that arguments based upon hardness do not necessarily go through for practical problems. However, the subject of our discussion is exactly those easier heuristics. (But the heuristics which take advantage of the phase space structure of such problems do not comprise the entire set of heuristics, as far as we know.)
An identical argument applies against the China Brain argument 16, where it is not logically possible that the Chinese folks could emulate the brain with perfect alacrity without taking advantage of that strange feature of the problem, so Davis’s and Block’s arguments are logically impossible in this world unless P=NP.
Another familiar attack on functionalism is the argument for its triviality. That is, one can easily claim for nearly all functionalisms that the functionalist theory ascribes consciousness to everything and therefore does not actually say anything as a theory. It is true that if your functional structure is probabilistic automata, nearly everything in the world can be given states and transitions such that it is described by a probabilistic automata. But if the claim is that the functional structure is that subset of automata which utilize properties of the logical phase space structure, many programs do not make the cut - but many do, even in a testable way. At the same time, such a theory would not be a behaviorist theory, because there still remains an internal representation given, even if with specifications.
Such a hypothesis also gives us the strange conclusion that p-zombies 17 are logically impossible unless P=NP, inasmuch as a claim about the phase structure of satisfiability is a claim about the structure of logic. Again, this is because the phase structure of NP-complete problems precludes their possibility, given freezing. You would simply be able to tell between putative p-zombies and conscious people, by an experiment similar to the one I suggest for blindsighted patients below, unless those p-zombies have a polytime solution for NP-Complete problems sitting around.
One might note that this hypothesis would entail that Mezard, Parisi and Zecchina’s implementation of Survey Propagation is conscious, which is a little unsettling to imagine. The moral aspects of consciousness are disturbing and weighty matters because conscious agents can make moral decisions and we imagine that a survey propagation algorithm cannot make them. Au contraire. Practical credit assignment problems, such as decisions on scheduling and economic optimization decisions, are easily reducible to satisfiability and are much more easily seen to have moral weight and decision, even if the ultimate credit is assigned to the implementer and inventor of the algorithm, because consciousness does not entail autonomy (and therefore responsibility), moral or otherwise.
An important point comes to mind also with the detailed examination of the nature of the problem that survey propagation attacks. The surveys in survey propagation are simplifications of the essential problem, which deals with probabilities of probabilities. The surveys are merely histograms of warnings (messages of the corresponding warning propagation algorithm), which are a computational optimization (a large one). The warnings are in and of themselves messages on “bare” (do not think I have anything to say about renormalization, because I do not - I do not mean bare in that sense) configurations, still. So surveys are not messages over the bare configuration space. They deal with the ensembles over covers.
Upon close examination of these covers, another surprise ensues. An attack on the ancient problem of the qualia may be made. The hard phase of k-SAT corresponds, as we have said, to a splitting of the giant component of configuration solutions which are amenable to nonextensive flips (nonextensive Hamming distance) towards a solution with an exponential number of pure states, which are represented as covers in survey propagation.
I claim that a good hypothesis for what the qualia are, are these computational pure states. This is not only because of the mere observation that there is a level of indirection that is really computationally significant on a problem of use and interest to AI folks, but also because the linked phenomenon of ergodicity breaking 18 can explain surprisingly many of the claimed ineffable qualities of the qualia, especially their intentionality and arbitrariness. The intentional quality of the states is easy to see because the states are clusters of the base configurations, so they seem like a representation in and of themselves, but the other claimed qualities are less clear.
Consider the spin glass magnet itself. The development of magnetization in the magnetic system seems completely as arbitrary as the qualia are claimed to be, and it breaks the symmetry of the possible magnetizations. Why does the magnet freeze its poles in the exact way that it does? There is a sensitive dependence upon initial conditions which breaks the symmetry. Why does Hofstadter think about that one time at the pier at Frederikssund 19 that his memory brings up so often? The claim is, there is a sensitive dependence upon initial conditions of his neural system which breaks the symmetry. This is an argument very close to that of the progenitors of the many spin-glass-inspired neural networks, from Hopfield on.
One thing to note that is a little related is the classic observation of Gomes et al 20, who found heavy-tail distributions in the timing and difficulty of the general complete DPLL solution of hard CSP, which led to the discovery of backbone variables. If there also exist backbone variables in the whole computational system of the human (freezing is one story of how they might arise), then one might see why a mind-body dichotomy might reoccur in many cultures and many places. The ancients, the medievals and the early moderns saw that there were a backbone of greatly important variables for cognition and figured that one could color these variables as nonphysical in a variety of different ways, because they endured when the physical undergirding changed. But this is like naming whirlpools and eddies in a turbulent system instead of naming proper variables and making a theory of fluid dynamics. Compare to Kolmogorov’s power law heuristic on turbulence and, of course, to the statement from BB Mandelbrot:
Faced with phenomena I view as self-affine, other students take an extremely different tack. Most economists, scientists and engineers from diverse fields begin by subdividing time into alternating periods of quiescence and activity. Examples are provided by the following contrasts: between turbulent flow and its laminar inserts, between error prone periods in communication and error-free periods, and between periods of orderly and agitated (“quiet” and “turbulent”) Stock Market activity. Such subdivisions must be natural to human thinking, since they are widely accepted with no obvious mutual consultation. Rene Descartes endorsed them by recommending that every difficulty be decomposed into parts to be handled separately. Such subdivisions were very successful in the past, but this does not guarantee their continuing success. Past investigations only tackled variability and randomness that are mild, hence, local. In every field where variability / randomness is wild, my view is that such subdivisions are powerless. They can only hide the important facts, and cannot provide understanding. My alternative is to move to the above-mentioned apparatus centered on scaling. 21
I do not and probably never will have the resources to test the hypothesis on actual humans. However, it is easy to suggest an experimental procedure. Human performance on the travelling salesman problem (TSP) is excellent 22, although I have not found an extension to decision TSP. Decision TSP is the task to test, since it and not normal TSP has the same basic phase diagram as CSP and kSAT23. The claim is that decision TSP requires perception throughout the phase space but conscious perception only in the hard phase, because freezing has to be taken advantage of only in the hard phase. A prerequisite would be to find that the excellent human performance found in TSP extends to the case of decision TSP in the hard phase, after some practice.
So what you could do, if you had the resources, is to take patients with blindsight, get them to decide on some decision TSP problems with their blindsight, and compare their results to normally sighted controls as you vary the order parameter. It should be the case that the normally sighted controls would be able to correctly decide on the decision TSP problem far into the hard phase, almost into the un-TSP phase, but the blindsight group should be able to correctly decide on the decision TSP only up to the end of the easy phase. You would not even have to do any brain surgery. If the null hypothesis for that question is rejected, investigation of animal consciousness would be as simple: find a problem analogous to decision TSP in NP-complete where you do not have to explain it to the subject, and lesion the experimental group brains at any regions putatively necessary for consciousness.
Of course, that only tells of the sufficiency of consciousness for helping computation in that specific way. It does this in an experimental and therefore imperfect way. Unfortunately disordered systems is an area of statistical physics where proofs are seldom forthcoming, but that does not rule out the possibility of a proof of the necessary direction, or otherwise: it is quite well known, for example, that the dynamic portion of the 1RSB ansatz is not necessarily the onset of a hard phase.24
Machine investigation is easier. A Procrustean summary of the connectionist program is that the connectionists claim that the human system, or at least the most important computational part of it, is an online CSP solving system. Also see 25. However, at least there are many constraint-satisfaction interpretations of the most important of the connectionists’ contributions to the literature, including backpropagation with dynamic programming, although the most salient one that comes to mind remains the Boltzmann machine. (More directly, the proof of the NP-completeness of the problem of neural net learning comes by a simple reduction to satisfiability 26.) And it is pretty straightforward to have a transformation between the layered restricted Boltzmann machines and backpropagation nets: those were the now-superseded first non-convolutional backpropagation deep nets.
Survey propagation is actually not often used in shipped satisfiability products, because it often falls apart in problems of high k for k-SAT specifically. But this may not be the case for these specific problems, which are more alike to CSP.
If you want to ship a production system, most people do that nowadays with neural nets. This is because, in practice, the construction of a complicated bayes net or markov net is not at all a solved task (see: Clippy). But there are fundamental connections between belief propagation and the straightforward way to create and ship neural networks. Backpropagation net representations can also be used for graphical model inference 27. It is also the case that backpropagation itself can be used as a replacement for the Baum-Welch algorithm in hidden Markov models, with a simple generalization to the rest of belief propagation, as outlined in the same paper. This is a strange way to look at it, given the knowledge that the backpropagation representation is indeed exponentially smaller in parameter size than the factor graph (HMM in this case) representation 28, but it is a way to look at it.
Interestingly, Dauwels notes 29 that gradient descent, including neural backpropagation, fits quite well in the message passing regime. He probably intended this to hybridize EM methods with backpropagation, but you could also use this as justification to try to find a dual that would be analogous to the dual applied to belief propagation to create survey propagation, in the regime of neural backpropagation. Unlike the suggested medical experiment, I can easily give this a try, so this will occupy most of my next few months’ worth of weekends and evenings.