Two of the biggest names in AI/Learning, Noam Chomsky--who needs no introduction--and Peter Norvig--Director of Research at Google and well-known author of AI textbooks--have been debating about their favored approaches with increasing acrimony, as is detailed in the linked article ("Norvig vs. Chomsky and the Fight for the Future of AI" by Kevin Gold, tor.com, 11Jun2011). To oversimplify, their positions are:
- Chomsky has spent a half century building an ever more complex universal grammar for how languages are put together as a mechanism for "understanding" language, and argues that humans (at least) have a mechanism for utilizing this grammar to enable language and learning by being able to map the parts of the language onto concepts and information.
- Norvig has--quite successfully--shown that ignoring grammar and simply using neural network technology along with massive amounts of data (like what Google Hoovers up on the internet) can be used to map anything in one language into any other. This approach has been used not only for language translation, but to create the quite impressive Jeopardy-playing contraption "Watson" at IBM and many other technologies that seek to allow machine "understanding".
The crux of this debate and analysis of "who's winning" has really come down to the apparent success of the Neural Network/Statistical Learning approaches versus the general lack thereof from grammar-based approaches, which have been in development for a lot longer. To quote the article:
Kevin Gold said:
What occurred to me in reading this, is that the two sides are really talking right past one another, mainly because they lack agreement on what the *goal* of all this is. Chomsky is closer to understanding this (which is no surprise to me because he's a cognitive scientist, not just a computer scientist):
Kevin Gold said:
Steven Kass,"Unthinking Machines", MIT Technology Review 4May2011 said:
Norvig is obviously arguing for "going with what works":
Kevin Gold said:
In his essay, Norvig argues that there are ways of doing statistical reasoning that are more sophisticated than looking at just the previous one or two words, even if they aren’t applied as often in practice. But his fundamental stance, which he calls the “algorithmic modeling culture,” is to believe that “nature’s black box cannot necessarily be described by a simple model.” He likens Chomsky’s quest for a more beautiful model to Platonic mysticism, and he compares Chomsky to Bill O’Reilly in his lack of satisfaction with answers that work. (emphasis Buffy)
When I think about how to translate these two points of view, I see a huge difference in the two combatants goals:
- Chomsky is looking for a way to get a base of code to logically describe knowledge about the world: something that you can actually learn in an abstract way and then apply *precisely* because it is described in an abstract manner. That is the software actually "understands" how things work and can "use" that knowledge in new and creative ways.
- Norvig is looking for a way to gather together enough data, such that programs produce "correct results" but "understanding" is really irrelevant: no matter what you want the software to do, there's an analogue out there that's close enough that you'll get the right result a percentage of the time that is proportional to the amount of modelling data that you can get your hands on.
That sounds to me like two guys who are living on different planets. As a fellow computer scientist, I certainly appreciate what Norvig and others are doing so successfully, in getting computers to perform useful jobs that do indeed seem "intelligent". But while Chomsky is being derided as being "old guard" promoting solutions that "don't work", in my mind those people are completely missing the point of what Chomsky is trying to do, which is to understand the nature of "intelligence" in the sense of being able to make leaps in adaptation that might indeed be solved by doing enough munching of enough data, but do it in a way that is fundamentally more efficient.
When I think about Norvig's argument I can't help but think of the old saw about if you have enough monkey's typing on enough typewriters, eventually one of them will type the entire works of Shakespeare, the problem being the definition of the word "enough". Norvig's logic really completely depends on--to quote Shakepeare--there being (almost) nothing new under the sun. Unless you have something "close" to a desired result in your learning set, your neural network is unlikely to produce that result.
This really points at the more limited definition of "intelligence" that we get from Turing: if an observer cannot distinguish between a human and a computer, then we can call it "intelligent". Watson playing Jeopardy was an excellent example of this, and we all marveled at how human Watson could be *in real time*. But given enough time spent observing, enough of those hilariously off answers would creep in and start to allow that observer to fail Watson eventually. All that Norvig's approach of "just get more data to have it learn from" does is increase the amount of observing time necessary for it to hit the hilarious failure.
Moreover it cannot be overemphasized that projects like Watson require huge amounts of time and resources to solve an *extremely limited problem set*. Yes, Ken Jennings no longer even has to think because of all the money he won on Jeopardy, but just knowing how to play Jeopardy would not allow him to write his autobiography or be the producer of another game show. It is exactly that inability to *use* all that "knowledge" that is the limitation of the statistical approach.
Now the other important point here is that it's not that Norvig's preferred technology is any more of a dead end than Chomsky's: remember that the brain is a huge neural network and it *does indeed* implement the more sophisticated form of intelligence that Chomsky is seeking to harness. But the point is that the neural network in the case of the brain is used to *implement* a conceptual framework for that intelligence. Once you start to think about the fact that silicon-ware vs. wet-ware is a *platform issue*, you realize it has nothing do with the program that's implemented on top of that hardware (in either silicon or neurons) designed to implement "generalized intelligence." Norvig is using the neuron model directly to sift through data to simply ensure that within some well-defined problem set that "reasonable" answers are obtained. Going outside the problem set runs into the same problems that the grammar/logic folks ran into decades ago with the recognition of the need for "world knowledge" to achieve "generalized intelligence".
That is to say, having neural networks is not sufficient to achieve such generalized intelligence, there's got to be something programmed in to the network that actually implements a system for dealing with abstract concepts. Since a brain could do it, it could indeed be all neural nets, but it might take 200 million years of trials to develop, just as it did to get to our brains. Chomsky is in essence arguing that if we can figure out what that "system for dealing with abstract concepts" is, we could short circuit the process and maybe get it done in our lifetimes, and deal with a nagging hole in the "pure neural network" approach that is coming not from the computer or cognitive science fields, but that other favorite topic of Chomsky's: pubic policy which I will get to in a second.
Having spent quite a bit of time with both technologies, I have to say I get really tired of the debate, because as I see it, any true generalized intelligence is going to require BOTH approaches. Neural networks are excellent for tuning and optimizing behavior using real-world feedback loops to implement solutions for limited problem sets. But if you're going to put the big pieces together, you absolutely are going to need logical/semantic programming.
The Public Policy issue that has flared recently has to do with how we deal with "robots" that are autonomous. The two most notable examples are self-driving cars and military drones. With both there is an increasing desire to have these operate without human intervention, either because the human cannot be trusted (e.g. a driver who's had too many to drink), or human intervention is increasingly impractical (military drones needing to be able to do without human input due to communications delays). The question becomes, as a legal and moral and optimized outcome, when can we ENTIRELY trust the computer to "do the right thing?" Issac Asimov famously posited that we needed to logically program in his Laws of Robotics, but it's not entirely clear how its possible to merge such logic into a black box neural network with any assurance that the logic would be obeyed, when the network could have some data that simply avoids all the tests for adherence to that logic.
Unfortunately, it seems to me that that breakthrough of merging logical/conceptual frameworks with statistically based modules is what we really need before we have "real artificial intelligence."
People locked into such scientific battles like to hear "you're both right" even less than "he's right and you're wrong." But lets hope for (and lobby for!) just that sort of change in thinking.
Colorless green ideas sleep furiously,