Bill Katz

My Brain

An occasionally updated repository of thoughts, past work, and links. Topics include programming, web ventures, and writing.

AI: The Power of the Masses

It was approximately two decades back that Doug Lenat went down to Texas to work on Cyc, his vision of how to solve the common sense problem inherent in Artificial Intelligence. Computers are pretty good at doing some tasks, like playing Chess in a very inhuman way, but they fail when asked general questions that touch on life. For example, if a patient told a doctor that he had heart pains after the Dallas Cowboys scored a touchdown on the Washington Redskins, a computer would have a hard time asking context-riddled questions: Did the patient have money riding on the game? Did he get upset because he's a die-hard Redskins fan and the subsequent stress precipitated some angina? These are very tangential ideas, and ones that computers have failed in the past. It's why there's a Turing Test for artificial intelligence.

Lenat started Cyc with the idea of gathering a massive amount of data from newspapers, magazines, etc. Gather little bits of knowledge about the human condition, link them all up, and hopefully some reasoning system can make use of it and produce behavior that's human-like. Since you probably haven't heard of Cyc, you'd guess that the technology hasn't reached its original goal. But the more I think about the web and companies like Google, the more I think that the basic idea behind Cyc has come of age. The web is being used to solve tough problems by intelligently processing huge amounts of data. The case study for this is Google.

I came across an old blog entry on the GooOS, Google Operating System (via Jon Aquino's excellent Rails Day entry yubnub.com). If you listen to Google tech heads about what they're doing with their uber-computer, you'll hear how they are harvesting real-world knowledge through analysis of web documents. Because there's so much stuff out there, they can be choosy about which sentences they'll extract knowledge from -- what is the reputation of the source, how easily a particular sentence is parsed, how likely will it give a nugget of the author's "truth", etc. You can see the results whenever you search for "JEK assassination" and Google helpfully asks if you really mean "JFK."

Language translation, one of those extremely difficult AI problems, can also benefit by processing a large number of documents from reputable authors. Example #2: Google's use of United Nations documents to power a translation system. It's amazing how researchers are able to tap into the knowledge embedded in the web, and now off the web. Aside from UN documents, Google will also digitize a vast collection of library books, which should give their knowledge extraction systems high-quality historical data. Makes me wonder if the system that passes the Turing Test will be suckled on the milk of repurposed human books and documents.