What is Watson?

Ed. Note: This is the second installment in a three-part series exploring IBM’s multi-year project to create the ultimate Big Data patient diagnostic and treatment plan tool using its unique Watson natural language cognitive system. The first part defined the scope of the problem. This part will look at what Watson is and what makes it a new generation of computing system and arguably the ultimate Big Data query engine. The third part will cover IBM’s vision for Watson in healthcare.

On Jeopardy Watson was billed as a supercomputer because it ran on a very large Power system, consumed a large amount of processing space, and used a tremendous amount of memory, Watson CTO, IBM Fellow and VP Robert High says. “But in reality Watson is at its heart a piece of software designed to answer questions against a knowledge base with incredibly high accuracy. So we refer to it as a cognitive system, not as a supercomputer or mainframe, because the level of fidelity and accuracy we can get from answering natural language questions is uncanny with respect to how much it captures the capabilities human beings have. So as a cognitive system, it is very much a natural language processing system that does what we refer to as deep natural language processing.”

Watson can take a question that someone literally asks out loud, and run that against hundreds of millions of pages of documents, and give a highly accurate answer. This is much more complicated than it sounds.

“For humans, the process of asking a question, understanding the question, & understanding the body of knowledge we are reading from to answer that question, is second nature,” High says. “We don’t think about disambiguating the question. In reality there is a tremendous amount of ambiguity.”

Contextualizing intelligence


Suppose, for instance, Ken Jennings had read the clue “Jodie Foster took this home for her role in Little Man Tate.” A person would quickly realize first that Jennings wants to be told what Foster took home, which means it must be something easily transportable and valuable. That implies that the question refers to an award of some sort. Since Jodie Foster is a movie actress and director, the person would quickly think of motion picture awards and the Oscars. With an Internet search at most that person would quickly discover that Jodie Foster won an Academy Award for Best Director for the movie. And a person would realize that the Ken Jennings reference means this is a Jeopardy clue, so the correct reply must be in the form of a question: “What is an Academy Award?”

Getting a computer to realize all that from a simple statement, however, is a major artificial intelligence challenge, High says. First, consider that Jennings made a statement, not a question. Then consider the many things that people take home – groceries, colds, babies, pets, frustrations from work. Then consider the things that people win that they do not take home – a contract, a corner office, a court case. The simple approach might be to create a rule that associates the phrase “take something home” with winning an award, but then you would need similar rules for all the other things that people take home, which quickly becomes an impossible task. Nor could any set of rules cover all the things people might do with things they win.

“So rather than building rules, we tried to follow the same processes that we as human beings take when we resolve ambiguities,” High said. “That is, we add context. There is no particular rule to it.” But in the context of Jody Foster, who directed Little Man Tate, one association would be an Academy Award, which includes a statuette of a size and value that winners would want to take home. “So we as human beings disambiguate by adding a lot of context. That’s essentially what Watson does, too. It takes a question like this and teases it apart, and then it goes out and seeks as much context as it can to associate with both temporal, spatial and juxtaposition aspects of the language used to ask the question, and, by the way, the language we use to find answers. And in the case of Jeopardy we had to do that in three seconds or less because on average you have three seconds to determine that you have an answer and be the first to buzz in. And we had to do that against about 60 million pages of documentation.”

Developing Watson


In the two years since Watson’s Jeopardy debut, High says, the development team has done a lot of work on engineering Watson implementations. Today it is much more compact, and it is about 240X faster. This is another reason that IBM does not refer to Watson as a supercomputer. Depending on the size of the problem, Watson can run on systems of a variety of sizes. In fact, in its recent announcement of the new smaller Power7 RISC computer systems for SMBs, IBM said that the environment running on them includes elements of Watson technology. So if you have ever dreamed of having that computer from Star Trek in your office, that dream may now be achievable.

In a sense, High said, this is software-led computing. However, the software has a very strong synergy with the IBM Power7 RISC systems, which “are uniquely architected to optimally run the kinds of workloads that are inherent in a Watson and other analytic systems. We have engineered Watson to exploit that unique systems architecture – machine instruction set, cache organization, bus-channel management, etc. We maintain a very strong relationship between the software and hardware to drive even more optimization around these kinds of systems architecture elements.

“In terms of go-to-market, we have been concentrating Watson in spaces where we can make an enormous difference with the kind of capability we just described. Which is why we’re in the healthcare industry.”