Q&A with Tableau Software Chief Scientist and Co-Founder Pat Hanrahan [Part 1]
Recently I had the opportunity to interview Pat Hanrahan, chief scientist and co-founder at Tableau Software, a leader in the data visualization field. In part one of our two part Q&A, Hanrahan discusses how data visualization has changed, the hindrances people face and the rise of data analytics technologies.
Here’s an excerpt from the interview:
Question: How would you say data visualization has changed in the past year?
It is becoming a little more mainstream and more widely accepted in the business community. It is driven by the amount of data that people have. You hear about big data. Now you are hearing about a lot of analysis. Somewhere along the line you have to present that analysis to people. You need visualization technology for that. That causes incredible adoption for the technology.
Question: What are the hindrances people are having with data visualization?
Designing good visualization is challenging. A lot of people go in and say any visualization will be good. They will go in and create lousy visualization. There is a lot of bad visualization out there and when you have bad visualization that doesn’t help anyone. You don’t get anywhere. We’ve had this problem with bad visualization so people undervalue the technology.
People are getting better educated but it is still a huge problem. People are not visually literate – especially analysts.
Second big problem is ease of use. Do you have to get a programmer involved? If you do, it’s a huge barrier to actually using visualization. What we do at Tableau is have people compose a picture of the problem. One of the things we have we done is make it possible for people to ask a question of a database by composing a picture of what you want to see. And have the query be generated automatically and the visualization be produced as a side effect.
You push out the visualization to the users. That lets people who are not programmers use visualization.
Question: What do you mean by composing a picture?
Suppose you want to show sales by state. You have a database with a bunch of sales information. You can click on the state. Double click and make a map with every state that has sales on it. You can think of it as I am going to make a picture of a map with sales on it but in the process I am formulating a query. In SQL lingo, I am creating a query, grouped by state and forming the sum of sales.
Question: Isn’t the challenge though getting the data in there in the first place?
That has classically been the challenge. How to collect the data and how to store the data. What is changing now – people have spent a lot of money collecting and storing data. And not any money accessing and presenting the data. You see all this information trapped in databases. There has been no way to access and get the data. That is why data visualization is emerging. There has been all this data stored. No one has tools for doing anything with it. The balance has shifted toward accessing the data. You can’t have one without the other. If you are not collecting the data you are not going to be doing much data visualization.
Question: What are the technological drawbacks of traditional data warehouses for data visualization?
What you are seeing emerge is a lot of database analytical technologies. You are seeing this emerge in (EMC) Greenplum, (HP) Vertica, (Teradata) Aster Data and Hadoop a little bit. What is happening is a shift toward them supporting transactions v. analysis. A transaction meaning you buy something or change a seat assignment. It’s an update. You check out a record and you update it. Nowadays the real loads are doing analysis. Tell me how many seats I have not sold. That means I have to look at the entire database. Those queries are starting to dominate the workloads. The fact that the traditional database technology is not very good at driving innovation in the data industry. Now that the data has grown so much we are seeing a whole new generation of data warehouse technologies. Tableau is great for these technologies. Hadoop is good at throughput but not as good at latency. With visualization, you want low latency. You are starting to see technology emerge that answers questions on the fly. Visualization is driving that.
You are starting to see database technology emerge with high throughput and low latency.
Question: Hadoop has issues with latency so how can that be optimized?
A lot of people are working on that now. In practice what they do is take Hadoop and run a bunch of queries. They run a bunch of queries and produce a fast database. They stuff the data into a database like Vertica. In some sense they are doing a two step process. They ingest the data with Hadoop and get it down to reasonable size and then put into a database that supports it.
That is one approach. If you trying to make Hadoop run faster, a lot of companies are trying to do that. But none of the technology is commercially viable. Hit a query and get an answer in 2 seconds, that kind of technology is still not viable.
Question: What are the the different classes of data virtualization? How do you define data visualization?
My definition of visualization is anything I see. So to me tables are visualizations. From my point of view in terms of usefulness it is the class, it is the type – a map, a time series. The classic data visualizations have been around for decades, centuries, millenniums, actually. They have stood the test of time. Maps, time series, even the lowly bar chart. The goofier ones sometimes are useful and sometimes are not.
Services Angle
Most analysts are not trained to aggregate data, clean it up and then place it in an analytics service for piping into a data visualization environment. It’s really a field for the next generation of data analysts. The current crop of business intelligence solutions architects understand transactional data. But unstructured data is a mystery to most analysts. The need, really, is for more services so companies can train their analysts to use data visualization while at the same time nurturing a new generation of data visualization specialists. Pat gets more into this in the interview, which we will run in the second part of the Q&A.
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU