“I think we need to change the way we think of the word ‘search’,” bit.ly Data Scientist and self-defined “data geek” Hilary Mason told David Vellante and John Furrier on SiliconAngle.TV at the Strata Conference in early February (see here for the full video of the interview). “So we have this idea that search is you go on a Web site, and it’s got a box on it; you type a query in, and you get back a list of results.”
But, she says, this unfiltered listing of results is an outmoded idea of what search needs to be in an era in which people choke on the fire hose of data trying to find some information. “When we think about real-time search, we’re trying to think about helping you discover the information you will want to know as soon as possible.”
What she and her cohorts are working on is developing statistical analysis based on bit.ly’s unique visibility into what information people find valuable enough to share – not just access. For instance,in her keynote presentation at Strata she showed a chart of the huge spike in sharing of information on the anti-Mubarak demonstrations shaking Egypt. Just searching for “Egypt” or “Mubarak”, however, is likely to turn up a lot of background information, plus malware sites masquerading as news sources. What bit.ly is developing is an enhanced search that will eliminate the uninformative sources and the malware and provide links to the sites that are most interesting, based not on what people looked at but what they then shared with their social networks via services like Facebook and Twitter. The first iteration of this, she says, will soon be available as a new iPad app.
Bit.ly has a unique view into just what people do find valuable on the Internet from its position as a leading URL-shortening service that can be used directly at Bit.ly or through popular front-end software such as TweetDeck, which allows users to update their status on multiple social services simultaneously and monitor their social contacts’ status updates in a single location. Bit.ly uses that data to provide various services. For instance, it tracks how often and where users distribute its shortened links. Anyone can see this basic data by adding a “+” to the end of a llink. This will display all the raw statistics on that particular link from the bit.ly database.
Bit.ly analyzes the activity around each new link as it comes in to provide security to its users. One weakness of a shortened link is that it may hide clues that might reveal that the URL accesses a malware or spam site. To prevent that, bit.ly analyzes the traffic associated with each new link to identify probable malware or spam. It also matches links to the malware source lists provided by Google and recognized online security services and blocks access to any sites they identify as dangerous or that its own analysis identifies as having a high probability of being dangerous.
Bit.ly also provides a paid service to companies that want to see how their brands are being received and used on the Internet, based on its statistics on how, how often, and where they are shared by users.
Opportunities in Data Science
This, however, is just the beginning. “I think there are amazing opportunities for start-ups right now,” she says. “We’re doing a very good job of solving problems we solved 20-years-ago and 10-years-ago more quickly and efficiently….but we still need to figure out what the new capabilities area for solving problems we haven’t been able to address at all in the past.”
What does it take to solve these problems? She defines data science as a combination of advanced mathematics, computer science, statistics, and “finally just hacking, and I think that is by far the most important.
“If you’re the kind of person who can say, ‘I have some cool data, and I’m really curious about some questions about that data, and I’m going to figure this out’, and if you can do the math, then you can become a data scientist and play in this exciting new field.”