Building Big Data: What Choice Do You Have?


One of the most understated impacts of big data in our lives is just how much it can influence the choices we make. For example, retailers and product sellers from just about every industry these days are using data to some degree, analyzing their customer’s purchase history and habits to come up with special offers and recommendations that they’re more likely to want to buy.

Big data is proving itself to be a highly successful sales tactic – but what most people don’t realize is just how subtle it can be, or how much influence it can have on their decision making.


The world’s number one ecommerce portal, eBay, is striving to make our ‘choices’ easier for us, using big data analytics to shape its search engine and throw up ‘results’ that consumers are more likely to purchase.

eBay claims that it’s trying to make life easier for its customers, but there can be little doubt that profit is also a motivating factor behind its efforts to optimize its search engines by deciphering consumer behavioral patterns.

From its data analysis, eBay has pulled up some curious findings recently. For example, a user looking to buy something called a Pilzlampe – a mushroom-styled lamp made in Germany – was found to be far more likely to actually purchase an item if he or she searched for the term “pilz lampe” (two words), as opposed to “pilzlampe” (one word).

See the entire Building Big Data Series on Pinterest and Springpad!


The reason for this is that most sellers spell the word incorrectly, and so a miss-spelt search for “pilz lampe” would actually throw up more and better results than a correctly-spelt search.

Knowing this, eBay has begun to adapt your search queries for you. It groups search terms together, so that when a user searches for one term, it automatically adds common, alternative terms and synonyms associated with that phrase, so that it can come up with the best possible results.

eBay does this for thousands of different searches, but it’s not without its dangers. One of the problems is understanding search terms in the correct context – for example, if a user searches for “Orlando Magic”, eBay has to be careful that its search results don’t go throwing up any witches broomsticks or loaded dice instead of merchandise for the sports team.

As Hugh Williams of eBay Australia explains, its search engine is something that’s constantly evolving:

“There are very subtle problems that can occur at our scale, so we need the likes of data scientists to investigate these issues.”

All this to make you buy more stuff – isn’t that good of them?

TV Viewing

One of the things that the online video streaming service Netflix is famed for, is its uncanny ability to come up with ‘suggestions’ that viewers will actually want to watch.

Things didn’t always used to be this way. Netflix’s movie recommendations were actually pretty hit and miss when the company first started out in 2008, and it wasn’t until they offered up a cool $1 million prize that a team of data experts were able to improve the accuracy of the system significantly.

Apparently, recommending stuff isn’t as simple as it sounds. There’s more to it than just coming up with movies of a genre that a viewer has a preference for, as Netflix would lose all credibility if the films it offered up were box office failures.

So what Netflix does is to take into account every single piece of data it can get its hands on, including purchase history, movies that customers looked at but didn’t buy, product ratings, response to previous recommendations, products their friends have bought, products bought by people with similar tastes, and products bought by people living in similar regions.

Key to all this is Netflix’s use of the Amazon Cloud and Hadoop. By using an enterprise data store that integrates with the R statistical language, Netflix is able to rapidly develop and improve its recommendation algorithm using data already stored in the enterprise data warehouse, and by integrating this with unstructured data stored in Hadoop. This eliminates the need to copy massive amounts of data, reduces the cost of the IT infrastructure required to support the project, and allows our customers to quickly deploy improved recommendations on their websites.

Netflix’s data sources are too innumerable to mention, but this only underlines the point that the more data you can use, and the smarter your algorithms are, the better your results will be (making customers more likely to buy).


It’s nice to know that not everyone is using big data recommendations to make money. In fact, some people are going as far as to make suggestions that can help you make money as well (one day), by proposing an educational path that is best suited to your interests and skills.

Motivating this new approach is the worrying level of dropouts in America’s educational institutions. According to a New York Times article:

“Only 31% of students at public colleges get their bachelor’s degree within four years, and 56 percent graduate within six years.”

The article goes on to show how Arizona State University is using big data to power its eAdvisor system – borrowed from the University of Florida – to make personalized recommendations on which majors students should take, as part of an effort to reduce dropout rates.

If students fail to do well enough, eAdvisor automatically flags them as being “off-track” – should any student be flagged twice in a row, they may be forced to change majors.

This might sound harsh, but the idea is that you have to be cruel to be kind. In addition, the eAdvisor system helps students to know what they’re getting themselves into by front-loading the key courses they’ll have to take. For example, a psychology major necessitates that students also do well in statistics.

The case of biology major Ms. Eriven is a fine example of how eAdvisor can help. Ms. Eriven was unfortunate enough to be flagged twice by the system, which meant she had to go and see an advisor for crunch talks on her future.

Eriven’s meeting proved to be an extremely fruitful one – there, she listed her interests, which included science and music. She also stated that she likes writing and is a family-orientated person, and these answers were fed to eAdvisor, which came up with a number of alternative majors she could study, among them creative writing.

Writing. It would involve only a couple of classes each semester. She could still take science and, hopefully, switch back to biology.

And so that’s exactly what she decided to do.