UPDATED 13:40 EST / FEBRUARY 29 2012

Strata Conference Day 1: Data Mining and Predictive Models

One of the biggest conference dedicated to big data kicked off yesterday, and we at SiliconAngle are here to give you the highlights of the events, drumming up highlights and exclusive interviews. Last year’s conference heavily focused on Hadoop, and it’s still a central topic this year, compounded by a lot of data mining and predictive analytics.

John Furrier and Dave Vellante brought in sociologist Marc Smith from Social Media Research Foundation for an interview at theCube. He believes that now’s the perfect time to construct tools to edify the mass about big data and social data, saying it is the leg up to our knowledge of the “big picture.”

Whoever “races to the top of the Big Data mountain first will see that vista,” and this is to their advantage as they’ll be the first to play on, develop and utilize big data patterns. It enables social scientists understand society.

To support Smith’s claim, here’s Christopher Berry’s take on the  Strata Conference Day 1 in a few bullets:

On Web Mining

• The web is an infinite series of edge cases.
• The Robots.txt is not a terms of use document.
• Scraping should be done ethically, respect the robots.txt, respect their rate limits, be transparent about who you are and why you’re taking data from them.

Predictive Models

• The complexity of the model needs only to be proportional to the complexity of the problem.
• Producing random trees and generating a forest is a good way to produce a model without systematic error.
• Keep the trees shallow.

Aside from big data talk, Strata Conference is also a hub for product launches. To start, we have the Cloudera University announcement.  Their Shared Learning Collaborative, an effort to make data work for students, received a lot support. This open source project is  geared towards building technology that helps bring personalized educational materials and powerful tools directly to teachers’ fingertips so that they can easily find the resources, techniques and strategies that will help them meet individual students’ learning needs.

Here’s also some interesting statistics originally drummed up by The Guardian about the conference’s attendees. The highlights are as follows:

-Developers traveled an average of 2,346 miles to attend the conference.
-The total company air miles: 2,174,144
-About 83% of attendees are male. Ugh.
-About 33% of the attendees said their organization stores an average of one terabyte of data produced each.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU