The Largest Data Industry Will Change Our World: Jose Ferreira of Knewton on Educational Technology’s Global Impact

Whether you look at it from the perspective of hard numbers or abstract impact, education is the world’s largest data industry with incredible potential to change enduring social problems worldwide. Students produce and consume millions more actionable data per year than the average online user. Jose Ferreira explains that we all benefit from a better educated world that data technology will help create, positing: “What if the girl who invents the cure for ovarian cancer is growing up in a Cambodian fishing village and otherwise wouldn’t have a chance? As distribution of technology continues to improve, adaptive learning will give her and similar students countless opportunities that they otherwise wouldn’t have.”

Inspired by his own background in education and technology, and the problem of educational access, Ferreira created Knewton, an adaptive learning platform, to address the enduring, but solvable problem of access to education internationally through an innovative approach. As, the educational landscape is fundamentally changing from a standardized model to personalized programs, Knewton’s adaptive learning platform leads the industry with $33 million in investments and $150+ million valuation.

In this interview we discuss what makes educational data so massive and the problem of educational access so urgent. We also explore how Knewton’s technological infrastructure handles massive throughput to provide personalized learning and what distinguishes the company from competitors. We also get personal with Ferreira as he shares the greatest lesson he learned outside of the classroom and his childhood fascination with Spiderman.

You’ve said, “Education is the world’s largest data industry by far.” What makes it so big? How big are we talking?

Education has always produced an enormous amount of data – but until recently, we haven’t been able to capture virtually any of it. Two things that distinguish how much data are produced by education (versus, say, etailing or social media) are the facts that academic study by nature requires a prolonged period of engagement, and that all educational concepts are extremely highly correlated to hundreds of other concepts; as a result, data mining education produces an exponentially increasing cascade effect. A Knewton student can produce millions of actionable data each day, or five orders of magnitude more than the amount generated by a typical online search user.

See the entire Innovative Big Data Series with Kathryn Buford on Pinterest and Springpad!


What was the motivation to start Knewton?

I spent most of my career in education and technology, I worked at Kaplan, and while I was there I was trying to innovate as much as possible around personalization. But at that point, it meant giving people lots of proctored tests to generate data. It was cumbersome for administrators and not a great user experience for students. That was back in the mid-‘90s, but I never stopped thinking about the intersection of data-mining and education. So, when the technology finally caught up, I decided to start Knewton.

Knewton’s goal is to personalize education. We want to become the industry standard data platform for education. But our ultimate vision – and what really motivated me to start the company – is to solve the access problem for the human race once and for all. Only 22% of the world finishes high school; only 55% finish sixth grade. This is a preventable tragedy. Adaptive learning can give students around the world access to high-quality education they wouldn’t otherwise have.

What makes Knewton different from other adaptive learning companies?

When most people talk about adaptive learning, what they’re discussing is either a.) single-point adaptivity, which evaluates a student’s performance at one point in time in order to determine the level of instruction or material she receives from that point on, or b.) adaptive testing, which determines a student’s exact proficiency level using a fixed number of questions.

When Knewton refers to adaptive learning, we mean a system that is continuously adaptive — that responds in real-time to each individual’s performance and activity on the system and that maximizes the likelihood a student will obtain her learning objectives by providing the right instruction, at the right time, about the right thing.

Knewton isn’t just a personalized quizzing engine. We can power any type of content – whether for teaching or assessment – in any kind of media. Knewton harnesses the power of all your previous data plus the combined data set of every other student who’s ever used the platform to optimize learning for each individual student.

Unlike other groups dabbling in adaptive learning, Knewton doesn’t force you to buy pre-fabricated products using our own content. Our platform makes it possible for anyone  — publishers, instructors, app developers, and others — to build her own adaptive applications using any content she likes.

How does Knewton leverage big data to provide personalized experiences for students?

Education produces a LOT of data per user: depending on the student, 12-16 hours a day, nine months a year for 10-20 years.

There is a very high degree of correlation between educational data points. And the aggregated effect of all those correlations is profound. If, for example, a student has demonstrated mastery of fractions, algorithms can reveal how likely it is that he will demonstrate mastery of exponentiation as well. The hierarchical nature of educational concepts means that they can be organized in a graph-like structure, which allows student flow from concept-to-concept to be optimized over time, as Knewton learns more and more about the relationships between them through data. In order to process tremendous amounts of student data involved, Knewton has established its own adaptive infrastructure.

There are millions of students using Knewton. How do you ensure your system can handle that massive throughput?

Knewton has established a cloud-based infrastructure that allows the platform to process tremendous amounts of student data and scale up and down in real-time as needed. We use algorithms that lend themselves to parallelization—for instance, “graph algorithms,” which are special in that they can be broken down into units of computation that depend only on other specific units and can thus be parallelized very efficiently.

Given the absence of robust, public frameworks for accomplishing these computations at a large scale, Knewton has designed its own framework called AltNode, which works by dividing work between machines and then sending continuous updates between the minimal necessary number of machines. All significant updates are stored in a distributed Cassandra database. If one machine fails, another one nearby automatically takes its place, recovering recent values from the database and resuming work. One unique feature of AltNode is that it allows models to recover from any state and respond to new data as it arrives.

Knewton technology is not just for educational institutions, but also corporations. How might Knewton impact the bottom-line for such companies?

 In addition to educational institutions, publishers, game and app developers, corporate trainers, and media companies can all use Knewton adaptive learning. For example, an organization might use Knewton to establish a personalized, self-paced corporate training program, allowing them to save money and train their employees far more efficiently and effectively.

You believe Knewton is “gonna make a big impact on the human race and change the world forever.” Can you connect the dots between adaptive learning implementation and a better society (i.e. things that adaptive learning might foster like lower unemployment, greater political participation, etc.)?

Education is a gateway problem – it drives virtually every problem we face. Improving education, then, makes all other problems more solvable.

Aggregating open educational resources in one adaptive platform could give every student in the world access to a baseline high-quality educational experience. The lectures and “classes” might not always involve live-in-person teaching, but they would be low or no cost for the developing world and still very high quality thanks to data mining and personalization. What if the girl who invents the cure for ovarian cancer is growing up in a Cambodian fishing village and otherwise wouldn’t have a chance? As distribution of technology continues to improve, adaptive learning will give her and similar students countless opportunities that they otherwise wouldn’t have.

You have also stated: “It’s a crime that 80% of the world doesn’t have [access to education] and it’s a solvable problem.” What do you believe have been the greatest challenges to solving this problem?

Education has always been an infrastructure-intensive endeavor. You have to create a school building equipped with electricity, plumbing, learning materials and highly trained teachers able to make that content come to life for students. As technology continues to get faster, cheaper and stronger, you’re able to work around some of the physical limitations of a student’s location.

We always end this segment with: How would you like to see the field of data science evolve over the next few years?

I want my fridge and cupboards to be able to order all my groceries for me.










What’s one of the greatest lessons you’ve learned outside of the classroom?

You can’t do everything by yourself. A strong team beats even the most talented individual.

If you could do it all over again, where would you have studied abroad in college?


Cambridge – but I wouldn’t have gotten in.

What was one of your favorite educational television shows as a child?


Electric Company, because it had a Spiderman segment. I watched that show religiously to see Spiderman.