DataKind pairs great data scientists with governments, non-profits and NGO’s that address social problems. Harnessing Big Data to make sense of complex problems like racial discrimination in New York and the utility of financial services in South Africa, DataKind illuminates processes that would have otherwise been difficult and costly to understand for organizations with often limited budgets. We spoke with DataKind co-founder, Drew Conway (whom Allistair Croll of O’Reilly Media has deemed the “James Bond of data”) to discuss the goals of a data driven social change movement, new career opportunities social initiatives are opening up for data scientists and what the new “era of hacktivism” means for the evolution of data science, in general. Conway also shares his favorite brain food, recreational activity and scientific thinker.
What is your vision for a data driven social change movement?
I think the social change movement starts with showing social organizations the power
of data through working with our organization. It has been our observation that many
organizations in the social sector see data as a way of showing progress to their donors,
and thus as a means maintain or increase funding. While this is true, our goal for social
change in these organizations to show them how data can improve their overall mission.
What makes the work data scientists can contribute to social change organizations different from that of sociologists or statisticians using programs like SPSS or STATA?
We do not think there is a difference, per se. As an organization we are completely
tool agnostic, and have many people participate in Datadives that are practicing social
scientists using SPSS, STATA, or any number of other propriety statistical computing
By expanding the pool of participants to the data science community – more broadly
defined – we can contribute many additional skills. Such as, acquiring data; whether
though web scraping or accessing APIS, cleaning and structuring it, specifying models
and computing them, and finally taking that analysis and conveying the results visually.
This cycle requires many different tools, and often many different people with different
skill sets. But, of course, includes social scientists!
What would be included in a checklist of questions a non-profit organization should ask themselves to determine if they’d make a good candidate for DataKind assistance?
We hope to get to the point as an organization where such a check-list is not necessary, and we can assist organizations at any technical level. That said, we are not there yet, and there are some baseline requirements for collaboration – especially in Datadives.
First, some set of machine-readable data. The “machine-readable” qualification is
important, because many organizations have data, but it may be stored in PDFs or
paper files. While converting those data to something that can be computed on is
extremely important, it is currently not one of our services.
Given this type of data, the other requirement is a well-defined problem to investigate
with the data. Part of our task as an organization is to assist the non-profit in defining
that problem, as they often have broad questions that need to be focused. Once we’ve
narrowed the problem, we find this allows the data scientists to think creatively about
how to approach solving it.
Right now it seems that data scientists’ involvement with DataKind is something that occurs on the side or in phases. Going forward, do you envision “Data Science for Social Good” to be something that data scientists can pursue as a full-time career opportunity? I think it’s clear how DataKind can help transform society, but how might you also be changing the terrain of data science?
The simple answer is: yes. The primary motivation for Jake and I to form DataKind
was that we did not have an outlet for using our skills for social good. We knew social
organizations had lots of data, and we knew there were many data scientists that wanted
to help the analyze it, but we just didn’t know how to connect the two communities.
The response to DataKind has been overwhelming, which thoroughly convinced us that data science for social good is not only a viable career path, but also a necessary part of defining data science as a discipline. That is, apart from engineering tools, or applying
statistical methods, data science is a discipline of using this tool-set to affect social
We always end with: How would you like to see the field of data science evolve over the next few years?
I would like the discipline to move beyond a conversation about engineering tools, e.g., Hadoop, R, etc., to one where we focus on the broader set of problems that can now be addressed with our expanded access to data. At the risk of sounding whimsical, humanity has reached a unique crossroad where we can now have access to data that can actually address some fundamental questions about human behavior and the mechanics of society.
For data science to be a viable discipline over the next few years it must be the
vanguard that pushes these questions forward.
JAMES BOND OF DATA
Favorite brain food to eat and footwear to sport at a Data Dive?
For Datadives I like quick and easy, so pizza and cookies are my favorite. But Jake
and Craig are vegitarians, and have been getting me to eat more healthy! Footwear
has to be comfortable, so I opt for sneakers or bare feet.
Favorite break-time activity in between heavy-duty data analysis?
I think keeping active is key, so either a trip to the gym, or a quick run. We’ve been
toying with the idea of adding a group run to Datadives.
Favorite scientist from history – data or otherwise?
Not at a scientist, but M.C. Escher is – to my mind – was the first data scientist in
history. He understood how to convey data and mathematical relationships visually
before anyone, which now constitutes much of what modern data scientists must do.