When it comes to the sophisticated tracking, monitoring and analyses of individual behavior that big data affords, more than just advertising profits are at stake. Democracy and justice may be compromised with every digital step we take. If, as Allistair Croll’s thoughtful critique of big data analyst practices states, “Big data is our generation’s civil rights issue,” where are all the protests? Perhaps, as David Gurman (Brainvise) and Eric L. Berlow (Vibrant Data Labs) of WeTheData point out, we have not yet heard much discussion or resistance concerning data and ethics, because these concerns have not been made accessible to a broader audience (beyond jargon savvy tech specialists).Vibrant Data is a project made up of a diverse team of data, writing and creative professionals that harnesses innovative data technology to reveal the social and political implications of big data’s pervasiveness. The project is collaboration with Intel Research and Emily Aiken (The Story Studio), Juliette Powell and Quid.com.
In our interview with, Gurman and Berlow, we discuss what it means to democratize data and why this issue is truly a matter of life and death. Gurman and Berlow also explain their pioneering crowd-mapping technology and MAPPR app, which helps to explore the interconnectedness of expert ideas on democracy and data.
Can you share your inspiration for We The Data?
David Gurman: The project was inspired by a conversation with Brandon Barnett and John Sherry at Intel’s Innovation Lab. The lab was trying to understand a different approach to business where social problems are the impetus for innovation and market growth rather than the more traditional model, where we “create” markets around the technology and not the problem the technology addresses. Together we honed in on the fundamental question we wanted to address and ultimately came to: How do we Democratize Data? We are not talking just about Big Data, we mean Vibrant Data, Data that has social implications in it’s generation and use.
Eric L. Berlow: As David mentioned, the initial impetus for us was work that Intel’s research group (that includes anthropologists and ethnographers) has been doing on the social and economic implications of this new world of data we live in. We are more and more aware that our every click, text, and tweet, our every purchase or inquiry, is being tracked… and large corporations are “battling for control” of those data because they are incredibly valuable. The sad thing is that often the most “value” we get from our data is more targeted marketing. The vast majority of Big Data analytics is focused on how to sell us more stuff. Is that really the best we can do? We’re interested in understanding how individuals can retain more control over their personal data, collaboratively discover its value, and directly benefit from it.
One area where Intel’s research group has really demonstrated thought leadership is in the idea that solving social problems that don’t have obvious immediate economic benefit actually can catalyze new business ecosystems that didn’t exist before. For example, recent investment in affordable art spaces in Oakland has helped feed a new economic ecosystem that is revitalizing a wide variety of downtown businesses – including many completely unexpected ones.
A major goal of the project is to democratize data. What does the democratization of data mean to you? What makes it so urgent?
DG: Well, the question for me is how do we gain control of our own data trail so that it can be used for social good and not used against us. Everyday as we traverse our digitally augmented lives, we secrete an amazing amount of very personal data. That data for the most part is harvested and stored by our service providers in the best case, and by tyrannical governments in the worst. In the best of the worst cases our data is used to target advertising towards us, in the worst it is used to hunt people down for their political beliefs.
The goal for Data Democratization is to investigate fundamental qualities of applications, services, storage, analysis, security and reputation systems that insure that our data is our own and not used to harm us or others. It’s about making the data more accessible and legible while simultaneously protecting and respecting the privacy of the data’s author so that they may be visible when desired and anonymous when necessary.
It’s so urgent because in some instances we are literally talking life or death. And we are sitting at a tipping point where our digital lives and embodied lives are merging as one in the same. If we don’t begin to address this now we will quickly lose the opportunity to address it from the bottom up. This will become a top down mandate, a frontier we will have helped create but have no control over. This means the FCC and corporate bodies like Google and Facebook will tell us what we can do with the data we create in the US. And, in other countries ruled by tyrants, citizens will be tracked and hunted in some cases and information censored. This is an immediate issue of civil liberties and justice.
ELB: We are becoming more aware how our data are tracked. For the privileged it maybe just means putting up with a few more targeted ads. But many are not so lucky. If you are a citizen journalist documenting a violent event with your phone camera and sharing that with the world, it means that you are at risk of being imprisoned or killed. For others it might mean unfair or unethical discrimination or profiling that could deny them healthcare, insurance, a small business loan, etc. A big risk I see is that our new world of Big Data will create an even greater disparity between the have’s and the have not’s. So when we talk of Democratizing Data, it’s about how can we make our data work for us and not against us. How can we empower everyday people to avoid being harmed, and more directly benefit from, the data we collectively generate.
You all are pioneering crowd-mapped network structure of a complex problem. Can you explain why you chose this approach to understand collective perception of a problem?
DG: It is more like “Expert Mapped.” We harnessed the expert wisdom on an issue to give a birds eye view of the diverse challenges, approaches, thoughts and practices that are the perceived components of a problem. Sometimes specialization can act as blinders, we were surprised to find out how many “experts” were unaware of the work of others working in their own disciplines on the same problem! Knowing that experts have a deeply focused but maybe myopic view individually, but broad and diverse collectively, we decided to harness and organize the expert wisdom to help map the network structure.
We received a wide swath of interdisciplinary perspectives on a complex problem. This means that when Eric asserts the tractable aspects of a problem, he is organizing and reflecting what the experts in the relevant fields examining the problem have told him. This is a focused way of laying collective understanding of a problem on the table and identifying the relationship of one issue to the next. This gives participants a birds eye view on their peers ways of thinking. And, shows the common perception of how all the parts of the problem interrelate, helping or hurting other aspects of system.
ELB: So the big question is, how do we constrain the problem? They say that we humans are not very good at wrapping our brains around more than 7 plus or minus 2 things. How can we cull a list of 90 challenges down to 7 plus or minus 2, or less? One way to do that is to tally votes – what are the most “popular” challenges that were mentioned by more experts. But the most popular isn’t always the most impactful. So instead we asked experts to map the relationships among challenges – if one is solved, what are the collateral benefits (or costs) to others? Then from the network structure of that collective input, a few challenge areas consistently emerged that if solved could help solve many other problems but had few things helping them.
I agree with David about “expert-mapping” – Any successful crowd-sourcing effort is really “expert-sourcing” – the key thing is enabling individuals with unique expertise to add their perspective to a problem. In our case, we specifically reached out to an initial group of known experts to map their collective input, while at the same time leaving the process open for anyone to contribute. Two really cool aspects of this approach are that, first, potential catalysts for change emerge from collective input on what influences what (rather than what is most ‘popular’), and second, the picture can evolve with more input.
The MAPPR app allowed you to identify how if one challenge improved, it affected its likelihood to improve others or make them worse. Can you provide more details as to how exactly the tool works to discern these connections?
DG: It’s a pretty simple app. All problems have a lot of moving parts and understanding how those moving parts are influenced by each other really opens our minds to the possibilities of creative informed problem solving. The app gives you the opportunity to select a few parts of a problem that you know about. Then it puts those in a list on the left and asks you to compare those parts to all the other expert identified parts of the problem on the right. It allows you to identify a few key traits of the parts relationship to each other. It first asks, “is there a relationship between these parts?” and “which direction does it travel in?” From A to B or B to A or both directions.
Next, the app draws a directional line linking those issues. Then it asks, “Well, what happens to those parts if one gets better? Does the other get better or worse?” Is it a positive relationship, meaning that they help one another, or is it negative in the sense that as one grows the other retracts? The app then renders this into the drawing indicating the nature of the relationship between the parts. Over the course of this process experts cycle through all the moving parts identifying the links between all the parts of the system and voila, we are left with map of their collective understanding of the problems structure.
ELB: The MAPPR app is part of an emerging form of “cartography” that is mapping the interconnectedness of things (and in this case ideas). It’s a tool to help create a “collective nervous system.” In the network images on VibrantData.org each node is a challenge identified by a human (expert) brain, and then each link is at least one human brain saying, “If this challenge improves I think it has a really strong, direct influence on this other challenge.” The idea is simple, but implementing it is not so simple because both the ‘nodes’ and the ‘links’ in the network are fuzzy concepts. In this app we’re taking advantage of what human brains excel at. Algorithms are great at crunching data and identifying similarity (e.g., people who like this also like that), but humans are way better at grasping subtle context and implicit causality (e.g., this influences that).
Another major goal of your project is to involve a broad audience outside of the world of data science. What challenges have you run into when trying to make visualized data accessible to audiences that may not even know what the term “big data” means? How have you tried to work around them?
DG: Oof! You hit the nail on the head. Trying to explain this endeavor is hard. It is barely graspable for technologists, scientists and data junkies! There are many reasons for this. We are building an approach, a platform for complex problem understanding. It might aid in solving, but really the onus of the “solving” is in our collective hands. This approach equips us however, with an organized view of expert wisdom and gives us a few waypoints to move towards. Namely, it identifies high impact tractable aspects of the problem. So, that is the approach, but not the problem itself.
Explaining the approach is challenging enough but “Democratizing Data” – ouch! – that is an entirely different animal. Mostly because many people who are not “in the field” are not even aware of the concept that they are data generators or that Democracy and Data might have any relationship. I think however that when we discuss examples like Tunisians using Twitter, Facebook and YouTube to organize and share ideas, suddenly it makes sense that there is a lifecycle of data exchange that spills from our virtual lives into our daily lives and has massive potential to aid in the fight for democracy and human rights. And highlighted is that Tunisians driving this were not technologists per se but engaged people leveraging accessible technology for their communication needs. So, our approach has been much like the Tunisians, use the accessible technologies around us to move away from just data visualization to data story telling, asking: What does the data tell us and why?
We use art in the service of the narrative emerging from the data because it holds peoples attention and communicates on poetic, intellectual, and relatable levels. We collaborate with the Emily Aiken from the Story Studio to take the data narrative and crafting it into a legible meaningful story. We work with Quid to help analyze and visualize the data itself. Quid’s data visualization platform allows us to make pretty pictures. Then, I assemble Eric’s analysis, Emily’s story and Quid’s visualization into a consumable package sprinkling illustration, motion graphics, video, interactive web components and other rich media. Really our approach has been to get interdisciplinary creatives together to present the story that emerges from the data in a way that we all can take something from.
ELB: We had NO idea at the start how difficult this was going to be to communicate. As David mentioned earlier, many of the experts we talk with were often so focused on their own field that they were not immediately aware how it fit within a broader problem. So the first challenge has been just outlining how AirBnB and the Arab Spring are in some ways part of the same “Vibrant Data Ecosystem.” Micro-entrepreneurship – e.g., renting my car, my spare bedroom, or even my advice to strangers – has exploded because everyday people can more easily share data, see the big picture, and convert that knowledge into real value. At the same time, citizens toppling a dictator were helped not just by the ability to communicate through social networks, but by the ability to see the trends of their movement in near real time. It was like data analytics and visualization for the masses helped individuals see themselves in context – to see that they were part of something much bigger than they imagined which gave the courage to act.
Both Emily Aiken and Juliette Powell have been instrumental in helping engage with a broader audience. So at this stage if we can just communicate the bigger problem (and the central challenges that need solving) to a wide audience, then I think we will have been successful. The next stage is to catalyze creative action, so stay tuned for that.
We always end with: How would you like to see the field of data science evolve over the next few years?
ELB: Data Science is already transforming how Science itself is being conducted, so I don’t want to speculate too much on that because I’m sure I’d be wrong. Data used to be rare and hard to gather, the process of scientific inquiry focused on first identifying a clear question and then gathering the data needed to address it. That process is being flipped on it’s head, where now we gather tons of data and mine it for interesting questions we never considered. When you consider Network Science, there are conceptual similarities with Systems Theory and Complexity Theory from the 1970’s and 1980’s, but the big difference is that now we have 2 things – reams of Empirical Data and Cheap Computational Power – that change how we think about systems that are orders of magnitude more complex than were ever examined before. It is even changing the nature of how we think about prediction, causality, and understanding – from a more mechanistic, engineering perspective to one based on probability and statistical relationships.
Some big questions in network science we critically need to advance are: How can we better infer function (e.g., the ‘importance’ of a given node) from network structure alone? How can we more rigorously test the consequences of our assumptions or of inherent uncertainties in the data?