Companies continue to be excited about the possibilities of big data, including how data on their customers might reveal new patterns and insights. Although a lot of attention has focused on the benefits of big data, less attention has been paid to the new ethical complications big data presents. An upcoming book, Ethics of Big Data: Balancing Risk and Innovation (O’Reilly), will attempt to address these ethical issues. Howard Wen at O’Reilly Radar does a good job of highlighting some of the most prominent ethical issues which the book is expected to cover. The primary ethical issues involve the privacy expectations of those from whom the data is collected.
Although the issue of online privacy is not new, the debate has largely focused on the personal privacy concerns of individual consumers — rather than professionals served by specialty publishers. For example, the retention of smart phone location data or the practice of online tracking has repeatedly been the target of concerned consumers At a recent conference on internet privacy, social networks, and data aggregation, it was clear that many professionals are only just realizing some of the data collection practices currently in use (e.g., email parsing for ad generation, search query collection, etc.)
As professionals and their clients become increasingly aware of how data may be collected while professionals perform research and other tasks for their clients, publishers serving those professionals should consider the ways in which professionals are different from individual consumers and how those differences might shape the privacy debate. Two such factors include a professional’s duty to protect the privacy of clients and how data might be anonymized.
Unlike individual consumers, professionals are restricted in how they may be able to respond to these new privacy concerns. In the privacy debate involving individual consumers, it has been suggested that consumers should simply accept a world with less privacy or view privacy as a commodity that is traded in exchange for free services like social networks or web search. Many professionals, however, have a specific duty requiring them to protect the privacy of client or patient information. Professional licensing requirements, statutes, or professional codes of conduct may, therefore, not allow professionals to react in the same manner as individual consumers.
In the past, one way to balance the needs of data aggregators and privacy of individuals was to “anonymize” or “de-identify” collected data (i.e., removing parts of the data that directly identified the individual). As big data tools increase the ability to find patterns in data, however, the ability to re-identify the source of data also increases. The most notable example of this was the 1997 re-identification of Massachusetts Governor William Weld’s private medical history from publicly available insurance data, which had been stripped of direct identifiers. Some features on professional research platforms may inadvertently make this identification problem worse. For example, features that allow users to organize their work according to client present the potential to link data to a specific client.
There are no easy answers when trying to strike the right balance between user privacy expectations and the benefits of being a data-driven organization in a big data world. However, considering the unique privacy needs of professionals will allow publishers to be proactive, design better software, and work with professional associations, to provide clarity for professionals and publishers alike.