Our article “Refractive datasets as a sensemaking methodology in closed data ecosystems” is already published in Big Data & Society. This is the first paper from a very exciting collaboration with the Search Prompt Integrity & Learning Lab (SPILL), founded by Francesca Tripodi (University of North Carolina at Chapel Hill). The research was led by Anna Beers and conducted with an incredible team: Viviane Ito, Agustin Orozco, Patrick Gildersleve, Francesca Tripodi, and myself. In particular, we demonstrate how platforms such as Wikipedia and Google Trends can be leveraged to generate refractive datasets, offering new ways to understand the dynamics of closed data platforms.

Abstract:
As digital platforms restrict their APIs, researchers face diminishing options for studying social phenomena in digital environments. During what has been called the post-API era, researchers have found themselves looking for reliable data sources in an unreliable and frequently changing platform data ecosystem. In this context, we propose analyzing refractive datasets as a methodology for researchers to understand the dynamics of closed data platforms. Refractive datasets come from platforms with relatively more open data policies, and their analysis sheds light on platforms with more restrictive data policies. Like a prism, refractive datasets reflect but also transform data-based phenomena unfolding on closed platforms. Using refractive datasets from Wikipedia and Google Trends, we present three studies to demonstrate our methodology. We first show how refractive data from Wikipedia’s multiple language editions can be used to understand a fractured global platform ecosystem in a case study of hydroxychloroquine, a purported COVID-19 medicine. Second, we use Google Trends to show how similar refractive analyses can be used to understand information lost to platform deletion, in a profile of an online panic over the drug brand Galaxy Gas. Finally, we show how Wikipedia data can be used as a grounding point for a refractive analysis of how new generative algorithms reproduce and distort data across the social web. We discuss how refractive datasets can be a way for researchers to “sensemake” in increasingly opaque big data environments, enabling interpretivist analyses which aim to generate new hypotheses rather than verify existing claims.