Originally Published March 8, 2023
The phenomenon of big data is revolutionising various industries, and the media industry is one of the key sectors significantly affected at the global level. The internet is generating massive volumes of data, commonly referred to as big data. This data originates from multiple sources—social media interactions, online activities, financial transactions, and more. Our smart devices alone are capable of producing vast amounts of data, which must be broken down into meaningful information.
Journalism, as a profession, is centred around disseminating information to the public. Big data offers immense possibilities for journalists to read, analyse, and interpret numbers in a way that is accessible and understandable to the public. With effective storytelling techniques, journalists can simplify complex datasets and present them in layman’s language.
Understanding Data Journalism
Data journalism is a hybrid term, combining two broader concepts: ‘Data’ and ‘Journalism’. According to the Oxford Learner’s Dictionary, data refers to facts or information, especially when examined to discover new insights or make decisions. Journalism involves the gathering and dissemination of information. While these terms have been defined differently in academic literature, data journalism can broadly be summarised as the activity of obtaining, analysing, reporting, and writing news stories based on data.
Data journalism initiatives in India emerged around a decade ago. A few mainstream newspapers have since begun publishing data-driven stories. Digital-native organisations like IndiaSpend and Newslaundry are leading the way by actively promoting data journalism through their storytelling formats.
Although the concept of data journalism is relatively new, numbers have always played a significant role in news coverage, especially in sports, business, and election reporting. Academic studies also highlight that statistics often hold more rhetorical value than the factual information they convey. Opinions on this remain diverse, but in the era of big data, the role of numbers in journalism is redefined and revisited. Topics like education, public health, elections, and the environment have immense potential for data-driven storytelling.
Linking Data Journalism and Big Data
The relationship between data journalism and big data is both straightforward and complex. At a basic level, it involves finding a dataset, analysing it, and writing a story. However, it is more than that. Access to diverse datasets and deeper analysis can lead to richer, more informative, and engaging news stories. Big data refers to massive volumes of structured and unstructured data available across public and private domains.
Data journalism can be a transformative tool if journalists are willing to explore various sources, identify patterns and trends, and uncover hidden stories. Journalists who access, read, and analyse big data can uncover powerful stories that go beyond traditional narratives.
In 2001, Doug Laney conceptualised the “3Vs” of big data: Volume, Variety, and Velocity.
Big Data vs Open Data
It is important to distinguish between big data and open data. Big data refers to data sets that are vast in volume and may or may not be publicly accessible. Open data, a subset of big data, is freely available for public use. Over the past decade, many democratic governments have joined the open data movement, making government records accessible to citizens. In India, the government publishes datasets and data visualisations on its official open data platform: data.gov.in.
This growing domain of big and open data offers journalists a valuable opportunity to mine and analyse datasets. Initiatives like DataLeads—part of the Google News Initiative- have been promoting data literacy in Indian newsrooms through events such as “Data Dialogue,” launched in December 2022 across 20 Indian cities.
Opportunities and Challenges
While the opportunities are vast, data journalism comes with challenges. During my PhD research on data journalism in India, I conducted face-to-face interviews with several Indian data journalists. Many of them reported difficulties in finding datasets, requiring persistent follow-up with authorities to access data. Out of many challenges, outdated datasets, lack of data availability, and inaccessible formats (e.g., data stored in PDF or JPEG), which require conversion into machine-readable formats like XLS or CSV, are significantly mentioned by various reporters.
Despite these constraints, a considerable amount of public data is already available in accessible formats. For a novice reporter, this can be an excellent starting point. Writing data stories requires basic digital and data literacy skills that can be improved through free online courses and educational resources. Small newsrooms can organise training sessions for their reporters to build foundational data journalism skills.
I believe, data-driven approach may also lead to increased demand for public datasets at the local level. This, in turn, could push governments to be more transparent. Some Indian journalists have successfully combined the Right to Information (RTI) Act with data journalism, uncovering impactful stories that might otherwise remain hidden. Filing RTI applications has become a tool for extracting unreleased or concealed datasets.
Data literacy among journalists can bring more power to them!