Lessons Learned: 10 Common Mistakes in Data Journalism

3 weeks ago 22

Modern data journalism around the world has gripped audiences — and often moved policy — with compelling, understandable evidence of climate change impacts, bad government policies, and racial or gender discrimination. But there are gaps.

GIJN asked speakers and attendees in the NICAR hallways for the data journalism gaps they see, and for under-covered topic areas and under-used skills that newsrooms can address.

Traditional investigative journalism has blind spots — for instance, the general inability of watchdog reporters to trace false robocalls to their scammer origins, and a reluctance to tackle rights abuse issues within religions. And there are many blind spots for data reporting as well in what it covers, how it evaluates data, and how it tells stories.

At the annual NICAR summit this year, GIJN asked speakers and attendees in the NICAR hallways for the data journalism gaps they see, and for under-covered topic areas and under-used skills that newsrooms can address. We asked the same question of trusted sources in the Global South.

1. Lack of Connection Between Data Analysis and the Narrative Story

“Data reporters tend to think their studies are interesting, but they’re not — people are.” — Sarah Cohen, Knight Chair in Journalism at the Walter Cronkite School of Journalism

“We are not doing nearly enough narratives — we are seeing data used as an end in itself, rather than a jumping-off point for strong journalism,” said Sarah Cohen, Knight Chair in Journalism at the Walter Cronkite School of Journalism. “I’m a judge [for data journalism awards,] and I can tell you we throw out 90% of the submissions because they are great data exercises, but not great journalism. We get to do both — but if we forget to do the part that we’re good at as journalists, then what’s the point?”

MaryJo Webster, data editor at the Minneapolis Star Tribune, agreed:  “Data should be the spine to the story, but too often it’s become the body. In fact, I think the term ‘data story’ should be shipped out. Young reporters need to think of stories as stories, and data as a source to make those narratives authoritative.” She added: “We used to spend three months on one paragraph — identifying the impactful data and humanizing it.”

Cohen agreed: “Data reporters tend to think their studies are interesting, but they’re not — people are. …The real blind spot I see among a lot of data journalists is just the reporting side in general.”

2. Poor Hyperlink Placement for Low-Vision Audiences

Data stories frequently invite readers to “click here” to see original data sources, or “read more” to find related stories in a series — or even to click on a URL. These stories unwittingly forget that blind or low-vision audiences, or anyone else that uses screen reader software to read the story in data form, cannot make sense of hyperlinks placed on text in this way.

“We should all be using a full descriptive phrase, instead of just saying ‘click here,’ because on a screen reader, that just reflects as a list,” explained Helina Selemon, an investigative reporter at New York Amsterdam News. “Place the link on words that actually describe what it links to. The media is also not offering news at lower reading levels for people with mental disabilities.”

best practice hyperlink on descriptive text

Avoid hyperlinking on phrases like “click here,” as this practice can make it difficult to parse data analysis by those with disabilities who use screen-reader software. Image: Screenshot, University of California IT blog

3. Failure to Double-Check Outliers in Official Data

“One of the aspects that worries me the most is when the data leads us to wrong conclusions,” said Argentina-based data journalism trainer Sandra Crucianelli, an investigative journalist at ICIJ and founder of Sololocal.Info. “The data may be incomplete, outdated, or even incorrectly loaded into the database.”

“When we analyze data, the numbers themselves can also lie. …That is why verification is essential.” — ICIJ investigative journalist Sandra Crucianelli

Crucianelli shared the real-world example of writing a campaign finance story, where an official database stated that one donor had contributed $1,000,000 to a political party. “The high value led us to mistrust the number,” she recalled. “We located the donor to ask him if he had actually made that donation. It turned out that he had only donated $100,000. When we consulted the person responsible for the database we were told ‘it was a loading error.’ Imagine if we had made a headline stating the million!”

Since checking every data row is unrealistic, she recommended that reporters focus on the extremes. “When we analyze data, the numbers themselves can also lie,” she noted. “That is why verification is essential. What has changed the most over time, or what has decreased the most?”

(This point was echoed in GIJN’s “hallways round-up” story from NICAR in 2023, which also noted the common error threat of blank rows in spreadsheets. See GIJN’s 10 Simple Data Errors That Can Ruin an Investigation.)

4. Missing Red Flags in Data Due to Lack of Beat Reporters

How many news outlets nowadays have the capacity for, say, a labor, local politics, or primary education beat reporter? In the past, these beat reporters often recognized important but complex stories — and subtle red flags — in data they received from their niche sector sources, and took this data to investigative or data editors, who would often then enlist the beat reporter to join a team project. Mike Reilley, founder of JournalistsToolbox.AI, said the global reduction in newsroom staffing has meant that numerous data-based investigations have been lost along with the loss of a layer of  beat reporters who can provide key context on arcane subjects.

“We need to get creative to begin to fill this gap — collaboration would certainly help,” said Reilley. He added that data editors should actively think about the data they might be overlooking in dormant beats, such as aviation or waste management or elder care.

5. Getting Stuck in the Climate Change Data Rut

“While it’s true that there is extensive reporting on the most evident consequences of climate change, such as heatwaves, floods, and droughts, it’s necessary to delve deeper beyond these surface-level aspects,” said Hassel Fallas, data analysis editor at La Data Cuenta, a Latin-American independent media outlet focusing on data journalism related to climate change and gender. “Outlets need to delve into the analysis of adaptation measures to climate change: the long-term strategies to mitigate and adapt.”

Fallas added: “By only addressing the most visible consequences of climate change, there’s a risk of overlooking the complexity of the issue and its various dimensions. For instance, climate change adaptation extends beyond environmental issues to encompass socio-economic, political, and cultural aspects.”

Chart, La Data Cuenta, cases of planned relocation in the context of risks, disasters and climate change

La Data Cuenta has used data to look at deeper impacts of climate change, like this chart of planned migration in numerous countries due to risks from climate change-related effects and natural disasters. Image: Screenshot, La Data Cuenta

6. Overlooking Labor Trafficking Data Stories

A handful of high-profile investigations into labor trafficking have captivated audiences in recent years, involving abusive conditions for migrants employed at places like foreign US military bases and global fast food chains in the Middle East. Labor trafficking often involves exploitation, harsh working conditions, travel restrictions, and even the confiscation of workers’ passports by labor contractors who work with global employers in foreign countries.

Andrew Lehren — former investigations editor at NBC News, who has worked on several of these stories — believes these investigations only represent the tip of the global iceberg of worker exploitation, and that there is important data to be found and analyzed.

“Our more recent story was about Western warehouses in the Middle East staffed by people who were trafficked — but there is no doubt that is not an exception,” said Lehren, who is now Director of Investigative Reporting at the CUNY Journalism School in New York City. “There are large corporations operating in different parts of the world that have huge demand for cheap labor. The labor supply firms they rely on have a huge incentive to cut corners, leading to the abuse of migrant workers from places like Pakistan, Sri Lanka, Nepal, and the Philippines.  These stories do take time and require collaboration — but they are out there.”

7. Neglecting Data Angles in Investigating Consumer Scams

“How do these scams work? What enables them? What gangs are behind them? What does the data tell us about the threat trends?” — Jeremy Caplan, CUNY Journalism School

One potentially data-rich topic that receives little journalistic attention involves identity theft operations and online scams. In addition to the sheer volume of texts, emails, and calls designed to trick people into disclosing their identity or financial details, experts say public service investigations are especially important in this area because many victims are embarrassed to report or admit to falling for the con.

“There are a number of scams involving identity theft, phishing, and getting people to give up their passport information and security codes that happen in organized ways,” said Jeremy Caplan, director of teaching and learning at the CUNY Journalism School. “We don’t see enough coverage. How do these scams work? What enables them? What gangs are behind them?What does the data tell us about the threat trends?”

He added: “If I send you an email to say: ‘Your teenage child deposited money into the wrong account at our bank — her birth date is X, and we want to make sure it goes into her proper account,’ that can be persuasive for a lot of people.”

8. Ignoring ‘Positive’ Outlier Data Stories

It is hard to think of traditional investigative stories that expose a positive outcome. By contrast, data journalism can readily reveal otherwise unknown, or unanticipated, positive policy outcomes — and these can have both a needed trust impact for audiences, and an accountability impact for opponents of those policies.

News outlets rarely reflect dramatic improvements in big metrics, such as the recent sharp decreases in extreme global poverty and infant mortality — which has fallen 59% in the past three decades, or encouraging increases in, say, female representation in many parliaments in Africa.

CUNY’s Caplan said newsrooms should assess whether they exhibit negativity-bias — and warned that audiences notice when the story balance between negative data and at least some positive data “is out of whack.”

He said there were good data lessons to learn from the book “Factfulness: Ten Reasons We’re Wrong About the World — and Why Things Are Better Than You Think,” by Swedish public health expert Hans Rosling, who challenged journalists to “convince yourself that things can be both better and bad.”

”Rosling did some great data experiments that showed positive trends over time — things like dramatically lower numbers of people living under a dollar a day,” Caplan recounted. “We have a tendency to think things are much worse than what the data shows them to be, and media plays a role. This is one of the main reasons people don’t want to consume news, because — while there is certainly a need for negative or critical news — audiences believe it’s too negative.”

When exposing disproportionate harm or discrimination by some institution in a major project, reporters often stumble upon a few positive outlier data points — some city or sector where harm has been prevented, or benefits achieved. These, Caplan said, should be explored in follow-up data stories — especially if reasons for the positive change were not anticipated by the policymakers.

9. Failure to Build Diverse Teams and Partnerships

“It’s essential to identify and select media partners based on the strengths needed for the investigation: data analysis, visualization, storytelling, podcast creation, graphic design.” — Hassel Fallas, data analysis editor at La Data Cuenta

“A significant blind spot is the lack of vision in many newsrooms to have interdisciplinary teams for data journalism,” said La Data Cuenta’s Fallas. “Many media outlets have a ‘data unit,’ which is essentially one person expected to do everything: gather data, analyze it, visualize it, report on it, write the story, and even make a viral TikTok. This doesn’t work. Data journalism projects should be comprehensive, involving reporters, data analysts, data visualizers, social media experts, and editors from the outset.”

Fallas said this also applies to external collaboration — including established partnerships with civil society organizations, academia, and experts, and media partnerships for regional projects with outlets that offer different skill sets and audiences from your own.

“Rather than choosing all partners for an investigation with the same profile, it’s essential to identify and select media partners based on the strengths needed for the investigation: data analysis, visualization, storytelling, podcast creation, graphic design,” she said.

10. Lack of Advanced Data Analysis and AI Skills in Many Newsrooms

“Spreadsheet management, whether in Excel or Google Sheets, is fundamental for data journalism,” noted Fallas. “However, many reporters stick to the basics of these tools: adding, subtracting, identifying who’s up or down. Analyses could go much further than these elementary functions. It’s crucial for more journalists to undergo training in programming languages like R and Python. Understanding how to apply machine learning algorithms or statistics, such as linear regression or cluster analysis, is fundamental for conducting more complex analyses with data. These tools allow us to better unravel patterns and find answers of greater public interest.”

Fallas said reporters need to learn generative AI tools “critically and cleverly” as assistants to solve some of the basic, time-consuming data tasks that might otherwise stall an investigation.

“For example, a few weeks ago in Puerto Rico, I conducted a workshop for colleagues where we explored both the benefits and limitations of using ChatGPT to address challenges in data journalism,” she recalled. “We discussed clear ways to create ‘prompts.’ We emphasized a reflective attitude, rather than simply ‘copying and pasting.’ Generative AI can serve as a valuable tool for learning programming languages that allow you to delve deeper.”


Rowan Philp, senior reporter, GIJNRowan Philp is GIJN’s senior reporter. He was formerly chief reporter for South Africa’s Sunday Times. As a foreign correspondent, he has reported on news, politics, corruption and conflict from more than two dozen countries around the world, and has also served as an assignments editor for newsrooms in the UK, US and Africa.

Read Entire Article