From Space to Story in Data Journalism

1 week ago 17

Almost 25 years ago, The Washington Post reported on the first picture delivered by the brand-new Ikonos satellite. It was the first commercial imaging satellite capable of acquiring data that rivaled the resolution of spy satellites. Curiously, the reporter, D’Vera Cohn, failed to mention that one of the industries that would be changed by satellite imagery would be the news business itself.

The launch of Ikonos was one of a handful of developments that allowed newsrooms to expand from reporting on rocket launches and satellite hardware, to using remote sensing data as an essential tool to help tell stories. A wide variety of satellite data are now used to provide context to the news, to document events, and as a tool for investigation.

Teams with expertise in a range of fields – investigative reporting, writing, design, programming, and data analysis – are conducting the most novel and impactful data journalism.

A handful of factors combined with the advent of commercial high-resolution data to help make remote sensing a resource for journalists. In the early 2000s, data from government research satellites became widely available for no cost. This trend culminated in 2008 when the entire archive of Landsat data, which once cost thousands of dollars per image, was released for free.

At the same time, advances in computers enabled rapid processing and storage of large datasets. Google’s Earth Engine and other cloud computing services allow types of analysis, especially of time series, that once required supercomputers. Finally, an ecosystem of free and open source software has evolved to supplement the boutique commercial applications that were once required to read the (sometimes esoteric) formats used to store and distribute remote sensing data.

Instead of reporting on a scientist’s research or claims made by an intelligence agency, reporters could now tell their own stories with this data.

Images for Investigations

To me, the most exciting use of satellite imagery in journalism is for investigative reporting – data as a research tool, used to make discoveries and draw inferences. One early and innovative example came from Reveal News in the story Who is the Wet Prince of Bel Air? Here are the Likely Culprits.

The reporters – Michael Corey and Lance Williams – used a combination of techniques to identify the largest residential users of water in Los Angeles during the California drought of the mid 2010s. (State water agencies released a list of their largest water users, but could not share names or addresses.)

This map shows the largest residential consumers of water in Los Angeles – identified through the analysis of data from NAIP and Landsat. The map itself, however, is an example of the use of imagery as context. Image from the story: Who is the Wet Prince of Bel Air? Here are the Likely Culprits, by Michael Corey and Lance Williams for Reveal News. Image: Screenshot, map data from Google, City of Los Angeles, Los Angeles County, and the US Census Bureau

A measure of vegetation health called the Normalized Difference Vegetation Index (NDVI) helped identify properties in Los Angeles with large expanses of lush greenery. The vegetation measurements were derived from National Agriculture Imagery Program (NAIP) data, a free source of high-resolution aerial and satellite imagery refreshed every few years. This was combined with estimates of soil moisture from Landsat data, which is lower resolution than NAIP but provides information in additional wavelengths. The combined datasets gave more reliable estimates of water use than either technique used alone. To reduce the uncertainty further, Reveal even looked at the proportion of grass, trees, and shrubs on each property.

The result? A list of locations, each annually consuming millions of gallons of water, and an investigation by the Los Angeles City Council. An example of investigative journalism having an impact on the actions of local government.

Instead of reporting on a scientist’s research or claims made by an intelligence agency, reporters could now tell their own stories with this data.

A primary use of satellite data is to be able to monitor inaccessible locations. A great example of this is Bellingcat’s efforts to track illicit shipping of grain from the port of Sevastopol in occupied Crimea, Ukraine. In addition to being in a war zone, ships docking in Sevastopol often turn off their Automatic Identification System (AIS) transponder – effectively hiding their location. By obscuring their movements, ships can evade sanctions on exports from Sevastopol and transport stolen grain.

As with Reveal, the Bellingcat team combined multiple types of data to track hidden activity for the story Grain Trail: Tracking Russia’s Ghost Ships with Satellite Imagery. They used commercial high- and very-high resolution optical PlanetScope and SkySat imagery from Planet, plus open access medium-resolution Synthetic Aperture Radar (SAR) data from Sentinel-1. The Planet imagery revealed a ship docked at the Avlita grain terminal on more than 100 days in the year following the Russian invasion of Ukraine, despite incomplete coverage and frequent cloudiness.

Sentinel-1 SAR data analyzed with the Ship Detection Tool (a machine learning algorithm run on Google Earth Engine) determined there was a ship present at the terminal on more than two-dozen additional days. SAR can penetrate clouds, but is not available as frequently as Planet’s optical imagery, so even the combined dataset is likely an undercount.

Bellingcat reporters augmented the satellite data with photographs of the Avlita grain terminal in Sevastopol, and the Bosphorus Strait that links the Black Sea with the Mediterranean. This “ground truth” information helped the researchers identify and track the individual ships spotted in Crimea. The combined datasets reveal the larger scope of illegal grain shipments in a way that is more comprehensive than any of the techniques alone.

Grid of images showing ships docked next to the Avlita grain terminal in Sevastopol, a port city in Russian occupied Crimea, Ukraine. Image: Screenshot, Bellingcat

Like Bellingcat, The New York Times used a mix of ground-based evidence, satellite data, and machine learning to monitor illicit activity. But instead of monitoring the motions of ships through time, the Times staff mapped unregistered airstrips across the Brazilian Amazon. They then analyzed additional satellite data to document illegal mining that occurred near the airstrips, and tracked aircraft delivering supplies.

Flight paths are indicated on a high-res satellite image showing individual airstrips, complementing the overview provided by the map. Image, Screenshot from the story: The Illegal Airstrips Bringing Toxic Mining to Brazil’s Indigenous Land, by Manuela Andreoni, Blacki Migliozzi, Pablo Robles, and Denise Lu. New York Times. August 2, 2022

Another example of researchers using machine learning and satellite data to detect illegal activity is Myanmar’s Poisoned Mountains by Global Witness. Since they’re advocates and not journalists they don’t quite fit, but I think the story of the growth of illegal rare earth mines along Myanmar’s border with China is one worth reading.

One of the more creative uses of satellite data I’ve seen is an analysis of the flight of the Chinese surveillance balloon that passed over Canada and the United States in early 2023. The story started with a machine learning approach similar to those I’ve already described, which was used to locate the balloon over North America and then track it back to Hainan Island, China. But that left an outstanding question – was the path of the balloon driven solely by wind currents? Or was it being actively guided? With no known source of propulsion, the only way to steer the balloon would be to adjust its altitude until it was carried along by favorable winds.

Tracking the Chinese Balloon From Space, by Muyi Xiao, Ishaan Jhaveri, Eleanor Lutz, Christoph Koettl, and Julian E. Barnes. Image: Screenshot, The New York Times. March 20, 2023

The Times’s Visual Investigations team took advantage of a quirk present in most satellite imagery – each color is collected at a slightly different time – to determine the balloon’s altitude. (You may have noticed rainbow planes while browsing Google Earth or similar satellite-driven map. The phenomenon is similar, except there’s additional spacing between each color due to an aircraft’s high speed.) Essentially, by knowing the speed and altitude of the satellite, and the elapsed time between each picture, they could estimate the balloon’s altitude with trigonometry. They concluded the balloon was, in fact, being guided – at least over some of its journey.

ProPublica is well known for their deep dives into American politics, but they also report on a wide range of environmental issues, often with the help of remote sensing data. Their series on locations at risk for future Ebola outbreaks combined investigative reporting with original scientific research. The articles uncovered how the fragmentation of forests around networks of villages and towns in Equatorial Africa correlated with known outbreaks of Ebola, and identified places where the disease may next spill over from wildlife to humans.

Map of recent deforestation and forest edges in Nigeria. These two characteristics were affiliated with many of the known Ebola outbreaks. Image, Screenshot from the story: How We Used Machine Learning to Investigate Where Ebola May Strike, by Caroline Chen, Al Shaw, and Irena Hwang. ProPublica. August 8, 2023

The series combined satellite data – long-term records of changing global forest cover and settlement maps – with pattern-finding algorithms, calculations of forest fragmentation, cloud computing on Google Earth Engine, epidemiological models, consultation with scientists, and interviews with the people of Meliandou, Guinea, who survived the worst Ebola outbreak in history. Their conclusion is not just a warning for at-risk communities, but also a set of recommendations to reduce the likelihood of future outbreaks.

Most of my examples have shown reporting in far-flung locales (at least from my perspective in San Francisco’s tech industry), which is one of the primary strengths of satellite data. The data journalists at texty.org.ua, however, had to deal directly with tragedy and trauma when Russia invaded Ukraine in early 2022. They responded with some of the most detailed reporting on the impact of the war I’ve seen, despite working during blackouts and while sheltering from air raids.

Ukrainian reporters used the shape of shell craters in an irrigated field to determine the origin of the artillery fire. Image, Screenshot from the story: From Where and How is Russia Shelling Mykolaiv? by Denis Gubashov. Texty.org.ua. August 31, 2022

Texty used multiple types of data to cover the war – including high resolution commercial imagery, night lights data, and NASA fire locations. Combined, the datasets give civilians whose lives have been upended by the invasion a means to investigate and respond to the tragedy in Ukraine that has been forced upon them. The stories reflect their interests and priorities.

This series of maps shows cultivation of wheat (top), sunflower (middle), and maize (bottom) from 2021–23 in both government controlled and Russian-occupied Ukraine. The data were derived by OneSoil from a combination of satellite and in-situ data. Image, Screenshot from the story: Harvest from the Occupied Territories, by Yevhenia Drozdova and Nadia Kelm. Texty.org.ua. November 22, 2023

Approaches for Using Satellite Data in the Newsroom

The use of satellite data is rapidly increasing in journalism, a trend fueled by growing availability, higher quality, and the development of more usable analysis tools. What does it take to successfully use this data to tell stories in a newsroom, and develop innovative reporting?

Teamwork: It’s difficult for a single reporter to have the wide range of skills necessary to fully exploit the potential of satellite data. Teams with expertise in a range of fields – investigative reporting, writing, design, programming, and data analysis – are conducting the most novel and impactful data journalism.

Over the past 10 years satellite imagery has become an important component of data journalism. In the next 10, it will likely evolve further.

Data literacy: Satellite data comes in many forms, suitable for a wide variety of applications. Knowing what data is available, and the strengths and weaknesses of each type, is essential for using it effectively.

Outside experts: The field of remote sensing has thrived for over 50 years. In that time scientists and technicians in government, academia, and industry have developed techniques to derive insights from data. They’re an invaluable resource for both background information and innovative new ideas.

Local knowledge: Data collected from a few hundred miles above the Earth’s surface is often limited when used in isolation. It is far more reliable when combined with in-situ data, augmented by on-the ground reporting, and (perhaps most importantly) informed by the perspective of the people who live in the areas being imaged.

Over the past 10 years satellite imagery has become an important component of data journalism. In the next 10, it will likely evolve further, from a tool used primarily to illustrate stories to one that is an integral part of research and  investigative reporting. I’m excited to see how reporters develop innovative uses of existing datasets, and explore new types of data.

This story was originally published by Nightingale, the Journal of the Data Visualization Society. This is an extract from a larger piece, which is reprinted with permission. You can read the original here.


Robert Simmon is a designer and visualizer renowned for his work in cartography and science communication. With decades of experience at Planet and the NASA Earth Observatory, he transforms satellite data into captivating imagery. His work has appeared on the front page of The New York Times and the cover of National Geographic. 
Read Entire Article