Breaking the ice: Addressing data barriers in Polar research

08 January 2025

Back to blog

Dr Scott Hosking is leading the charge when it comes to using Artificial Intelligence (AI) to tackle some of the biggest environmental challenges of our time. As head of British Antarctic Survey’s AI Lab and co-director of the Turing Research and Innovation Cluster in Digital Twins at The Alan Turing Institute, the climate-scientist-turned-data-expert leads a multi-disciplinary team of scientists and engineers developing cutting-edge tools to help predict how our planet is changing.

A key area of Dr Hosking’s work is building ‘digital twins’ – highly detailed simulations of real-world environments that use AI and machine learning to help create digital versions of the physical world that can help to test future scenarios and forecast changes. Drawing on data from satellites, drones, radars, ocean floats, aircraft, robots, underwater vehicles and a range of other sources, his team is currently working on a digital twin of the Polar regions.

Dr Hosking is also leading a pioneering collaboration to address the impact of climate change in the Arctic region using an AI-based sea ice forecasting system. IceNet draws on satellite data and weather observations to predict sea ice concentrations across the Arctic with remarkable precision, providing daily forecasts up to six months ahead. It’s proving to be a vital tool, helping indigenous Arctic communities and global policy organisations prepare for rapidly changing conditions, as well as helping to track and protect endangered wildlife including polar bears and Arctic foxes, and to reduce carbon emissions for shipping operations.

Working with WWF and Canadian partners, Dr Hosking’s team has developed forecasts that have been used to help government researchers plan large-scale polar bear surveys, and to develop novel migration early-warning systems for endangered caribou. By blending decades of real-world observations and climate simulations, IceNet is already outperforming traditional physics based numeral models in potentially game-changing ways, according to Dr Hosking.

“AI is different in that you show it so much information that it starts to forecast forward, based on data rather than physical understanding,” he says. “For a long time, physicists were sceptical that a data-driven approach would be better because you’re removing physics. I’m a physicist by training, but we’ve been surprised that AI can do a better job. It’s taking the weather community by storm and suddenly old traditional weather models are just being outpaced now.”

Drilling down into the data

The datasets that Dr Hosking and his team draw on for their work are all open access, ranging from in situ British Antarctic Survey sensors to satellite data from Copernicus – the EU space programme’s observation component – as well as satellite systems managed by the European Space Agency and the European Centre for Medium-Range Weather. As a Natural Environment Research Council (NERC) organisation that’s part of UK Research and Innovation, British Antarctic Survey follows strict data policies, frameworks and protocols which aim to preserve and manage all environmental data of long-term value. The data outputs underpinning all Dr Hosking’s research is published and shared via British Antarctic Survey’s Polar Data Centre. The aim for all projects, Dr Hosking says, is to be as open and transparent as possible when it comes to sharing data that has been gathered from a myriad different sources.

“In a way, we are flooded with data,” says Dr Hosking. “We can access almost all the data we need, but it’s just in different places around the world in different formats, updated at different times and often fragmented and imperfect.”

And herein lies one of the biggest challenges facing Dr Hosking and his colleagues, who spend much of their time working out how to combine this fragmented data in a meaningful way. The challenge is exacerbated by technical hurdles, with data held in different formats and on different servers. This means that data scientists are spending more than half of their time accessing, downloading and cleaning data before they can even start their research, Dr Hosking says.

“You’d think that, with the millions we spend on satellites, we would have good tools to access the data, but we just don’t,” says Dr Hosking. “We’ve got more data than we have the ability to process. We work with terabytes of data at a time. We have access to petabytes of data but we have no means to ingest all that data.”

The biggest data barrier for cutting-edge environmental research like his, Dr Hosking says, is building the digital infrastructure and the scaffolding that connects everything together.

Removing barriers

To address these obstacles, Dr Hosking’s team at British Antarctic Survey and The Alan Turing Institute includes Research Software Engineers (RSEs). These positions have been set up to build the underpinning digital tools and infrastructure to support open science. However, the resources for software development and data management are often squeezed into research budgets as an afterthought and are not allocated the time or budget needed, according to Dr Hosking.

Another related challenge is that scientists often work in silos, spending weeks developing software to download and clean data that is then discarded once a paper is published – leading to wasted resources and duplicated efforts.

“To do groundbreaking, real-world science, we cannot sit in our silos,” says Dr Hosking, whose team shares open-source software code for their projects via the platform GitHub.

“We need to be far more joined up,” he says. “For me, the biggest barrier is building sustainable research quality software and building in the legacy so that teams can work together and they don’t have to reinvent the wheel again and again.”

The government could help to address these barriers, Dr Hosking says, by ringfencing money in future research contracts to make sure data management is prioritised. Making funding available for Research Software Engineers to build the digital infrastructure that sits on top of the data and to provide the tools for scientists to do the research, for example, would be a huge step forward.

“For me, it’s less about where the data sits because climate data is very open and available,” says Dr Hosking. “It’s all about the digital infrastructure and the software. If I walk into a normal library with books, everything’s there in front of me ready to read. I want to do the same thing through my web interface or even through my computer terminal. I want to be able to log in and have access to everything through simple lines of code.”

Written by Vicky Anning

Read the Access to Data Report