Generating synthetic off-grid observations for multimodal sea-ice forecasting models
Background
Around 2025, several research efforts were investigating machine-learning-based approaches to environmental forecasting that could complement or replace parts of traditional physics-based numerical climate and weather models. Systems such as Aardvark Weather and IceNet demonstrated that forecasting models could be trained directly from observational data rather than relying on complex and computationally expensive physics-based models. A project at the Alan Turing Institute aimed to develop a multimodal, fully machine-learning-based forecasting model for sea ice, inspired by the architectural ideas behind Aardvark while building on the progress already achieved by IceNet for sea-ice concentration prediction.Challenge
One of the goals of the new pipeline was the ability to ingest heterogeneous data sources, including not only gridded climate fields but also irregular off-grid observations such as stations or drifting buoys, which introduce additional spatial and temporal complexity. Rather than immediately integrating heterogeneous observational datasets, the team first explored whether the modelling pipeline could ingest such data structures at all.Contribution
As I worked on this project in its initial phases (as part of a one-year research software engineering placement with the Research Engineering Group at the Alan Turing Institute), one of my contributions was to generate synthetic off-grid observations from gridded climate anomaly fields, simulating station and drifting-buoy measurements. This allowed the team to develop and test a pipeline able to ingest irregular observational data before integrating heterogeneous real-world sources, which would involve additional data engineering challenges.Data
The first set of synthetic off-grid data was generated from gridded NetCDF files containing mean sea level pressure anomalies. This variable was chosen because recent ablation studies in IceNet had identified it as a useful predictor for sea-ice concentration, making it a sensible variable with which to test the pipeline on synthetic off-grid data.Method
Two scenarios were simulated:- Fixed stations
A set of random latitude-longitude coordinates was sampled within the spatial domain of the original grid. For each timestamp, the value of the anomaly field at each synthetic station location was obtained by linearly interpolating from the surrounding grid cells. Repeating this operation across all timestamps produced a non-gridded time series for each station.
Example of synthetic fixed stations sampling mean sea level pressure anomalies.
This visualisation was used only as a sanity check that the transformed data had the expected spatial distribution. - Drifting buoys
To simulate moving sensors, each buoy was assigned an initial random location within the spatial domain and an initial random direction of travel. At each timestamp, the buoy's position was updated by combining:- a fixed movement step in the current direction,
- a small random change in heading, and
- an additional random spatial perturbation (jitter).
Example of synthetic drifting buoys sampling the same anomaly field.
Again, the visualisation served only to check that the simulated trajectories and sampled values behaved as intended.
- Fixed stations
Credits
The generation of synthetic off-grid observations from gridded climate data presented on this page was developed by me as part of my work during a research software engineering placement with the Research Engineering Group at the Alan Turing Institute. This work took place within a broader collaborative effort involving several researchers, research software engineers and data scientists working on different aspects of an AI-based sea-ice forecasting pipeline as part of the Turing's Environment & Sustainability research area.
Check out other projects
Off the grid
Simulating stations and drifting buoys from gridded climate data for multimodal sea-ice forecasting
Variety-agnostic dependency parsing for Early Slavic
Building generic pre-modern Slavic dependency taggers



