This article describes the MDIO format and specification and examines five energy data applications, including wind resource assessment, seismic data management, data processing, machine learning, and seismic data visualization.

Abstract
MDIO offers a technical solution for storing and retrieving energy data in the cloud and on premises. As an open-source framework, it incorporates high-resolution, multi-dimensional arrays that accurately represent wind resources and seismic data for multiple applications. By utilising the Zarr format, MDIO ensures efficient chunked storage and parallel I/O operations, facilitating easy data interaction in diverse infrastructures. This paper covers MDIO’s application in renewable energy (wind simulations), predictive analytics, and seismic imaging and interpretation, aiming to provide a robust technical platform for researchers navigating the evolving energy landscape.

Introduction
The energy sector is deeply anchored in data, guiding exploration, production, and asset management strategies. Numerous data formats have emerged to support diverse scientific and business needs. NetCDF4 (Unidata, 2021) and HDF5 (The HDF Group, 1997-2023) are prominent in atmospheric sciences and climate modelling domains. SEG-Y stands out for seismic data storage in exploration geophysics, while academic circles have adopted formats such as SEP, SU, and Madagascar for seismic research. Additionally, commercial ventures have brought forth and later open-sourced formats such as Data Dictionary System (DDS, freedds.org), OpenVDS (Barre et al., 2022), and OpenZGY (OSDU, 2023). We’ve aimed to encapsulate big binary storage
formats but acknowledge the possibility of omissions and welcome reader insights on any overlooked formats for subsequent revisions.

This article details the MDIO format and specification and reviews five energy data applications: wind resource assessment, seismic data management, data processing, machine learning, and seismic data visualisation.

MDIO format
MDIO (Sansal et al., 2023) is a file format that simplifies the access of seismic and wind data through Zarr (Miles et al., 2023) arrays and JSON metadata. This format enables storing data and metadata within the same file structure, eliminating the need for a separate database. MDIO makes sharing and analysing data easier, enabling lossless and lossy data compression. Additionally, MDIO can transform irregular data into a regularised hypercube without taking up extra storage space. This is possible by integrating the Segyio library (Equinor, 2023) for parsing SEG-Y files and writing text and binary headers.

Furthermore, MDIO provides a standard API that can read, write, and perform tensor operations on data, regardless of its storage location, such as network file storage or cloud object stores. The library also includes converters for popular file formats such as SEG-Y and NetCDF4. Seismic and wind projects often have similar data patterns, including multiple datasets that share the same grid, coordinate reference system information and time/depth series of spatial data. Although the workflow depends on the specific project, MDIO can help to make it easier to manage and analyze data.

MDIO specification
In this section, we will provide a brief overview of the seismic data specification stored as MDIO. The MDIO format was developed as an extension of the Zarr protocol. Figure 1 illustrates the current specification (MDIO 0.4.2), which includes the following key features:

1. Global attributes: Includes information about the creation of the data set, details on the geometry, and global statistics such as minimum, maximum, root-mean-square (RMS), mean, and standard deviation values of samples.

Integrating energy datasets: the MDIO format
Figure 1 MDIO seismic data specification, v0.4.2. We show various access patterns for 3D seismic datasets.

Read the full article HERE .