TGS Articles & Insights

MDIO: Open-Source Format for Multidimensional Energy Data

First Published: The Leading Edge, July 2023 by Altay Sansal, Sribharath Kainkaryam, Ben Lasscock, and Alejandro Valenciano, TGS

Abstract

MDIO is a fully open-source data storage format that enables computational workflows for various high-dimensional energy data sets, including seismic data and wind models. Designed to be efficient and flexible, MDIO provides interoperable software infrastructure with existing energy data standards. It leverages an open-source format called Zarr to enable data usage in the cloud and on-premises file systems. An overview of the data model and schema for MDIO is provided, and an open-source Python library developed to work with MDIO data is demonstrated. We explain how MDIO supports different computational workflows and discuss applications for data management, seismic imaging, machine learning, wind resource assessment, and real-time seismic visualization. Overall, MDIO gives researchers, practitioners, and developers in the energy sector a standardized and open approach to managing and sharing multidimensional energy data.

Introduction

The energy industry relies heavily on data to make informed exploration, production, and asset management decisions. Different data formats facilitate scientific workflows depending on the business or application needs. SEG-Y (Society of Exploration Geophysicists, 2002) is a widely used data format for storing and sharing seismic data in the exploration geophysics industry. Academia introduced SEP (Claerbout, 1991), SU, and Madagascar (Fomel, 2013) formats and associated utility software to facilitate research in exploration seismic processing applications. Other formats for commercial seismic data workloads were developed and later open sourced, such as the Data Dictionary System (DDS), OpenVDS (Barré et al., 2022), and OpenZGY (OSDU, 2023). NetCDF4 (Unidata, 2021) and HDF5 (The HDF Group, 1997–2023) are popular formats in atmospheric and oceanic sciences, geophysics, and climate modeling. While we have endeavored to include popular binary storage formats, we acknowledge potential omissions and invite readers to suggest any overlooked alternatives for future updates.

This article explains the data model and schema for MDIO, a modern fully open-source energy data storage format. We review the input/output (I/O) access patterns for five applications: seismic data management, seismic data processing, machine learning, wind resource assessment, and seismic data visualization. We illustrate how the chunk size and data compression required by MDIO are essential for performance. We also review the setting of these parameters, with applications to both cloud object stores and local file systems. We highlight the performance considerations and potential cost savings of using data compression for storing data, with examples of defaults for seismic data applications. Based
Altay Sansal1, Sribharath Kainkaryam1, Ben Lasscock1, and Alejandro Valenciano1on tools from the PyData ecosystem, MDIO can integrate with existing libraries to process and analyze scientific data without reimplementing many algorithms.

Read the full article here.