Reading the dictionary from the beginning every time you need to find a word would not be an efficient use of your time unless you needed to spell "aardvark"! Unfortunately, this is similar to what a computer does when it needs to query data written in a sequential format. Modern cloud-native formats are revolutionizing the way data storage, processing and management are implemented, reducing work time and cutting expenses.

Multidimensional seismic data are usually big and require large amounts of storage in SEG-Y file format. The SEG-Y standard format has served the industry well but has become outdated, notating the seismic data in a sequential pattern. The SEG-Y format's sequential nature complicates the implementation of most modern data management, data processing, and interpretation workflows and makes migration to the cloud cost-prohibitive and inefficient.

To be efficient in the cloud, scientific software engineers had to rethink how to write, query, and export data. New cloud-native algorithms and data formatting solutions make it possible to quickly gather spatially relevant pieces from a multidimensional dataset and output the data for delivery and interpretation. These new formats also enable massive data parallelism and work well with managed services provided in the cloud.

Cloud-friendly Formats Set to Disrupt Energy Data Storage

Multidimensional Input/Output (MDIO) is a solution developed by TGS for quickly accessing and disseminating data on the cloud. This new data format and associated tools have been specifically designed to support storing and manipulating multidimensional datasets, including seismic, wind, and other energy data. It allows for innovative growth in various applications, including seismic and renewable energy solutions, and the algorithms' continued evolution to benefit other industry sectors.

The energy industry has extensively used multidimensional data for various aspects of exploration and analysis. Multidimensional data is utilized in seismic imaging, reservoir simulation, petrophysical analysis, fluid flow simulation, weather analysis, and many other applications. Much of this data has been stored in sequential formats built for hardware and network speeds of the past. This has resulted in making data analysis in the current computing ecosystem expensive.

How MDIO Works

MDIO is a format of chunked/bricked multidimensional data specifically designed to take advantage of cloud object storage. It was built on top of Zarr, a community project to develop specifications and software for storing large multidimensional arrays with applications from biology to satellite data. MDIO is process and thread-safe and allows fast structured and random I/O; it enables machine learning training and other HPC workloads with immediate access to metadata that is stored separately. MDIO leverages other open-source software components for cloud I/O and distributed computing, such as Dask, FSSpec, and Xarray.

Converting traditional SEG-Y formats to MDIO begins with ingesting all trace headers and then compressing them without any loss. All the trace data is then chunked/bricked for fast random access and can be compressed in a lossless or a lossy manner as decided by the user and data requirements. Requests for inlines, crosslines, depth/time slices, or headers become very fast because of the newly chunked data. As seen in Figure 1, these storage advantages dramatically increase the speed at which the data can be visualized and quality controlled, enabling tensor and image-based machine learning and massively parallel high-performance computing tasks.

Figure 1. Data visualization shows how bricked, and chunked seismic data can be used to speed visualization and perform machine learning at revolutionary speeds.

Benefits of MDIO

MDIO saves interpretation time, simplifies workflows, and minimizes storage space while streamlining data management pipelines. Since MDIO as a format is agnostic to the storage backend (filesystems, object stores, etc.), computational workflows run faster in the cloud than in on-premise datacenters.

As time is saved, so are storage and processing costs. MDIO allows for 20-30% less storage space requirements, with lossless compression, compared to SEG-Y, either on-prem or in the cloud. When lossy compression is acceptable, those storage savings could be increased to around 60-70%, and sometimes even touching 90%. The savings in storage space often correspond directly to cost savings.

The MDIO was built with flexibility in mind. Industries that need to quickly and efficiently store and process multidimensional data can benefit from the savings and efficiency afforded. Those with ambitions of employing machine learning in their workflows will drastically reduce the time spent managing data.

MDIO is made accessible to the broader energy exploration developer community through the Python language. Users include data managers, academics, data scientists, and software developers. They can use MDIO to source the data faster, making implementation in their workflows and software seamless.

TGS Using MDIO for Seismic and Wind Data

As energy data and insights specialists, TGS continues to impact data projects across the new energy sector. Several workflows for processing multidimensional seismic and wind data have been operational in the cloud-enabled by MDIO format. Petabytes of data have been ported to the cloud, proving MDIO's worth in managing large data sets for processing, machine learning applications, and client delivery.

Large amounts of seismic data have already been converted from their native SEG-Y format to MDIO. The tool kit used in MDIO format has addressed many issues with sequentially stored multidimensional data, allowing this data to be held in adjacent "cubes" or "chunks" according to subsequent processing or data access needs.

Wind data shares similar storage-intensive challenges with seismic data. The wind industry-standard format is NetCDF, as is the case of SEG-Y for seismic; most of it stores the data in a sequential pattern. However, even if chunking is enabled in NetCDF format, the available tooling to access NetCDF files from cloud object stores makes it very challenging to scale up using modern technologies.

Running high-performance computing workloads or machine learning workflows on wind data to extract insights requires high computing capacity. Like seismic, this work has traditionally been done with small volumes of data using computing available in data centers. MDIO can utilize the benefits of cloud-native storage to retrieve the data needed to make accurate wind model calculations faster. Calculations such as these are used in TGS' Wind AXIOM platform to assess idealized energy output potential and accurate revenue expectations of future wind farm projects.

The Future of Multidimensional Data in the Energy Industry

With decades of big data expertise, TGS is applying its knowledge to benefit the energy industry. Through the development of MDIO, a serverless cloud-native format built for multidimensional data storage, developers can realize the full potential of the data they possess. With the most extensive seismic data and well data library on the planet plus a unique approach to global wind modeling and measurement, the MDIO format, combined with scalable cloud processing access, provides vast possibilities for the future of energy data and insight.