Reducing Cloud-Storage through Open-Sourced Lossless Compression.

The energy sector is rapidly evolving, driven by the need for efficient data management. Organizations grapple with the sheer volume and complexity of data, ranging from seismic information to wind, solar and well data, which can require Petabytes (PBs) of expensive storage and complex data management infrastructures. The necessity for a more efficient and streamlined approach is apparent, given that our data generation is guaranteed to continue expanding.

A company with enormous datasets like TGS is in a unique position to solve this dilemma. TGS presently manages more than 150 PBs of seismic data alone, which includes nearly one  million pieces of media, and the volume continues to grow daily. The TGS Data Science team set a goal of reducing cloud storage by 30%. The team was able to exceed that goal using lossless compression technology developed in-house and subsequently made available to the industry under an open-source license. This technology, MDIO (Multidimensional Input/Output), is now being leveraged across the TGS organization as well as with TGS customers.

Our journey began a few years ago when the TGS Data Science team began applying Artificial Intelligence (AI) and Machine Learning (ML) to the TGS data for internal analytics and data processing. The team was quickly faced with challenges working with these massive datasets, resulting in time delays and expensive High-Performance Compute (HPC) costs. After conducting a thorough assessment of the available solutions in the market, the team set out to develop a new solution by focusing on optimizing the data infrastructure, beginning with a data compression solution. Although the team ultimately delivered a full data management solution with TGS Data Verse as demonstrated in Figure 1, this article focuses on one component of the solution, cloud data storage. This series of articles will take you on a journey to address the additional components required to implement a comprehensive data management solution in future articles.

1-26-2024-Picture1-1
Figure 1: Comprehensive offerings of TGS Data Verse

MDIO is the industry's first fully open-source cloud-native data storage architecture for AI, ML and HPC workflows. Also referred to as the "Swiss Army Knife" of energy data, MDIO provides efficient and fast processing of large volumes of data. It was initially designed for numerical weather prediction and exploration seismology, but due to its versatility, it has become a crucial tool in various data-intensive fields. Based on Zarr, a community-driven project, MDIO represents collaborative innovation that greatly enhances data management and analysis in different industries. Its integration into systems like TGS Data Verse highlights its pivotal role in modern data management and processing infrastructure.

Having effectively handled internal data sets, TGS has introduced this capability to various energy operators, resulting in significant cloud storage savings with immediate access to large sets of data. Figure 2 below demonstrates the lossless compression ratio of three different files, with storage savings ranging from 19% – 53%.

1-26-2024-Picture2-1
Figure 2: Comparison of files before/after MDIO lossless compression

This marks the initial stages of demonstrating how energy companies can leverage TGS Data Verse for cost reduction and time savings.  Learn more about TGS Data Verse here: https://www.tgs.com/digital/data-management-as-a-service

To learn more about how TGS gives back to the industry with MDIO’s involvement in the Open-Source Community, visit https://mdio.dev and check out “Get Started in 10 Minutes”. 

1-26-2024-Picture3-2
Figure 3: MDIO’s dedicated website for the Open-Source Community

Sign up for our  spotlight articles and learn more about the TGS Data Verse series of articles as we continue to delve into topics such as master data management, data accessibility, OSDU integration, data delivery, and data science.