Advances in Scaling and Architecture of 3D Foundation Models for Seismic Data

Paper Summary

Traditional machine learning approaches for seismic data interpretation have historically depended on iterative training and inference processes applied to individual datasets, resulting in models lacking robustness and failing to generalize beyond their specific training domains. To address these limitations, we introduce a novel methodology that leverages self-supervised training and the scaling of 3D Vision Transformer (ViT) architectures to significantly enhance seismic interpretation capabilities significantly, enabling improved generalization across diverse geological datasets. This study focuses on the complexities associated with large-scale training, utilizing a comprehensive global dataset comprising 63 seismic volumes. We employ the Masked Autoencoder (MAE) architecture, integrated with the ViT-H model, which encompasses an impressive 660 million parameters, to achieve unprecedented scalability and performance
in seismic data processing.

Our approach is underpinned by a cloud-native, digitalized seismic data infrastructure that effectively tackles data engineering challenges, eliminating the need for data duplication and streamlining access to large-scale datasets. This infrastructure, combined with the MDIO seismic data format, facilitates efficient data management and high-throughput access, ensuring that computational resources, such as A100 GPU clusters, are fully utilized during training. We fine-tuned a salt segmentation for a practical demonstration of the model's capabilities.