Advances in Scaling and Architecture of 3D Foundation Models for Seismic Data

Paper Summary

3D Seismic Foundation Models (SFMs) have been scaled to 1.8 billion parameters, pushing the boundaries of AI-driven seismic analysis. This work employs Vision Transformers (ViTs) augmented with multi-dimensional rotary positional embeddings and FlashAttention-2 to efficiently handle larger 3D spatial contexts. Pretraining was conducted on 20 terabytes of seismic data spanning 444,000 km² using a Masked Autoencoder (MAE) approach for self-supervised learning. Drawing on advancements in large model optimization, including key/query normalization and mixed precision techniques, the models achieved state-of-the-art generalization for salt segmentation tasks, with mean Intersection over Union (IoU) scores exceeding 0.9 across unseen datasets. Memory consumption analysis reveals a loglinear scaling relationship between model size, context size, and memory requirements. These advancements showcase the transformative potential of scaled SFMs in geophysical interpretation.