How ViTs Scale on Seismic Data: a Study on Model and Data Trade-offs

Paper Summary

Pre-trained seismic foundation models (SFMs) have shown promising performance in seismic interpretation tasks. They demonstrate effective generalization across various geographic areas. However, the impact of dataset size and model complexity on seismic data applications remains underexplored. Understanding these relationships is vital for optimizing model performance and improving geological interpretation accuracy in practical seismic applications. We systematically assess the effects of dataset size and model complexity on seismic data by training multiple Vision Transformer (ViT) variants using the Masked Auto Encoder (MAE) technique. We benchmark these models using a fewshot facies classification task with the established LANDMASS1 dataset. Our analysis reveals clear scaling metrics, demonstrating performance enhancements with larger models and datasets consistent with scaling laws noted
in other domains. These insights offer actionable guidelines for larger 3D models and datasets.