Paper Summary

Extracting structured metadata from unstructured SEG-Y text headers is essential for organizing and retrieving seismic data. We develop and assess an LLM-powered API that organizes unstructured SEG-Y text header data using predefined JSON schemas. Across 21 datasets, our method achieves a semantic accuracy of 90.33% and a strict (worstcase) accuracy of 79.45%, demonstrating its effectiveness for scalable data extraction. We explore areas for improvement, including domain knowledge integration and model enhancements, to further optimize structured text extraction.