The TACO Specification
A FAIR-compliant, cloud-native specification that defines a formal and scalable format for packaging and sharing AI-ready Earth Observation datasets.
A FAIR-compliant, cloud-native specification that defines a formal and scalable format for packaging and sharing AI-ready Earth Observation datasets.
There is no standard for AI-ready Earth Observation datasets. The data remains fragmented, siloed, and difficult to unify, holding back the full potential of planetary-scale insight.
Data producers adopt inconsistent organizational patterns, making datasets difficult to interpret, compare, and integrate across different projects.
Without standardized schemas, data interpretation becomes subjective and error-prone, hindering automated processing.
Varying metadata standards prevent meaningful data integration and limit the development of interoperable AI tools.
Most datasets lack efficient partial read support and cloud-native access patterns, resulting in poor performance and high costs.
TACO is a unified specification that brings structure, clarity, and interoperability to AI-ready Earth Observation data. To achieve this, TACO defines a complete, interoperable specification:
A structured data model built on SAMPLEs 🌽, TORTILLA 🫓, and rich, standardized metadata schemas, all coming together to generate a TACO 🌮.
SAMPLE
unitsTORTILLA
containersEfficient binary serialization using Apache Parquet metadata and GDAL VFS for fast, scalable access to geospatial data.
Consistent interfaces across Python, R, and Julia for creating and accessing datasets.
Built-in support for major cloud storage platforms (S3, Azure, GCS) with efficient partial reads and minimal HTTP requests, reducing both latency and operational costs.
Designed specifically for the era of foundation models, enabling seamless combination of diverse datasets for effective training and performance evaluation.
Reduce time-to-science with standardized, ready-to-use datasets
Built-in validation and consistency checks ensure data quality
Standardized formats enable reliable scientific reproducibility
Developed with input from leading EO and AI researchers
Designed to handle datasets from gigabytes to petabytes
Optimized for training and evaluating large-scale AI models
Be part of the movement to standardize AI-ready Earth Observation data. Collaborate, learn, and contribute to tools that accelerate global insight and innovation.