Skip to content

The TACO Specification

A FAIR-compliant, cloud-native specification that defines a formal and scalable format for packaging and sharing AI-ready Earth Observation datasets.

taco
Scroll Down

The Structural Bottleneck

There is no standard for AI-ready Earth Observation datasets. The data remains fragmented, siloed, and difficult to unify, holding back the full potential of planetary-scale insight.

Ad Hoc File Structures

Data producers adopt inconsistent organizational patterns, making datasets difficult to interpret, compare, and integrate across different projects.

Loosely Defined Formats

Without standardized schemas, data interpretation becomes subjective and error-prone, hindering automated processing.

Inconsistent Semantic Encodings

Varying metadata standards prevent meaningful data integration and limit the development of interoperable AI tools.

Limited Cloud Optimization

Most datasets lack efficient partial read support and cloud-native access patterns, resulting in poor performance and high costs.

Scroll Down

Introducing TACO

Built on GDAL & Apache Parquet • Multi-language Support • Cloud-Optimized • FAIR-compliant

TACO is a unified specification that brings structure, clarity, and interoperability to AI-ready Earth Observation data. To achieve this, TACO defines a complete, interoperable specification:

Formal Data Model

A structured data model built on SAMPLEs 🌽, TORTILLA 🫓, and rich, standardized metadata schemas, all coming together to generate a TACO 🌮.

  • Self-contained SAMPLE units
  • Hierarchical TORTILLA containers
  • Standardized metadata conventions

Binary File Format

Efficient binary serialization using Apache Parquet metadata and GDAL VFS for fast, scalable access to geospatial data.

  • Partial read optimization
  • Cloud-native access patterns
  • Self-contained & portable

Unified API

Consistent interfaces across Python, R, and Julia for creating and accessing datasets.

  • Toolbox for dataset creation
  • Reader for efficient access
  • DataFrame integration
Cloud-Optimized Architecture

Built-in support for major cloud storage platforms (S3, Azure, GCS) with efficient partial reads and minimal HTTP requests, reducing both latency and operational costs.

Foundation Model Ready

Designed specifically for the era of foundation models, enabling seamless combination of diverse datasets for effective training and performance evaluation.

Scroll Down

Why Choose TACO?

Accelerated Development

Reduce time-to-science with standardized, ready-to-use datasets

Data Integrity

Built-in validation and consistency checks ensure data quality

Enhanced Reproducibility

Standardized formats enable reliable scientific reproducibility

Community-Driven

Developed with input from leading EO and AI researchers

Scalable Architecture

Designed to handle datasets from gigabytes to petabytes

Foundation Model Ready

Optimized for training and evaluating large-scale AI models

Scroll Down

Shape the Future of AI-ready EO Datasets

Be part of the movement to standardize AI-ready Earth Observation data. Collaborate, learn, and contribute to tools that accelerate global insight and innovation.

Back to Top