The TACO Specification

A FAIR-compliant, cloud-native specification that defines a formal and scalable format for packaging and sharing AI-ready Earth Observation datasets.

Read the Paper

View Implementation

Scroll Down

The Structural Bottleneck

There is no standard for AI-ready Earth Observation datasets. The data remains fragmented, siloed, and difficult to unify, holding back the full potential of planetary-scale insight.

Ad Hoc File Structures

Data producers adopt inconsistent organizational patterns, making datasets difficult to interpret, compare, and integrate across different projects.

Loosely Defined Formats

Without standardized schemas, data interpretation becomes subjective and error-prone, hindering automated processing.

Inconsistent Semantic Encodings

Varying metadata standards prevent meaningful data integration and limit the development of interoperable AI tools.

Limited Cloud Optimization

Most datasets lack efficient partial read support and cloud-native access patterns, resulting in poor performance and high costs.

Scroll Down

Introducing TACO

Built on GDAL & Apache Parquet • Multi-language Support • Cloud-Optimized • FAIR-compliant

TACO is a unified specification that brings structure, clarity, and interoperability to AI-ready Earth Observation data. To achieve this, TACO defines a complete, interoperable specification:

Formal Data Model

A structured data model built on SAMPLEs 🌽, TORTILLA 🫓, and rich, standardized metadata schemas, all coming together to generate a TACO 🌮.

Self-contained SAMPLE units
Hierarchical TORTILLA containers
Standardized metadata conventions

Binary File Format

Efficient binary serialization using Apache Parquet metadata and GDAL VFS for fast, scalable access to geospatial data.

Partial read optimization
Cloud-native access patterns
Self-contained & portable

Unified API

Consistent interfaces across Python, R, and Julia for creating and accessing datasets.

Toolbox for dataset creation
Reader for efficient access
DataFrame integration

Cloud-Optimized Architecture

Built-in support for major cloud storage platforms (S3, Azure, GCS) with efficient partial reads and minimal HTTP requests, reducing both latency and operational costs.

Foundation Model Ready

Designed specifically for the era of foundation models, enabling seamless combination of diverse datasets for effective training and performance evaluation.

Scroll Down

Why Choose TACO?

Accelerated Development

Reduce time-to-science with standardized, ready-to-use datasets

Data Integrity

Built-in validation and consistency checks ensure data quality

Enhanced Reproducibility

Standardized formats enable reliable scientific reproducibility

Community-Driven

Developed with input from leading EO and AI researchers

Scalable Architecture

Designed to handle datasets from gigabytes to petabytes

Foundation Model Ready

Optimized for training and evaluating large-scale AI models

Scroll Down

Shape the Future of AI-ready EO Datasets

Be part of the movement to standardize AI-ready Earth Observation data. Collaborate, learn, and contribute to tools that accelerate global insight and innovation.

The TACO Specification

The Structural Bottleneck

Ad Hoc File Structures

Loosely Defined Formats

Inconsistent Semantic Encodings

Limited Cloud Optimization

Introducing TACO

Formal Data Model

Binary File Format

Unified API

Why Choose TACO?

Accelerated Development

Data Integrity

Enhanced Reproducibility

Community-Driven

Scalable Architecture

Foundation Model Ready

Shape the Future of AI-ready EO Datasets

Contribute

Learn

Datasets