Skip to content

tacozipHigh-Speed Archival for Cloud-Native Data

A blazing-fast, STORE-only ZIP writer with embedded TACO Header for instant metadata access. Built in C with first-class Python bindings.

Quick Start

See tacozip API in action:

  • Archive large files (like Parquet or data.bin) with custom metadata.
  • Define byte-range "entries" (e.g., Row Groups) in the header.
  • Read metadata back instantly using read_header().

  import tacozip
  from pathlib import Path

  # 1. Create sample Parquet files
  Path("train.parquet").write_bytes(b"training data..." * 1000)
  Path("test.parquet").write_bytes(b"test data..." * 500)

  # 2. Archive with row group metadata
  tacozip.create(
      "dataset.taco",
      src_files=["train.parquet", "test.parquet"],
      entries=[
          (1000, 5000),  # Row group 0: bytes 1000-6000
          (6000, 4500)   # Row group 1: bytes 6000-10500
      ]
  )

  # 3. Read metadata instantly (165 bytes only!)
  entries = tacozip.read_header("dataset.taco")
  print(f"Metadata: {entries}")

  # 4. Works with cloud storage (S3, HTTP)
  import requests
  r = requests.get(
      "https://cdn.example.com/dataset.taco",
      headers={"Range": "bytes=0-164"}
  )
  entries = tacozip.read_header(r.content)
  

Contact Us

Drop me a line

ISP · Image & Signal Processing

Universitat de València

Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo
Cubo

Released under the MIT License.