tacozipHigh-Speed Archival for Cloud-Native Data

A blazing-fast, STORE-only ZIP writer with embedded TACO Header for instant metadata access. Built in C with first-class Python bindings.

What is tacozip?

GitHub

⚡

Native C Performance

C library with zero-overhead Python bindings.Perfect for processing multi-GB files without Python GIL bottlenecks.

🗂️

100% ZIP Compatible

Open with WinZip, 7-Zip, or any standard tool. The TACO Header transparently appears as a regular file - full ecosystem compatibility.

☁️

Cloud-Native Design

Single 165-byte read gets all metadata. Perfect for S3, Azure, HTTP CDN - access specific chunks (e.g., Parquet row groups) without downloading entire archives.

Quick Start

See tacozip API in action:

Archive large files (like Parquet or data.bin) with custom metadata. →
Define byte-range "entries" (e.g., Row Groups) in the header. →
Read metadata back instantly using read_header(). →


  import tacozip
  from pathlib import Path

  # 1. Create sample Parquet files
  Path("train.parquet").write_bytes(b"training data..." * 1000)
  Path("test.parquet").write_bytes(b"test data..." * 500)

  # 2. Archive with row group metadata
  tacozip.create(
      "dataset.taco",
      src_files=["train.parquet", "test.parquet"],
      entries=[
          (1000, 5000),  # Row group 0: bytes 1000-6000
          (6000, 4500)   # Row group 1: bytes 6000-10500
      ]
  )

  # 3. Read metadata instantly (165 bytes only!)
  entries = tacozip.read_header("dataset.taco")
  print(f"Metadata: {entries}")

  # 4. Works with cloud storage (S3, HTTP)
  import requests
  r = requests.get(
      "https://cdn.example.com/dataset.taco",
      headers={"Range": "bytes=0-164"}
  )
  entries = tacozip.read_header(r.content)

Contact Us

Drop me a line

ISP · Image & Signal Processing

Universitat de València