Compression Heads: Specialized Encoders

Three Segments, Three Custom Algorithms

After segmentation routing, each segment type gets compressed by a specialized head optimized for its statistical properties:

REPEAT Head

Custom RLE Encoder
Format:
[value: 1 byte]
[runLength: varuint]

Example:
Input: 0x20 repeated 65536 times
Encoded: 0x20 0x80 0x00 0x04
         (space, varint length)
Output: 5 bytes vs 64 KB
Ratio: 13107:1

Features:

  • Variable-length integer encoding for run length
  • Single byte + length = minimal overhead
  • Perfect for padding, spaces, nulls
  • Table reuse for repeated constant patterns

STATIC Head

Custom FSE Coder
Based on FA-CVM Histograms:

1. Build histogram from segment
   (or use FA-CVM frequency tracking)

2. Normalize to 2^12 scale via
   Top-K + largest remainder

3. Encode with FSE (Finite State
   Entropy) - tANS variant

4. Table reuse: KL ≤ 0.08 bits
   → reference prior table ID

Features:

  • FA-CVM sparse frequency → FSE weights
  • Deterministic normalization (256 symbols → 4096 scale)
  • Table caching across segments
  • Perfect for stationary distributions (text, structured data)

ADAPTIVE Head

Quantum Attention Modeler
Multi-Order Context Mixing:

ICM (Indirect Context Model)
  ↓
ISSE (Indirect Secondary Symbol
      Estimation)
  ↓  
SSE (Secondary Symbol Estimation)
  ↓
Arithmetic Coder

Context orders: 0-8
(dynamically adapts to local statistics)

Features:

  • Multi-order context mixing with weight learning
  • Adaptive probability models that evolve during encoding
  • ICM + ISSE chain for deep statistical dependencies
  • Perfect for non-stationary, high-entropy data

Complete Compression Pipeline

Input Data (enwik9: 1 GB Wikipedia XML)
    ↓
[FA-CVM Segmentation Detector]
    ↓
  Split into segments based on entropy/distinct variance
    ↓
┌──────────────┬──────────────────┬───────────────────┐
│   REPEAT     │      STATIC      │     ADAPTIVE      │
│  segments    │     segments     │     segments      │
│              │                  │                   │
│   purity     │  stationary +    │  non-stationary   │
│   ≥ 98.5%    │  structured      │  or high-entropy  │
└──────┬───────┴────────┬─────────┴─────────┬─────────┘
       ↓                ↓                   ↓
   [RLE Head]     [FSE Coder]    [Context-Mixing Head]
       │                │                   │
       │                │                   │
    [value+len]   [histogram→FSE]   [ICM+ISSE chain]
       │                │                   │
       └────────────────┴───────────────────┘
                        ↓
              [Binary Archive Stream]
                        ↓
              Compressed Output File
REPEAT Compression
~13000:1
For pure constant runs (spaces, padding)
STATIC Compression
~4-6:1
For English text, structured data
ADAPTIVE Compression
~2-4:1
For mixed/encrypted/pre-compressed data