After segmentation routing, each segment type gets compressed by a specialized head optimized for its statistical properties:
Format:
[value: 1 byte]
[runLength: varuint]
Example:
Input: 0x20 repeated 65536 times
Encoded: 0x20 0x80 0x00 0x04
(space, varint length)
Output: 5 bytes vs 64 KB
Ratio: 13107:1
Features:
Based on FA-CVM Histograms:
1. Build histogram from segment
(or use FA-CVM frequency tracking)
2. Normalize to 2^12 scale via
Top-K + largest remainder
3. Encode with FSE (Finite State
Entropy) - tANS variant
4. Table reuse: KL ≤ 0.08 bits
→ reference prior table ID
Features:
Multi-Order Context Mixing:
ICM (Indirect Context Model)
↓
ISSE (Indirect Secondary Symbol
Estimation)
↓
SSE (Secondary Symbol Estimation)
↓
Arithmetic Coder
Context orders: 0-8
(dynamically adapts to local statistics)
Features:
Input Data (enwik9: 1 GB Wikipedia XML)
↓
[FA-CVM Segmentation Detector]
↓
Split into segments based on entropy/distinct variance
↓
┌──────────────┬──────────────────┬───────────────────┐
│ REPEAT │ STATIC │ ADAPTIVE │
│ segments │ segments │ segments │
│ │ │ │
│ purity │ stationary + │ non-stationary │
│ ≥ 98.5% │ structured │ or high-entropy │
└──────┬───────┴────────┬─────────┴─────────┬─────────┘
↓ ↓ ↓
[RLE Head] [FSE Coder] [Context-Mixing Head]
│ │ │
│ │ │
[value+len] [histogram→FSE] [ICM+ISSE chain]
│ │ │
└────────────────┴───────────────────┘
↓
[Binary Archive Stream]
↓
Compressed Output File