CVM (Cardinality Variance Model) addresses the count-distinct problem: efficiently estimating the number of unique elements in a data stream without storing every element.
Challenge: Given a stream of n elements with d distinct values, estimate d using far less than O(n) memory.
CVM analyzes data in a sliding window, continuously estimating:
High repetition → Small alphabet
⟹ Use Static compression
Dictionary methods excel with limited vocabulary
High diversity → Large alphabet
⟹ Use Adaptive compression
Context modeling captures complex patterns
CVM uses space-efficient algorithms for cardinality estimation:
These structures provide approximate counts with bounded error using logarithmic space—essential for real-time segmentation decisions.
CVM serves as the segmentation oracle: it continuously monitors data characteristics and triggers compression head switches when cardinality crosses predefined thresholds. This enables adaptive routing without full-pass analysis.