fixed-size round-robin time-series storage with circular overwrite
Whisper implements a fixed-size round-robin database that pre-allocates disk space for time-series metrics and overwrites oldest data when capacity is reached, eliminating dynamic allocation overhead. Data is written sequentially by the Carbon ingestion service into pre-defined storage slots indexed by timestamp, enabling predictable disk usage and O(1) write performance regardless of dataset age. The circular buffer design trades retention flexibility for guaranteed storage footprint and consistent write latency.
Unique: Uses fixed-size circular buffer design with pre-allocated disk space and sequential writes, eliminating dynamic allocation and enabling deterministic write performance — unlike traditional databases that grow dynamically and require compaction/vacuuming
vs alternatives: Provides guaranteed O(1) write latency and predictable disk footprint for long-term metric storage, whereas InfluxDB and Prometheus require active compaction and dynamic scaling to maintain performance
multi-resolution time-series aggregation with retention tiers
Whisper supports creating multiple aggregation levels (e.g., 1-minute raw data, 10-minute averages, 1-hour summaries) within a single database file, enabling efficient long-term storage by downsampling older data while preserving high-resolution recent data. Each retention tier is a separate round-robin buffer with different time granularity and aggregation function (average, sum, last, etc.), allowing queries to select appropriate resolution based on time range. This hierarchical approach reduces storage for historical data while maintaining query flexibility without requiring separate database files.
Unique: Implements multi-resolution storage within single database files using stacked round-robin buffers with configurable aggregation functions per tier, enabling space-efficient long-term retention without separate database files or external aggregation pipelines
vs alternatives: More storage-efficient than Prometheus for long-term retention because aggregation happens at write-time into fixed-size buffers, whereas Prometheus requires external tools like Thanos for downsampling and long-term storage
carbon-integrated metric ingestion with timestamp-based write routing
Whisper integrates with the Carbon service through a write API that accepts timestamped metric data and routes writes to appropriate storage slots based on metric name and timestamp. Carbon handles network protocol parsing (plaintext, pickle, AMQP) and metric name validation, then delegates to Whisper for storage, which maps timestamps to fixed storage offsets using modulo arithmetic on the circular buffer. This separation enables Carbon to handle protocol complexity while Whisper focuses on efficient storage, with no direct user-facing API — all writes flow through Carbon.
Unique: Delegates protocol handling to Carbon while Whisper focuses purely on storage, using timestamp-based modulo arithmetic for O(1) write routing without index lookups — enabling high-throughput ingestion without database overhead
vs alternatives: Simpler write path than InfluxDB (no query parsing, no schema validation at storage layer) but less flexible than Prometheus (no built-in scraping, requires external metric collection)
graphite-web query integration with time-range-based data retrieval
Whisper provides a read interface consumed by Graphite-web to retrieve time-series data for graph rendering and API responses. Queries specify metric name and time range, and Whisper maps the time range to storage offsets in the circular buffer, reading only the relevant data without scanning the entire file. Graphite-web handles query parsing, metric pattern matching, and result formatting (PNG, CSV, JSON, XML), while Whisper provides the low-level storage retrieval. This separation enables Graphite-web to support complex queries (wildcards, functions, aggregations) without Whisper needing to understand query semantics.
Unique: Provides time-range-based retrieval using circular buffer offset calculation, enabling efficient reads without index structures — Graphite-web handles all query complexity, keeping Whisper focused on fast storage access
vs alternatives: Simpler query interface than InfluxDB (no query language, no server-side aggregation) but requires external tool (Graphite-web) for complex queries, whereas Prometheus includes query language in storage layer
metric schema definition with retention and aggregation configuration
Whisper requires upfront schema definition specifying retention policies (how long to keep data at each resolution) and aggregation functions (how to downsample data between tiers). Schema is typically defined in a configuration file (storage-schemas.conf) that maps metric name patterns to retention/aggregation rules, which Carbon uses to create Whisper database files with matching structure. Once created, schema is immutable — changing retention or aggregation requires recreating the database file and losing historical data. This design trades flexibility for predictability: storage footprint and query performance are determined at schema definition time.
Unique: Requires upfront schema definition with immutable retention/aggregation configuration, enabling deterministic storage allocation and performance but requiring careful planning — unlike dynamic schema systems (InfluxDB, Prometheus) that adapt to incoming data
vs alternatives: Provides predictable storage costs and performance through schema-driven design, whereas Prometheus and InfluxDB require active management of retention policies and cardinality limits to prevent unbounded growth
efficient metric file storage with minimal filesystem overhead
Whisper stores each metric as a single binary file on disk containing a header (metadata about retention tiers and aggregation) followed by pre-allocated round-robin buffers. Files are created at metric creation time with full size allocated upfront, eliminating fragmentation and enabling direct offset calculation for reads/writes. The binary format is compact and uncompressed, optimized for sequential access patterns typical of time-series workloads. No indexing structures are maintained — metric discovery relies on filesystem directory scanning, making Whisper suitable for systems with moderate metric cardinality (thousands to tens of thousands) but not billions of unique metrics.
Unique: Uses single-file-per-metric design with pre-allocated binary storage and no indexing, enabling O(1) offset calculation for reads/writes but requiring filesystem-level metric discovery — simpler than database systems but less scalable than indexed storage
vs alternatives: Simpler operational model than InfluxDB or Prometheus (no compaction, no WAL management) but less efficient for high-cardinality metrics (billions of unique series) where filesystem overhead becomes prohibitive