2026-04-13 · 6 min read
PULSE Check: Is Your ClickHouse Compression Actually Working?
Your ClickHouse cluster might be using 3x more disk than it needs to. Bad compression is invisible — the data loads, queries work, nothing errors. You just pay more for storage, backups take longer, and merges move more bytes than necessary. These three queries take 10 seconds and tell you exactly where the waste is.
Each query maps to the E (Efficiency) dimension of the PULSE Framework.
Query 1: Table-Level Compression Overview
Start here. This shows every table's compression ratio — how much disk you're using vs. how much raw data you have.
SELECT
database,
table,
formatReadableSize(sum(data_compressed_bytes)) AS compressed,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes) * 100, 1) AS ratio_pct,
sum(rows) AS total_rows,
formatReadableSize(sum(bytes_on_disk)) AS disk_total
FROM system.parts
WHERE active = 1
GROUP BY database, table
ORDER BY sum(data_uncompressed_bytes) DESC
LIMIT 20;What to look for:
- ratio_pct > 50% → Poor compression. Investigate column-level codecs.
- ratio_pct > 70% → Very poor. You're likely storing high-entropy data (UUIDs, hashes, random strings) without appropriate codecs.
- ratio_pct < 15% → Excellent. Typical for well-typed timestamp/integer-heavy tables with appropriate codecs.
Benchmark: A typical ClickHouse analytics table with timestamps, integers, and low-cardinality strings should compress to 15-30% with default LZ4.
Query 2: Column-Level Compression Breakdown
The table-level view hides the real story. Usually one or two columns are responsible for most of the waste.
SELECT
database,
table,
column,
type,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
round(sum(column_data_compressed_bytes) / greatest(sum(column_data_uncompressed_bytes), 1) * 100, 1) AS ratio_pct,
any(compression_codec) AS codec
FROM system.parts_columns
WHERE active = 1
GROUP BY database, table, column, type
HAVING sum(column_data_uncompressed_bytes) > 104857600 -- > 100MB uncompressed
ORDER BY sum(column_data_uncompressed_bytes) DESC
LIMIT 30;What to look for:
- String columns with ratio > 60% → Try ZSTD(3) instead of LZ4
- UUID columns with ratio > 80% → UUIDs don't compress well; consider if you actually need them or if a hash would work
- Timestamp columns with ratio > 20% → Use Delta or DoubleDelta codec
- Integer columns with ratio > 30% → Use Delta codec for monotonically increasing values
- Nullable columns → Nullable adds overhead. Drop Nullable if the column never contains nulls
Query 3: Codec Recommendations
This query identifies columns that would benefit most from a codec change, sorted by potential disk savings.
SELECT
database,
table,
column,
type,
any(compression_codec) AS current_codec,
formatReadableSize(sum(column_data_compressed_bytes)) AS current_compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
round(sum(column_data_compressed_bytes) / greatest(sum(column_data_uncompressed_bytes), 1) * 100, 1) AS ratio_pct,
multiIf(
type LIKE '%Int%' AND ratio_pct > 30, 'Try CODEC(Delta, LZ4)',
type LIKE 'DateTime%' AND ratio_pct > 20, 'Try CODEC(DoubleDelta, LZ4)',
type = 'String' AND ratio_pct > 50, 'Try CODEC(ZSTD(3))',
type LIKE 'UUID%' AND ratio_pct > 70, 'Consider replacing with hash',
type LIKE 'Float%' AND ratio_pct > 40, 'Try CODEC(Gorilla, LZ4)',
ratio_pct > 60, 'Try CODEC(ZSTD(3))',
'OK'
) AS recommendation
FROM system.parts_columns
WHERE active = 1
GROUP BY database, table, column, type
HAVING sum(column_data_uncompressed_bytes) > 104857600
AND recommendation != 'OK'
ORDER BY sum(column_data_uncompressed_bytes) - (sum(column_data_uncompressed_bytes) * 0.2) DESC
LIMIT 20;Applying Codec Changes
Once you've identified wasteful columns:
-- Change codec for a specific column
ALTER TABLE events
MODIFY COLUMN user_agent CODEC(ZSTD(3));
-- Change codec for timestamp column
ALTER TABLE events
MODIFY COLUMN event_time CODEC(DoubleDelta, LZ4);
-- Change codec for integer counter
ALTER TABLE metrics
MODIFY COLUMN request_count CODEC(Delta, LZ4);Important: Codec changes apply to future inserts only. Existing parts keep the old codec until they're merged. To recompress existing data:
-- Force recompression (rewrites all data — expensive)
OPTIMIZE TABLE events FINAL;For large tables, recompress partition by partition to spread the I/O impact. See 312 Parts Per Partition for why you should check merge queue health before running OPTIMIZE.
The Codec Cheat Sheet
| Data Type | Best Codec | Expected Ratio | When to Use |
|---|---|---|---|
| DateTime/DateTime64 | DoubleDelta + LZ4 | 5-15% | Timestamps, event times |
| UInt32/UInt64 (monotonic) | Delta + LZ4 | 10-20% | Auto-increment IDs, counters |
| UInt32/UInt64 (random) | LZ4 | 30-50% | Hash values, random IDs |
| Float32/Float64 | Gorilla + LZ4 | 15-30% | Metrics, sensor data |
| String (low-cardinality) | LZ4 | 5-15% | Status codes, country codes (also use LowCardinality type) |
| String (high-cardinality) | ZSTD(3) | 30-50% | URLs, user agents, log messages |
| UUID | LZ4 | 70-90% | UUIDs compress poorly by nature |
| Enum8/Enum16 | LZ4 | 2-5% | Already very efficient |
What ClusterSight Monitors
ClusterSight tracks compression ratios as part of the E (Efficiency) PULSE dimension:
- Per-table compression trending — alerts when a ratio degrades after schema changes or new data patterns
- Column-level waste detection — flags columns with the highest savings potential
- Codec recommendations — suggests specific codec changes based on column type and data distribution
- Disk budget tracking — shows which tables are growing fastest and why
A PULSE Efficiency score below 70 means you're overspending on storage. The fix is usually a few ALTER TABLE statements.
Check your cluster's PULSE.
This post is part of the PULSE Check series — tactical health checks for ClickHouse operators.
Read next:
Frequently Asked Questions
How do I check compression ratio in ClickHouse?
Query SELECT database, table, formatReadableSize(sum(data_compressed_bytes)) AS compressed, formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed, round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes) * 100, 1) AS ratio_pct FROM system.columns GROUP BY database, table ORDER BY sum(data_uncompressed_bytes) DESC. A healthy ratio is under 30%.
What is a good compression ratio for ClickHouse?
A good compression ratio is 10-30% (compressed size vs uncompressed). ClickHouse with LZ4 typically achieves 20-40%. With ZSTD, 10-25%. If your ratio exceeds 50%, your codec choice or column types likely need attention.
How do I change compression codec in ClickHouse?
Use ALTER TABLE database.table MODIFY COLUMN column_name codec_expression. For example: ALTER TABLE events MODIFY COLUMN user_agent CODEC(ZSTD(3)). The new codec applies to future inserts; existing parts keep the old codec until merged.
Which ClickHouse compression codec should I use?
Use LZ4 (default) for hot data with fast reads. Use ZSTD for cold data or high-cardinality string columns. Use Delta + LZ4 for timestamps and monotonically increasing integers. Use DoubleDelta + LZ4 for metrics and counters.