2026-06-15 · 6 min read
system.parts_columns: How to Find the Tables Destroying Your Disk Budget
Your largest ClickHouse table isn't necessarily the one with the most rows. It's the one with a user_agent column storing 500 million unique strings with default LZ4 compression. system.parts_columns tells you exactly which columns in which tables are eating your disk — down to the byte.
This is the deep-dive reference for the E (Efficiency) dimension of the PULSE Framework. If you ran the compression PULSE check and found waste, this guide shows you where it lives and how to fix it.
The Disk Budget Query
Start here. This shows your top 20 most expensive columns across all tables:
SELECT
database,
table,
column,
type,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
round(sum(column_data_compressed_bytes) / greatest(sum(column_data_uncompressed_bytes), 1) * 100, 1) AS ratio_pct,
any(compression_codec) AS codec,
formatReadableSize(sum(column_data_compressed_bytes)) AS disk_cost
FROM system.parts_columns
WHERE active = 1
GROUP BY database, table, column, type
ORDER BY sum(column_data_compressed_bytes) DESC
LIMIT 20;The results will likely surprise you. One or two columns typically account for 40-60% of a table's disk usage. These are your optimization targets.
Finding the Five Types of Disk Waste
1. String Columns With Bad Codecs
-- Strings consuming the most disk
SELECT
database,
table,
column,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
round(sum(column_data_compressed_bytes) / greatest(sum(column_data_uncompressed_bytes), 1) * 100, 1) AS ratio_pct,
any(compression_codec) AS codec
FROM system.parts_columns
WHERE active = 1
AND type = 'String'
GROUP BY database, table, column
HAVING sum(column_data_compressed_bytes) > 104857600 -- > 100MB
ORDER BY sum(column_data_compressed_bytes) DESC
LIMIT 15;Fix: High-cardinality strings (URLs, user agents, log messages) with ratio > 50% should use CODEC(ZSTD(3)). Low-cardinality strings (country codes, status values) should use LowCardinality(String).
-- Change to ZSTD for better compression
ALTER TABLE events MODIFY COLUMN user_agent CODEC(ZSTD(3));
-- Or switch to LowCardinality for low-cardinality columns
ALTER TABLE events MODIFY COLUMN country_code LowCardinality(String);2. Unused Columns Still Consuming Disk
-- Columns that exist in the table but may not be queried
-- Cross-reference with system.query_log to find unused columns
SELECT
pc.database,
pc.table,
pc.column,
formatReadableSize(sum(pc.column_data_compressed_bytes)) AS disk_usage,
pc.type
FROM system.parts_columns pc
WHERE pc.active = 1
GROUP BY pc.database, pc.table, pc.column, pc.type
HAVING sum(pc.column_data_compressed_bytes) > 52428800 -- > 50MB
ORDER BY sum(pc.column_data_compressed_bytes) DESC
LIMIT 30;Cross-reference this with system.query_log to identify columns that are stored but never queried. Dropping unused columns from wide tables can save significant disk.
3. Timestamps Without Delta Encoding
-- DateTime/DateTime64 columns with poor compression
SELECT
database,
table,
column,
type,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
round(sum(column_data_compressed_bytes) / greatest(sum(column_data_uncompressed_bytes), 1) * 100, 1) AS ratio_pct,
any(compression_codec) AS codec
FROM system.parts_columns
WHERE active = 1
AND type LIKE 'DateTime%'
AND compression_codec NOT LIKE '%Delta%'
AND compression_codec NOT LIKE '%DoubleDelta%'
GROUP BY database, table, column, type
HAVING sum(column_data_compressed_bytes) > 52428800
ORDER BY sum(column_data_compressed_bytes) DESC;Fix: DateTime columns should always use Delta or DoubleDelta encoding:
ALTER TABLE events MODIFY COLUMN event_time CODEC(DoubleDelta, LZ4);
ALTER TABLE events MODIFY COLUMN created_at CODEC(DoubleDelta, LZ4);4. Nullable Overhead
-- Nullable columns and their overhead
SELECT
database,
table,
column,
type,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
any(compression_codec) AS codec
FROM system.parts_columns
WHERE active = 1
AND type LIKE 'Nullable(%'
GROUP BY database, table, column, type
HAVING sum(column_data_compressed_bytes) > 52428800
ORDER BY sum(column_data_compressed_bytes) DESC
LIMIT 15;Every Nullable column stores an extra byte per row for the null bitmap. If a column never contains nulls, drop the Nullable wrapper:
ALTER TABLE events MODIFY COLUMN optional_field UInt32 DEFAULT 0;5. TTL Not Cleaning Up
-- Data that should have been deleted by TTL
SELECT
database,
table,
partition_id,
formatReadableSize(sum(bytes_on_disk)) AS disk_usage,
min(min_time) AS oldest_data,
max(max_time) AS newest_data,
dateDiff('day', min(min_time), now()) AS age_days
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
HAVING age_days > 90 -- Adjust to your retention policy
ORDER BY sum(bytes_on_disk) DESC
LIMIT 20;If partitions older than your TTL retention still exist, TTL merges haven't processed them:
-- Force TTL cleanup
OPTIMIZE TABLE database.table FINAL;
-- Or per-partition (less I/O impact)
ALTER TABLE database.table DROP PARTITION 'partition_id';The Disk Budget Dashboard
Combine everything into a single table-level overview:
SELECT
database,
table,
count(DISTINCT column) AS columns,
formatReadableSize(sum(column_data_compressed_bytes)) AS total_compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS total_uncompressed,
round(sum(column_data_compressed_bytes) / greatest(sum(column_data_uncompressed_bytes), 1) * 100, 1) AS overall_ratio_pct,
argMax(column, column_data_compressed_bytes) AS largest_column,
formatReadableSize(max(column_data_compressed_bytes)) AS largest_column_size
FROM (
SELECT database, table, column,
sum(column_data_compressed_bytes) AS column_data_compressed_bytes,
sum(column_data_uncompressed_bytes) AS column_data_uncompressed_bytes
FROM system.parts_columns
WHERE active = 1
GROUP BY database, table, column
)
GROUP BY database, table
ORDER BY sum(column_data_compressed_bytes) DESC
LIMIT 20;This shows each table's total disk cost, compression ratio, and which single column is the biggest contributor. Often the largest column alone is 30-50% of the table's disk usage.
How ClusterSight Uses system.parts_columns
ClusterSight analyzes column-level storage as part of the E (Efficiency) PULSE dimension:
- Top disk consumers — ranked by column across all tables
- Codec recommendations — per-column suggestions based on data type and current ratio
- TTL compliance checking — detects expired data still on disk
- Nullable audit — flags Nullable columns that never contain nulls
- Disk budget trending — tracks storage growth per table over time
When the E score drops, ClusterSight pinpoints exactly which columns changed and recommends specific ALTER TABLE commands.
Check your cluster's PULSE.
This post is part of the ClickHouse at Scale series. Previously: system.replicas, system.merges, and system.query_log.
Read next:
Frequently Asked Questions
What is system.parts_columns in ClickHouse?
system.parts_columns is a system table that shows storage details for every column in every active data part. It reveals compressed and uncompressed sizes, compression codecs, and row counts at the column level — far more granular than system.parts which only shows table-level totals.
How do I find which columns use the most disk in ClickHouse?
Query SELECT database, table, column, formatReadableSize(sum(column_data_compressed_bytes)) AS compressed FROM system.parts_columns WHERE active = 1 GROUP BY database, table, column ORDER BY sum(column_data_compressed_bytes) DESC LIMIT 20.
How do I reduce disk usage in ClickHouse?
Identify oversized columns with system.parts_columns, then: change compression codecs (ZSTD for strings, Delta for timestamps), drop unused columns, use LowCardinality for low-cardinality strings, and verify TTL rules are deleting expired data.
How do I check if ClickHouse TTL is working?
Query SELECT database, table, partition_id, min_time, max_time FROM system.parts WHERE active = 1 ORDER BY min_time ASC. If you see partitions with data older than your TTL retention, TTL isn't cleaning up properly. Run OPTIMIZE TABLE database.table FINAL to force TTL cleanup.