2026-04-06 · 9 min read

312 Parts Per Partition: The Merge Backlog Time Bomb Nobody Monitors

At 312 parts per partition, your ClickHouse cluster is 12 seconds away from rejecting every insert. Not because anything is broken. Not because a node is down. Because the merge queue fell behind three days ago and nobody noticed — because no dashboard was watching.

This is the merge backlog time bomb. It's the second most common production failure in ClickHouse (after replication drift), and it's almost always preventable.

How Parts Work (The 30-Second Version)

Every INSERT into a MergeTree table creates at least one data part per partition. ClickHouse's background merge process combines smaller parts into larger ones. In a healthy cluster, merges keep pace with inserts — part count stays stable.

The problem starts when inserts outpace merges:

Hour 0:   50 parts/partition  (healthy)
Hour 12: 120 parts/partition  (no alert)
Hour 24: 200 parts/partition  (no alert)
Hour 36: 280 parts/partition  (no alert)
Hour 48: 312 parts/partition  ← you are here
Hour 49: 350 parts/partition  (ClickHouse warns in logs)
Hour 50: INSERT rejected      (incident begins)

At no point in this timeline did CPU spike. Memory was fine. Disk wasn't full. Queries were still fast (a little slower, but within SLA). Every Grafana panel showed green. The time bomb ticked for 48 hours while nobody watched system.parts.

Why Merge Backlogs Form

1. High-Frequency Small Inserts

The most common cause. If your application inserts one row at a time — or even 100 rows per insert — you're creating parts faster than ClickHouse can merge them.

-- How fast are you creating parts?
SELECT
    database,
    table,
    toStartOfHour(max_time) AS hour,
    count() AS parts_created
FROM system.parts
WHERE active = 1
  AND modification_time > now() - INTERVAL 24 HOUR
GROUP BY database, table, hour
ORDER BY hour DESC, parts_created DESC
LIMIT 50;

Rule of thumb: If you're creating more than 1 part per second for a single table, you need to batch your inserts. Use the Buffer table engine or batch at the application layer.

2. Merges Competing for I/O

Background merges need disk I/O. If your cluster is serving heavy analytical queries simultaneously, merges get starved. The merge queue grows while your query dashboard looks healthy.

-- Are merges getting enough resources?
SELECT
    database,
    table,
    count() AS active_merges,
    round(sum(progress) / count() * 100, 1) AS avg_progress_pct,
    formatReadableSize(sum(total_size_bytes_compressed)) AS total_merge_size,
    max(elapsed) AS longest_merge_seconds
FROM system.merges
GROUP BY database, table
ORDER BY active_merges DESC;

If active_merges is at the limit (default: background_pool_size = 16) and avg_progress_pct is low, merges are bottlenecked. See system.merges Decoded for the full breakdown.

3. Large Parts Blocking Small Merges

ClickHouse prioritizes merging similarly-sized parts. If you have one 10GB part and fifty 1MB parts, ClickHouse may defer merging the small parts because combining them with the large part would be expensive. The small parts accumulate.

-- Part size distribution — look for imbalance
SELECT
    database,
    table,
    partition_id,
    count() AS parts,
    formatReadableSize(min(bytes_on_disk)) AS smallest_part,
    formatReadableSize(max(bytes_on_disk)) AS largest_part,
    formatReadableSize(avg(bytes_on_disk)) AS avg_part_size,
    round(max(bytes_on_disk) / greatest(min(bytes_on_disk), 1), 0) AS size_ratio
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
HAVING parts > 10
ORDER BY size_ratio DESC
LIMIT 20;

A size_ratio above 1000x means highly imbalanced parts — the merge scheduler will struggle.

4. Mutations Blocking Merges

Active mutations prevent merges on affected parts. A stuck mutation can freeze the merge queue for a single table while other tables continue merging normally. The part count for the affected table climbs silently.

-- Mutations that might be blocking merges
SELECT
    database,
    table,
    mutation_id,
    command,
    create_time,
    dateDiff('second', create_time, now()) AS age_seconds,
    parts_to_do
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time ASC;

Any mutation older than 10 minutes with parts_to_do > 0 is actively blocking merges on that table.

The Dashboard That Would Have Caught This

Here's what you should be monitoring — the queries that catch merge backlogs days before they become insert failures.

Part Count Per Partition (The Critical Metric)

-- THE query. Run this. Alert on it.
SELECT
    database,
    table,
    partition_id,
    count() AS parts,
    sum(rows) AS total_rows,
    formatReadableSize(sum(bytes_on_disk)) AS disk_size,
    min(modification_time) AS oldest_part,
    max(modification_time) AS newest_part
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
HAVING parts > 50
ORDER BY parts DESC
LIMIT 30;

Alert thresholds:

  • Green: < 100 parts per partition
  • Warning: 100–200 parts per partition
  • Critical: > 200 parts per partition
  • Emergency: > 300 parts per partition (inserts will fail soon)

Part Creation Rate vs Merge Rate

-- Are merges keeping pace with inserts?
SELECT
    database,
    table,
    countIf(event_type = 'NewPart') AS parts_created,
    countIf(event_type = 'MergeParts') AS parts_merged,
    countIf(event_type = 'NewPart') - countIf(event_type = 'MergeParts') AS net_growth
FROM system.part_log
WHERE event_time > now() - INTERVAL 1 HOUR
GROUP BY database, table
HAVING net_growth > 10
ORDER BY net_growth DESC;

If net_growth is consistently positive, your merge queue is falling behind. The rate matters more than the absolute count.

Tiny Parts Detection

-- Parts under 1MB that should have been merged by now
SELECT
    database,
    table,
    partition_id,
    count() AS tiny_parts,
    formatReadableSize(sum(bytes_on_disk)) AS total_tiny_size,
    min(modification_time) AS oldest_tiny_part
FROM system.parts
WHERE active = 1
  AND bytes_on_disk < 1048576  -- 1MB
GROUP BY database, table, partition_id
HAVING tiny_parts > 10
ORDER BY tiny_parts DESC;

Many tiny parts means inserts are small and merges aren't combining them. Oldest tiny part older than 1 hour is a red flag.

Fixing the Time Bomb

Immediate: Force Merge

-- Force merge for a specific table (expensive but immediate)
OPTIMIZE TABLE database.table_name FINAL;
 
-- Force merge for a specific partition (less expensive)
OPTIMIZE TABLE database.table_name PARTITION 'partition_id';

Warning: OPTIMIZE TABLE ... FINAL rewrites all data in the table. On a 500GB table, this takes hours and consumes significant I/O. Use partition-level optimization when possible.

Short-term: Increase Merge Throughput

-- Allow more concurrent merges (default: 16)
SET background_pool_size = 32;
 
-- Allow larger merges to run
SET max_bytes_to_merge_at_max_space_in_pool = 161061273600; -- 150GB

These are server-level settings — add them to your ClickHouse config for persistence.

Long-term: Fix the Insert Pattern

The real fix is always reducing part creation rate:

  1. Batch inserts — Aim for 10,000+ rows per INSERT, not 1 row
  2. Use Buffer tables — Insert into a Buffer table that flushes to MergeTree in batches
  3. Reduce partition granularity — Partitioning by minute creates 1440 partitions/day; partition by hour or day instead
  4. Tune merge settingsparts_to_throw_insert (default: 300) and parts_to_delay_insert (default: 150) control the thresholds

How This Maps to PULSE

Part count monitoring is the P (Parts) dimension of the PULSE Framework. It's the first letter for a reason — part health is the foundation of ClickHouse performance.

When the P score drops in ClusterSight, it means one of:

  • Part counts are growing (merge backlog forming)
  • Tiny parts are accumulating (insert pattern issue)
  • Detached or broken parts exist (data integrity issue)
  • Compression ratios are degrading (schema or data change)

ClusterSight tracks part count trends per partition, alerts on growth rate (not just threshold), and provides the exact OPTIMIZE TABLE command for the specific partition that needs attention.

The 312 Parts Post-Mortem

Back to the cluster with 312 parts per partition. What actually happened:

  1. Day 0: A new microservice started inserting events one row at a time via HTTP interface. 50 inserts/second to a single table.
  2. Day 1: Part count grew from 30 to 150 per partition. No alert — Grafana wasn't monitoring system.parts.
  3. Day 2: Part count reached 250. Query latency increased 40% but stayed within SLA. Nobody investigated.
  4. Day 3, 8 AM: Part count hit 312. Insert latency spiked to 30 seconds (ClickHouse delays inserts above parts_to_delay_insert).
  5. Day 3, 8:15 AM: Part count hit 350. Inserts started failing with "Too many parts" error. Pipeline backed up. Alert finally fired — on the application error rate, not on ClickHouse health.

The fix took 4 hours: force-merge the partitions, batch the inserts, tune merge settings. The prevention would have taken 30 seconds: one query against system.parts in the monitoring stack.

Check your cluster's PULSE.


This post is part of the Operator's Blind Spot series — deep dives into ClickHouse production failures that standard monitoring misses.

Read next:

Frequently Asked Questions

What happens when ClickHouse has too many parts?

ClickHouse warns at 300 parts per partition and starts rejecting inserts at approximately 600 parts. Before that threshold, query performance degrades because each query must open file handles for every active part. The fix is reducing insert frequency, increasing merge throughput, or both.

How do I check part count per partition in ClickHouse?

Query SELECT database, table, partition_id, count() AS parts FROM system.parts WHERE active = 1 GROUP BY database, table, partition_id ORDER BY parts DESC. Any partition above 200 parts needs investigation.

Why is my ClickHouse merge queue falling behind?

Common causes: too many small inserts (each creates a part), insufficient disk I/O for background merges, concurrent heavy queries competing for resources, or merge settings that are too conservative. Check system.merges for active merge count and system.parts for accumulation rate.

How do I fix too many parts in ClickHouse?

Immediate fix: run OPTIMIZE TABLE database.table FINAL to force-merge all parts (expensive on large tables). Long-term fix: batch inserts to reduce part creation rate, tune merge settings (background_pool_size, max_bytes_to_merge_at_max_space_in_pool), and monitor part count trends.