How do I know if ClickHouse merges are falling behind?

Query system.part_log and compare parts created (NewPart events) vs parts merged (MergeParts events) in the last hour. If the net growth is consistently positive, merges are falling behind inserts and part count will grow until inserts fail.

What is a healthy merge rate in ClickHouse?

A healthy cluster has a net part growth near zero — parts created roughly equals parts consumed by merges. Consistent net growth above 10 parts per hour for a single table indicates the merge queue is falling behind.

How do I speed up ClickHouse merges?

Increase background_pool_size (default 16) if disk I/O has headroom. Batch inserts to create fewer parts. Reduce partition granularity. Check if stuck mutations are blocking merges. On NVMe, pool sizes of 32-64 are common.

PULSE Check: The One Query That Shows If Merges Are Falling Behind

This one query tells you more about your ClickHouse cluster's future than any dashboard panel. It answers the question every operator needs to know: are merges keeping pace with inserts, or is a part count explosion quietly forming?

Maps to the P (Parts) dimension of the PULSE Framework.

The Query

SELECT
    database,
    table,
    countIf(event_type = 'NewPart') AS parts_created,
    countIf(event_type = 'MergeParts') AS merges_completed,
    countIf(event_type = 'NewPart') - countIf(event_type = 'MergeParts') AS net_growth,
    round(countIf(event_type = 'NewPart') / greatest(countIf(event_type = 'MergeParts'), 1), 2) AS creation_to_merge_ratio
FROM system.part_log
WHERE event_time > now() - INTERVAL 1 HOUR
  AND event_type IN ('NewPart', 'MergeParts')
GROUP BY database, table
HAVING parts_created > 10
ORDER BY net_growth DESC;

Reading the Results

`net_growth`	`creation_to_merge_ratio`	Status	Action
≤ 0	≤ 1.0	Healthy	Merges keeping pace. No action needed.
1–20	1.0–1.5	Normal	Slight growth during high ingestion. Monitor.
20–50	1.5–2.0	Warning	Merges falling behind. Check I/O and merge pool.
50–100	2.0–3.0	Critical	Significant backlog forming. Act now.
> 100	> 3.0	Emergency	Part count explosion in progress. Reduce insert rate or increase merge throughput immediately.

A creation_to_merge_ratio above 2.0 means you're creating parts twice as fast as you're merging them. At that rate, a table starting at 50 parts will hit 300 parts (the danger zone) within hours.

Why This Matters More Than Part Count

Part count tells you where you are. This query tells you where you're heading.

A table with 200 parts and net_growth = 0 is stable — merges are keeping up. A table with 80 parts and net_growth = 50/hour will hit 300 parts in 4 hours. The second scenario is more urgent, but a part count threshold alert won't fire until it's almost too late.

When net_growth Is Positive: Diagnose the Cause

Too Many Small Inserts?

-- Average rows per insert
SELECT
    database,
    table,
    count() AS inserts,
    round(avg(rows), 0) AS avg_rows_per_insert,
    round(min(rows), 0) AS min_rows,
    round(max(rows), 0) AS max_rows
FROM system.part_log
WHERE event_type = 'NewPart'
  AND event_time > now() - INTERVAL 1 HOUR
GROUP BY database, table
ORDER BY inserts DESC
LIMIT 15;

avg_rows_per_insert < 1000 = you're inserting too frequently. Batch to 10,000+ rows per insert.

Merges Bottlenecked on I/O?

-- Is the merge pool saturated?
SELECT
    count() AS active_merges,
    getSetting('background_pool_size') AS pool_size,
    round(count() / getSetting('background_pool_size') * 100, 0) AS utilization_pct
FROM system.merges;

utilization_pct at 100% = merge pool is full. See system.merges Decoded for tuning.

Mutations Blocking Merges?

-- Are mutations consuming merge capacity?
SELECT database, table, mutation_id, command,
    dateDiff('second', create_time, now()) AS age_sec
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time ASC;

Any stuck mutation blocks merges on that table. See The Mutation That Ran for 72 Hours.

Emergency Response

If net_growth is above 50 and climbing:

-- 1. Check current part count (how urgent?)
SELECT database, table, partition_id, count() AS parts
FROM system.parts WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts DESC LIMIT 10;
 
-- 2. Increase merge throughput (temporary)
-- Add to config or SET in session:
-- background_pool_size = 32
 
-- 3. Force merge on the worst partition
OPTIMIZE TABLE database.table PARTITION 'partition_id';
 
-- 4. If caused by small inserts, use Buffer table
-- CREATE TABLE buffer AS database.table
-- ENGINE = Buffer(database, table, 16, 10, 100, 10000, 100000, 1000000, 10000000);

Automate This Check

Run the query every 5 minutes. Alert when net_growth exceeds 20 for any table across two consecutive checks. That gives you an early warning while there's still time to fix the root cause.

ClusterSight runs this check continuously and tracks the creation-to-merge ratio as a trending metric in the P (Parts) PULSE dimension. When the ratio exceeds 1.5, the P score begins dropping — hours before part count thresholds are breached.

Check your cluster's PULSE.

This post is part of the PULSE Check series — tactical health checks for ClickHouse operators.

Read next: