2026-04-20 · 9 min read

system.merges Decoded: Understanding ClickHouse's Background Engine

system.merges shows you exactly what ClickHouse is doing in the background right now. Every MergeTree table relies on background merges to consolidate data parts. When merges fall behind, parts accumulate until inserts fail. This table tells you whether merges are healthy, what's slowing them down, and whether you're about to have a problem.

The official docs list the columns. This guide explains what they mean operationally and how to use them to prevent incidents.

Why Merges Matter

Every INSERT creates at least one data part. Without merges, part count grows unboundedly. ClickHouse warns at 300 parts per partition and rejects inserts around 600. Merges are the process that prevents this — they combine smaller parts into larger ones.

For the full part lifecycle, see ClickHouse Merges: How They Work.

The Table at a Glance

SELECT * FROM system.merges FORMAT Vertical;

This table is live — it only shows currently running merges. When a merge completes, its row disappears. To see merge history, query system.part_log instead.

Column-by-Column Reference

Identity Columns

database / table

Which table this merge belongs to. A busy cluster might show merges from dozens of tables simultaneously.

partition_id

The partition being merged. Merges never cross partition boundaries — each merge combines parts within a single partition.

result_part_name

The name of the part that will be created when this merge completes. The naming convention encodes the partition, min/max block numbers, and merge level.

Progress Columns

elapsed

What it is: Seconds since the merge started.

What it means operationally: Short merges (< 60 seconds) are normal. Merges lasting 10+ minutes usually involve large parts. Merges lasting 1+ hours indicate very large data or I/O problems.

-- Long-running merges — potential I/O bottleneck
SELECT
    database,
    table,
    partition_id,
    round(elapsed, 0) AS seconds,
    round(progress * 100, 1) AS progress_pct,
    formatReadableSize(total_size_bytes_compressed) AS merge_size,
    num_parts AS source_parts
FROM system.merges
WHERE elapsed > 300  -- > 5 minutes
ORDER BY elapsed DESC;

Alert threshold: Any merge running longer than 30 minutes warrants investigation.

progress

What it is: A float from 0 to 1 representing merge completion.

What it means operationally: The most important diagnostic column. A merge with progress advancing steadily is healthy — it's just large. A merge with progress stuck at the same value for minutes has an I/O problem.

-- Snapshot this every 60 seconds, compare progress values
-- If progress hasn't moved, merges are I/O-starved
SELECT
    database,
    table,
    result_part_name,
    round(progress * 100, 2) AS progress_pct,
    elapsed,
    formatReadableSize(bytes_read_uncompressed) AS read_so_far,
    formatReadableSize(total_size_bytes_compressed) AS total_size,
    now() AS checked_at
FROM system.merges
ORDER BY elapsed DESC;

num_parts

What it is: How many source parts are being combined in this merge.

What it means operationally: Small merges combine 2-5 parts. Larger merges can combine 10-20. If you see merges with num_parts = 2 when your table has 200 parts, ClickHouse is only able to do tiny incremental merges — the backlog will grow faster than it clears.

Size Columns

total_size_bytes_compressed / total_size_marks

What they are: The total compressed size and mark count of the resulting merged part.

What they mean operationally: Large merges (> 1GB compressed) are expensive. Very large merges (> 10GB) can take hours and dominate disk I/O during that time.

-- Merge size distribution — are merges too large?
SELECT
    database,
    table,
    count() AS active_merges,
    formatReadableSize(sum(total_size_bytes_compressed)) AS total_merge_work,
    formatReadableSize(avg(total_size_bytes_compressed)) AS avg_merge_size,
    formatReadableSize(max(total_size_bytes_compressed)) AS largest_merge
FROM system.merges
GROUP BY database, table
ORDER BY sum(total_size_bytes_compressed) DESC;

bytes_read_uncompressed / bytes_written_uncompressed

What they are: How much data has been read from source parts and written to the result part so far.

What they mean operationally: The ratio of read to written tells you about compression efficiency during the merge. If bytes_written is much less than bytes_read, the resulting part will be well-compressed. If they're similar, compression is poor for this data.

Resource Columns

memory_usage

What it is: Current memory consumed by this merge operation.

What it means operationally: Each merge needs memory for read buffers, sort operations, and write buffers. If total merge memory across all concurrent merges approaches your server's memory limit, merges compete with queries for RAM.

-- Total memory consumed by all merges
SELECT
    formatReadableSize(sum(memory_usage)) AS total_merge_memory,
    count() AS concurrent_merges
FROM system.merges;

If this is over 20% of total ClickHouse memory, consider reducing background_pool_size.

is_mutation

What it is: Whether this "merge" is actually a mutation (ALTER TABLE UPDATE/DELETE) executing as a merge operation.

What it means operationally: Mutations execute by rewriting parts. They appear in system.merges with is_mutation = 1. These are often slower than regular merges because they must evaluate predicates on every row.

-- Separate mutations from regular merges
SELECT
    is_mutation,
    count() AS count,
    formatReadableSize(sum(total_size_bytes_compressed)) AS total_size,
    round(avg(elapsed), 0) AS avg_seconds
FROM system.merges
GROUP BY is_mutation;

If mutations are consuming most of the merge pool, regular merges can't run — and parts accumulate. See Your Replicas Are Lying for how stuck mutations cause data inconsistency.

The Five Diagnostic Queries

1. Merge Queue Health Overview

SELECT
    database,
    table,
    count() AS active_merges,
    round(avg(progress) * 100, 1) AS avg_progress_pct,
    round(max(elapsed), 0) AS longest_merge_sec,
    formatReadableSize(sum(total_size_bytes_compressed)) AS total_work,
    sum(num_parts) AS total_source_parts,
    countIf(is_mutation = 1) AS mutation_merges
FROM system.merges
GROUP BY database, table
ORDER BY active_merges DESC;

2. I/O Contention Detection

-- Merges that aren't making progress
SELECT
    database,
    table,
    result_part_name,
    round(progress * 100, 2) AS progress_pct,
    elapsed AS seconds_running,
    formatReadableSize(bytes_read_uncompressed) AS data_read,
    formatReadableSize(total_size_bytes_compressed) AS total_size,
    round(bytes_read_uncompressed / greatest(elapsed, 1) / 1048576, 1) AS read_mb_per_sec
FROM system.merges
WHERE elapsed > 120
ORDER BY read_mb_per_sec ASC;

read_mb_per_sec below 10 indicates serious I/O contention. Healthy merges on SSD should read at 100+ MB/s.

3. Merge Pool Saturation

-- Are we at the merge concurrency limit?
SELECT
    count() AS active_merges,
    getSetting('background_pool_size') AS max_merges,
    round(count() / getSetting('background_pool_size') * 100, 0) AS utilization_pct
FROM system.merges;

If utilization_pct is consistently at 100%, the merge pool is saturated. Parts are accumulating faster than they can be merged. Either increase background_pool_size or reduce insert rate.

4. Per-Table Merge Throughput

-- Which tables are consuming the most merge resources?
SELECT
    database,
    table,
    count() AS merges,
    formatReadableSize(sum(memory_usage)) AS memory_used,
    formatReadableSize(sum(total_size_bytes_compressed)) AS data_being_merged,
    round(sum(elapsed), 0) AS total_cpu_seconds
FROM system.merges
GROUP BY database, table
ORDER BY sum(total_size_bytes_compressed) DESC;

If one table dominates, it may be starving other tables of merge resources.

5. Merge History (from part_log)

-- Recent merge completions — how fast are merges finishing?
SELECT
    database,
    table,
    event_type,
    toStartOfMinute(event_time) AS minute,
    count() AS merges_completed,
    round(avg(duration_ms) / 1000, 1) AS avg_duration_sec,
    formatReadableSize(sum(bytes_uncompressed)) AS data_merged
FROM system.part_log
WHERE event_type = 'MergeParts'
  AND event_time > now() - INTERVAL 1 HOUR
GROUP BY database, table, event_type, minute
ORDER BY minute DESC
LIMIT 30;

Tuning Merge Performance

Increase Concurrency (Carefully)

-- Default: 16. Increase if disk I/O has headroom.
SET background_pool_size = 32;

More concurrent merges means faster part reduction — but only if disk I/O can handle it. On spinning disks, more merges can create more contention and actually slow things down. On NVMe, you can often double the pool size.

Allow Larger Merges

-- Default: ~150GB. Increase if you have large tables with big parts.
SET max_bytes_to_merge_at_max_space_in_pool = 214748364800; -- 200GB

Prioritize Merge Over Queries (Emergency)

-- During a part count emergency, temporarily reduce query concurrency
SET max_threads = 4;  -- Default: number of cores
SET background_pool_size = 48;  -- Give merges more resources

This is a temporary measure. See 312 Parts Per Partition for the full emergency response.

How ClusterSight Uses system.merges

ClusterSight tracks merge activity as part of the P (Parts) dimension in the PULSE Framework:

  • Merge throughput trending — alerts when merge completion rate drops below part creation rate
  • I/O contention detection — flags merges with throughput below 10 MB/s
  • Pool saturation monitoring — alerts when merge pool is at capacity for more than 5 minutes
  • Mutation vs merge tracking — detects when mutations are starving regular merges

When the P score drops, ClusterSight tells you whether the cause is insert rate, I/O contention, or mutation blocking — and provides the specific settings change or OPTIMIZE command to fix it.

Check your cluster's PULSE.


This post is part of the ClickHouse at Scale series — authoritative deep dives on ClickHouse system tables.

Read next:

Frequently Asked Questions

What is system.merges in ClickHouse?

system.merges is a ClickHouse system table that shows all currently running merge operations. Each row represents one active merge, with details about source parts, progress, memory usage, and throughput. The table is empty when no merges are running.

How do I check if ClickHouse merges are running?

Query SELECT database, table, elapsed, progress, num_parts, total_size_bytes_compressed FROM system.merges ORDER BY elapsed DESC. An empty result means no merges are currently active. If your table has many parts and no merges are running, check if mutations are blocking the merge queue.

Why are ClickHouse merges slow?

Common causes: disk I/O contention from concurrent queries, too many concurrent merges saturating I/O bandwidth, very large parts (>10GB) taking hours to merge, insufficient memory for merge buffers, or RAID/network storage with high latency. Check system.merges progress column — if it's not advancing, I/O is the bottleneck.

How many concurrent merges can ClickHouse run?

ClickHouse runs up to background_pool_size concurrent merges (default: 16). You can increase this for faster merge throughput, but each merge consumes disk I/O and memory. On I/O-bound systems, increasing the pool size can actually slow merges by creating more contention.