Is OPTIMIZE TABLE safe in ClickHouse production?

OPTIMIZE TABLE without FINAL is generally safe — it triggers one round of merges. OPTIMIZE TABLE FINAL rewrites ALL parts into one and should be used cautiously in production. On large tables it consumes massive I/O, blocks other merges, and can cause replication lag. Use partition-level optimization instead.

What does OPTIMIZE TABLE FINAL do in ClickHouse?

OPTIMIZE TABLE FINAL forces ClickHouse to merge ALL active parts in a table into a single part per partition. This rewrites every byte of data, consuming significant disk I/O and memory. During the operation, regular merges for that table are blocked and part count temporarily increases.

What should I use instead of OPTIMIZE TABLE FINAL?

Use OPTIMIZE TABLE database.table PARTITION 'partition_id' to optimize one partition at a time. Or let ClickHouse's background merge process handle optimization naturally by tuning merge settings (background_pool_size, min_bytes_to_merge_at_max_space_in_pool).

When is OPTIMIZE TABLE FINAL appropriate?

OPTIMIZE TABLE FINAL is appropriate for: small tables under 1GB, after codec changes that you want applied to existing data, post-migration cleanup on idle tables, and development/staging environments. Never on large production tables during active ingestion.

Stop Running OPTIMIZE TABLE in Production — Here's Why

Every ClickHouse runbook has it. Every "fix slow queries" guide recommends it. And it causes more production incidents than the problem it's trying to solve.

OPTIMIZE TABLE ... FINAL is the ClickHouse equivalent of rm -rf in its ability to seem helpful while causing damage. It's not inherently bad — it's just wildly misused in production environments.

What OPTIMIZE TABLE Actually Does

Without FINAL

OPTIMIZE TABLE events;

This triggers one round of merges on the table. ClickHouse picks candidate parts and merges them according to its normal merge algorithm. This is relatively safe — it's essentially asking ClickHouse to do what it would do anyway, just sooner.

With FINAL

OPTIMIZE TABLE events FINAL;

This forces ClickHouse to merge ALL active parts in every partition into a single part per partition. It rewrites every byte of data in the table. On a 500GB table, that means:

Reading 500GB from disk
Decompressing all data
Re-sorting and merging
Recompressing
Writing 500GB back to disk
Marking old parts for deletion

During this process:

Disk I/O is consumed by the rewrite
Regular merges for this table are blocked
Part count temporarily increases (new part created before old parts deleted)
Replication must transfer the new parts to replicas
Memory usage spikes for merge buffers

The Five Ways OPTIMIZE TABLE FINAL Hurts Production

1. It Blocks the Merge Queue

While OPTIMIZE FINAL runs, no other merges can happen on that table. If your table receives constant inserts, new parts accumulate while FINAL runs. On a table that takes 2 hours to optimize, you could accumulate enough parts during those 2 hours to trigger "too many parts" errors.

2. It Causes Replication Storms

The new merged part must be replicated to every replica. A 500GB table optimized to one part per partition means replicas need to fetch hundreds of gigabytes of new parts. This saturates network bandwidth and causes replication lag — potentially across all tables on the cluster, not just the one being optimized.

3. It Doubles Disk Usage Temporarily

During the merge, both the old parts and the new merged part exist simultaneously. A 500GB table briefly needs 1TB of disk. If your disk is 70% full before running OPTIMIZE, you'll run out of space.

4. It's Non-Resumable

If OPTIMIZE FINAL is interrupted (server restart, OOM, disk full), all progress is lost. The partially-written merged part is discarded. You're back where you started — except now you've wasted hours of I/O.

5. It Triggers Unnecessary Work

ClickHouse's merge algorithm is smart. It merges parts that benefit from merging — similar sizes, same partition, manageable I/O cost. OPTIMIZE FINAL overrides this intelligence and forces merges that ClickHouse deliberately deferred because they're expensive.

What To Do Instead

Option 1: Optimize Per Partition

-- Find the partitions that actually need optimization
SELECT
    database,
    table,
    partition_id,
    count() AS parts,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
HAVING parts > 10
ORDER BY parts DESC
LIMIT 20;
 
-- Optimize just the problematic partition
OPTIMIZE TABLE events PARTITION '202603';

This limits the blast radius. One partition at a time. If it fails or causes issues, only that partition is affected.

Option 2: Tune Merge Settings

Instead of forcing merges, help ClickHouse merge more effectively:

-- Allow more concurrent merges
-- Default: 16. Increase on fast storage.
SET background_pool_size = 32;
 
-- Allow larger merges
SET max_bytes_to_merge_at_max_space_in_pool = 214748364800; -- 200GB
 
-- Reduce the delay between merge attempts
SET merge_tree_clear_old_temporary_directories_interval_seconds = 60;

Option 3: Use OPTIMIZE Without FINAL

-- Safe: triggers one round of normal merges
OPTIMIZE TABLE events;

This respects ClickHouse's merge algorithm. It won't merge everything into one part, but it will reduce part count using the same logic ClickHouse uses internally.

Option 4: Let Merges Happen Naturally

If part count is under 200 per partition and merges aren't falling behind, you don't need to optimize at all. ClickHouse will merge parts on its own schedule. The "too many parts" threshold is 300 — there's headroom.

When OPTIMIZE FINAL Is Actually Fine

There are legitimate use cases:

Small tables (< 1GB): The rewrite completes in seconds. No meaningful impact.
After codec changes: You changed a column's compression codec and want it applied to existing data. Optimize one partition at a time.
Post-migration cleanup: After a large data migration on an idle table. No ongoing inserts to compete with.
ReplacingMergeTree deduplication: To force final deduplication before reading. But use FINAL in the SELECT query instead: SELECT * FROM table FINAL.
Development/staging: No production traffic. Do whatever you want.

The Cron Job Anti-Pattern

The worst pattern: a cron job that runs OPTIMIZE TABLE ... FINAL on every table nightly. This exists in many production environments because someone added it during initial setup to "keep things clean." It:

Rewrites terabytes of data nightly for no benefit
Causes predictable replication lag every night
Doubles disk usage during the optimization window
Blocks natural merges during the highest-I/O period
Masks underlying problems (like high-frequency inserts) by brute-forcing part counts down

If you have this cron job: delete it. Monitor part counts instead. Fix root causes (insert batching, merge settings). Only optimize partitions that actually need it.

How to Check If You Need OPTIMIZE

Before running any optimization, check whether it's actually needed:

-- Do any tables actually have too many parts?
SELECT database, table, partition_id, count() AS parts
FROM system.parts WHERE active = 1
GROUP BY database, table, partition_id
HAVING parts > 100
ORDER BY parts DESC;

If this returns nothing, you don't need to optimize. Your merge queue is healthy. Let it be.

If it returns results, diagnose the root cause before optimizing:

Check merge rate vs insert rate
Check for stuck mutations blocking merges
Check system.merges for I/O contention

Fix the cause. Then, if part count is still high, optimize the specific partitions that need it.

ClusterSight monitors part count trends and merge queue health. It tells you when optimization is actually needed — and for which specific partitions — so you never have to guess.

Check your cluster's PULSE.

This post is part of the Opinions series — honest takes on ClickHouse operations. Previously: Grafana + Prometheus Is Not ClickHouse Monitoring.

Read next: