What is system.query_log in ClickHouse?

system.query_log is a system table that records every query executed on the ClickHouse server. It includes timing, resource usage, read statistics, error information, and execution metadata. It's the primary tool for diagnosing query performance issues.

How do I find slow queries in ClickHouse?

Query SELECT query, query_duration_ms, read_rows, memory_usage FROM system.query_log WHERE type = 'QueryFinish' AND query_duration_ms > 5000 AND event_time > now() - INTERVAL 1 HOUR ORDER BY query_duration_ms DESC LIMIT 20. This shows queries over 5 seconds in the last hour.

What are the most important columns in system.query_log?

The 15 most important fields are: query_duration_ms, read_rows, read_bytes, memory_usage, result_rows, written_rows, type (for errors), exception, ProfileEvents (for granular metrics), query_kind, is_initial_query, user, tables, thread_ids, and peak_memory_usage.

How long does ClickHouse keep query_log data?

By default, ClickHouse keeps query_log data for 30 days (controlled by the system_log_engine_options setting). On busy clusters, query_log can grow very large — consider reducing retention or filtering out frequent lightweight queries.

system.query_log: The 15 Fields That Actually Matter for Performance

system.query_log has 60+ columns. Reading all of them for every slow query investigation is like reading an entire phone book to find one person. These 15 fields diagnose 95% of query performance issues. Learn what each one means operationally and you'll never waste time scrolling through irrelevant columns again.

The Quick Start: Two Queries That Solve Most Problems

Before the deep dive, here are the two queries you'll use 80% of the time:

Slowest Queries in the Last Hour

SELECT
    query_id,
    user,
    query_duration_ms,
    formatReadableSize(read_bytes) AS data_read,
    read_rows,
    formatReadableSize(memory_usage) AS memory,
    result_rows,
    substring(query, 1, 200) AS query_preview
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time > now() - INTERVAL 1 HOUR
  AND query_kind = 'Select'
ORDER BY query_duration_ms DESC
LIMIT 20;

Most Resource-Intensive Queries (Aggregated)

SELECT
    normalized_query_hash,
    count() AS executions,
    round(avg(query_duration_ms), 0) AS avg_ms,
    round(max(query_duration_ms), 0) AS max_ms,
    formatReadableSize(avg(read_bytes)) AS avg_read,
    formatReadableSize(avg(memory_usage)) AS avg_memory,
    substring(any(query), 1, 150) AS example_query
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time > now() - INTERVAL 24 HOUR
  AND query_kind = 'Select'
GROUP BY normalized_query_hash
ORDER BY avg_ms * executions DESC
LIMIT 20;

This second query is more powerful — it groups identical queries by their normalized hash and shows which query patterns consume the most total resources. A query that takes 500ms but runs 10,000 times/day is more impactful than a query that takes 30 seconds but runs once.

The 15 Fields That Matter

1. `query_duration_ms`

What it is: Total wall-clock time for the query in milliseconds.

What it means operationally: The primary performance metric. But duration alone doesn't tell you why a query is slow. A 10-second query that reads 50GB of data isn't broken — it's doing a lot of work. A 10-second query that reads 5MB of data has a problem.

Always pair with: read_bytes and read_rows to determine if duration is proportional to work.

2. `read_rows`

What it is: Total rows read from storage to serve this query.

What it means operationally: The most important diagnostic field. If read_rows is much larger than result_rows, the query is scanning far more data than it returns. This usually means:

The WHERE clause doesn't align with the primary key
A GROUP BY on high-cardinality columns
Missing skip indexes

-- Queries with worst scan-to-result ratio
SELECT
    query_id,
    read_rows,
    result_rows,
    round(read_rows / greatest(result_rows, 1), 0) AS amplification,
    query_duration_ms,
    substring(query, 1, 200) AS query_preview
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time > now() - INTERVAL 1 HOUR
  AND result_rows > 0
ORDER BY amplification DESC
LIMIT 20;

An amplification above 1000x is usually fixable. See the query performance guide for optimization strategies.

3. `read_bytes`

What it is: Total bytes read from disk (compressed + decompressed) to serve this query.

What it means operationally: Shows the actual I/O cost. Compare with read_rows — if read_bytes is huge but read_rows is modest, you're reading too many wide columns. If both are huge, you're scanning too many rows.

4. `memory_usage`

What it is: Total memory used by this query.

What it means operationally: High memory queries risk OOM errors. Monitor this to find queries that need max_memory_usage limits or query restructuring.

-- Memory-heavy queries that could cause OOM
SELECT
    query_id,
    user,
    formatReadableSize(memory_usage) AS memory,
    formatReadableSize(peak_memory_usage) AS peak_memory,
    query_duration_ms,
    substring(query, 1, 200) AS query_preview
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time > now() - INTERVAL 24 HOUR
  AND memory_usage > 1073741824  -- > 1GB
ORDER BY memory_usage DESC
LIMIT 20;

5. `peak_memory_usage`

What it is: The maximum memory used at any point during query execution.

What it means operationally: memory_usage is cumulative. peak_memory_usage shows the actual high-water mark. A query might use 2GB total but peak at 500MB — or peak at 4GB and nearly OOM. The peak determines whether the query will trigger memory limits.

6. `result_rows`

What it is: Number of rows returned to the client.

What it means operationally: Compare with read_rows to calculate scan amplification. If result_rows is very high (millions), the client might be pulling too much data — consider server-side aggregation.

7. `written_rows` / `written_bytes`

What they are: Rows and bytes written (for INSERT, CREATE TABLE AS SELECT, materialized views).

What they mean operationally: For insert monitoring, these show actual data volume ingested. For materialized views, they show the write amplification — how much data the MV generates per insert.

8. `type`

What it is: Query lifecycle event type. Values: QueryStart, QueryFinish, ExceptionBeforeStart, ExceptionWhileProcessing.

What it means operationally: Always filter by type in your queries:

QueryFinish — successful queries with full resource metrics
ExceptionWhileProcessing — failed queries with the error
ExceptionBeforeStart — queries that failed before execution (syntax errors, permission denied)

-- Error rate by type
SELECT
    type,
    count() AS count,
    round(count() / (SELECT count() FROM system.query_log
        WHERE event_time > now() - INTERVAL 1 HOUR) * 100, 2) AS pct
FROM system.query_log
WHERE event_time > now() - INTERVAL 1 HOUR
GROUP BY type
ORDER BY count DESC;

9. `exception` / `exception_code`

What they are: Error message and code for failed queries.

What they mean operationally: Group by exception_code to find systematic errors:

-- Most common query errors
SELECT
    exception_code,
    substring(any(exception), 1, 150) AS example_error,
    count() AS occurrences,
    uniqExact(user) AS affected_users
FROM system.query_log
WHERE type IN ('ExceptionBeforeStart', 'ExceptionWhileProcessing')
  AND event_time > now() - INTERVAL 24 HOUR
GROUP BY exception_code
ORDER BY occurrences DESC
LIMIT 15;

Common codes to watch: 241 (memory limit), 252 (too many parts), 159 (timeout), 60 (table doesn't exist).

10. `tables`

What it is: Array of tables accessed by the query.

What it means operationally: Identify which tables generate the most load:

-- Most queried tables
SELECT
    arrayJoin(tables) AS table_name,
    count() AS query_count,
    round(avg(query_duration_ms), 0) AS avg_ms,
    formatReadableSize(avg(read_bytes)) AS avg_read
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time > now() - INTERVAL 24 HOUR
GROUP BY table_name
ORDER BY query_count DESC
LIMIT 15;

11. `query_kind`

What it is: Select, Insert, Create, Alter, System, etc.

What it means operationally: Filter noise. Most performance investigations care about Select. Insert monitoring cares about Insert. System commands (OPTIMIZE, SYSTEM SYNC REPLICA) show up as System.

12. `is_initial_query`

What it is: Boolean — whether this query was initiated directly by a client (vs. being a distributed sub-query).

What it means operationally: On a multi-shard cluster, one SELECT generates sub-queries on each shard. is_initial_query = 1 gives you client-facing queries only. Without this filter, you'll count each query N times (once per shard).

13. `user`

What it is: The ClickHouse user that ran the query.

What it means operationally: Identify which application or team generates the most load. Essential for cost attribution and noisy-neighbor detection.

14. `normalized_query_hash`

What it is: A hash of the query with literals replaced by placeholders.

What it means operationally: Groups identical query patterns. SELECT * FROM events WHERE user_id = '123' and SELECT * FROM events WHERE user_id = '456' get the same hash. This is how you find the most expensive query patterns, not individual executions.

15. `ProfileEvents`

What it is: A Map of detailed execution metrics — hundreds of counters covering I/O, network, cache hits, marks read, and more.

What it means operationally: The deep diagnostic tool. When the basic fields don't explain why a query is slow:

-- Deep dive into a specific slow query
SELECT
    ProfileEvents['SelectedMarks'] AS marks_selected,
    ProfileEvents['SelectedRanges'] AS ranges_selected,
    ProfileEvents['FileOpen'] AS files_opened,
    ProfileEvents['ReadBufferFromFileDescriptorReadBytes'] AS disk_read_bytes,
    ProfileEvents['NetworkSendBytes'] AS network_sent,
    ProfileEvents['RealTimeMicroseconds'] AS wall_time_us,
    ProfileEvents['UserTimeMicroseconds'] AS cpu_time_us
FROM system.query_log
WHERE query_id = 'your-query-id-here'
  AND type = 'QueryFinish';

Key ProfileEvents to check:

SelectedMarks — how many granules ClickHouse had to read (lower is better)
FileOpen — excessive file opens means too many parts (see 312 Parts Per Partition)
RealTimeMicroseconds vs UserTimeMicroseconds — if real time >> user time, the query is waiting on I/O, not CPU

Operational Patterns

Pattern: Detecting Query Regressions After Deploys

-- Compare query performance between two time windows
SELECT
    normalized_query_hash,
    substring(any(query), 1, 100) AS query_pattern,
    avgIf(query_duration_ms, event_time < now() - INTERVAL 2 HOUR) AS avg_ms_before,
    avgIf(query_duration_ms, event_time > now() - INTERVAL 1 HOUR) AS avg_ms_after,
    round(avg_ms_after / greatest(avg_ms_before, 1), 2) AS slowdown_factor
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time > now() - INTERVAL 3 HOUR
  AND query_kind = 'Select'
GROUP BY normalized_query_hash
HAVING count() > 10
   AND avg_ms_after > avg_ms_before * 1.5
ORDER BY slowdown_factor DESC
LIMIT 15;

A slowdown_factor above 2x after a deploy means something changed. This catches regressions that individual query monitoring misses.

Pattern: Finding Queries That Should Be Materialized Views

-- Frequently repeated heavy queries — candidates for materialization
SELECT
    normalized_query_hash,
    count() AS executions,
    round(avg(query_duration_ms), 0) AS avg_ms,
    formatReadableSize(sum(read_bytes)) AS total_read,
    round(sum(query_duration_ms) / 1000, 0) AS total_seconds_spent,
    substring(any(query), 1, 150) AS query_pattern
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time > now() - INTERVAL 7 DAY
  AND query_kind = 'Select'
  AND query_duration_ms > 1000  -- > 1 second
GROUP BY normalized_query_hash
HAVING executions > 50
ORDER BY total_seconds_spent DESC
LIMIT 10;

Queries running 50+ times with > 1 second each are prime candidates for materialized views or pre-aggregation.

How ClusterSight Uses query_log

ClusterSight analyzes system.query_log as part of the L (Latency) dimension in the PULSE Framework:

Slow query trending — tracks p50/p95/p99 query duration over time, alerts on regression
Error rate monitoring — groups exceptions by code and alerts on spikes
Query pattern analysis — identifies the top resource-consuming query patterns using normalized_query_hash
Memory pressure detection — flags queries approaching memory limits before they OOM
Post-deploy regression detection — automatically compares query performance before and after deployments

These feed into the L score. When latency degrades, ClusterSight shows which query patterns changed and when.

Check your cluster's PULSE.

This post is part of the ClickHouse at Scale series — authoritative deep dives on ClickHouse system tables. Previously: The Complete Guide to system.replicas and system.merges Decoded.

Read next:

The Quick Start: Two Queries That Solve Most Problems

Slowest Queries in the Last Hour

Most Resource-Intensive Queries (Aggregated)

The 15 Fields That Matter

1. query_duration_ms

2. read_rows

3. read_bytes

4. memory_usage

5. peak_memory_usage

6. result_rows

7. written_rows / written_bytes

8. type

9. exception / exception_code

10. tables

11. query_kind

12. is_initial_query

13. user

14. normalized_query_hash

15. ProfileEvents