Skip to content

API Reference

ClusterSight exposes a REST API at /api/v1/. All responses use the ApiResponse envelope unless noted.


Authentication

When CLUSTERSIGHT_PASSWORD is set, all endpoints (except /api/v1/health) require an Authorization header.

Token derivation

The bearer token is the lowercase hex SHA-256 of the plaintext password:

# Derive the token from your password
echo -n "your_password" | sha256sum | cut -d' ' -f1
# Example output: 5e884898da28047151d0e56f8dc629277...

Making authenticated requests

curl -H "Authorization: Bearer <token>" http://localhost:3001/api/v1/clusters

Dev mode (no authentication)

When CLUSTERSIGHT_PASSWORD is not set, no Authorization header is required. A warning is logged on startup (auth_mode: "dev_mode_no_auth").


Base URL & Versioning

All endpoints are prefixed with /api/v1/.

Base URL (default Docker mapping): http://localhost:3001/api/v1

Response envelope

All endpoints return the ApiResponse envelope:

{
  "data": <payload>,
  "meta": { "cache_hit": false },
  "error": null
}
FieldTypeDescription
dataanyResponse payload (object, array, or null)
metaobjectMetadata — may include cache_hit: true for cached panel responses
errorobject | nullError details on failure; null on success

Endpoints Overview

MethodPathDescriptionAuth
GET/healthHealth check + versionNo
GET/overviewAll clusters with health + alert summaryYes
GET/clustersList configured clustersYes
POST/clustersAdd a new clusterYes
POST/clusters/testTest connection & discover topologyYes
GET/clusters/{id}Get a single clusterYes
PUT/clusters/{id}Update cluster configurationYes
DELETE/clusters/{id}Remove a clusterYes
GET/panels/health-scoreCluster health score + componentsYes
GET/panels/replicationReplication lag time-seriesYes
GET/panels/mergesActive merges time-seriesYes
GET/panels/diskDisk usage per diskYes
GET/panels/mutationsActive mutationsYes
GET/panels/broken-partsBroken/detached parts countYes
GET/panels/keeperKeeper connection healthYes
GET/panels/keeper-nodesPer-node Keeper TCP statusYes
GET/panels/zookeeperZooKeeper session healthYes
GET/panels/compressionPer-table compression ratiosYes
GET/panels/errorsError event time-seriesYes
GET/alerts/rules/metricsValid metric keys for custom rulesYes
GET/alerts/rulesList alert rulesYes
POST/alerts/rulesCreate a custom alert ruleYes
PUT/alerts/rules/{id}Update an alert ruleYes
DELETE/alerts/rules/{id}Delete a custom alert ruleYes
GET/alerts/historyAlert history with filtersYes
POST/alerts/{id}/acknowledgeAcknowledge an alertYes
POST/alerts/{id}/snoozeSnooze an alertYes
POST/alerts/{id}/escalateEscalate alert to SlackYes
GET/alerts/summaryUnresolved alert countYes
GET/alerts/{id}Single alert detailYes
GET/queries/slowSlow queries analysisYes
POST/queries/explainFetch EXPLAIN for a queryYes
GET/queries/failedFailed queries breakdownYes
GET/queries/parts-distributionParts size distributionYes
GET/settingsApplication settingsYes
PUT/settingsUpdate application settingsYes
POST/settings/test-notificationSend a test Slack notificationYes

Health

GET /api/v1/health

Health check. Does not require authentication. Safe to use as a liveness probe.

Response:

{
  "data": { "version": "0.1.0" },
  "meta": {},
  "error": null
}

curl:

curl http://localhost:3001/api/v1/health

Cluster Overview

GET /api/v1/overview

Returns a summary of all active clusters: health score, alert counts, collector status. Powers the Cluster Overview page.

Response: Array of ClusterOverviewItem inside data.

{
  "data": [
    {
      "id": "abc123",
      "name": "production",
      "host": "clickhouse.internal",
      "port": 8123,
      "health_score": 87.4,
      "grade": "B+",
      "active_alert_count": 1,
      "collector_status": "online",
      "last_collected_at": "2026-03-22T14:00:00Z",
      "logical_cluster_count": 3
    }
  ],
  "meta": {},
  "error": null
}

Clusters

GET /api/v1/clusters

List all configured clusters.

curl:

curl -H "Authorization: Bearer <token>" http://localhost:3001/api/v1/clusters

Response: Array of ClusterResponse objects.


POST /api/v1/clusters

Add a new cluster.

Request body:

{
  "host": "clickhouse.internal",
  "port": 8123,
  "username": "clustersight_ro",
  "password": "readonly_password",
  "name": "production"
}

Response: ClusterResponse with assigned id.


POST /api/v1/clusters/test

Test a connection before saving. Returns topology information (discovered clusters, Keeper nodes).

Request body: Same as POST /clusters.

Response:

{
  "data": {
    "success": true,
    "message": "Connected successfully",
    "clusters": ["cluster1", "cluster2"],
    "keeper_nodes": ["keeper1:9181", "keeper2:9181"]
  },
  "meta": {},
  "error": null
}

GET /api/v1/clusters/

Get a single cluster by ID.


PUT /api/v1/clusters/

Update cluster configuration (host, port, credentials, name).


DELETE /api/v1/clusters/

Soft-delete a cluster. The cluster is marked inactive; its data is retained.

Response:

{ "data": { "id": "abc123", "deleted": true }, "meta": {}, "error": null }

Dashboard Panels

Each panel is a dedicated endpoint. Panel data is cached for CACHE_TTL seconds (default: 30).

GET /api/v1/panels/health-score

Returns the current health score, grade, trend, and per-component breakdown.

Response:

{
  "data": {
    "overall_score": 87.4,
    "grade": "B+",
    "trend": "stable",
    "previous_score": 85.0,
    "components": {
      "replication": { "score": 100.0, "label": "Replication" },
      "storage":     { "score": 82.0,  "label": "Storage" },
      "errors":      { "score": 95.0,  "label": "Errors" },
      "infrastructure": { "score": 80.0, "label": "Infrastructure" },
      "queries":     { "score": 70.0,  "label": "Queries" }
    },
    "calculated_at": "2026-03-22T14:00:00Z"
  },
  "meta": { "cache_hit": false },
  "error": null
}

No ?range parameter — returns the most recently computed score.


GET /api/v1/panels/replication

Query params: ?range=1h|6h|24h|7d (default: 1h)

Returns replication lag time-series per replicated table.


GET /api/v1/panels/merges

Query params: ?range=1h|6h|24h|7d (default: 1h)

Returns active merge counts and estimated completion times.


GET /api/v1/panels/disk

No range param. Returns current disk usage per disk path (used bytes, free bytes, usage %).


GET /api/v1/panels/mutations

No range param. Returns active mutations with estimated parts remaining.


GET /api/v1/panels/broken-parts

No range param. Returns count of detached/broken parts per table.


GET /api/v1/panels/keeper

No range param. Returns Keeper connection status (ONLINE / OFFLINE).


GET /api/v1/panels/keeper-nodes

No range param. Returns per-node TCP health status (status: 1 = reachable, 0 = unreachable).


GET /api/v1/panels/zookeeper

No range param. Returns ZooKeeper/Keeper session health metrics.


GET /api/v1/panels/compression

No range param. Returns per-table compression ratios (compressed / uncompressed bytes).


GET /api/v1/panels/errors

Query params: ?range=<seconds> (integer, 60–604800, default: 3600)

Note: Unlike replication and merges, the errors panel uses an integer seconds range, not the 1h|6h|24h|7d enum.

Returns error event count time-series from system.errors.

curl:

# Last 6 hours (21600 seconds)
curl -H "Authorization: Bearer <token>" \
  "http://localhost:3001/api/v1/panels/errors?range=21600"

Alerts

GET /api/v1/alerts/rules

List all alert rules (built-in and custom).

curl:

curl -H "Authorization: Bearer <token>" http://localhost:3001/api/v1/alerts/rules

POST /api/v1/alerts/rules

Create a custom alert rule.

Request body:

{
  "name": "My Custom Rule",
  "metric_key": "disk.usage_pct",
  "operator": ">",
  "threshold": 70.0,
  "severity": "warning",
  "cooldown_sec": 3600
}

Get valid metric keys: GET /api/v1/alerts/rules/metrics


PUT /api/v1/alerts/rules/

Update an existing rule. Built-in rules can have their threshold, severity, and cooldown updated but cannot be deleted.


DELETE /api/v1/alerts/rules/

Delete a custom rule. Returns {"id": "...", "deleted": true}. Built-in rules (rule_type: "built_in") cannot be deleted.


GET /api/v1/alerts/history

List alert history with optional filters.

Query params:

ParamTypeDefaultDescription
severitystring(all)Filter by warning or critical
statusstring(all)Filter by active, acknowledged, or resolved
hoursint (1–168)24Lookback window in hours
limitint (1–200)50Maximum results to return
offsetint (≥ 0)0Pagination offset

curl:

# Active critical alerts in the last 24 hours
curl -H "Authorization: Bearer <token>" \
  "http://localhost:3001/api/v1/alerts/history?status=active&severity=critical&hours=24"

POST /api/v1/alerts//acknowledge

Mark an alert as acknowledged. Sets status to acknowledged.


POST /api/v1/alerts//snooze

Snooze an alert for a specified duration.

Request body:

{ "duration_minutes": 60 }

Response: { "snoozed_until": "<ISO timestamp>", "rule_id": "..." }


POST /api/v1/alerts//escalate

Escalate an alert to Slack immediately, regardless of notification cooldown.


GET /api/v1/alerts/summary

Returns the count of currently unresolved alerts.

Response:

{ "data": { "count": 3 }, "meta": {}, "error": null }

GET /api/v1/alerts/

Get full detail for a single alert, including metric history and fix command.


Query Inspector

GET /api/v1/queries/slow

Returns the slowest queries from system.query_log.

Query params: ?hours=<1–168> (default: 24), ?limit=<1–200> (default: 50)

curl:

# Top 20 slowest queries in the last 6 hours
curl -H "Authorization: Bearer <token>" \
  "http://localhost:3001/api/v1/queries/slow?hours=6&limit=20"

POST /api/v1/queries/explain

Fetch the EXPLAIN output for a specific query by its hash.

Request body:

{ "query_hash": "abc123def456..." }

GET /api/v1/queries/failed

Returns failed queries grouped by error type.

Query params: ?hours=<1–168> (default: 24), ?limit=<1–200> (default: 50)


GET /api/v1/queries/parts-distribution

Returns parts size distribution across tables.

Query params: ?limit=<1–500> (default: 100)


Settings

GET /api/v1/settings

Get current application settings. The slack_webhook_url field is masked in the response.


PUT /api/v1/settings

Update application settings.

Request body (all fields optional):

{
  "slack_webhook_url": "https://hooks.slack.com/services/...",
  "collection_interval_sec": 30,
  "tier1_enabled": true,
  "tier2_enabled": true,
  "retention_raw_days": 7,
  "retention_hourly_days": 30,
  "retention_daily_days": 365
}

POST /api/v1/settings/test-notification

Send a test Slack notification using the configured webhook URL.

Response: { "data": { "sent": true }, "meta": {}, "error": null }


Error Codes

HTTP StatusMeaning in ClusterSight
401 UnauthorizedMissing or invalid Authorization header. Provide Bearer <sha256-token> or disable password protection.
404 Not FoundThe requested resource (cluster, alert, rule) does not exist or has been deleted.
409 ConflictDuplicate resource — e.g., a cluster with the same host:port already exists.
422 Unprocessable EntityValidation error. Response body contains field-level details: { "error": { "detail": [...] } }
503 Service UnavailableClickHouse is unreachable. The panel or query endpoint could not connect to the cluster. Verify ClickHouse is running and accessible.

curl Examples

List all clusters

TOKEN=$(echo -n "your_password" | sha256sum | cut -d' ' -f1)
 
curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:3001/api/v1/clusters

Get health score

curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:3001/api/v1/panels/health-score

List active alerts (last 24 hours)

curl -H "Authorization: Bearer $TOKEN" \
  "http://localhost:3001/api/v1/alerts/history?status=active&hours=24"

Add a cluster

curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"host":"clickhouse.internal","port":8123,"username":"clustersight_ro","password":"pw","name":"prod"}' \
  http://localhost:3001/api/v1/clusters

Acknowledge an alert

curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  http://localhost:3001/api/v1/alerts/<alert-id>/acknowledge