Monitoring & Live Events

Overview

LangSight exposes two complementary monitoring surfaces:

Prometheus /metrics endpoint — pull-based metrics for infrastructure dashboards and alerting rules
SSE live event feed (GET /api/live/events) — push-based real-time events for dashboard UIs and custom integrations

Use Prometheus + Grafana for long-term monitoring, capacity planning, and SLO-based alerting. Use the SSE feed for instant UI updates and event-driven automation.

Prometheus Metrics

Available metrics

Metric	Type	Labels	Description
`langsight_http_requests_total`	Counter	`method`, `path`, `status`	Total HTTP requests processed by the API
`langsight_http_request_duration_seconds`	Histogram	`method`, `path`	Request duration with buckets: 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s
`langsight_spans_ingested_total`	Counter	—	Total tool call spans ingested via `/api/traces/spans` and `/api/traces/otlp`
`langsight_active_sse_connections`	Gauge	—	Number of currently connected SSE live feed clients
`langsight_health_checks_total`	Counter	`server`, `status`	Total MCP health checks performed, labeled by server name and result status

Path normalization

The path label in HTTP metrics uses normalized paths to keep cardinality bounded. UUIDs and long hex identifiers are collapsed to {id}:

/api/agents/sessions/abc123def456  -->  /api/agents/sessions/{id}
/api/projects/proj-xyz-789/members -->  /api/projects/{id}/members

High-frequency internal paths (/metrics, /api/liveness, /api/readiness) are excluded from instrumentation entirely.

Authentication

The /metrics endpoint requires a bearer token. Set the LANGSIGHT_METRICS_TOKEN environment variable before starting the API:

export LANGSIGHT_METRICS_TOKEN=$(python -c "import secrets; print(secrets.token_hex(32))")

If LANGSIGHT_METRICS_TOKEN is not set, the endpoint returns 404 (no body) so it does not fingerprint the deployment.
If the token is wrong or missing in the request, the endpoint returns 401 Unauthorized.
Pass the token in every scrape request using the Authorization: Bearer <token> header.

Scrape configuration

Add the following job to your prometheus.yml, including the authorization block:

# prometheus.yml
scrape_configs:
  - job_name: langsight
    scrape_interval: 15s
    static_configs:
      - targets: ["localhost:8000"]
    metrics_path: /metrics
    authorization:
      credentials: "<your-LANGSIGHT_METRICS_TOKEN-value>"

If running inside Docker Compose, use the service name:

scrape_configs:
  - job_name: langsight
    scrape_interval: 15s
    static_configs:
      - targets: ["api:8000"]
    metrics_path: /metrics
    authorization:
      credentials: "<your-LANGSIGHT_METRICS_TOKEN-value>"

Store the token value in a Prometheus --web.config.file or use environment variable expansion ($LANGSIGHT_METRICS_TOKEN) rather than hardcoding it in prometheus.yml.

Verify the endpoint

curl -H "Authorization: Bearer <your-token>" http://localhost:8000/metrics

You should see Prometheus text exposition format output:

# HELP langsight_http_requests_total Total HTTP requests
# TYPE langsight_http_requests_total counter
langsight_http_requests_total{method="GET",path="/api/health/servers",status="200"} 42.0
# HELP langsight_spans_ingested_total Total tool call spans ingested
# TYPE langsight_spans_ingested_total counter
langsight_spans_ingested_total 1337.0
...

Grafana Dashboard Tips

Recommended panels

Panel	PromQL	Visualization
Request rate	`rate(langsight_http_requests_total[5m])`	Time series, stacked by `path`
Error rate	`sum(rate(langsight_http_requests_total{status=~"5.."}[5m])) / sum(rate(langsight_http_requests_total[5m]))`	Stat panel, threshold: red > 1%
p99 latency	`histogram_quantile(0.99, rate(langsight_http_request_duration_seconds_bucket[5m]))`	Time series, by `path`
Span ingestion rate	`rate(langsight_spans_ingested_total[5m])`	Stat panel (spans/sec)
Active SSE clients	`langsight_active_sse_connections`	Gauge panel
Health check rate	`rate(langsight_health_checks_total[5m])`	Time series, by `server`
Health check failures	`rate(langsight_health_checks_total{status="down"}[5m])`	Time series, by `server`

Alert rules (Grafana Alerting)

# Example: alert when API error rate exceeds 5% for 5 minutes
groups:
  - name: langsight
    rules:
      - alert: LangSightHighErrorRate
        expr: |
          sum(rate(langsight_http_requests_total{status=~"5.."}[5m]))
          /
          sum(rate(langsight_http_requests_total[5m]))
          > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "LangSight API error rate above 5%"

      - alert: LangSightHighLatency
        expr: |
          histogram_quantile(0.99, rate(langsight_http_request_duration_seconds_bucket[5m]))
          > 2.0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "LangSight API p99 latency above 2 seconds"

      - alert: LangSightHealthCheckFailing
        expr: |
          rate(langsight_health_checks_total{status="down"}[5m]) > 0
        for: 3m
        labels:
          severity: critical
        annotations:
          summary: "MCP server {{ $labels.server }} failing health checks"

Dashboard JSON

A pre-built Grafana dashboard JSON is planned for a future release. In the meantime, create a dashboard manually using the PromQL queries above, or import the panels into an existing infrastructure dashboard.

SSE Live Event Feed

How it works

GET /api/live/events opens a Server-Sent Events stream. The server pushes events as they happen — no polling required. Authentication: This endpoint requires the same authentication as all other API routes (X-API-Key header or session proxy headers). Events pushed:

Event type	Triggered by	Payload fields
`span:new`	Span ingestion via `/api/traces/spans`	`session_id`, `agent_name`, `server_name`, `tool_name`, `status`, `latency_ms`
`health:check`	Health check completion	`server_name`, `status`, `latency_ms`

Connection behavior:

Keepalive comments (: keepalive) sent every 15 seconds to prevent proxy/load-balancer timeouts
Maximum 200 concurrent clients — the 201st connection receives an event: error and closes
Each client has a 50-event buffer; if the client is slower than the event rate, the oldest event is dropped
The browser EventSource API reconnects automatically on disconnection

JavaScript example (browser)

// Connect to the live event stream
const source = new EventSource("/api/proxy/live/events");

// Listen for new span ingestion events
source.addEventListener("span:new", (event) => {
  const span = JSON.parse(event.data);
  console.log(`[span:new] ${span.agent_name} -> ${span.server_name}/${span.tool_name}: ${span.status} (${span.latency_ms}ms)`);

  // Example: update a dashboard counter
  updateSpanCount(span);
});

// Listen for health check events
source.addEventListener("health:check", (event) => {
  const check = JSON.parse(event.data);
  console.log(`[health:check] ${check.server_name}: ${check.status} (${check.latency_ms}ms)`);

  // Example: flash a status indicator
  updateServerHealth(check);
});

// Handle connection errors
source.onerror = () => {
  console.warn("SSE connection lost — EventSource will reconnect automatically");
};

When using the LangSight dashboard, the live event feed is connected automatically. The example above is for custom integrations or external dashboards that want to receive real-time events from LangSight.

Python example (httpx-sse)

import httpx
from httpx_sse import connect_sse

with httpx.Client() as client:
    with connect_sse(
        client,
        "GET",
        "http://localhost:8000/api/live/events",
        headers={"X-API-Key": "ls_your_key"},
    ) as event_source:
        for sse in event_source.iter_sse():
            if sse.event == "span:new":
                print(f"New span: {sse.data}")
            elif sse.event == "health:check":
                print(f"Health check: {sse.data}")

curl example

curl -N -H "X-API-Key: ls_your_key" \
  http://localhost:8000/api/live/events

The -N flag disables output buffering so events appear immediately.

Dashboard Integration

The LangSight Next.js dashboard uses both monitoring surfaces:

Polling (existing): SWR fetchers poll REST API endpoints at 5s (health) and 30s (metrics) intervals for page-level data
SSE (new): The dashboard connects to GET /api/live/events for instant notifications — span ingestion events trigger session list refreshes and health check events update server status indicators without waiting for the next poll cycle

The Prometheus metrics are not consumed by the dashboard directly. They are intended for external monitoring infrastructure (Prometheus, Grafana, Datadog, etc.) to provide infrastructure-level visibility into LangSight itself.

Architecture

  Agent frameworks         LangSight API              Monitoring
  ────────────────         ──────────────             ──────────
  CrewAI    ──spans──►  POST /api/traces/spans
  Pydantic AI             │                         GET /metrics
  OpenAI Agents           ├── store in ClickHouse      │
                          ├── broadcast to SSE ──► GET /api/live/events
                          │                            │
                          │                     ┌──────▼──────┐
                          │                     │  Dashboard   │
                          │                     │  (EventSource)│
                          │                     └─────────────┘
                          │
                     ┌────▼──────┐
                     │ Prometheus │
                     │ (scrapes)  │
                     └────┬──────┘
                          │
                     ┌────▼──────┐
                     │  Grafana   │
                     │ (visualize)│
                     └───────────┘

Single-instance limitation

The SSEBroadcaster is an in-memory asyncio pub/sub. Events published on one API instance are not visible to SSE clients connected to a different instance. For single-instance deployments (the default Docker Compose setup), this is not a limitation. For multi-instance horizontal scaling, a Redis pub/sub layer can be added behind the same SSEBroadcaster interface. This is planned for a future release when horizontal scaling becomes a requirement.

Documentation Index

​Overview

​Prometheus Metrics

​Available metrics

​Path normalization

​Authentication

​Scrape configuration

​Verify the endpoint

​Grafana Dashboard Tips

​Recommended panels

​Alert rules (Grafana Alerting)

​Dashboard JSON

​SSE Live Event Feed

​How it works

​JavaScript example (browser)

​Python example (httpx-sse)

​curl example

​Dashboard Integration

​Architecture

​Single-instance limitation

Overview

Prometheus Metrics

Available metrics

Path normalization

Authentication

Scrape configuration

Verify the endpoint

Grafana Dashboard Tips

Recommended panels

Alert rules (Grafana Alerting)

Dashboard JSON

SSE Live Event Feed

How it works

JavaScript example (browser)

Python example (httpx-sse)

curl example

Dashboard Integration

Architecture

Single-instance limitation