What it is
Blast Radius answers the first question in any MCP server outage: who is affected? When a server goesDOWN or DEGRADED, LangSight queries real tool-call traffic from the last 24 hours and computes exactly which agents called this server, how many sessions they ran, and what the error rate was. It classifies the outage severity — CRITICAL, HIGH, MEDIUM, or LOW — based on how many agents and sessions are at risk.
For servers that are currently UP, Blast Radius shows a pre-emptive view: “if this server went down right now, here is what would be affected.”
Where to find it
MCP Servers (/servers) → click any server name → Health tab
The Blast Radius panel appears at the top of the Health tab, above the health history chart.
Severity classification
| Severity | Condition |
|---|---|
CRITICAL | More than 5 distinct agents, or more than 100 sessions at risk in the last 24h |
HIGH | More than 2 agents, or more than 20 sessions |
MEDIUM | Any recent traffic — at least one agent called this server in the last 24h |
LOW | No traffic in the last 24h — no agents will be immediately impacted |
LOW does not mean the server is unimportant. It means no agent has called it recently. A server that is used infrequently (e.g., a nightly batch job) will show LOW during the day even if its failure would be catastrophic during the batch window.
What the UI shows
Banner
For DOWN or DEGRADED servers: A red banner with a red left border appears at the top of the Health tab:Summary metric cards
Three cards below the banner:| Card | Value |
|---|---|
| Agents at risk | Number of distinct agent names that called this server in the last 24h |
| Sessions at risk | Number of distinct session IDs that touched this server |
| Calls (24h) | Total tool call volume from all agents |
Per-agent breakdown
Each affected agent is shown as a card:| Field | Description |
|---|---|
| Agent name | From agent_name span attribute |
| Call count | Total tool calls from this agent to this server in the last 24h |
| Session count | Distinct sessions in which this agent called this server |
| Error count | Failed calls |
| Error rate badge | Red badge showing X% errors — only displayed when error rate > 0% |
| Avg latency | Mean tool call latency |
| Last called | Timestamp of the most recent call |
The Blast Radius panel is read-only. It reflects traffic data already in ClickHouse — it does not perform any new health checks or projections. The 24-hour window is fixed and not configurable in the UI. Use the API for custom time windows.
How blast radius is computed
The backend queriesmcp_tool_calls in ClickHouse for the specified time window, grouped by agent_name:
total_agents= number of distinctagent_namevalues returnedtotal_sessions= sum ofsession_countacross all agents
src/langsight/rca/blast_radius.py as compute_blast_radius(), which returns a BlastRadiusResult Pydantic model.
API reference
Get blast radius for a server
Time window in hours. How far back to look in tool-call history when computing impact.
Response fields
The MCP server identifier.
Current health check status:
up, degraded, or down.Computed blast radius severity:
CRITICAL, HIGH, MEDIUM, or LOW.Number of distinct agents that called this server in the time window.
Number of distinct sessions that touched this server in the time window.
Total tool call volume from all agents in the time window.
Per-agent breakdown. Sorted by
call_count descending.Python module
The blast radius computation is available as a standalone function for use in custom alerting pipelines or RCA scripts:BlastRadiusResult is a Pydantic model — all fields are typed and validated.
Using blast radius in incident response
When you get paged for a server outage, the recommended workflow is:Open MCP Servers and find the affected server
The server table shows
DOWN or DEGRADED status. Click the server name.Check the Blast Radius severity
The Health tab opens with the Blast Radius banner at the top. Read the severity and the agent list. This tells you how many teams are affected and who to notify.
Check the Drift tab
If the server went
DEGRADED, there may be a schema drift event that explains what changed. A BREAKING drift with high blast radius is the highest-priority combination.Page the right teams
Use the affected agent list to find which teams own each agent. The
last_called_at field tells you which agents are actively running right now vs. which ran earlier.Related
- Health Monitoring — how
DOWNandDEGRADEDstatus is determined - Schema Drift Detection — breaking changes that cause DEGRADED status
- MCP Servers Dashboard — full detail panel reference
- Session Health — session-level health tags that reflect server outages