Skip to main content

Overview

The Dashboard at /dashboard is the first page you land on. It has three tabs: Overview, Tools, and Models.
  • Overview tab — aggregate health across all agents: session counts, error totals, anomaly alerts, SLO status, and the Error Breakdown section
  • Tools tab — per-tool reliability table: call counts, error rates, latencies, Calls per Session, and Silent Failures
  • Models tab — per-model usage table: call counts, token totals, estimated cost, and Context Window Pressure
All tabs respect the time window selected in the top-right corner: 1h, 6h, 24h, or 7d.

Error Breakdown

The Error Breakdown section appears at the bottom of the Overview tab whenever there are failed spans in the selected time window. It shows exactly what kinds of errors are occurring — not just that errors happened.

Categories

CategoryMeaning
Safety FilterLLM call blocked by the provider’s content safety policy (finish_reason = SAFETY, PROHIBITED_CONTENT, or content_filter)
Max Tokens HitLLM output was truncated because the response reached the context or token limit (finish_reason = MAX_TOKENS or length)
API UnavailableAn upstream LLM provider or MCP server returned a 5xx error or UNAVAILABLE status
TimeoutA tool call or LLM generation exceeded the configured timeout threshold
Rate Limited (429)Too many requests — the API rate limit was hit
Auth Error (401/403)Authentication or authorisation failure on a tool call or LLM API request
Agent CrashThe agent process threw an unhandled exception — TaskGroup error, RuntimeError, or similar
OtherErrors that do not match any of the patterns above

Display

Each category is shown as a horizontal bar with:
  • Percentage of all errors in the time window
  • Count of failed spans in that category
Categories are sorted by percentage descending — the most common error type is at the top. Only failed spans are counted. Successful spans are excluded from the breakdown entirely.

Why it matters

“Error rate is 12%” tells you something is wrong. “48% of errors are API Unavailable, 45% are Safety Filter hits” tells you what to fix. The breakdown converts a number into an actionable diagnosis. Common patterns to look for:
PatternWhat it usually means
High Safety FilterPrompt is generating content the provider blocks — review prompt templates
High Rate LimitedAgent is hitting API quotas — add retry backoff or request a quota increase
High API UnavailableUpstream provider or MCP server is unreliable — check provider status and consider adding retries
High TimeoutTools or LLM calls are slow — check MCP server health on the MCP Servers page
High Agent CrashUnhandled exception in agent code — check session traces for the stack trace
High Auth ErrorCredentials rotated or expired — check API key configuration

Session-level drill-down

Click any error category bar to navigate to the Sessions page filtered to sessions containing errors of that type. From there you can open individual session traces to see the exact span that failed.
The Error Breakdown section is hidden when there are zero errors in the selected time window. This is intentional — if everything is working, the section does not appear.

Calls per Session / Tool Retry Rate

The Calls/Session column in the Tools tab shows how many times each tool is called on average per session. It is the primary signal for detecting tool loops and redundant calling patterns.

Display

The column shows the average as X.X× (e.g. 3.2×).
  • Amber — value > 5×, signalling a potential loop or redundant call pattern
  • Grey (normal) — value ≤ 5×

How to interpret it

ValueMeaning
1.0×Tool called exactly once per session — clean, efficient use
2–3×Normal for iterative agents that refine results across multiple calls
5–8×Worth investigating — the agent may be retrying on errors or looping on the same query
10×+Almost certainly a loop or a design issue — review the agent’s tool-use logic

Using calls/session alongside error rate

Calls/session and error rate tell different stories about the same tool. Use them together:
Calls/sessionError rateInterpretation
HighLowTool is called successfully many times — possible redundancy or iterative logic
HighHighTool is failing and being retried — reliability problem, investigate the failures
LowHighTool fails on the first or only attempt — not a retry loop, a clean failure
LowLowTool is healthy and used efficiently
For example: a read_file tool showing 8.2× and 2% error rate means the agent reads the same file many times per session but it rarely fails. That is a design pattern to review, not a reliability incident. The same tool at 8.2× and 60% error rate means the agent is retrying a broken read — fix the tool.

Relationship to Session Health Score

If an agent’s health score is degraded and a tool shows high calls/session plus high error rate, that tool’s retry loop is likely dragging sessions into tool_failure or loop_detected state. Start your investigation there. For loop detection with automatic prevention, see Session Health — loop_detected.

Silent Failures

The Silent Failures column in the Tools tab counts MCP tool calls where the protocol reported success (isError=False) but the response content itself contained an error message.

Why this matters

A tool returning isError=False with "Error: database connection failed" in its content is the hardest failure mode to debug. The MCP protocol says “success”, so the error does not appear in the regular Error Breakdown or error rate columns. The agent receives the broken content, treats it as a valid result, and passes it upstream — where the failure surfaces much later as a hallucination or a wrong answer. Silent failures are invisible to every metric except this one.

What triggers a silent failure count

LangSight’s SDK inspects the first content block of every MCP tool response at the time of the call. A call is flagged as a silent failure when isError=False and the content starts with any of the following prefixes:
  • Error: or ERROR:
  • Exception:
  • Traceback
  • [ERROR]
  • Failed: or FAILED:
Only tool_call spans (MCP calls) are inspected. LLM generation spans are not included.

Display

  • Amber — one or more silent failures detected for this tool in the time window
  • Grey / zero — no silent failures (clean)

Interpreting the column

Silent FailuresError rateInterpretation
0LowTool is healthy
> 0LowTool is returning error text despite isError=False — fix the MCP server’s error handling
> 0HighBoth protocol-level and content-level errors — the tool has multiple failure modes
0HighClean protocol errors only — these are visible in the standard Error Breakdown
A non-zero Silent Failures count paired with a low error rate is a red flag: it means the tool’s error rate is understated. The true failure rate is (Errors + Silent Failures) / Calls.

Overview tab — full column reference

PanelDescription
Total sessionsSession count in the time window
Healthy sessionsSessions tagged success or success_with_fallback
Health rateHealthy / total sessions (same calculation as per-agent Health Score)
Total errorsFailed spans across all sessions
AnomaliesCount of active anomalies from Anomaly Detection
SLO statusPass/breach status for defined Agent SLOs
Error BreakdownPer-category error distribution — see above

Week-over-week trend badges

When the 7d time window is selected, the three top-level stat cards on the Overview tab — Sessions, Error Rate, and Avg Latency — each display a small trend badge comparing the last 7 days against the previous 7 days.

Format

↑12.4% vs last 7d
↓8.2% vs last 7d
The arrow indicates direction of change. The percentage is calculated as:
trend = (current 7d value − previous 7d value) / previous 7d value × 100

Colour logic

Direction does not determine colour uniformly — the colour depends on whether the change is good or bad for each metric:
CardUp (↑)Down (↓)
SessionsGreen — more sessions = more agent activityRed — sessions are declining
Error RateRed — error rate is risingGreen — error rate is falling
Avg LatencyRed — agents are slowerGreen — agents are faster

When the badge appears

The trend badge is only shown when the 7d window is selected. For the 1h, 6h, and 24h windows, the badges are hidden — there is no meaningful previous period of the same length to compare against.

What it means for engineers

TrendCommon interpretation
Error Rate ↑40% vs last 7d after a deployThe deploy introduced regressions — investigate sessions from the last 7 days
Avg Latency ↓18% vs last 7dPerformance improved — result of an optimization or infrastructure upgrade
Sessions ↑25% vs last 7dMore agent activity — growing traffic, new users, or newly deployed agents
Sessions ↓30% vs last 7dUsage dropped — check for deployment issues or agents that stopped running

Complement with SLO targets

Trend badges show relative change but carry no pass/fail judgement. A 10% error rate increase is concerning if your baseline was already high; it is less urgent if your baseline was 0.1%. Pair trend badges with Agent SLOs to apply absolute thresholds that fire alerts when a hard limit is crossed.
Trend badges require at least one session in both the current 7-day window and the previous 7-day window. If either window has no data, the badge is hidden and the card shows only the current value.

Tools tab — full column reference

ColumnDescription
ToolTool name as reported in spans
ServerMCP server that provides the tool
CallsTotal calls in the time window
ErrorsFailed calls (isError=True or exception)
Error rateErrors / total calls (%)
Avg latencyMean call duration (ms)
p99 latency99th percentile call duration (ms)
Calls/SessionAverage calls per session — see Calls per Session
Silent FailuresisError=False calls whose content contained error text — see Silent Failures
Click any tool row to open the MCP Servers catalog entry for that server, filtered to that tool.

Context Window Pressure

The Ctx Usage column in the Models tab shows what percentage of a model’s context window is being consumed on average per call.

Calculation

ctx_usage = avg input tokens per call ÷ model context limit × 100

Why this matters

An agent consistently at 80%+ context usage is operating near its limit. When the context window fills, one of two things happens:
  1. The model returns MAX_TOKENS — the call fails outright
  2. The model silently truncates the input — responses degrade without an explicit error
Context Window Pressure is a leading indicator: it warns you before MAX_TOKENS errors start appearing. If you see high Ctx Usage in the Models tab and Max Tokens Hit in the Error Breakdown on the Overview tab, you have found the root cause.

Display

ColourConditionMeaning
Red> 80%Agent is near the context limit — expect MAX_TOKENS errors
Amber> 50%Worth watching — usage is elevated
Grey≤ 50%Healthy
No token data or unknown modelContext limit not available for this model

Model context limits

LangSight uses the following context limits when calculating Ctx Usage:
ModelContext limit (tokens)
Gemini 2.5 Flash / Pro1,048,576
Gemini 2.0 Flash1,048,576
Gemini 1.5 Pro / Flash1,048,576
GPT-4o / GPT-4o Mini128,000
o3 / o3-mini200,000
Claude Opus 4.6200,000
Claude Sonnet 4.6200,000
Claude Haiku 4.5200,000
Models not in this list show for Ctx Usage.

Relationship with Error Breakdown

If the Error Breakdown (Overview tab) shows a spike in Max Tokens Hit and the Models tab shows red Ctx Usage for the same model, the problem is clear: the agent is passing too much context. Common causes:
  • System prompt is too large
  • Agent is accumulating the full conversation history instead of windowing it
  • Tool responses are returned in full when only a subset is needed

Models tab — full column reference

ColumnDescription
ModelModel name as reported in spans
CallsTotal LLM generation calls in the time window
Input tokensTotal input tokens across all calls
Output tokensTotal output tokens across all calls
Avg input/callMean input tokens per individual call
Avg output/callMean output tokens per individual call
Estimated costCost estimate based on model pricing — see Cost Model
Ctx UsageAverage context window utilisation — see Context Window Pressure