Overview
The Dashboard at /dashboard is the first page you land on. It has three tabs: Overview, Tools, and Models.
- Overview tab — aggregate health across all agents: session counts, error totals, anomaly alerts, SLO status, and the Error Breakdown section
- Tools tab — per-tool reliability table: call counts, error rates, latencies, Calls per Session, and Silent Failures
- Models tab — per-model usage table: call counts, token totals, estimated cost, and Context Window Pressure
All tabs respect the time window selected in the top-right corner: 1h, 6h, 24h, or 7d.
Error Breakdown
The Error Breakdown section appears at the bottom of the Overview tab whenever there are failed spans in the selected time window. It shows exactly what kinds of errors are occurring — not just that errors happened.
Categories
| Category | Meaning |
|---|
| Safety Filter | LLM call blocked by the provider’s content safety policy (finish_reason = SAFETY, PROHIBITED_CONTENT, or content_filter) |
| Max Tokens Hit | LLM output was truncated because the response reached the context or token limit (finish_reason = MAX_TOKENS or length) |
| API Unavailable | An upstream LLM provider or MCP server returned a 5xx error or UNAVAILABLE status |
| Timeout | A tool call or LLM generation exceeded the configured timeout threshold |
| Rate Limited (429) | Too many requests — the API rate limit was hit |
| Auth Error (401/403) | Authentication or authorisation failure on a tool call or LLM API request |
| Agent Crash | The agent process threw an unhandled exception — TaskGroup error, RuntimeError, or similar |
| Other | Errors that do not match any of the patterns above |
Display
Each category is shown as a horizontal bar with:
- Percentage of all errors in the time window
- Count of failed spans in that category
Categories are sorted by percentage descending — the most common error type is at the top.
Only failed spans are counted. Successful spans are excluded from the breakdown entirely.
Why it matters
“Error rate is 12%” tells you something is wrong. “48% of errors are API Unavailable, 45% are Safety Filter hits” tells you what to fix. The breakdown converts a number into an actionable diagnosis.
Common patterns to look for:
| Pattern | What it usually means |
|---|
| High Safety Filter | Prompt is generating content the provider blocks — review prompt templates |
| High Rate Limited | Agent is hitting API quotas — add retry backoff or request a quota increase |
| High API Unavailable | Upstream provider or MCP server is unreliable — check provider status and consider adding retries |
| High Timeout | Tools or LLM calls are slow — check MCP server health on the MCP Servers page |
| High Agent Crash | Unhandled exception in agent code — check session traces for the stack trace |
| High Auth Error | Credentials rotated or expired — check API key configuration |
Session-level drill-down
Click any error category bar to navigate to the Sessions page filtered to sessions containing errors of that type. From there you can open individual session traces to see the exact span that failed.
The Error Breakdown section is hidden when there are zero errors in the selected time window. This is intentional — if everything is working, the section does not appear.
The Calls/Session column in the Tools tab shows how many times each tool is called on average per session. It is the primary signal for detecting tool loops and redundant calling patterns.
Display
The column shows the average as X.X× (e.g. 3.2×).
- Amber — value > 5×, signalling a potential loop or redundant call pattern
- Grey (normal) — value ≤ 5×
How to interpret it
| Value | Meaning |
|---|
1.0× | Tool called exactly once per session — clean, efficient use |
2–3× | Normal for iterative agents that refine results across multiple calls |
5–8× | Worth investigating — the agent may be retrying on errors or looping on the same query |
10×+ | Almost certainly a loop or a design issue — review the agent’s tool-use logic |
Using calls/session alongside error rate
Calls/session and error rate tell different stories about the same tool. Use them together:
| Calls/session | Error rate | Interpretation |
|---|
| High | Low | Tool is called successfully many times — possible redundancy or iterative logic |
| High | High | Tool is failing and being retried — reliability problem, investigate the failures |
| Low | High | Tool fails on the first or only attempt — not a retry loop, a clean failure |
| Low | Low | Tool is healthy and used efficiently |
For example: a read_file tool showing 8.2× and 2% error rate means the agent reads the same file many times per session but it rarely fails. That is a design pattern to review, not a reliability incident. The same tool at 8.2× and 60% error rate means the agent is retrying a broken read — fix the tool.
If an agent’s health score is degraded and a tool shows high calls/session plus high error rate, that tool’s retry loop is likely dragging sessions into tool_failure or loop_detected state. Start your investigation there.
For loop detection with automatic prevention, see Session Health — loop_detected.
Silent Failures
The Silent Failures column in the Tools tab counts MCP tool calls where the protocol reported success (isError=False) but the response content itself contained an error message.
Why this matters
A tool returning isError=False with "Error: database connection failed" in its content is the hardest failure mode to debug. The MCP protocol says “success”, so the error does not appear in the regular Error Breakdown or error rate columns. The agent receives the broken content, treats it as a valid result, and passes it upstream — where the failure surfaces much later as a hallucination or a wrong answer.
Silent failures are invisible to every metric except this one.
What triggers a silent failure count
LangSight’s SDK inspects the first content block of every MCP tool response at the time of the call. A call is flagged as a silent failure when isError=False and the content starts with any of the following prefixes:
Error: or ERROR:
Exception:
Traceback
[ERROR]
Failed: or FAILED:
Only tool_call spans (MCP calls) are inspected. LLM generation spans are not included.
Display
- Amber — one or more silent failures detected for this tool in the time window
- Grey / zero — no silent failures (clean)
Interpreting the column
| Silent Failures | Error rate | Interpretation |
|---|
| 0 | Low | Tool is healthy |
| > 0 | Low | Tool is returning error text despite isError=False — fix the MCP server’s error handling |
| > 0 | High | Both protocol-level and content-level errors — the tool has multiple failure modes |
| 0 | High | Clean protocol errors only — these are visible in the standard Error Breakdown |
A non-zero Silent Failures count paired with a low error rate is a red flag: it means the tool’s error rate is understated. The true failure rate is (Errors + Silent Failures) / Calls.
Overview tab — full column reference
| Panel | Description |
|---|
| Total sessions | Session count in the time window |
| Healthy sessions | Sessions tagged success or success_with_fallback |
| Health rate | Healthy / total sessions (same calculation as per-agent Health Score) |
| Total errors | Failed spans across all sessions |
| Anomalies | Count of active anomalies from Anomaly Detection |
| SLO status | Pass/breach status for defined Agent SLOs |
| Error Breakdown | Per-category error distribution — see above |
Week-over-week trend badges
When the 7d time window is selected, the three top-level stat cards on the Overview tab — Sessions, Error Rate, and Avg Latency — each display a small trend badge comparing the last 7 days against the previous 7 days.
↑12.4% vs last 7d
↓8.2% vs last 7d
The arrow indicates direction of change. The percentage is calculated as:
trend = (current 7d value − previous 7d value) / previous 7d value × 100
Colour logic
Direction does not determine colour uniformly — the colour depends on whether the change is good or bad for each metric:
| Card | Up (↑) | Down (↓) |
|---|
| Sessions | Green — more sessions = more agent activity | Red — sessions are declining |
| Error Rate | Red — error rate is rising | Green — error rate is falling |
| Avg Latency | Red — agents are slower | Green — agents are faster |
When the badge appears
The trend badge is only shown when the 7d window is selected. For the 1h, 6h, and 24h windows, the badges are hidden — there is no meaningful previous period of the same length to compare against.
What it means for engineers
| Trend | Common interpretation |
|---|
Error Rate ↑40% vs last 7d after a deploy | The deploy introduced regressions — investigate sessions from the last 7 days |
Avg Latency ↓18% vs last 7d | Performance improved — result of an optimization or infrastructure upgrade |
Sessions ↑25% vs last 7d | More agent activity — growing traffic, new users, or newly deployed agents |
Sessions ↓30% vs last 7d | Usage dropped — check for deployment issues or agents that stopped running |
Complement with SLO targets
Trend badges show relative change but carry no pass/fail judgement. A 10% error rate increase is concerning if your baseline was already high; it is less urgent if your baseline was 0.1%. Pair trend badges with Agent SLOs to apply absolute thresholds that fire alerts when a hard limit is crossed.
Trend badges require at least one session in both the current 7-day window and the previous 7-day window. If either window has no data, the badge is hidden and the card shows only the current value.
| Column | Description |
|---|
| Tool | Tool name as reported in spans |
| Server | MCP server that provides the tool |
| Calls | Total calls in the time window |
| Errors | Failed calls (isError=True or exception) |
| Error rate | Errors / total calls (%) |
| Avg latency | Mean call duration (ms) |
| p99 latency | 99th percentile call duration (ms) |
| Calls/Session | Average calls per session — see Calls per Session |
| Silent Failures | isError=False calls whose content contained error text — see Silent Failures |
Click any tool row to open the MCP Servers catalog entry for that server, filtered to that tool.
Context Window Pressure
The Ctx Usage column in the Models tab shows what percentage of a model’s context window is being consumed on average per call.
Calculation
ctx_usage = avg input tokens per call ÷ model context limit × 100
Why this matters
An agent consistently at 80%+ context usage is operating near its limit. When the context window fills, one of two things happens:
- The model returns
MAX_TOKENS — the call fails outright
- The model silently truncates the input — responses degrade without an explicit error
Context Window Pressure is a leading indicator: it warns you before MAX_TOKENS errors start appearing. If you see high Ctx Usage in the Models tab and Max Tokens Hit in the Error Breakdown on the Overview tab, you have found the root cause.
Display
| Colour | Condition | Meaning |
|---|
| Red | > 80% | Agent is near the context limit — expect MAX_TOKENS errors |
| Amber | > 50% | Worth watching — usage is elevated |
| Grey | ≤ 50% | Healthy |
— | No token data or unknown model | Context limit not available for this model |
Model context limits
LangSight uses the following context limits when calculating Ctx Usage:
| Model | Context limit (tokens) |
|---|
| Gemini 2.5 Flash / Pro | 1,048,576 |
| Gemini 2.0 Flash | 1,048,576 |
| Gemini 1.5 Pro / Flash | 1,048,576 |
| GPT-4o / GPT-4o Mini | 128,000 |
| o3 / o3-mini | 200,000 |
| Claude Opus 4.6 | 200,000 |
| Claude Sonnet 4.6 | 200,000 |
| Claude Haiku 4.5 | 200,000 |
Models not in this list show — for Ctx Usage.
Relationship with Error Breakdown
If the Error Breakdown (Overview tab) shows a spike in Max Tokens Hit and the Models tab shows red Ctx Usage for the same model, the problem is clear: the agent is passing too much context. Common causes:
- System prompt is too large
- Agent is accumulating the full conversation history instead of windowing it
- Tool responses are returned in full when only a subset is needed
Models tab — full column reference
| Column | Description |
|---|
| Model | Model name as reported in spans |
| Calls | Total LLM generation calls in the time window |
| Input tokens | Total input tokens across all calls |
| Output tokens | Total output tokens across all calls |
| Avg input/call | Mean input tokens per individual call |
| Avg output/call | Mean output tokens per individual call |
| Estimated cost | Cost estimate based on model pricing — see Cost Model |
| Ctx Usage | Average context window utilisation — see Context Window Pressure |