Dashboard - LangSight

LangSight dashboard — sessions, error rate, P99 latency, token usage

Overview

The Dashboard at /dashboard is the first page you land on. It has three tabs: Overview, Tools, and Models.

Overview tab — aggregate health across all agents: session counts, error totals, anomaly alerts, SLO status, and the Error Breakdown section
Tools tab — per-tool reliability table: call counts, error rates, latencies, Calls per Session, and Silent Failures
Models tab — per-model usage table: call counts, token totals, estimated cost, and Context Window Pressure

All tabs respect the time window selected in the top-right corner: 1h, 6h, 24h, or 7d.

Error Breakdown

The Error Breakdown section appears at the bottom of the Overview tab whenever there are failed spans in the selected time window. It shows exactly what kinds of errors are occurring — not just that errors happened.

Category	Meaning
Safety Filter	LLM call blocked by the provider’s content safety policy (`finish_reason = SAFETY`, `PROHIBITED_CONTENT`, or `content_filter`)
Max Tokens Hit	LLM output was truncated because the response reached the context or token limit (`finish_reason = MAX_TOKENS` or `length`)
API Unavailable	An upstream LLM provider or MCP server returned a 5xx error or `UNAVAILABLE` status
Timeout	A tool call or LLM generation exceeded the configured timeout threshold
Rate Limited (429)	Too many requests — the API rate limit was hit
Auth Error (401/403)	Authentication or authorisation failure on a tool call or LLM API request
Agent Crash	The agent process threw an unhandled exception — `TaskGroup` error, `RuntimeError`, or similar
Other	Errors that do not match any of the patterns above

Display

Each category is shown as a horizontal bar with:

Percentage of all errors in the time window
Count of failed spans in that category

Categories are sorted by percentage descending — the most common error type is at the top. Only failed spans are counted. Successful spans are excluded from the breakdown entirely.

Why it matters

“Error rate is 12%” tells you something is wrong. “48% of errors are API Unavailable, 45% are Safety Filter hits” tells you what to fix. The breakdown converts a number into an actionable diagnosis. Common patterns to look for:

Pattern	What it usually means
High Safety Filter	Prompt is generating content the provider blocks — review prompt templates
High Rate Limited	Agent is hitting API quotas — add retry backoff or request a quota increase
High API Unavailable	Upstream provider or MCP server is unreliable — check provider status and consider adding retries
High Timeout	Tools or LLM calls are slow — check MCP server health on the MCP Servers page
High Agent Crash	Unhandled exception in agent code — check session traces for the stack trace
High Auth Error	Credentials rotated or expired — check API key configuration

Session-level drill-down

Click any error category bar to navigate to the Sessions page filtered to sessions containing errors of that type. From there you can open individual session traces to see the exact span that failed.

The Error Breakdown section is hidden when there are zero errors in the selected time window. This is intentional — if everything is working, the section does not appear.

Calls per Session / Tool Retry Rate

The Calls/Session column in the Tools tab shows how many times each tool is called on average per session. It is the primary signal for detecting tool loops and redundant calling patterns.

Display

The column shows the average as X.X× (e.g. 3.2×).

Amber — value > 5×, signalling a potential loop or redundant call pattern
Grey (normal) — value ≤ 5×

How to interpret it

Value	Meaning
`1.0×`	Tool called exactly once per session — clean, efficient use
`2–3×`	Normal for iterative agents that refine results across multiple calls
`5–8×`	Worth investigating — the agent may be retrying on errors or looping on the same query
`10×+`	Almost certainly a loop or a design issue — review the agent’s tool-use logic

Using calls/session alongside error rate

Calls/session and error rate tell different stories about the same tool. Use them together:

Calls/session	Error rate	Interpretation
High	Low	Tool is called successfully many times — possible redundancy or iterative logic
High	High	Tool is failing and being retried — reliability problem, investigate the failures
Low	High	Tool fails on the first or only attempt — not a retry loop, a clean failure
Low	Low	Tool is healthy and used efficiently

For example: a read_file tool showing 8.2× and 2% error rate means the agent reads the same file many times per session but it rarely fails. That is a design pattern to review, not a reliability incident. The same tool at 8.2× and 60% error rate means the agent is retrying a broken read — fix the tool.

Relationship to Session Health Score

If an agent’s health score is degraded and a tool shows high calls/session plus high error rate, that tool’s retry loop is likely dragging sessions into tool_failure or loop_detected state. Start your investigation there. For loop detection with automatic prevention, see Session Health — loop_detected.

Silent Failures

The Silent Failures column in the Tools tab counts MCP tool calls where the protocol reported success (isError=False) but the response content itself contained an error message.

Why this matters

A tool returning isError=False with "Error: database connection failed" in its content is the hardest failure mode to debug. The MCP protocol says “success”, so the error does not appear in the regular Error Breakdown or error rate columns. The agent receives the broken content, treats it as a valid result, and passes it upstream — where the failure surfaces much later as a hallucination or a wrong answer. Silent failures are invisible to every metric except this one.

What triggers a silent failure count

LangSight’s SDK inspects the first content block of every MCP tool response at the time of the call. A call is flagged as a silent failure when isError=False and the content starts with any of the following prefixes:

Error: or ERROR:
Exception:
Traceback
[ERROR]
Failed: or FAILED:

Only tool_call spans (MCP calls) are inspected. LLM generation spans are not included.

Display

Amber — one or more silent failures detected for this tool in the time window
Grey / zero — no silent failures (clean)

Interpreting the column

Silent Failures	Error rate	Interpretation
0	Low	Tool is healthy
> 0	Low	Tool is returning error text despite `isError=False` — fix the MCP server’s error handling
> 0	High	Both protocol-level and content-level errors — the tool has multiple failure modes
0	High	Clean protocol errors only — these are visible in the standard Error Breakdown

A non-zero Silent Failures count paired with a low error rate is a red flag: it means the tool’s error rate is understated. The true failure rate is (Errors + Silent Failures) / Calls.

Overview tab — full column reference

Panel	Description
Total sessions	Session count in the time window
Healthy sessions	Sessions tagged `success` or `success_with_fallback`
Health rate	Healthy / total sessions (same calculation as per-agent Health Score)
Total errors	Failed spans across all sessions
Anomalies	Count of active anomalies from Anomaly Detection
SLO status	Pass/breach status for defined Agent SLOs
Error Breakdown	Per-category error distribution — see above

Week-over-week trend badges

When the 7d time window is selected, the three top-level stat cards on the Overview tab — Sessions, Error Rate, and Avg Latency — each display a small trend badge comparing the last 7 days against the previous 7 days.

Format

↑12.4% vs last 7d
↓8.2% vs last 7d

The arrow indicates direction of change. The percentage is calculated as:

trend = (current 7d value − previous 7d value) / previous 7d value × 100

Colour logic

Direction does not determine colour uniformly — the colour depends on whether the change is good or bad for each metric:

Card	Up (↑)	Down (↓)
Sessions	Green — more sessions = more agent activity	Red — sessions are declining
Error Rate	Red — error rate is rising	Green — error rate is falling
Avg Latency	Red — agents are slower	Green — agents are faster

When the badge appears

The trend badge is only shown when the 7d window is selected. For the 1h, 6h, and 24h windows, the badges are hidden — there is no meaningful previous period of the same length to compare against.

What it means for engineers

Trend	Common interpretation
`Error Rate ↑40% vs last 7d` after a deploy	The deploy introduced regressions — investigate sessions from the last 7 days
`Avg Latency ↓18% vs last 7d`	Performance improved — result of an optimization or infrastructure upgrade
`Sessions ↑25% vs last 7d`	More agent activity — growing traffic, new users, or newly deployed agents
`Sessions ↓30% vs last 7d`	Usage dropped — check for deployment issues or agents that stopped running

Complement with SLO targets

Trend badges show relative change but carry no pass/fail judgement. A 10% error rate increase is concerning if your baseline was already high; it is less urgent if your baseline was 0.1%. Pair trend badges with Agent SLOs to apply absolute thresholds that fire alerts when a hard limit is crossed.

Trend badges require at least one session in both the current 7-day window and the previous 7-day window. If either window has no data, the badge is hidden and the card shows only the current value.

LangSight MCP infrastructure dashboard — tool calls, error rates per server

Tools tab — full column reference

Column	Description
Tool	Tool name as reported in spans
Server	MCP server that provides the tool
Calls	Total calls in the time window
Errors	Failed calls (`isError=True` or exception)
Error rate	Errors / total calls (%)
Avg latency	Mean call duration (ms)
p99 latency	99th percentile call duration (ms)
Calls/Session	Average calls per session — see Calls per Session
Silent Failures	`isError=False` calls whose content contained error text — see Silent Failures

Click any tool row to open the MCP Servers catalog entry for that server, filtered to that tool.

Context Window Pressure

The Ctx Usage column in the Models tab shows what percentage of a model’s context window is being consumed on average per call.

Calculation

ctx_usage = avg input tokens per call ÷ model context limit × 100

Why this matters

An agent consistently at 80%+ context usage is operating near its limit. When the context window fills, one of two things happens:

The model returns MAX_TOKENS — the call fails outright
The model silently truncates the input — responses degrade without an explicit error

Context Window Pressure is a leading indicator: it warns you before MAX_TOKENS errors start appearing. If you see high Ctx Usage in the Models tab and Max Tokens Hit in the Error Breakdown on the Overview tab, you have found the root cause.

Display

Colour	Condition	Meaning
Red	> 80%	Agent is near the context limit — expect MAX_TOKENS errors
Amber	> 50%	Worth watching — usage is elevated
Grey	≤ 50%	Healthy
`—`	No token data or unknown model	Context limit not available for this model

Model context limits

LangSight uses the following context limits when calculating Ctx Usage:

Model	Context limit (tokens)
Gemini 2.5 Flash / Pro	1,048,576
Gemini 2.0 Flash	1,048,576
Gemini 1.5 Pro / Flash	1,048,576
GPT-4o / GPT-4o Mini	128,000
o3 / o3-mini	200,000
Claude Opus 4.6	200,000
Claude Sonnet 4.6	200,000
Claude Haiku 4.5	200,000

Models not in this list show — for Ctx Usage.

Relationship with Error Breakdown

If the Error Breakdown (Overview tab) shows a spike in Max Tokens Hit and the Models tab shows red Ctx Usage for the same model, the problem is clear: the agent is passing too much context. Common causes:

System prompt is too large
Agent is accumulating the full conversation history instead of windowing it
Tool responses are returned in full when only a subset is needed

Models tab — full column reference

Column	Description
Model	Model name as reported in spans
Calls	Total LLM generation calls in the time window
Input tokens	Total input tokens across all calls
Output tokens	Total output tokens across all calls
Avg input/call	Mean input tokens per individual call
Avg output/call	Mean output tokens per individual call
Estimated cost	Cost estimate based on model pricing — see Cost Model
Ctx Usage	Average context window utilisation — see Context Window Pressure

Documentation Index

​Overview

​Error Breakdown

​Categories

​Display

​Why it matters

​Session-level drill-down

​Calls per Session / Tool Retry Rate

​Display

​How to interpret it

​Using calls/session alongside error rate

​Relationship to Session Health Score

​Silent Failures

​Why this matters

​What triggers a silent failure count

​Display

​Interpreting the column

​Overview tab — full column reference

​Week-over-week trend badges

​Format

​Colour logic

​When the badge appears

​What it means for engineers

​Complement with SLO targets

​Tools tab — full column reference

​Context Window Pressure

​Calculation

​Why this matters

​Display

​Model context limits

​Relationship with Error Breakdown

​Models tab — full column reference

Overview

Error Breakdown

Categories

Display

Why it matters

Session-level drill-down

Calls per Session / Tool Retry Rate

Display

How to interpret it

Using calls/session alongside error rate

Relationship to Session Health Score

Silent Failures

Why this matters

What triggers a silent failure count

Display

Interpreting the column

Overview tab — full column reference

Week-over-week trend badges

Format

Colour logic

When the badge appears

What it means for engineers

Complement with SLO targets

Tools tab — full column reference

Context Window Pressure

Calculation

Why this matters

Display

Model context limits

Relationship with Error Breakdown

Models tab — full column reference