Introduction - LangSight

What is LangSight?

LangSight is the runtime reliability layer for AI agent toolchains. Langfuse watches the brain — prompts, completions, evals, token costs. LangSight watches the hands — the tools your agents call, their health, safety, cost, and blast radius. When your agent fails at 2 AM, LangSight tells you which tool broke, why it broke, how many agents are affected, and stops it from happening again.

Where LangSight fits

Question	Best tool
Did the prompt/model perform well?	LangWatch / Langfuse / LangSmith
Is my server CPU/memory healthy?	Datadog / New Relic
Which tool call failed in production?	LangSight
Is my agent stuck in a loop?	LangSight
Is an MCP server unhealthy or drifting?	LangSight
Is an MCP server exposed or risky?	LangSight
Why did this session cost $47 instead of$ 3?	LangSight
If this tool goes down, which agents break?	LangSight

Use LangSight alongside Langfuse and LangWatch — not instead of them. They never overlap.

The four pillars

1. Prevent — stop failures before users notice

from langsight.sdk import LangSightClient

client = LangSightClient(
    url="http://localhost:8000",
    loop_detection=True,        # same tool+args 3x → auto-stop
    max_cost_usd=1.00,          # hard budget limit per session
    max_steps=25,               # hard step limit
    circuit_breaker=True,       # auto-disable tools after 5 failures
)

Loop detection — same tool called with same args 3x → session terminated, alert fired
Budget guardrails — max cost / max steps per session → hard stop before bill shock
Circuit breaker — tool fails 5x → auto-disabled for cooldown → alert → auto-recovery

2. Detect — see what broke and why

$ langsight sessions --id sess-f2a9b1

Trace: sess-f2a9b1  (support-agent)  [LOOP_DETECTED]
├── jira-mcp/get_issue        89ms  ✓
├── postgres-mcp/query        42ms  ✓
├──  → billing-agent          handoff
│   ├── crm-mcp/update    120ms  ✓
│   └── slack-mcp/notify    —   ✗  timeout
Root cause: slack-mcp timed out at 14:32 UTC

Action traces — every tool call in every session, with latency, status, cost
Multi-agent trees — full call tree across agent handoffs
Run health tags — every session auto-classified: success, success_with_fallback, loop_detected, budget_exceeded, tool_failure, circuit_breaker_open, timeout, schema_drift

3. Monitor — MCP health + security

$ langsight mcp-health

Server              Status    Latency     Schema    Circuit
snowflake-mcp       ✅ UP     142ms       Stable    closed
jira-mcp            ❌ DOWN   —           —         open (5 failures)
postgres-mcp        ✅ UP     31ms        Changed   closed

$ langsight security-scan

CRITICAL  jira-mcp        CVE-2025-6514  Remote code execution
HIGH      postgres-mcp    OWASP-MCP-04   No authentication configured

MCP health checks — continuous ping, latency, uptime tracking
Schema drift detection — tool schemas change → alert fires before agents hallucinate
Security scanning — CVE (OSV), OWASP MCP Top 10, tool poisoning, auth audit

4. Map — blast radius via lineage

postgres-mcp ❌ DOWN

Impact:
  - support-agent: 200 sessions/day (HIGH)
  - billing-agent: 50 sessions/day (MEDIUM)

Total: ~250 sessions/day affected
Circuit breaker: active

Lineage DAG — which agents call which tools
Blast radius — if this tool goes down, what else breaks?
Impact alerts — enriched with affected agents and session counts

What you can observe per tool type

Tool type	Trace	Health check	Security scan	Cost	Guardrails
MCP servers	Yes	Yes	Yes	Yes	Yes
HTTP APIs	Yes	—	—	Yes	Yes
Python functions	Yes	—	—	Yes	Yes
Sub-agents	Yes	—	—	Yes	Yes

MCP servers get extra depth (health checks, security scanning, schema drift) because the MCP protocol is standard and inspectable.

How it works

Your agents                LangSight
──────────                 ──────────────────────────────────────
LangGraph       ──SDK──►   Prevent: loops, budgets, circuit breakers
CrewAI          ──SDK──►   Detect:  traces, health tags, root cause
Pydantic AI     ──SDK──►   Monitor: MCP health, security, schema drift
Claude Desktop  ◄──health─ Map:     lineage, blast radius, impact
                ◄──scan──  Alert:   Slack, OpsGenie, PagerDuty

LangSight dashboard — sessions, error rate, latency overview

Installation

pip install langsight
langsight init          # auto-discovers MCP servers
langsight sessions      # view recent sessions with health tags
langsight mcp-health    # check MCP server health + circuit state

Quickstart →

Get from install to your first guarded agent trace in 5 minutes.

Integrations

Framework	Integration
LangGraph	LangSightLangGraphCallback
LangChain / Langflow	LangSightLangChainCallback
OpenAI / Anthropic / Gemini SDK	`wrap_llm()`
CrewAI	LangSightCrewAICallback
OpenAI Agents SDK	LangSightOpenAIHooks
Anthropic / Claude Agent SDK	AnthropicToolTracer
Pydantic AI	@langsight_tool decorator
Claude Desktop / Cursor / VS Code	Auto-discovered by `langsight init`
Any OTEL framework	OTLP endpoint

Open source

LangSight is Apache 2.0 licensed — free to use, modify, and distribute.

git clone https://github.com/LangSight/langsight

LangSight session graph — multi-agent call tree showing coordinator, sub-agents, and MCP servers

​What is LangSight?

​Where LangSight fits

​The four pillars

​1. Prevent — stop failures before users notice

​2. Detect — see what broke and why

​3. Monitor — MCP health + security

​4. Map — blast radius via lineage

​What you can observe per tool type

​How it works

​Installation