LLM Observability Docs
Everything you need to monitor your LLM prompts, detect drift, and prevent regressions in production.
Overview
Deadpipe provides LLM observability that answers one question: "Is this prompt still behaving safely?"
The Core Problem
LLMs are non-deterministic. The same prompt can produce different outputs, and model updates can silently break your application. You need baselines to detect when behavior shifts.
- • You cannot detect regression without a baseline
- • You cannot alert without stable fingerprints
- • You cannot audit without provenance
Automatic Baselines
We compute rolling baselines for every prompt: latency p95, token distributions, schema pass rates. No configuration required.
Drift Detection
Get alerted when latency spikes, token counts shift, schema validation drops, or output patterns change unexpectedly.
Schema Validation
Pass your Pydantic model and we validate every LLM output, tracking pass rates and detecting regressions.
Hallucination Proxies
Track refusals, empty outputs, JSON parse failures, and enum violations as early indicators of model misbehavior.
Quick Start
Get LLM observability running in under 5 minutes with zero-config instrumentation.
1. Install the SDK
Python
pip install deadpipeNode.js
npm install deadpipe2. Set your API key
export DEADPIPE_API_KEY="dp_your_api_key"3. Wrap your client (zero code changes!)
The universal wrap() function auto-detects your provider:
Python
from deadpipe import wrap
from openai import OpenAI
from anthropic import Anthropic
# Universal wrap() - wrap once with app context
openai = wrap(OpenAI(), app="my_app")
anthropic = wrap(Anthropic(), app="my_app")
# Pass prompt_id per call to identify each prompt
response = openai.chat.completions.create(
prompt_id="checkout_agent",
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Process refund"}]
)
# → Automatically captures everything: latency, tokens, cost, schema validation, etc.Node.js
import { wrap } from 'deadpipe';
import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';
// Universal wrap() - wrap once with app context
const openai = wrap(new OpenAI(), { app: 'my_app' });
const anthropic = wrap(new Anthropic(), { app: 'my_app' });
// Pass promptId per call to identify each prompt
const response = await openai.chat.completions.create({
promptId: 'checkout_agent',
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Process refund' }]
});
// → Automatically captures everything: latency, tokens, cost, schema validation, etc.Alternative: Manual tracking with schema validation
Python
from deadpipe import wrap_openai
from pydantic import BaseModel
from openai import OpenAI
class RefundResponse(BaseModel):
order_id: str
amount: float
status: str
# Wrap client once
client = wrap_openai(OpenAI(), app="my_app")
# Pass schema per-call (each prompt can have its own schema)
response = client.chat.completions.create(
prompt_id="checkout_agent",
schema=RefundResponse, # Per-call schema validation
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Process refund"}],
response_format={"type": "json_object"}
)
# → Schema validation + all automatic trackingNode.js
import { wrapOpenAI } from 'deadpipe';
import { z } from 'zod';
import OpenAI from 'openai';
const RefundSchema = z.object({
orderId: z.string(),
amount: z.number(),
status: z.string()
});
// Create validator helper
const zodValidator = (schema: z.ZodSchema) => ({
validate: (data: unknown) => {
const result = schema.safeParse(data);
return {
success: result.success,
data: result.success ? result.data : undefined,
errors: result.success ? undefined : result.error.errors.map(e => e.message)
};
}
});
// Wrap client once
const client = wrapOpenAI(new OpenAI(), { app: 'my_app' });
// Pass schema per-call (each prompt can have its own schema)
const response = await client.chat.completions.create({
promptId: 'checkout_agent',
schema: zodValidator(RefundSchema), // Per-call schema validation
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Process refund' }],
response_format: { type: 'json_object' }
});
// → Schema validation + all automatic trackingWhat Gets Captured Automatically
Performance
- • Request/response latency
- • Time to first token (streaming)
- • Input/output tokens
- • Estimated cost (OpenAI, Anthropic, Gemini, Mistral, Cohere)
- • Model and provider (auto-detected)
Quality & Safety
- • Schema validation results
- • JSON parse success/failure
- • Empty output detection
- • Refusal detection
- • Enum/bounds constraint checking
Change Tracking
- • Prompt hash (message content)
- • Tool schema hash
- • System prompt hash
- • Output hash for deduplication
- • Version/git hash
Python SDK
The Python SDK provides multiple ways to track LLM calls: wrap() for universal zero-config instrumentation, provider-specific wrappers like wrap_openai() and wrap_anthropic(), and track() context manager for manual tracking.
Installation
pip install deadpipeRecommended: Universal wrap() Function
The universal wrap() function auto-detects your provider and wraps appropriately:
from deadpipe import wrap
from openai import OpenAI
from anthropic import Anthropic
# Universal wrap() - wrap once with app context
openai_client = wrap(OpenAI(), app="my_app")
anthropic_client = wrap(Anthropic(), app="my_app")
# Pass prompt_id per call to identify each prompt
response = openai_client.chat.completions.create(
prompt_id="checkout_agent",
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Process refund for order 1938"}]
)
# → Automatically captures: latency, tokens, cost, schema validation, etc.Supported Providers
Provider-specific wrappers are available for explicit control:
| Provider | Wrapper Function | Client |
|---|---|---|
| OpenAI | wrap_openai() | OpenAI() |
| Anthropic | wrap_anthropic() | Anthropic() |
| Google AI (Gemini) | wrap_google_ai() | genai.GenerativeModel() |
| Mistral | wrap_mistral() | MistralClient() |
| Cohere | wrap_cohere() | cohere.Client() |
Advanced: Manual Tracking with track()
For streaming, custom logic, or non-OpenAI providers:
from deadpipe import track
from openai import OpenAI
client = OpenAI()
# Store params in variable for context capture
params = {
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}
with track(prompt_id="my_agent") as t:
response = client.chat.completions.create(**params)
# Pass input params to capture full context (messages, tools, system prompt)
t.record(response, input=params)With Pydantic Schema Validation
Pass a Pydantic model to validate every LLM output and track schema pass rates over time.
from deadpipe import wrap_openai
from pydantic import BaseModel, Field
from typing import Literal
from openai import OpenAI
class ProductRecommendation(BaseModel):
product_id: str
confidence: float = Field(ge=0, le=1)
reasoning: str
category: Literal["electronics", "clothing", "home"]
alternatives: list[str] = []
# Schema validation with wrapper
client = wrap_openai(OpenAI(), app="my_app")
response = client.chat.completions.create(
prompt_id="recommender",
schema=ProductRecommendation, # Per-call schema validation
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Recommend a laptop"}],
response_format={"type": "json_object"}
)
# → Automatically validates output against ProductRecommendation
# → Tracks schema pass rates over time
# → Alerts when validation rates dropWith Enum and Numeric Bounds
Add additional validation rules for hallucination detection:
from deadpipe import track
with track(
prompt_id="pricing_agent",
enum_fields={
"currency": ["USD", "EUR", "GBP"],
"tier": ["free", "pro", "enterprise"]
},
numeric_bounds={
"price": (0, 10000), # Must be between 0-10000
"quantity": (1, 100) # Must be between 1-100
}
) as t:
response = client.chat.completions.create(...)
t.record(response)
# → Automatically flags enum_out_of_range and numeric_out_of_boundsStreaming Support
from deadpipe import track
with track(prompt_id="streaming_agent") as t:
stream = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
chunks = []
for chunk in stream:
if chunk.choices[0].delta.content:
t.mark_first_token() # Call once when first content arrives
chunks.append(chunk.choices[0].delta.content)
print(chunk.choices[0].delta.content, end="")
# Record with the stream object - captures timing
t.record(stream)Anthropic Claude Support
Use the universal wrap() or provider-specific wrap_anthropic():
from deadpipe import wrap, wrap_anthropic
from anthropic import Anthropic
# Option 1: Universal wrap (recommended)
client = wrap(Anthropic(), app="my_app")
# Option 2: Provider-specific wrapper
client = wrap_anthropic(Anthropic(), app="my_app")
# All calls automatically tracked - pass prompt_id per call
response = client.messages.create(
prompt_id="claude_agent",
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude!"}]
)
# → Provider auto-detected, cost estimated automaticallyRetry Tracking
with track(prompt_id="retrying_agent") as t:
for attempt in range(3):
try:
t.mark_retry() # Call before each retry
response = client.chat.completions.create(...)
t.record(response)
break
except Exception as e:
if attempt == 2:
raiseConfiguration Options
| Parameter | Type | Description |
|---|---|---|
prompt_id* | string | Unique identifier for this prompt type |
schema | BaseModel | Pydantic model for output validation |
enum_fields | Dict[str, List] | Enum field constraints for hallucination detection |
numeric_bounds | Dict[str, Tuple] | Numeric bounds for constraint validation |
app_id | string | Application identifier for grouping |
environment | string | Environment (production, staging, etc.) |
version | string | Version or git hash for change tracking |
api_key | string | Override DEADPIPE_API_KEY env var |
base_url | string | Override Deadpipe API URL |
Environment Variables
# Required
export DEADPIPE_API_KEY="dp_your_api_key"
# Optional
export DEADPIPE_APP_ID="my-app"
export DEADPIPE_ENVIRONMENT="production"
export DEADPIPE_VERSION="v1.2.3" # or GIT_COMMITWhat Gets Captured Automatically
Performance
- • Request/response latency
- • Time to first token (streaming)
- • Input/output tokens
- • Estimated cost (OpenAI, Anthropic, Gemini, Mistral, Cohere)
- • Model and provider (auto-detected)
Quality
- • Schema validation results
- • JSON parse success/failure
- • Empty output detection
- • Refusal detection
- • Output hash for deduplication
Change Tracking
- • Prompt hash (from messages)
- • Tool schema hash
- • System prompt hash
- • Version/git hash
Safety
- • Enum out-of-range detection
- • Numeric bounds checking
- • HTTP status codes
- • Provider error codes
- • Retry count tracking
Node.js SDK
The Node.js SDK provides wrap() for universal zero-config instrumentation, provider-specific wrappers like wrapOpenAI() and wrapAnthropic(), and track() for advanced manual tracking. Full TypeScript support included.
Installation
npm install deadpipe
# or
yarn add deadpipe
# or
pnpm add deadpipeRecommended: Universal wrap() Function
The universal wrap() function auto-detects your provider and wraps appropriately:
import { wrap } from 'deadpipe';
import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';
// Universal wrap() - wrap once with app context
const openai = wrap(new OpenAI(), { app: 'my_app' });
const anthropic = wrap(new Anthropic(), { app: 'my_app' });
// Pass promptId per call to identify each prompt
const response = await openai.chat.completions.create({
promptId: 'checkout_agent',
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Process refund for order 1938' }]
});
// → Automatically captures: latency, tokens, cost, schema validation, etc.Supported Providers
Provider-specific wrappers are available for explicit control:
| Provider | Wrapper Function | Client |
|---|---|---|
| OpenAI | wrapOpenAI() | new OpenAI() |
| Anthropic | wrapAnthropic() | new Anthropic() |
| Google AI (Gemini) | wrapGoogleAI() | GoogleGenerativeAI() |
| Mistral | wrapMistral() | new MistralClient() |
| Cohere | wrapCohere() | new CohereClient() |
Advanced: Manual Tracking with track()
For streaming, custom logic, or non-OpenAI providers:
import { track } from 'deadpipe';
import OpenAI from 'openai';
const client = new OpenAI();
// Store params in variable for context capture
const params = {
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello!' }]
};
const response = await track('my_agent', async (t) => {
const response = await client.chat.completions.create(params);
// Pass input params to capture full context (messages, tools, system prompt)
t.record(response, undefined, params);
return response;
});With Zod Schema Validation
Create a schema validator for output validation:
import { track, SchemaValidator } from 'deadpipe';
import { z } from 'zod';
const ProductSchema = z.object({
productId: z.string(),
confidence: z.number().min(0).max(1),
reasoning: z.string(),
category: z.enum(['electronics', 'clothing', 'home']),
alternatives: z.array(z.string()).default([])
});
// Create Zod adapter for Deadpipe
const zodValidator: SchemaValidator = {
validate: (data) => {
const result = ProductSchema.safeParse(data);
return {
success: result.success,
data: result.success ? result.data : undefined,
errors: result.success ? undefined : result.error.errors.map(e => e.message)
};
}
};
const result = await track('recommender', async (t) => {
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Recommend a laptop' }],
response_format: { type: 'json_object' }
});
return t.record(response);
}, { schema: zodValidator });
// result is typed as ProductSchema | null
if (result) {
console.log(result.productId); // TypeScript knows this is ProductSchema
}With Enum and Numeric Bounds
Add additional validation rules for hallucination detection:
const result = await track('pricing_agent', async (t) => {
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Set pricing' }]
});
return t.record(response);
}, {
enumFields: {
currency: ['USD', 'EUR', 'GBP'],
tier: ['free', 'pro', 'enterprise']
},
numericBounds: {
price: [0, 10000], // Must be between 0-10000
quantity: [1, 100] // Must be between 1-100
}
});
// → Automatically flags enum_out_of_range and numeric_out_of_boundsStreaming Support
const params = {
model: 'gpt-4',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true,
};
const response = await track('streaming_agent', async (t) => {
const stream = await client.chat.completions.create(params);
let fullContent = '';
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
t.markFirstToken(); // Call once on first token
fullContent += chunk.choices[0].delta.content;
}
}
// Record manually for streams - pass input params to capture context
t.record({
model: 'gpt-4',
choices: [{ message: { content: fullContent } }],
usage: { prompt_tokens: 10, completion_tokens: 100, total_tokens: 110 }
}, undefined, params);
return fullContent;
});Anthropic Claude Support
Use the universal wrap() or provider-specific wrapAnthropic():
import { wrap, wrapAnthropic } from 'deadpipe';
import Anthropic from '@anthropic-ai/sdk';
// Option 1: Universal wrap (recommended)
const client = wrap(new Anthropic(), { app: 'my_app' });
// Option 2: Provider-specific wrapper
const client = wrapAnthropic(new Anthropic(), { app: 'my_app' });
// All calls automatically tracked - pass promptId per call
const response = await client.messages.create({
promptId: 'claude_agent',
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello, Claude!' }]
});
// → Provider auto-detected, cost estimated automaticallyRetry Tracking
const response = await track('retrying_agent', async (t) => {
for (let attempt = 0; attempt < 3; attempt++) {
try {
t.markRetry(); // Call before each retry
const response = await client.chat.completions.create({...});
t.record(response);
return response;
} catch (error) {
if (attempt === 2) throw error;
}
}
});Framework Integration
Next.js API Routes
export async function POST(request: Request) {
const { prompt } = await request.json();
const response = await track('api_handler', async (t) => {
const completion = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: prompt }]
});
t.record(completion);
return completion;
});
return Response.json({
result: response.choices[0].message.content
});
}Express.js
app.post('/generate', async (req, res) => {
const response = await track('express_endpoint', async (t) => {
const completion = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: req.body.messages
});
t.record(completion);
return completion;
});
res.json(response);
});Configuration Options
| Option | Type | Description |
|---|---|---|
promptId* | string | Unique identifier for this prompt type |
apiKey | string | Override DEADPIPE_API_KEY env var |
baseUrl | string | Override Deadpipe API URL |
appId | string | Application ID for grouping |
environment | string | Environment (production, staging) |
version | string | Version or git hash |
schema | SchemaValidator | Schema validator for output validation |
enumFields | Record<string, any[]> | Enum field constraints |
numericBounds | Record<string, [number|null, number|null]> | Numeric bounds for validation |
Environment Variables
# Required
export DEADPIPE_API_KEY="dp_your_api_key"
# Optional
export DEADPIPE_APP_ID="my-app"
export DEADPIPE_ENVIRONMENT="production"
export DEADPIPE_VERSION="1.2.3"
export GIT_COMMIT="abc123" # Fallback for versionTracker Methods
The tracker object provides these methods:
await track('my-prompt', async (t) => {
// Mark when first token arrives (for streaming)
t.markFirstToken();
// Mark retry attempts
t.markRetry();
// Record the response (required)
t.record(response, parsedOutput?, input?);
// Or record an error manually
t.recordError(error);
// Check if already recorded
if (!t.isRecorded()) { /* ... */ }
// Get current telemetry
const telemetry = t.getTelemetry();
});What Gets Captured Automatically
Performance
- • Request/response latency
- • Time to first token (streaming)
- • Input/output tokens
- • Estimated cost (OpenAI, Anthropic, Gemini, Mistral, Cohere)
- • Model and provider (auto-detected)
Quality
- • Schema validation results
- • JSON parse success/failure
- • Empty output detection
- • Refusal detection
- • Output hash for deduplication
Change Tracking
- • Prompt hash (from messages)
- • Tool schema hash
- • System prompt hash
- • Version/git hash
Safety
- • Enum out-of-range detection
- • Numeric bounds checking
- • HTTP status codes
- • Provider error codes
- • Retry count tracking
Input Context Capture
Deadpipe automatically captures your input context (messages, tools, system prompts) to track when your prompts change and correlate changes with behavior shifts.
Why Input Context Matters
Without capturing input context, you can't detect when prompt changes cause drift. Deadpipe automatically extracts and hashes:
- • Prompt hash - Hash of all messages to detect prompt template changes
- • Tool schema hash - Hash of function/tool definitions for tool-calling prompts
- • System prompt hash - Hash of system instructions for change tracking
- • Input previews - Last user message for dashboard inspection
Zero-Config with wrap()
When using the universal wrapper, context is automatically extracted from every API call:
Python
from deadpipe import wrap
from openai import OpenAI
client = wrap(OpenAI(), app="my_app")
# Context automatically extracted - pass prompt_id per call
response = client.chat.completions.create(
prompt_id="my_agent",
model="gpt-4",
messages=[
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello!"}
],
tools=[{"type": "function", "function": {...}}]
)
# → prompt_hash, tool_schema_hash, system_prompt_hash auto-capturedNode.js
import { wrap } from 'deadpipe';
import OpenAI from 'openai';
const client = wrap(new OpenAI(), { app: 'my_app' });
// Context automatically extracted - pass promptId per call
const response = await client.chat.completions.create({
promptId: 'my_agent',
model: 'gpt-4',
messages: [
{ role: 'system', content: 'You are helpful' },
{ role: 'user', content: 'Hello!' }
],
tools: [{ type: 'function', function: {...} }]
});
// → prompt_hash, tool_schema_hash, system_prompt_hash auto-capturedManual Context Capture
When using track() directly, pass input parameters to record():
Python
from deadpipe import track
# Store params in variable
params = {
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello!"}
],
"tools": [{"type": "function", "function": {...}}]
}
with track(prompt_id="my_agent") as t:
response = client.chat.completions.create(**params)
# Pass params to capture context
t.record(response, input=params)Node.js
import { track } from 'deadpipe';
const params = {
model: 'gpt-4',
messages: [
{ role: 'system', content: 'You are helpful' },
{ role: 'user', content: 'Hello!' }
],
tools: [{ type: 'function', function: {...} }]
};
const response = await track('my_agent', async (t) => {
const response = await client.chat.completions.create(params);
// Pass params to capture context
t.record(response, undefined, params);
return response;
});Best Practice: Always Pass Input Parameters
❌ Bad - Missing Context
# Don't do this - no input context captured
response = client.chat.completions.create({
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
})
t.record(response)✅ Good - Context Captured
# Always pass input params
params = {
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}
response = client.chat.completions.create(**params)
t.record(response, input=params)What Gets Hashed
| Hash Type | What It Tracks | Use Case |
|---|---|---|
prompt_hash | All messages in conversation | Detect prompt template changes |
system_prompt_hash | System message content | Track system instruction changes |
tool_schema_hash | Function/tool definitions | Detect tool API changes |
output_hash | Model output content | Detect output pattern changes |
Change Correlation
When Deadpipe detects anomalies, it checks if any of these hashes changed recently. This helps you distinguish between model-side drift and your own code changes.
Fail-Safe Design
Deadpipe is designed to never break your LLM calls. All telemetry is sent asynchronously and failures are silently ignored.
Zero Impact on Performance
Asynchronous Telemetry
- • SDK never awaits telemetry sends
- • Fire-and-forget HTTP requests
- • No blocking on your LLM calls
- • Background threads (Python) or fetch (Node.js)
Fail-Safe Error Handling
- • All exceptions are caught and ignored
- • Deadpipe downtime doesn't affect you
- • Network failures are silent
- • Your app continues working normally
Automatic Features
The SDK automatically handles complexity so you don't have to:
Provider Auto-Detection
# No need to specify provider
with track(prompt_id="my_prompt") as t:
# Works with OpenAI, Anthropic, etc.
response = client.chat.completions.create(...)
t.record(response) # Provider auto-detectedResponse Parsing
# Handles all response formats
t.record(openai_response) # OpenAI format
t.record(anthropic_response) # Anthropic format
t.record(custom_response) # Best effort parsingCost Estimation
# Automatic cost calculation
# Supports OpenAI, Anthropic, Gemini, Mistral, Cohere
t.record(response) # Cost auto-estimatedContext Extraction
# With universal wrap()
client = wrap(OpenAI(), app="my_app")
response = client.chat.completions.create(
prompt_id="agent",
messages=[...],
tools=[...]
) # Context auto-extractedFail-Safe Example
from deadpipe import wrap
from openai import OpenAI
# If Deadpipe is down, your code continues working
client = wrap(OpenAI(), app="my_app")
try:
response = client.chat.completions.create(prompt_id="agent", ...)
# Works normally even if telemetry fails
except Exception as e:
# Only your LLM error, never Deadpipe
passWhat Happens When Deadpipe is Down
✅ Your App Continues
- • LLM calls work normally
- • No exceptions thrown
- • No performance impact
- • Silent telemetry failure
❌ What Doesn't Happen
- • No blocking or delays
- • No error propagation
- • No failed LLM requests
- • No broken user flows
Payload Optimization
Telemetry payloads are automatically optimized to reduce bandwidth:
Filtered Out
- •
undefinedandnullvalues - • Empty strings and arrays
- •
falsebooleans (keepstrue) - • Empty objects
Result
- • ~40-60% smaller payloads
- • Faster network requests
- • Reduced bandwidth costs
- • Better performance on slow connections
Development vs Production
Development Mode
# Warns about missing API key
export DEADPIPE_DEBUG=1
# or
export NODE_ENV=developmentShows helpful warnings without production noise
Production Mode
# Silent operation
export DEADPIPE_API_KEY="dp_xxx"
# No warnings, no errorsCompletely silent - never affects your app
Prompt Tracking API
For direct API integration without the SDK, use the prompt tracking endpoint.
POST /api/v1/prompt
Send prompt execution telemetry for tracking and baseline computation.
curl -X POST https://deadpipe.com/api/v1/prompt \
-H "Content-Type: application/json" \
-H "X-API-Key: dp_your_api_key" \
-d '{
"prompt_id": "review-analyzer",
"model": "gpt-4o-mini",
"provider": "openai",
"request_start": 1704067200000,
"end_time": 1704067201500,
"total_latency_ms": 1500,
"input_tokens": 150,
"output_tokens": 75,
"total_tokens": 225,
"estimated_cost": 0.00045,
"http_status": 200,
"output_length": 512,
"empty_output": false,
"json_parse_success": true,
"schema_validation_pass": true,
"refusal_flag": false,
"output_hash": "abc123..."
}'Request Fields
| Field | Type | Description |
|---|---|---|
| Identity | ||
prompt_id* | string | Unique identifier for this prompt type |
model* | string | Model identifier (gpt-4o-mini, claude-3-5-sonnet, etc.) |
provider | string | LLM provider (openai, anthropic, google) |
app_id | string | Application identifier |
environment | string | Environment (production, staging) |
version | string | Version or git hash |
| Timing | ||
request_start* | number | Unix timestamp (ms) when request started |
first_token_time | number | Unix timestamp (ms) of first token received |
end_time* | number | Unix timestamp (ms) when request completed |
total_latency_ms* | number | Total latency in milliseconds |
| Volume | ||
input_tokens | number | Number of input tokens |
output_tokens | number | Number of output tokens |
total_tokens | number | Total tokens (input + output) |
estimated_cost | number | Estimated cost in USD |
| Reliability | ||
http_status | number | HTTP response status code |
timeout | boolean | Whether the request timed out |
retry_count | number | Number of retry attempts |
provider_error_code | string | Provider-specific error code |
| Output Integrity | ||
output_length | number | Character length of output |
empty_output | boolean | Whether output was empty |
truncated | boolean | Whether output was truncated |
json_parse_success | boolean | Whether JSON parsing succeeded |
schema_validation_pass | boolean | Whether schema validation passed |
missing_required_fields | string[] | List of missing required fields |
| Behavioral Fingerprint | ||
output_hash | string | SHA-256 hash of output for deduplication |
refusal_flag | boolean | Whether model refused to respond |
tool_call_flag | boolean | Whether response included tool calls |
| Change Context | ||
prompt_hash | string | Hash of the prompt template |
tool_schema_hash | string | Hash of tool/function schemas |
system_prompt_hash | string | Hash of system prompt |
Response
{
"received": true,
"prompt_id": "review-analyzer",
"event_id": "evt_abc123",
"baseline": {
"latency_mean": 1200,
"latency_p95": 2500,
"token_mean": 200,
"schema_pass_rate": 0.98,
"sample_count": 1500
},
"anomalies": []
}GET /api/v1/prompt?prompt_id={id}
Retrieve baselines and statistics for a specific prompt.
curl https://deadpipe.com/api/v1/prompt?prompt_id=review-analyzer \
-H "X-API-Key: dp_your_api_key"Baselines & Drift Detection
Deadpipe automatically computes rolling baselines for every prompt and alerts you when metrics drift beyond thresholds.
How Baselines Work
- Every prompt execution updates a rolling statistical baseline
- We use Welford's algorithm for efficient online mean/variance computation
- Each incoming event is compared against the baseline in real-time
- Anomalies are flagged when metrics exceed threshold deviations
Baseline Metrics
Latency
- •
latency_mean- Average latency - •
latency_variance- Latency variance - •
latency_p95- 95th percentile
Tokens
- •
input_token_mean- Avg input tokens - •
output_token_mean- Avg output tokens - •
token_variance- Token variance
Reliability
- •
success_rate- % successful calls - •
error_rate- % errors (4xx/5xx) - •
timeout_rate- % timeouts
Output Quality
- •
schema_pass_rate- % valid outputs - •
empty_rate- % empty outputs - •
refusal_rate- % refusals
Drift Detection Rules
Anomalies are automatically detected when:
| Anomaly Type | Trigger Condition |
|---|---|
latency_spike | Latency > p95 + 2σ |
token_anomaly | Token count deviates > 3σ from mean |
schema_violation | Schema validation fails (when baseline pass rate > 95%) |
empty_output | Empty output (when baseline empty rate < 5%) |
refusal | Refusal detected (when baseline refusal rate < 5%) |
Baseline Warm-Up
Drift detection activates after 50 samples to ensure statistical significance. During warm-up, we collect data without triggering anomaly alerts.
Schema Validation
Validate every LLM output against your expected structure and track validation rates over time.
Python: Pydantic Models
from pydantic import BaseModel, Field
from typing import Literal, List
class ProductRecommendation(BaseModel):
product_id: str
confidence: float = Field(ge=0, le=1)
reasoning: str
category: Literal["electronics", "clothing", "home"]
alternatives: List[str] = []
with dp.track("recommender", schema=ProductRecommendation) as t:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Recommend..."}],
response_format={"type": "json_object"}
)
result = t.record(response)
# result.schema_pass tells you if validation passed
# result.validation_error contains the error message if failed
# result.validated contains the parsed Pydantic model if passedTypeScript: Zod Schemas
import { z } from 'zod';
const ProductSchema = z.object({
productId: z.string(),
confidence: z.number().min(0).max(1),
reasoning: z.string(),
category: z.enum(['electronics', 'clothing', 'home']),
alternatives: z.array(z.string()).default([])
});
const result = await dp.track('recommender', {
schema: ProductSchema,
fn: async () => {
return await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Recommend...' }],
response_format: { type: 'json_object' }
});
}
});
if (result.validated) {
// TypeScript knows this is ProductSchema type
console.log(result.validated.productId);
}Schema Drift Alerts
When your baseline schema pass rate is above 95% and validation starts failing, we flag this as a schema_violation anomaly. This catches cases where model updates or prompt changes cause structural regressions.
Output Integrity
Beyond schema validation, we track multiple signals that indicate potential model misbehavior.
Refusal Detection
Detects when models refuse to respond with patterns like "I cannot", "I'm not able to", etc.
Empty Output
Flags when outputs are empty or contain only whitespace.
JSON Parse Failures
Tracks when JSON mode responses fail to parse.
Truncation
Detects when outputs are cut off due to token limits.
Hallucination Proxy Flags
We don't claim to detect hallucinations directly, but we track proxy signals:
- • Enum out of range — When a Literal/enum field contains unexpected value
- • Numeric out of bounds — When constrained fields (0-1, positive, etc.) violate constraints
- • Schema violations — When structured output doesn't match expected format
- • Output pattern shift — When output hash distribution changes significantly
Change Detection
Track when your prompts, system prompts, or tool schemas change to correlate with behavior shifts.
Change Context Hashes
The SDK automatically computes and tracks hashes for:
| Hash | What It Tracks |
|---|---|
prompt_hash | Hash of the prompt template (user message) |
system_prompt_hash | Hash of the system prompt |
tool_schema_hash | Hash of function/tool definitions |
output_hash | Hash of the output for deduplication |
Correlating Changes with Drift
When we detect anomalies, we check if any of these hashes changed recently. This helps you identify whether drift is caused by your changes or model-side updates.
LLM Alerts
Configure alerts in your Dashboard to get notified when prompts drift.
Latency P95 Spike
Alert when prompt latency exceeds baseline p95
Schema Validation Drop
Alert when schema pass rate drops below threshold
Empty Output Spike
Alert when empty output rate increases
Refusal Rate Increase
Alert when model refusal rate spikes
Error Rate Threshold
Alert when API error rate exceeds limit
Cost Anomaly
Alert when prompt costs spike unexpectedly
Notification Channels
Alerts can be sent via Email, Slack, or Webhooks. Configure your preferred channels in the Dashboard settings.
Rate Limits
All API endpoints are protected with rate limiting to ensure fair usage.
Rate Limits by Endpoint
| Endpoint | Limit | Window |
|---|---|---|
/api/v1/prompt | 1000 requests | 1 minute |
/api/v1/heartbeat | 60 requests | 1 minute |
/api/v1/monitor/events | 100 requests | 1 minute |
Payload Size Limits
| Endpoint | Max Payload |
|---|---|
/api/v1/prompt | 50 KB |
/api/v1/heartbeat | 10 KB |
Rate Limit Response
{
"error": "Rate limit exceeded",
"retryAfter": 60
}Pipeline Heartbeat
For non-LLM batch jobs and scheduled tasks, use pipeline heartbeats to track job health.
Basic Usage
from deadpipe import Deadpipe
dp = Deadpipe()
@dp.heartbeat("daily-etl")
def run_etl():
process_data()
return {"records_processed": 1500}
run_etl() # Sends heartbeat on success/failurecURL
curl -X POST https://deadpipe.com/api/v1/heartbeat \
-H "Content-Type: application/json" \
-H "X-API-Key: dp_your_api_key" \
-d '{"pipeline_id": "my-job", "status": "success"}'Auto-Creation
Pipelines are automatically created on the first heartbeat if they don't exist. Customize settings in the Dashboard.
Ready to monitor your prompts?
Create your free account and start detecting drift in under 5 minutes.