LLM Observability Docs

Everything you need to monitor your LLM prompts, detect drift, and prevent regressions in production.

Overview

Deadpipe provides LLM observability that answers one question: "Is this prompt still behaving safely?"

The Core Problem

LLMs are non-deterministic. The same prompt can produce different outputs, and model updates can silently break your application. You need baselines to detect when behavior shifts.

  • You cannot detect regression without a baseline
  • You cannot alert without stable fingerprints
  • You cannot audit without provenance

Automatic Baselines

We compute rolling baselines for every prompt: latency p95, token distributions, schema pass rates. No configuration required.

Drift Detection

Get alerted when latency spikes, token counts shift, schema validation drops, or output patterns change unexpectedly.

Schema Validation

Pass your Pydantic model and we validate every LLM output, tracking pass rates and detecting regressions.

Hallucination Proxies

Track refusals, empty outputs, JSON parse failures, and enum violations as early indicators of model misbehavior.

Quick Start

Get LLM observability running in under 5 minutes with zero-config instrumentation.

1. Install the SDK

Python

pip install deadpipe

Node.js

npm install deadpipe

2. Set your API key

export DEADPIPE_API_KEY="dp_your_api_key"

3. Wrap your client (zero code changes!)

The universal wrap() function auto-detects your provider:

Python

from deadpipe import wrap
from openai import OpenAI
from anthropic import Anthropic

# Universal wrap() - wrap once with app context
openai = wrap(OpenAI(), app="my_app")
anthropic = wrap(Anthropic(), app="my_app")

# Pass prompt_id per call to identify each prompt
response = openai.chat.completions.create(
    prompt_id="checkout_agent",
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Process refund"}]
)
# → Automatically captures everything: latency, tokens, cost, schema validation, etc.

Node.js

import { wrap } from 'deadpipe';
import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';

// Universal wrap() - wrap once with app context
const openai = wrap(new OpenAI(), { app: 'my_app' });
const anthropic = wrap(new Anthropic(), { app: 'my_app' });

// Pass promptId per call to identify each prompt
const response = await openai.chat.completions.create({
  promptId: 'checkout_agent',
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Process refund' }]
});
// → Automatically captures everything: latency, tokens, cost, schema validation, etc.

Alternative: Manual tracking with schema validation

Python

from deadpipe import wrap_openai
from pydantic import BaseModel
from openai import OpenAI

class RefundResponse(BaseModel):
    order_id: str
    amount: float
    status: str

# Wrap client once
client = wrap_openai(OpenAI(), app="my_app")

# Pass schema per-call (each prompt can have its own schema)
response = client.chat.completions.create(
    prompt_id="checkout_agent",
    schema=RefundResponse,  # Per-call schema validation
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Process refund"}],
    response_format={"type": "json_object"}
)
# → Schema validation + all automatic tracking

Node.js

import { wrapOpenAI } from 'deadpipe';
import { z } from 'zod';
import OpenAI from 'openai';

const RefundSchema = z.object({
  orderId: z.string(),
  amount: z.number(),
  status: z.string()
});

// Create validator helper
const zodValidator = (schema: z.ZodSchema) => ({
  validate: (data: unknown) => {
    const result = schema.safeParse(data);
    return {
      success: result.success,
      data: result.success ? result.data : undefined,
      errors: result.success ? undefined : result.error.errors.map(e => e.message)
    };
  }
});

// Wrap client once
const client = wrapOpenAI(new OpenAI(), { app: 'my_app' });

// Pass schema per-call (each prompt can have its own schema)
const response = await client.chat.completions.create({
  promptId: 'checkout_agent',
  schema: zodValidator(RefundSchema),  // Per-call schema validation
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Process refund' }],
  response_format: { type: 'json_object' }
});
// → Schema validation + all automatic tracking

What Gets Captured Automatically

Performance

  • • Request/response latency
  • • Time to first token (streaming)
  • • Input/output tokens
  • • Estimated cost (OpenAI, Anthropic, Gemini, Mistral, Cohere)
  • • Model and provider (auto-detected)

Quality & Safety

  • • Schema validation results
  • • JSON parse success/failure
  • • Empty output detection
  • • Refusal detection
  • • Enum/bounds constraint checking

Change Tracking

  • • Prompt hash (message content)
  • • Tool schema hash
  • • System prompt hash
  • • Output hash for deduplication
  • • Version/git hash

Python SDK

The Python SDK provides multiple ways to track LLM calls: wrap() for universal zero-config instrumentation, provider-specific wrappers like wrap_openai() and wrap_anthropic(), and track() context manager for manual tracking.

Installation

pip install deadpipe

Recommended: Universal wrap() Function

The universal wrap() function auto-detects your provider and wraps appropriately:

from deadpipe import wrap
from openai import OpenAI
from anthropic import Anthropic

# Universal wrap() - wrap once with app context
openai_client = wrap(OpenAI(), app="my_app")
anthropic_client = wrap(Anthropic(), app="my_app")

# Pass prompt_id per call to identify each prompt
response = openai_client.chat.completions.create(
    prompt_id="checkout_agent",
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Process refund for order 1938"}]
)
# → Automatically captures: latency, tokens, cost, schema validation, etc.

Supported Providers

Provider-specific wrappers are available for explicit control:

ProviderWrapper FunctionClient
OpenAIwrap_openai()OpenAI()
Anthropicwrap_anthropic()Anthropic()
Google AI (Gemini)wrap_google_ai()genai.GenerativeModel()
Mistralwrap_mistral()MistralClient()
Coherewrap_cohere()cohere.Client()

Advanced: Manual Tracking with track()

For streaming, custom logic, or non-OpenAI providers:

from deadpipe import track
from openai import OpenAI

client = OpenAI()

# Store params in variable for context capture
params = {
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
}

with track(prompt_id="my_agent") as t:
    response = client.chat.completions.create(**params)
    # Pass input params to capture full context (messages, tools, system prompt)
    t.record(response, input=params)

With Pydantic Schema Validation

Pass a Pydantic model to validate every LLM output and track schema pass rates over time.

from deadpipe import wrap_openai
from pydantic import BaseModel, Field
from typing import Literal
from openai import OpenAI

class ProductRecommendation(BaseModel):
    product_id: str
    confidence: float = Field(ge=0, le=1)
    reasoning: str
    category: Literal["electronics", "clothing", "home"]
    alternatives: list[str] = []

# Schema validation with wrapper
client = wrap_openai(OpenAI(), app="my_app")

response = client.chat.completions.create(
    prompt_id="recommender",
    schema=ProductRecommendation,  # Per-call schema validation
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Recommend a laptop"}],
    response_format={"type": "json_object"}
)
# → Automatically validates output against ProductRecommendation
# → Tracks schema pass rates over time
# → Alerts when validation rates drop

With Enum and Numeric Bounds

Add additional validation rules for hallucination detection:

from deadpipe import track

with track(
    prompt_id="pricing_agent",
    enum_fields={
        "currency": ["USD", "EUR", "GBP"],
        "tier": ["free", "pro", "enterprise"]
    },
    numeric_bounds={
        "price": (0, 10000),      # Must be between 0-10000
        "quantity": (1, 100)      # Must be between 1-100
    }
) as t:
    response = client.chat.completions.create(...)
    t.record(response)
    # → Automatically flags enum_out_of_range and numeric_out_of_bounds

Streaming Support

from deadpipe import track

with track(prompt_id="streaming_agent") as t:
    stream = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Write a story"}],
        stream=True
    )

    chunks = []
    for chunk in stream:
        if chunk.choices[0].delta.content:
            t.mark_first_token()  # Call once when first content arrives
            chunks.append(chunk.choices[0].delta.content)
            print(chunk.choices[0].delta.content, end="")

    # Record with the stream object - captures timing
    t.record(stream)

Anthropic Claude Support

Use the universal wrap() or provider-specific wrap_anthropic():

from deadpipe import wrap, wrap_anthropic
from anthropic import Anthropic

# Option 1: Universal wrap (recommended)
client = wrap(Anthropic(), app="my_app")

# Option 2: Provider-specific wrapper
client = wrap_anthropic(Anthropic(), app="my_app")

# All calls automatically tracked - pass prompt_id per call
response = client.messages.create(
    prompt_id="claude_agent",
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude!"}]
)
# → Provider auto-detected, cost estimated automatically

Retry Tracking

with track(prompt_id="retrying_agent") as t:
    for attempt in range(3):
        try:
            t.mark_retry()  # Call before each retry
            response = client.chat.completions.create(...)
            t.record(response)
            break
        except Exception as e:
            if attempt == 2:
                raise

Configuration Options

ParameterTypeDescription
prompt_id*stringUnique identifier for this prompt type
schemaBaseModelPydantic model for output validation
enum_fieldsDict[str, List]Enum field constraints for hallucination detection
numeric_boundsDict[str, Tuple]Numeric bounds for constraint validation
app_idstringApplication identifier for grouping
environmentstringEnvironment (production, staging, etc.)
versionstringVersion or git hash for change tracking
api_keystringOverride DEADPIPE_API_KEY env var
base_urlstringOverride Deadpipe API URL

Environment Variables

# Required
export DEADPIPE_API_KEY="dp_your_api_key"

# Optional
export DEADPIPE_APP_ID="my-app"
export DEADPIPE_ENVIRONMENT="production"
export DEADPIPE_VERSION="v1.2.3"  # or GIT_COMMIT

What Gets Captured Automatically

Performance

  • • Request/response latency
  • • Time to first token (streaming)
  • • Input/output tokens
  • • Estimated cost (OpenAI, Anthropic, Gemini, Mistral, Cohere)
  • • Model and provider (auto-detected)

Quality

  • • Schema validation results
  • • JSON parse success/failure
  • • Empty output detection
  • • Refusal detection
  • • Output hash for deduplication

Change Tracking

  • • Prompt hash (from messages)
  • • Tool schema hash
  • • System prompt hash
  • • Version/git hash

Safety

  • • Enum out-of-range detection
  • • Numeric bounds checking
  • • HTTP status codes
  • • Provider error codes
  • • Retry count tracking

Node.js SDK

The Node.js SDK provides wrap() for universal zero-config instrumentation, provider-specific wrappers like wrapOpenAI() and wrapAnthropic(), and track() for advanced manual tracking. Full TypeScript support included.

Installation

npm install deadpipe
# or
yarn add deadpipe
# or
pnpm add deadpipe

Recommended: Universal wrap() Function

The universal wrap() function auto-detects your provider and wraps appropriately:

import { wrap } from 'deadpipe';
import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';

// Universal wrap() - wrap once with app context
const openai = wrap(new OpenAI(), { app: 'my_app' });
const anthropic = wrap(new Anthropic(), { app: 'my_app' });

// Pass promptId per call to identify each prompt
const response = await openai.chat.completions.create({
  promptId: 'checkout_agent',
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Process refund for order 1938' }]
});
// → Automatically captures: latency, tokens, cost, schema validation, etc.

Supported Providers

Provider-specific wrappers are available for explicit control:

ProviderWrapper FunctionClient
OpenAIwrapOpenAI()new OpenAI()
AnthropicwrapAnthropic()new Anthropic()
Google AI (Gemini)wrapGoogleAI()GoogleGenerativeAI()
MistralwrapMistral()new MistralClient()
CoherewrapCohere()new CohereClient()

Advanced: Manual Tracking with track()

For streaming, custom logic, or non-OpenAI providers:

import { track } from 'deadpipe';
import OpenAI from 'openai';

const client = new OpenAI();

// Store params in variable for context capture
const params = {
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hello!' }]
};

const response = await track('my_agent', async (t) => {
  const response = await client.chat.completions.create(params);
  // Pass input params to capture full context (messages, tools, system prompt)
  t.record(response, undefined, params);
  return response;
});

With Zod Schema Validation

Create a schema validator for output validation:

import { track, SchemaValidator } from 'deadpipe';
import { z } from 'zod';

const ProductSchema = z.object({
  productId: z.string(),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
  category: z.enum(['electronics', 'clothing', 'home']),
  alternatives: z.array(z.string()).default([])
});

// Create Zod adapter for Deadpipe
const zodValidator: SchemaValidator = {
  validate: (data) => {
    const result = ProductSchema.safeParse(data);
    return {
      success: result.success,
      data: result.success ? result.data : undefined,
      errors: result.success ? undefined : result.error.errors.map(e => e.message)
    };
  }
};

const result = await track('recommender', async (t) => {
  const response = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: 'Recommend a laptop' }],
    response_format: { type: 'json_object' }
  });
  return t.record(response);
}, { schema: zodValidator });

// result is typed as ProductSchema | null
if (result) {
  console.log(result.productId); // TypeScript knows this is ProductSchema
}

With Enum and Numeric Bounds

Add additional validation rules for hallucination detection:

const result = await track('pricing_agent', async (t) => {
  const response = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: 'Set pricing' }]
  });
  return t.record(response);
}, {
  enumFields: {
    currency: ['USD', 'EUR', 'GBP'],
    tier: ['free', 'pro', 'enterprise']
  },
  numericBounds: {
    price: [0, 10000],      // Must be between 0-10000
    quantity: [1, 100]      // Must be between 1-100
  }
});
// → Automatically flags enum_out_of_range and numeric_out_of_bounds

Streaming Support

const params = {
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true,
};

const response = await track('streaming_agent', async (t) => {
  const stream = await client.chat.completions.create(params);

  let fullContent = '';
  for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
      t.markFirstToken(); // Call once on first token
      fullContent += chunk.choices[0].delta.content;
    }
  }

  // Record manually for streams - pass input params to capture context
  t.record({
    model: 'gpt-4',
    choices: [{ message: { content: fullContent } }],
    usage: { prompt_tokens: 10, completion_tokens: 100, total_tokens: 110 }
  }, undefined, params);

  return fullContent;
});

Anthropic Claude Support

Use the universal wrap() or provider-specific wrapAnthropic():

import { wrap, wrapAnthropic } from 'deadpipe';
import Anthropic from '@anthropic-ai/sdk';

// Option 1: Universal wrap (recommended)
const client = wrap(new Anthropic(), { app: 'my_app' });

// Option 2: Provider-specific wrapper
const client = wrapAnthropic(new Anthropic(), { app: 'my_app' });

// All calls automatically tracked - pass promptId per call
const response = await client.messages.create({
  promptId: 'claude_agent',
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello, Claude!' }]
});
// → Provider auto-detected, cost estimated automatically

Retry Tracking

const response = await track('retrying_agent', async (t) => {
  for (let attempt = 0; attempt < 3; attempt++) {
    try {
      t.markRetry(); // Call before each retry
      const response = await client.chat.completions.create({...});
      t.record(response);
      return response;
    } catch (error) {
      if (attempt === 2) throw error;
    }
  }
});

Framework Integration

Next.js API Routes

export async function POST(request: Request) {
  const { prompt } = await request.json();

  const response = await track('api_handler', async (t) => {
    const completion = await client.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: prompt }]
    });
    t.record(completion);
    return completion;
  });

  return Response.json({
    result: response.choices[0].message.content
  });
}

Express.js

app.post('/generate', async (req, res) => {
  const response = await track('express_endpoint', async (t) => {
    const completion = await client.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: req.body.messages
    });
    t.record(completion);
    return completion;
  });

  res.json(response);
});

Configuration Options

OptionTypeDescription
promptId*stringUnique identifier for this prompt type
apiKeystringOverride DEADPIPE_API_KEY env var
baseUrlstringOverride Deadpipe API URL
appIdstringApplication ID for grouping
environmentstringEnvironment (production, staging)
versionstringVersion or git hash
schemaSchemaValidatorSchema validator for output validation
enumFieldsRecord<string, any[]>Enum field constraints
numericBoundsRecord<string, [number|null, number|null]>Numeric bounds for validation

Environment Variables

# Required
export DEADPIPE_API_KEY="dp_your_api_key"

# Optional
export DEADPIPE_APP_ID="my-app"
export DEADPIPE_ENVIRONMENT="production"
export DEADPIPE_VERSION="1.2.3"
export GIT_COMMIT="abc123"  # Fallback for version

Tracker Methods

The tracker object provides these methods:

await track('my-prompt', async (t) => {
  // Mark when first token arrives (for streaming)
  t.markFirstToken();

  // Mark retry attempts
  t.markRetry();

  // Record the response (required)
  t.record(response, parsedOutput?, input?);

  // Or record an error manually
  t.recordError(error);

  // Check if already recorded
  if (!t.isRecorded()) { /* ... */ }

  // Get current telemetry
  const telemetry = t.getTelemetry();
});

What Gets Captured Automatically

Performance

  • • Request/response latency
  • • Time to first token (streaming)
  • • Input/output tokens
  • • Estimated cost (OpenAI, Anthropic, Gemini, Mistral, Cohere)
  • • Model and provider (auto-detected)

Quality

  • • Schema validation results
  • • JSON parse success/failure
  • • Empty output detection
  • • Refusal detection
  • • Output hash for deduplication

Change Tracking

  • • Prompt hash (from messages)
  • • Tool schema hash
  • • System prompt hash
  • • Version/git hash

Safety

  • • Enum out-of-range detection
  • • Numeric bounds checking
  • • HTTP status codes
  • • Provider error codes
  • • Retry count tracking

Input Context Capture

Deadpipe automatically captures your input context (messages, tools, system prompts) to track when your prompts change and correlate changes with behavior shifts.

Why Input Context Matters

Without capturing input context, you can't detect when prompt changes cause drift. Deadpipe automatically extracts and hashes:

  • Prompt hash - Hash of all messages to detect prompt template changes
  • Tool schema hash - Hash of function/tool definitions for tool-calling prompts
  • System prompt hash - Hash of system instructions for change tracking
  • Input previews - Last user message for dashboard inspection

Zero-Config with wrap()

When using the universal wrapper, context is automatically extracted from every API call:

Python

from deadpipe import wrap
from openai import OpenAI

client = wrap(OpenAI(), app="my_app")

# Context automatically extracted - pass prompt_id per call
response = client.chat.completions.create(
    prompt_id="my_agent",
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are helpful"},
        {"role": "user", "content": "Hello!"}
    ],
    tools=[{"type": "function", "function": {...}}]
)
# → prompt_hash, tool_schema_hash, system_prompt_hash auto-captured

Node.js

import { wrap } from 'deadpipe';
import OpenAI from 'openai';

const client = wrap(new OpenAI(), { app: 'my_app' });

// Context automatically extracted - pass promptId per call
const response = await client.chat.completions.create({
  promptId: 'my_agent',
  model: 'gpt-4',
  messages: [
    { role: 'system', content: 'You are helpful' },
    { role: 'user', content: 'Hello!' }
  ],
  tools: [{ type: 'function', function: {...} }]
});
// → prompt_hash, tool_schema_hash, system_prompt_hash auto-captured

Manual Context Capture

When using track() directly, pass input parameters to record():

Python

from deadpipe import track

# Store params in variable
params = {
    "model": "gpt-4",
    "messages": [
        {"role": "system", "content": "You are helpful"},
        {"role": "user", "content": "Hello!"}
    ],
    "tools": [{"type": "function", "function": {...}}]
}

with track(prompt_id="my_agent") as t:
    response = client.chat.completions.create(**params)
    # Pass params to capture context
    t.record(response, input=params)

Node.js

import { track } from 'deadpipe';

const params = {
  model: 'gpt-4',
  messages: [
    { role: 'system', content: 'You are helpful' },
    { role: 'user', content: 'Hello!' }
  ],
  tools: [{ type: 'function', function: {...} }]
};

const response = await track('my_agent', async (t) => {
  const response = await client.chat.completions.create(params);
  // Pass params to capture context
  t.record(response, undefined, params);
  return response;
});

Best Practice: Always Pass Input Parameters

❌ Bad - Missing Context

# Don't do this - no input context captured
response = client.chat.completions.create({
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
})
t.record(response)

✅ Good - Context Captured

# Always pass input params
params = {
    "model": "gpt-4", 
    "messages": [{"role": "user", "content": "Hello!"}]
}
response = client.chat.completions.create(**params)
t.record(response, input=params)

What Gets Hashed

Hash TypeWhat It TracksUse Case
prompt_hashAll messages in conversationDetect prompt template changes
system_prompt_hashSystem message contentTrack system instruction changes
tool_schema_hashFunction/tool definitionsDetect tool API changes
output_hashModel output contentDetect output pattern changes

Change Correlation

When Deadpipe detects anomalies, it checks if any of these hashes changed recently. This helps you distinguish between model-side drift and your own code changes.

Fail-Safe Design

Deadpipe is designed to never break your LLM calls. All telemetry is sent asynchronously and failures are silently ignored.

Zero Impact on Performance

Asynchronous Telemetry

  • • SDK never awaits telemetry sends
  • • Fire-and-forget HTTP requests
  • • No blocking on your LLM calls
  • • Background threads (Python) or fetch (Node.js)

Fail-Safe Error Handling

  • • All exceptions are caught and ignored
  • • Deadpipe downtime doesn't affect you
  • • Network failures are silent
  • • Your app continues working normally

Automatic Features

The SDK automatically handles complexity so you don't have to:

Provider Auto-Detection

# No need to specify provider
with track(prompt_id="my_prompt") as t:
    # Works with OpenAI, Anthropic, etc.
    response = client.chat.completions.create(...)
    t.record(response)  # Provider auto-detected

Response Parsing

# Handles all response formats
t.record(openai_response)    # OpenAI format
t.record(anthropic_response) # Anthropic format
t.record(custom_response)    # Best effort parsing

Cost Estimation

# Automatic cost calculation
# Supports OpenAI, Anthropic, Gemini, Mistral, Cohere
t.record(response)  # Cost auto-estimated

Context Extraction

# With universal wrap()
client = wrap(OpenAI(), app="my_app")
response = client.chat.completions.create(
    prompt_id="agent",
    messages=[...],
    tools=[...]
)  # Context auto-extracted

Fail-Safe Example

from deadpipe import wrap
from openai import OpenAI

# If Deadpipe is down, your code continues working
client = wrap(OpenAI(), app="my_app")

try:
    response = client.chat.completions.create(prompt_id="agent", ...)
    # Works normally even if telemetry fails
except Exception as e:
    # Only your LLM error, never Deadpipe
    pass

What Happens When Deadpipe is Down

✅ Your App Continues

  • • LLM calls work normally
  • • No exceptions thrown
  • • No performance impact
  • • Silent telemetry failure

❌ What Doesn't Happen

  • • No blocking or delays
  • • No error propagation
  • • No failed LLM requests
  • • No broken user flows

Payload Optimization

Telemetry payloads are automatically optimized to reduce bandwidth:

Filtered Out

  • undefined and null values
  • • Empty strings and arrays
  • false booleans (keeps true)
  • • Empty objects

Result

  • • ~40-60% smaller payloads
  • • Faster network requests
  • • Reduced bandwidth costs
  • • Better performance on slow connections

Development vs Production

Development Mode

# Warns about missing API key
export DEADPIPE_DEBUG=1
# or
export NODE_ENV=development

Shows helpful warnings without production noise

Production Mode

# Silent operation
export DEADPIPE_API_KEY="dp_xxx"
# No warnings, no errors

Completely silent - never affects your app

Prompt Tracking API

For direct API integration without the SDK, use the prompt tracking endpoint.

POST /api/v1/prompt

Send prompt execution telemetry for tracking and baseline computation.

curl -X POST https://deadpipe.com/api/v1/prompt \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dp_your_api_key" \
  -d '{
    "prompt_id": "review-analyzer",
    "model": "gpt-4o-mini",
    "provider": "openai",
    "request_start": 1704067200000,
    "end_time": 1704067201500,
    "total_latency_ms": 1500,
    "input_tokens": 150,
    "output_tokens": 75,
    "total_tokens": 225,
    "estimated_cost": 0.00045,
    "http_status": 200,
    "output_length": 512,
    "empty_output": false,
    "json_parse_success": true,
    "schema_validation_pass": true,
    "refusal_flag": false,
    "output_hash": "abc123..."
  }'

Request Fields

FieldTypeDescription
Identity
prompt_id*stringUnique identifier for this prompt type
model*stringModel identifier (gpt-4o-mini, claude-3-5-sonnet, etc.)
providerstringLLM provider (openai, anthropic, google)
app_idstringApplication identifier
environmentstringEnvironment (production, staging)
versionstringVersion or git hash
Timing
request_start*numberUnix timestamp (ms) when request started
first_token_timenumberUnix timestamp (ms) of first token received
end_time*numberUnix timestamp (ms) when request completed
total_latency_ms*numberTotal latency in milliseconds
Volume
input_tokensnumberNumber of input tokens
output_tokensnumberNumber of output tokens
total_tokensnumberTotal tokens (input + output)
estimated_costnumberEstimated cost in USD
Reliability
http_statusnumberHTTP response status code
timeoutbooleanWhether the request timed out
retry_countnumberNumber of retry attempts
provider_error_codestringProvider-specific error code
Output Integrity
output_lengthnumberCharacter length of output
empty_outputbooleanWhether output was empty
truncatedbooleanWhether output was truncated
json_parse_successbooleanWhether JSON parsing succeeded
schema_validation_passbooleanWhether schema validation passed
missing_required_fieldsstring[]List of missing required fields
Behavioral Fingerprint
output_hashstringSHA-256 hash of output for deduplication
refusal_flagbooleanWhether model refused to respond
tool_call_flagbooleanWhether response included tool calls
Change Context
prompt_hashstringHash of the prompt template
tool_schema_hashstringHash of tool/function schemas
system_prompt_hashstringHash of system prompt

Response

{
  "received": true,
  "prompt_id": "review-analyzer",
  "event_id": "evt_abc123",
  "baseline": {
    "latency_mean": 1200,
    "latency_p95": 2500,
    "token_mean": 200,
    "schema_pass_rate": 0.98,
    "sample_count": 1500
  },
  "anomalies": []
}

GET /api/v1/prompt?prompt_id={id}

Retrieve baselines and statistics for a specific prompt.

curl https://deadpipe.com/api/v1/prompt?prompt_id=review-analyzer \
  -H "X-API-Key: dp_your_api_key"

Baselines & Drift Detection

Deadpipe automatically computes rolling baselines for every prompt and alerts you when metrics drift beyond thresholds.

How Baselines Work

  1. Every prompt execution updates a rolling statistical baseline
  2. We use Welford's algorithm for efficient online mean/variance computation
  3. Each incoming event is compared against the baseline in real-time
  4. Anomalies are flagged when metrics exceed threshold deviations

Baseline Metrics

Latency

  • latency_mean - Average latency
  • latency_variance - Latency variance
  • latency_p95 - 95th percentile

Tokens

  • input_token_mean - Avg input tokens
  • output_token_mean - Avg output tokens
  • token_variance - Token variance

Reliability

  • success_rate - % successful calls
  • error_rate - % errors (4xx/5xx)
  • timeout_rate - % timeouts

Output Quality

  • schema_pass_rate - % valid outputs
  • empty_rate - % empty outputs
  • refusal_rate - % refusals

Drift Detection Rules

Anomalies are automatically detected when:

Anomaly TypeTrigger Condition
latency_spikeLatency > p95 + 2σ
token_anomalyToken count deviates > 3σ from mean
schema_violationSchema validation fails (when baseline pass rate > 95%)
empty_outputEmpty output (when baseline empty rate < 5%)
refusalRefusal detected (when baseline refusal rate < 5%)

Baseline Warm-Up

Drift detection activates after 50 samples to ensure statistical significance. During warm-up, we collect data without triggering anomaly alerts.

Schema Validation

Validate every LLM output against your expected structure and track validation rates over time.

Python: Pydantic Models

from pydantic import BaseModel, Field
from typing import Literal, List

class ProductRecommendation(BaseModel):
    product_id: str
    confidence: float = Field(ge=0, le=1)
    reasoning: str
    category: Literal["electronics", "clothing", "home"]
    alternatives: List[str] = []

with dp.track("recommender", schema=ProductRecommendation) as t:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Recommend..."}],
        response_format={"type": "json_object"}
    )
    result = t.record(response)
    
    # result.schema_pass tells you if validation passed
    # result.validation_error contains the error message if failed
    # result.validated contains the parsed Pydantic model if passed

TypeScript: Zod Schemas

import { z } from 'zod';

const ProductSchema = z.object({
  productId: z.string(),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
  category: z.enum(['electronics', 'clothing', 'home']),
  alternatives: z.array(z.string()).default([])
});

const result = await dp.track('recommender', {
  schema: ProductSchema,
  fn: async () => {
    return await client.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: 'Recommend...' }],
      response_format: { type: 'json_object' }
    });
  }
});

if (result.validated) {
  // TypeScript knows this is ProductSchema type
  console.log(result.validated.productId);
}

Schema Drift Alerts

When your baseline schema pass rate is above 95% and validation starts failing, we flag this as a schema_violation anomaly. This catches cases where model updates or prompt changes cause structural regressions.

Output Integrity

Beyond schema validation, we track multiple signals that indicate potential model misbehavior.

Refusal Detection

Detects when models refuse to respond with patterns like "I cannot", "I'm not able to", etc.

Empty Output

Flags when outputs are empty or contain only whitespace.

JSON Parse Failures

Tracks when JSON mode responses fail to parse.

Truncation

Detects when outputs are cut off due to token limits.

Hallucination Proxy Flags

We don't claim to detect hallucinations directly, but we track proxy signals:

  • Enum out of range — When a Literal/enum field contains unexpected value
  • Numeric out of bounds — When constrained fields (0-1, positive, etc.) violate constraints
  • Schema violations — When structured output doesn't match expected format
  • Output pattern shift — When output hash distribution changes significantly

Change Detection

Track when your prompts, system prompts, or tool schemas change to correlate with behavior shifts.

Change Context Hashes

The SDK automatically computes and tracks hashes for:

HashWhat It Tracks
prompt_hashHash of the prompt template (user message)
system_prompt_hashHash of the system prompt
tool_schema_hashHash of function/tool definitions
output_hashHash of the output for deduplication

Correlating Changes with Drift

When we detect anomalies, we check if any of these hashes changed recently. This helps you identify whether drift is caused by your changes or model-side updates.

LLM Alerts

Configure alerts in your Dashboard to get notified when prompts drift.

Latency P95 Spike

Alert when prompt latency exceeds baseline p95

Schema Validation Drop

Alert when schema pass rate drops below threshold

Empty Output Spike

Alert when empty output rate increases

Refusal Rate Increase

Alert when model refusal rate spikes

Error Rate Threshold

Alert when API error rate exceeds limit

Cost Anomaly

Alert when prompt costs spike unexpectedly

Notification Channels

Alerts can be sent via Email, Slack, or Webhooks. Configure your preferred channels in the Dashboard settings.

Rate Limits

All API endpoints are protected with rate limiting to ensure fair usage.

Rate Limits by Endpoint

EndpointLimitWindow
/api/v1/prompt1000 requests1 minute
/api/v1/heartbeat60 requests1 minute
/api/v1/monitor/events100 requests1 minute

Payload Size Limits

EndpointMax Payload
/api/v1/prompt50 KB
/api/v1/heartbeat10 KB

Rate Limit Response

{
  "error": "Rate limit exceeded",
  "retryAfter": 60
}

Pipeline Heartbeat

For non-LLM batch jobs and scheduled tasks, use pipeline heartbeats to track job health.

Basic Usage

from deadpipe import Deadpipe

dp = Deadpipe()

@dp.heartbeat("daily-etl")
def run_etl():
    process_data()
    return {"records_processed": 1500}

run_etl()  # Sends heartbeat on success/failure

cURL

curl -X POST https://deadpipe.com/api/v1/heartbeat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dp_your_api_key" \
  -d '{"pipeline_id": "my-job", "status": "success"}'

Auto-Creation

Pipelines are automatically created on the first heartbeat if they don't exist. Customize settings in the Dashboard.

Ready to monitor your prompts?

Create your free account and start detecting drift in under 5 minutes.