LLM Observability Docs

Everything you need to monitor your LLM prompts, detect drift, and prevent regressions in production.

Overview

Deadpipe provides LLM observability that answers one question: "Is this prompt still behaving safely?"

The Core Problem

LLMs are non-deterministic. The same prompt can produce different outputs, and model updates can silently break your application. You need baselines to detect when behavior shifts.

• You cannot detect regression without a baseline
• You cannot alert without stable fingerprints
• You cannot audit without provenance

Automatic Baselines

We compute rolling baselines for every prompt: latency p95, token distributions, schema pass rates. No configuration required.

Drift Detection

Get alerted when latency spikes, token counts shift, schema validation drops, or output patterns change unexpectedly.

Schema Validation

Pass your Pydantic model and we validate every LLM output, tracking pass rates and detecting regressions.

Hallucination Proxies

Track refusals, empty outputs, JSON parse failures, and enum violations as early indicators of model misbehavior.

Quick Start

Get LLM observability running in under 5 minutes with zero-config instrumentation.

1. Install the SDK

Python

pip install deadpipe

Node.js

npm install deadpipe

2. Set your API key

export DEADPIPE_API_KEY="dp_your_api_key"

3. Wrap your client (zero code changes!)

The universal wrap() function auto-detects your provider:

Python

from deadpipe import wrap
from openai import OpenAI
from anthropic import Anthropic

# Universal wrap() - wrap once with app context
openai = wrap(OpenAI(), app="my_app")
anthropic = wrap(Anthropic(), app="my_app")

# Pass prompt_id per call to identify each prompt
response = openai.chat.completions.create(
    prompt_id="checkout_agent",
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Process refund"}]
)
# → Automatically captures everything: latency, tokens, cost, schema validation, etc.

Node.js

import { wrap } from 'deadpipe';
import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';

// Universal wrap() - wrap once with app context
const openai = wrap(new OpenAI(), { app: 'my_app' });
const anthropic = wrap(new Anthropic(), { app: 'my_app' });

// Pass promptId per call to identify each prompt
const response = await openai.chat.completions.create({
  promptId: 'checkout_agent',
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Process refund' }]
});
// → Automatically captures everything: latency, tokens, cost, schema validation, etc.

Alternative: Manual tracking with schema validation

Python

from deadpipe import wrap_openai
from pydantic import BaseModel
from openai import OpenAI

class RefundResponse(BaseModel):
    order_id: str
    amount: float
    status: str

# Wrap client once
client = wrap_openai(OpenAI(), app="my_app")

# Pass schema per-call (each prompt can have its own schema)
response = client.chat.completions.create(
    prompt_id="checkout_agent",
    schema=RefundResponse,  # Per-call schema validation
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Process refund"}],
    response_format={"type": "json_object"}
)
# → Schema validation + all automatic tracking

Node.js

import { wrapOpenAI } from 'deadpipe';
import { z } from 'zod';
import OpenAI from 'openai';

const RefundSchema = z.object({
  orderId: z.string(),
  amount: z.number(),
  status: z.string()
});

// Create validator helper
const zodValidator = (schema: z.ZodSchema) => ({
  validate: (data: unknown) => {
    const result = schema.safeParse(data);
    return {
      success: result.success,
      data: result.success ? result.data : undefined,
      errors: result.success ? undefined : result.error.errors.map(e => e.message)
    };
  }
});

// Wrap client once
const client = wrapOpenAI(new OpenAI(), { app: 'my_app' });

// Pass schema per-call (each prompt can have its own schema)
const response = await client.chat.completions.create({
  promptId: 'checkout_agent',
  schema: zodValidator(RefundSchema),  // Per-call schema validation
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Process refund' }],
  response_format: { type: 'json_object' }
});
// → Schema validation + all automatic tracking

What Gets Captured Automatically

Performance

• Request/response latency
• Time to first token (streaming)
• Input/output tokens
• Estimated cost (OpenAI, Anthropic, Gemini, Mistral, Cohere)
• Model and provider (auto-detected)

Quality & Safety

• Schema validation results
• JSON parse success/failure
• Empty output detection
• Refusal detection
• Enum/bounds constraint checking

Change Tracking

• Prompt hash (message content)
• Tool schema hash
• System prompt hash
• Output hash for deduplication
• Version/git hash

Python SDK

The Python SDK provides multiple ways to track LLM calls: wrap() for universal zero-config instrumentation, provider-specific wrappers like wrap_openai() and wrap_anthropic(), and track() context manager for manual tracking.

Installation

pip install deadpipe

Recommended: Universal wrap() Function

The universal wrap() function auto-detects your provider and wraps appropriately:

from deadpipe import wrap
from openai import OpenAI
from anthropic import Anthropic

# Universal wrap() - wrap once with app context
openai_client = wrap(OpenAI(), app="my_app")
anthropic_client = wrap(Anthropic(), app="my_app")

# Pass prompt_id per call to identify each prompt
response = openai_client.chat.completions.create(
    prompt_id="checkout_agent",
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Process refund for order 1938"}]
)
# → Automatically captures: latency, tokens, cost, schema validation, etc.

Supported Providers

Provider-specific wrappers are available for explicit control:

Provider	Wrapper Function	Client
OpenAI	`wrap_openai()`	OpenAI()
Anthropic	`wrap_anthropic()`	Anthropic()
Google AI (Gemini)	`wrap_google_ai()`	genai.GenerativeModel()
Mistral	`wrap_mistral()`	MistralClient()
Cohere	`wrap_cohere()`	cohere.Client()

Advanced: Manual Tracking with track()

For streaming, custom logic, or non-OpenAI providers:

from deadpipe import track
from openai import OpenAI

client = OpenAI()

# Store params in variable for context capture
params = {
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
}

with track(prompt_id="my_agent") as t:
    response = client.chat.completions.create(**params)
    # Pass input params to capture full context (messages, tools, system prompt)
    t.record(response, input=params)

With Pydantic Schema Validation

Pass a Pydantic model to validate every LLM output and track schema pass rates over time.

from deadpipe import wrap_openai
from pydantic import BaseModel, Field
from typing import Literal
from openai import OpenAI

class ProductRecommendation(BaseModel):
    product_id: str
    confidence: float = Field(ge=0, le=1)
    reasoning: str
    category: Literal["electronics", "clothing", "home"]
    alternatives: list[str] = []

# Schema validation with wrapper
client = wrap_openai(OpenAI(), app="my_app")

response = client.chat.completions.create(
    prompt_id="recommender",
    schema=ProductRecommendation,  # Per-call schema validation
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Recommend a laptop"}],
    response_format={"type": "json_object"}
)
# → Automatically validates output against ProductRecommendation
# → Tracks schema pass rates over time
# → Alerts when validation rates drop

With Enum and Numeric Bounds

Add additional validation rules for hallucination detection:

from deadpipe import track

with track(
    prompt_id="pricing_agent",
    enum_fields={
        "currency": ["USD", "EUR", "GBP"],
        "tier": ["free", "pro", "enterprise"]
    },
    numeric_bounds={
        "price": (0, 10000),      # Must be between 0-10000
        "quantity": (1, 100)      # Must be between 1-100
    }
) as t:
    response = client.chat.completions.create(...)
    t.record(response)
    # → Automatically flags enum_out_of_range and numeric_out_of_bounds

Streaming Support

from deadpipe import track

with track(prompt_id="streaming_agent") as t:
    stream = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Write a story"}],
        stream=True
    )

    chunks = []
    for chunk in stream:
        if chunk.choices[0].delta.content:
            t.mark_first_token()  # Call once when first content arrives
            chunks.append(chunk.choices[0].delta.content)
            print(chunk.choices[0].delta.content, end="")

    # Record with the stream object - captures timing
    t.record(stream)

Anthropic Claude Support

Use the universal wrap() or provider-specific wrap_anthropic():

from deadpipe import wrap, wrap_anthropic
from anthropic import Anthropic

# Option 1: Universal wrap (recommended)
client = wrap(Anthropic(), app="my_app")

# Option 2: Provider-specific wrapper
client = wrap_anthropic(Anthropic(), app="my_app")

# All calls automatically tracked - pass prompt_id per call
response = client.messages.create(
    prompt_id="claude_agent",
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude!"}]
)
# → Provider auto-detected, cost estimated automatically

Retry Tracking

with track(prompt_id="retrying_agent") as t:
    for attempt in range(3):
        try:
            t.mark_retry()  # Call before each retry
            response = client.chat.completions.create(...)
            t.record(response)
            break
        except Exception as e:
            if attempt == 2:
                raise

Configuration Options

Parameter	Type	Description
`prompt_id`*	`string`	Unique identifier for this prompt type
`schema`	`BaseModel`	Pydantic model for output validation
`enum_fields`	`Dict[str, List]`	Enum field constraints for hallucination detection
`numeric_bounds`	`Dict[str, Tuple]`	Numeric bounds for constraint validation
`app_id`	`string`	Application identifier for grouping
`environment`	`string`	Environment (production, staging, etc.)
`version`	`string`	Version or git hash for change tracking
`api_key`	`string`	Override DEADPIPE_API_KEY env var
`base_url`	`string`	Override Deadpipe API URL

Environment Variables

# Required
export DEADPIPE_API_KEY="dp_your_api_key"

# Optional
export DEADPIPE_APP_ID="my-app"
export DEADPIPE_ENVIRONMENT="production"
export DEADPIPE_VERSION="v1.2.3"  # or GIT_COMMIT

What Gets Captured Automatically

Performance

• Request/response latency
• Time to first token (streaming)
• Input/output tokens
• Estimated cost (OpenAI, Anthropic, Gemini, Mistral, Cohere)
• Model and provider (auto-detected)

Quality

• Schema validation results
• JSON parse success/failure
• Empty output detection
• Refusal detection
• Output hash for deduplication

Change Tracking

• Prompt hash (from messages)
• Tool schema hash
• System prompt hash
• Version/git hash

Safety

• Enum out-of-range detection
• Numeric bounds checking
• HTTP status codes
• Provider error codes
• Retry count tracking

Node.js SDK

The Node.js SDK provides wrap() for universal zero-config instrumentation, provider-specific wrappers like wrapOpenAI() and wrapAnthropic(), and track() for advanced manual tracking. Full TypeScript support included.

Installation

npm install deadpipe
# or
yarn add deadpipe
# or
pnpm add deadpipe

Recommended: Universal wrap() Function

The universal wrap() function auto-detects your provider and wraps appropriately:

import { wrap } from 'deadpipe';
import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';

// Universal wrap() - wrap once with app context
const openai = wrap(new OpenAI(), { app: 'my_app' });
const anthropic = wrap(new Anthropic(), { app: 'my_app' });

// Pass promptId per call to identify each prompt
const response = await openai.chat.completions.create({
  promptId: 'checkout_agent',
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Process refund for order 1938' }]
});
// → Automatically captures: latency, tokens, cost, schema validation, etc.

Supported Providers

Provider-specific wrappers are available for explicit control:

Provider	Wrapper Function	Client
OpenAI	`wrapOpenAI()`	new OpenAI()
Anthropic	`wrapAnthropic()`	new Anthropic()
Google AI (Gemini)	`wrapGoogleAI()`	GoogleGenerativeAI()
Mistral	`wrapMistral()`	new MistralClient()
Cohere	`wrapCohere()`	new CohereClient()

Advanced: Manual Tracking with track()

For streaming, custom logic, or non-OpenAI providers:

import { track } from 'deadpipe';
import OpenAI from 'openai';

const client = new OpenAI();

// Store params in variable for context capture
const params = {
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hello!' }]
};

const response = await track('my_agent', async (t) => {
  const response = await client.chat.completions.create(params);
  // Pass input params to capture full context (messages, tools, system prompt)
  t.record(response, undefined, params);
  return response;
});

With Zod Schema Validation

Create a schema validator for output validation:

import { track, SchemaValidator } from 'deadpipe';
import { z } from 'zod';

const ProductSchema = z.object({
  productId: z.string(),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
  category: z.enum(['electronics', 'clothing', 'home']),
  alternatives: z.array(z.string()).default([])
});

// Create Zod adapter for Deadpipe
const zodValidator: SchemaValidator = {
  validate: (data) => {
    const result = ProductSchema.safeParse(data);
    return {
      success: result.success,
      data: result.success ? result.data : undefined,
      errors: result.success ? undefined : result.error.errors.map(e => e.message)
    };
  }
};

const result = await track('recommender', async (t) => {
  const response = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: 'Recommend a laptop' }],
    response_format: { type: 'json_object' }
  });
  return t.record(response);
}, { schema: zodValidator });

// result is typed as ProductSchema | null
if (result) {
  console.log(result.productId); // TypeScript knows this is ProductSchema
}

With Enum and Numeric Bounds

Add additional validation rules for hallucination detection:

const result = await track('pricing_agent', async (t) => {
  const response = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: 'Set pricing' }]
  });
  return t.record(response);
}, {
  enumFields: {
    currency: ['USD', 'EUR', 'GBP'],
    tier: ['free', 'pro', 'enterprise']
  },
  numericBounds: {
    price: [0, 10000],      // Must be between 0-10000
    quantity: [1, 100]      // Must be between 1-100
  }
});
// → Automatically flags enum_out_of_range and numeric_out_of_bounds

Streaming Support

const params = {
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true,
};

const response = await track('streaming_agent', async (t) => {
  const stream = await client.chat.completions.create(params);

  let fullContent = '';
  for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
      t.markFirstToken(); // Call once on first token
      fullContent += chunk.choices[0].delta.content;
    }
  }

  // Record manually for streams - pass input params to capture context
  t.record({
    model: 'gpt-4',
    choices: [{ message: { content: fullContent } }],
    usage: { prompt_tokens: 10, completion_tokens: 100, total_tokens: 110 }
  }, undefined, params);

  return fullContent;
});

Anthropic Claude Support

Use the universal wrap() or provider-specific wrapAnthropic():

import { wrap, wrapAnthropic } from 'deadpipe';
import Anthropic from '@anthropic-ai/sdk';

// Option 1: Universal wrap (recommended)
const client = wrap(new Anthropic(), { app: 'my_app' });

// Option 2: Provider-specific wrapper
const client = wrapAnthropic(new Anthropic(), { app: 'my_app' });

// All calls automatically tracked - pass promptId per call
const response = await client.messages.create({
  promptId: 'claude_agent',
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello, Claude!' }]
});
// → Provider auto-detected, cost estimated automatically

Retry Tracking

const response = await track('retrying_agent', async (t) => {
  for (let attempt = 0; attempt < 3; attempt++) {
    try {
      t.markRetry(); // Call before each retry
      const response = await client.chat.completions.create({...});
      t.record(response);
      return response;
    } catch (error) {
      if (attempt === 2) throw error;
    }
  }
});

Framework Integration

Next.js API Routes

export async function POST(request: Request) {
  const { prompt } = await request.json();

  const response = await track('api_handler', async (t) => {
    const completion = await client.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: prompt }]
    });
    t.record(completion);
    return completion;
  });

  return Response.json({
    result: response.choices[0].message.content
  });
}

Express.js

app.post('/generate', async (req, res) => {
  const response = await track('express_endpoint', async (t) => {
    const completion = await client.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: req.body.messages
    });
    t.record(completion);
    return completion;
  });

  res.json(response);
});

Configuration Options

Option	Type	Description
`promptId`*	`string`	Unique identifier for this prompt type
`apiKey`	`string`	Override DEADPIPE_API_KEY env var
`baseUrl`	`string`	Override Deadpipe API URL
`appId`	`string`	Application ID for grouping
`environment`	`string`	Environment (production, staging)
`version`	`string`	Version or git hash
`schema`	`SchemaValidator`	Schema validator for output validation
`enumFields`	`Record<string, any[]>`	Enum field constraints
`numericBounds`	`Record<string, [number\|null, number\|null]>`	Numeric bounds for validation

Environment Variables

# Required
export DEADPIPE_API_KEY="dp_your_api_key"

# Optional
export DEADPIPE_APP_ID="my-app"
export DEADPIPE_ENVIRONMENT="production"
export DEADPIPE_VERSION="1.2.3"
export GIT_COMMIT="abc123"  # Fallback for version

Tracker Methods

The tracker object provides these methods:

await track('my-prompt', async (t) => {
  // Mark when first token arrives (for streaming)
  t.markFirstToken();

  // Mark retry attempts
  t.markRetry();

  // Record the response (required)
  t.record(response, parsedOutput?, input?);

  // Or record an error manually
  t.recordError(error);

  // Check if already recorded
  if (!t.isRecorded()) { /* ... */ }

  // Get current telemetry
  const telemetry = t.getTelemetry();
});

What Gets Captured Automatically

Performance

• Request/response latency
• Time to first token (streaming)
• Input/output tokens
• Estimated cost (OpenAI, Anthropic, Gemini, Mistral, Cohere)
• Model and provider (auto-detected)

Quality

• Schema validation results
• JSON parse success/failure
• Empty output detection
• Refusal detection
• Output hash for deduplication

Change Tracking

• Prompt hash (from messages)
• Tool schema hash
• System prompt hash
• Version/git hash

Safety

• Enum out-of-range detection
• Numeric bounds checking
• HTTP status codes
• Provider error codes
• Retry count tracking

Input Context Capture

Deadpipe automatically captures your input context (messages, tools, system prompts) to track when your prompts change and correlate changes with behavior shifts.

Why Input Context Matters

Without capturing input context, you can't detect when prompt changes cause drift. Deadpipe automatically extracts and hashes:

• Prompt hash - Hash of all messages to detect prompt template changes
• Tool schema hash - Hash of function/tool definitions for tool-calling prompts
• System prompt hash - Hash of system instructions for change tracking
• Input previews - Last user message for dashboard inspection

Zero-Config with wrap()

When using the universal wrapper, context is automatically extracted from every API call:

Python

from deadpipe import wrap
from openai import OpenAI

client = wrap(OpenAI(), app="my_app")

# Context automatically extracted - pass prompt_id per call
response = client.chat.completions.create(
    prompt_id="my_agent",
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are helpful"},
        {"role": "user", "content": "Hello!"}
    ],
    tools=[{"type": "function", "function": {...}}]
)
# → prompt_hash, tool_schema_hash, system_prompt_hash auto-captured

Node.js

import { wrap } from 'deadpipe';
import OpenAI from 'openai';

const client = wrap(new OpenAI(), { app: 'my_app' });

// Context automatically extracted - pass promptId per call
const response = await client.chat.completions.create({
  promptId: 'my_agent',
  model: 'gpt-4',
  messages: [
    { role: 'system', content: 'You are helpful' },
    { role: 'user', content: 'Hello!' }
  ],
  tools: [{ type: 'function', function: {...} }]
});
// → prompt_hash, tool_schema_hash, system_prompt_hash auto-captured

Manual Context Capture

When using track() directly, pass input parameters to record():

Python

from deadpipe import track

# Store params in variable
params = {
    "model": "gpt-4",
    "messages": [
        {"role": "system", "content": "You are helpful"},
        {"role": "user", "content": "Hello!"}
    ],
    "tools": [{"type": "function", "function": {...}}]
}

with track(prompt_id="my_agent") as t:
    response = client.chat.completions.create(**params)
    # Pass params to capture context
    t.record(response, input=params)

Node.js

import { track } from 'deadpipe';

const params = {
  model: 'gpt-4',
  messages: [
    { role: 'system', content: 'You are helpful' },
    { role: 'user', content: 'Hello!' }
  ],
  tools: [{ type: 'function', function: {...} }]
};

const response = await track('my_agent', async (t) => {
  const response = await client.chat.completions.create(params);
  // Pass params to capture context
  t.record(response, undefined, params);
  return response;
});

Best Practice: Always Pass Input Parameters

❌ Bad - Missing Context

# Don't do this - no input context captured
response = client.chat.completions.create({
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
})
t.record(response)

✅ Good - Context Captured

# Always pass input params
params = {
    "model": "gpt-4", 
    "messages": [{"role": "user", "content": "Hello!"}]
}
response = client.chat.completions.create(**params)
t.record(response, input=params)

What Gets Hashed

Hash Type	What It Tracks	Use Case
`prompt_hash`	All messages in conversation	Detect prompt template changes
`system_prompt_hash`	System message content	Track system instruction changes
`tool_schema_hash`	Function/tool definitions	Detect tool API changes
`output_hash`	Model output content	Detect output pattern changes

Change Correlation

When Deadpipe detects anomalies, it checks if any of these hashes changed recently. This helps you distinguish between model-side drift and your own code changes.

Fail-Safe Design

Deadpipe is designed to never break your LLM calls. All telemetry is sent asynchronously and failures are silently ignored.

Zero Impact on Performance

Asynchronous Telemetry

• SDK never awaits telemetry sends
• Fire-and-forget HTTP requests
• No blocking on your LLM calls
• Background threads (Python) or fetch (Node.js)

Fail-Safe Error Handling

• All exceptions are caught and ignored
• Deadpipe downtime doesn't affect you
• Network failures are silent
• Your app continues working normally

Automatic Features

The SDK automatically handles complexity so you don't have to:

Provider Auto-Detection

# No need to specify provider
with track(prompt_id="my_prompt") as t:
    # Works with OpenAI, Anthropic, etc.
    response = client.chat.completions.create(...)
    t.record(response)  # Provider auto-detected

Response Parsing

# Handles all response formats
t.record(openai_response)    # OpenAI format
t.record(anthropic_response) # Anthropic format
t.record(custom_response)    # Best effort parsing

Cost Estimation

# Automatic cost calculation
# Supports OpenAI, Anthropic, Gemini, Mistral, Cohere
t.record(response)  # Cost auto-estimated

Context Extraction

# With universal wrap()
client = wrap(OpenAI(), app="my_app")
response = client.chat.completions.create(
    prompt_id="agent",
    messages=[...],
    tools=[...]
)  # Context auto-extracted

Fail-Safe Example

from deadpipe import wrap
from openai import OpenAI

# If Deadpipe is down, your code continues working
client = wrap(OpenAI(), app="my_app")

try:
    response = client.chat.completions.create(prompt_id="agent", ...)
    # Works normally even if telemetry fails
except Exception as e:
    # Only your LLM error, never Deadpipe
    pass

What Happens When Deadpipe is Down

✅ Your App Continues

• LLM calls work normally
• No exceptions thrown
• No performance impact
• Silent telemetry failure

❌ What Doesn't Happen

• No blocking or delays
• No error propagation
• No failed LLM requests
• No broken user flows

Payload Optimization

Telemetry payloads are automatically optimized to reduce bandwidth:

Filtered Out

• undefined and null values
• Empty strings and arrays
• false booleans (keeps true)
• Empty objects

Result

• ~40-60% smaller payloads
• Faster network requests
• Reduced bandwidth costs
• Better performance on slow connections

Development vs Production

Development Mode

# Warns about missing API key
export DEADPIPE_DEBUG=1
# or
export NODE_ENV=development

Shows helpful warnings without production noise

Production Mode

# Silent operation
export DEADPIPE_API_KEY="dp_xxx"
# No warnings, no errors

Completely silent - never affects your app

Prompt Tracking API

For direct API integration without the SDK, use the prompt tracking endpoint.

POST /api/v1/prompt

Send prompt execution telemetry for tracking and baseline computation.

curl -X POST https://deadpipe.com/api/v1/prompt \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dp_your_api_key" \
  -d '{
    "prompt_id": "review-analyzer",
    "model": "gpt-4o-mini",
    "provider": "openai",
    "request_start": 1704067200000,
    "end_time": 1704067201500,
    "total_latency_ms": 1500,
    "input_tokens": 150,
    "output_tokens": 75,
    "total_tokens": 225,
    "estimated_cost": 0.00045,
    "http_status": 200,
    "output_length": 512,
    "empty_output": false,
    "json_parse_success": true,
    "schema_validation_pass": true,
    "refusal_flag": false,
    "output_hash": "abc123..."
  }'

Request Fields

Field	Type	Description
Identity
`prompt_id`*	`string`	Unique identifier for this prompt type
`model`*	`string`	Model identifier (gpt-4o-mini, claude-3-5-sonnet, etc.)
`provider`	`string`	LLM provider (openai, anthropic, google)
`app_id`	`string`	Application identifier
`environment`	`string`	Environment (production, staging)
`version`	`string`	Version or git hash
Timing
`request_start`*	`number`	Unix timestamp (ms) when request started
`first_token_time`	`number`	Unix timestamp (ms) of first token received
`end_time`*	`number`	Unix timestamp (ms) when request completed
`total_latency_ms`*	`number`	Total latency in milliseconds
Volume
`input_tokens`	`number`	Number of input tokens
`output_tokens`	`number`	Number of output tokens
`total_tokens`	`number`	Total tokens (input + output)
`estimated_cost`	`number`	Estimated cost in USD
Reliability
`http_status`	`number`	HTTP response status code
`timeout`	`boolean`	Whether the request timed out
`retry_count`	`number`	Number of retry attempts
`provider_error_code`	`string`	Provider-specific error code
Output Integrity
`output_length`	`number`	Character length of output
`empty_output`	`boolean`	Whether output was empty
`truncated`	`boolean`	Whether output was truncated
`json_parse_success`	`boolean`	Whether JSON parsing succeeded
`schema_validation_pass`	`boolean`	Whether schema validation passed
`missing_required_fields`	`string[]`	List of missing required fields
Behavioral Fingerprint
`output_hash`	`string`	SHA-256 hash of output for deduplication
`refusal_flag`	`boolean`	Whether model refused to respond
`tool_call_flag`	`boolean`	Whether response included tool calls
Change Context
`prompt_hash`	`string`	Hash of the prompt template
`tool_schema_hash`	`string`	Hash of tool/function schemas
`system_prompt_hash`	`string`	Hash of system prompt

Response

{
  "received": true,
  "prompt_id": "review-analyzer",
  "event_id": "evt_abc123",
  "baseline": {
    "latency_mean": 1200,
    "latency_p95": 2500,
    "token_mean": 200,
    "schema_pass_rate": 0.98,
    "sample_count": 1500
  },
  "anomalies": []
}

GET /api/v1/prompt?prompt_id={id}

Retrieve baselines and statistics for a specific prompt.

curl https://deadpipe.com/api/v1/prompt?prompt_id=review-analyzer \
  -H "X-API-Key: dp_your_api_key"

Baselines & Drift Detection

Deadpipe automatically computes rolling baselines for every prompt and alerts you when metrics drift beyond thresholds.

How Baselines Work

Every prompt execution updates a rolling statistical baseline
We use Welford's algorithm for efficient online mean/variance computation
Each incoming event is compared against the baseline in real-time
Anomalies are flagged when metrics exceed threshold deviations

Baseline Metrics

Latency

• latency_mean - Average latency
• latency_variance - Latency variance
• latency_p95 - 95th percentile

Tokens

• input_token_mean - Avg input tokens
• output_token_mean - Avg output tokens
• token_variance - Token variance

Reliability

• success_rate - % successful calls
• error_rate - % errors (4xx/5xx)
• timeout_rate - % timeouts

Output Quality

• schema_pass_rate - % valid outputs
• empty_rate - % empty outputs
• refusal_rate - % refusals

Drift Detection Rules

Anomalies are automatically detected when:

Anomaly Type	Trigger Condition
`latency_spike`	Latency > p95 + 2σ
`token_anomaly`	Token count deviates > 3σ from mean
`schema_violation`	Schema validation fails (when baseline pass rate > 95%)
`empty_output`	Empty output (when baseline empty rate < 5%)
`refusal`	Refusal detected (when baseline refusal rate < 5%)

Baseline Warm-Up

Drift detection activates after 50 samples to ensure statistical significance. During warm-up, we collect data without triggering anomaly alerts.

Schema Validation

Validate every LLM output against your expected structure and track validation rates over time.

Python: Pydantic Models

from pydantic import BaseModel, Field
from typing import Literal, List

class ProductRecommendation(BaseModel):
    product_id: str
    confidence: float = Field(ge=0, le=1)
    reasoning: str
    category: Literal["electronics", "clothing", "home"]
    alternatives: List[str] = []

with dp.track("recommender", schema=ProductRecommendation) as t:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Recommend..."}],
        response_format={"type": "json_object"}
    )
    result = t.record(response)
    
    # result.schema_pass tells you if validation passed
    # result.validation_error contains the error message if failed
    # result.validated contains the parsed Pydantic model if passed

TypeScript: Zod Schemas

import { z } from 'zod';

const ProductSchema = z.object({
  productId: z.string(),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
  category: z.enum(['electronics', 'clothing', 'home']),
  alternatives: z.array(z.string()).default([])
});

const result = await dp.track('recommender', {
  schema: ProductSchema,
  fn: async () => {
    return await client.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: 'Recommend...' }],
      response_format: { type: 'json_object' }
    });
  }
});

if (result.validated) {
  // TypeScript knows this is ProductSchema type
  console.log(result.validated.productId);
}

Schema Drift Alerts

When your baseline schema pass rate is above 95% and validation starts failing, we flag this as a schema_violation anomaly. This catches cases where model updates or prompt changes cause structural regressions.

Output Integrity

Beyond schema validation, we track multiple signals that indicate potential model misbehavior.

Refusal Detection

Detects when models refuse to respond with patterns like "I cannot", "I'm not able to", etc.

Empty Output

Flags when outputs are empty or contain only whitespace.

JSON Parse Failures

Tracks when JSON mode responses fail to parse.

Truncation

Detects when outputs are cut off due to token limits.

Hallucination Proxy Flags

We don't claim to detect hallucinations directly, but we track proxy signals:

• Enum out of range — When a Literal/enum field contains unexpected value
• Numeric out of bounds — When constrained fields (0-1, positive, etc.) violate constraints
• Schema violations — When structured output doesn't match expected format
• Output pattern shift — When output hash distribution changes significantly

Change Detection

Track when your prompts, system prompts, or tool schemas change to correlate with behavior shifts.

Change Context Hashes

The SDK automatically computes and tracks hashes for:

Hash	What It Tracks
`prompt_hash`	Hash of the prompt template (user message)
`system_prompt_hash`	Hash of the system prompt
`tool_schema_hash`	Hash of function/tool definitions
`output_hash`	Hash of the output for deduplication

Correlating Changes with Drift

When we detect anomalies, we check if any of these hashes changed recently. This helps you identify whether drift is caused by your changes or model-side updates.

LLM Alerts

Configure alerts in your Dashboard to get notified when prompts drift.

Latency P95 Spike

Alert when prompt latency exceeds baseline p95

Schema Validation Drop

Alert when schema pass rate drops below threshold

Empty Output Spike

Alert when empty output rate increases

Refusal Rate Increase

Alert when model refusal rate spikes

Error Rate Threshold

Alert when API error rate exceeds limit

Cost Anomaly

Alert when prompt costs spike unexpectedly

Notification Channels

Alerts can be sent via Email, Slack, or Webhooks. Configure your preferred channels in the Dashboard settings.

Rate Limits

All API endpoints are protected with rate limiting to ensure fair usage.

Rate Limits by Endpoint

Endpoint	Limit	Window
`/api/v1/prompt`	1000 requests	1 minute
`/api/v1/heartbeat`	60 requests	1 minute
`/api/v1/monitor/events`	100 requests	1 minute

Payload Size Limits

Endpoint	Max Payload
`/api/v1/prompt`	50 KB
`/api/v1/heartbeat`	10 KB

Rate Limit Response

{
  "error": "Rate limit exceeded",
  "retryAfter": 60
}

Pipeline Heartbeat

For non-LLM batch jobs and scheduled tasks, use pipeline heartbeats to track job health.

Basic Usage

from deadpipe import Deadpipe

dp = Deadpipe()

@dp.heartbeat("daily-etl")
def run_etl():
    process_data()
    return {"records_processed": 1500}

run_etl()  # Sends heartbeat on success/failure

cURL

curl -X POST https://deadpipe.com/api/v1/heartbeat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dp_your_api_key" \
  -d '{"pipeline_id": "my-job", "status": "success"}'

Auto-Creation

Pipelines are automatically created on the first heartbeat if they don't exist. Customize settings in the Dashboard.

Ready to monitor your prompts?

Create your free account and start detecting drift in under 5 minutes.

Go to Dashboard Create Account