See when your AI app
breaks itself.
Your chatbot loops. Your agent burns through tokens. Your RAG bot makes things up. Whoopsie catches it live and shows you what happened.
Catches what's going wrong
No code, no terminal
PII scrubbed before bytes leave your app
standard mode ships prompt, completion, tool args, and model reasoning text — with emails, phones, SSNs, card numbers, JWTs, and provider API keys replaced in the SDK before egress. Switch to metadata-only if you can't send any text off-machine. What we store →How it works
- 01
Pick where you build
Lovable, Replit, Bolt, v0 — all supported. We give you a prompt tailored to that tool.
- 02
Paste the prompt
Open your AI builder's chat. Paste. Send. The AI installs whoopsie and wires it up. Takes about a minute.
- 03
Watch your live dashboard
The first time someone uses your app, every chat call shows up. Failures get a red tag with what went wrong, in plain English.
Failures we've caught before
AI failures don't throw exceptions. They return a response that looks fine — until your OpenAI bill arrives or a screenshot shows up on Twitter. Six ways your app can break silently, and the detector that catches each one before your users do.
- cost
Your Lovable app went viral overnight. The OpenAI bill was $400 by morning.
An agent quietly looped on a tool call, 9k tokens per turn, for twelve hours. The cost detector flags the first call that crosses $0.50 or 8k tokens — you’d have seen it before the bill arrived.
- hallucination
Customer-support bot invented a product feature that doesn’t exist.
Your RAG retrieved nothing useful, so the bot made something up to sound helpful. The hallucination detector compares response claims against the Sources block in the prompt and flags the gap.
- loop
Web-search agent kept calling search → search → search and never answered.
Six identical tool calls in a row. The loop detector flags tool repetition, low tool diversity, and A→B→A→B cycles before your user gives up and refreshes.
- repetition
Chatbot kept ending every turn with “Is there anything else?” even after the user said no.
Same line three times in five turns. The repetition detector catches line-level and n-gram repeats in the completion text.
- context
RAG bot ignored the user’s “vegetarian only” filter and recommended chicken parm.
Response used zero key tokens from the user’s context block. The context detector flags it before the user notices and DMs you.
- completion
Your summarizer stopped at the 4k token cap mid-sentence and your users never saw the end.
Finish reason was length, not stop. The completion detector catches premature stops on questions and runaway 4k+ token outputs.
A seventh detector — derailment — catches when an agent's tool sequence doesn't match the task it was given. All seven ship in v1, run locally on every trace, and add zero per-call cost.
For developers (the actual code)▸
import { streamText, convertToModelMessages } from "ai";
import { openai } from "@ai-sdk/openai";
import { observe } from "@whoopsie/sdk";
export async function POST(req: Request) {
const { messages } = await req.json();
const modelMessages = await convertToModelMessages(messages);
const result = streamText({
model: observe(openai("gpt-4o"), { redact: "standard" }),
messages: modelMessages,
});
return result.toUIMessageStreamResponse();
}That's the entire wrap. Vercel AI SDK v6, runs in any Next.js app (or anywhere you call streamText). If you're comfortable in a terminal, npx @whoopsie/cli init does the wrap for you.