PromptCacheAI: Your LLM’s Memory Layer
Cache prompts and responses. Avoid duplicate LLM calls. Save time, tokens, and money — all while speeding up your AI app.
How it works
A simple, explicit flow that gives you full control — and keeps you provider-agnostic.
1) Check the cache first
Send your prompt to /chat. If we’ve seen it, you get the saved response instantly.
2) No hit? Call your LLM
Keep your provider keys and features (streaming, retries, custom params). PromptCacheAI is provider-agnostic —
- • No lock-in — switch models anytime
- • Full observability of your outbound requests
- • Apply your own safety/PII filters before saving
3) Save the response
Store it with /cache/save using the provided prompt_hash. Next time it’s faster and cheaper, with your configured TTL.
- • Per-namespace TTL controls
- • Per-user isolation & multi-tenant safety
- • Usage metrics and hit-rate dashboards
calls on a miss — still simple
// 1) Check cache
const { cached, response, prompt_hash } = await pc.fetch('/chat', { prompt, namespace, provider, model });
// 2) If miss, call your LLM (keep your API keys, streaming, and retries)
const live = cached ? response : await llm.generate(prompt);
// 3) Save back to cache (only when miss)
if (!cached) await pc.fetch('/cache/save', { prompt_hash, namespace, response: live });
return live;Why this is better: you keep control of provider features, secrets, and costs — PromptCacheAI adds a fast, similarity-aware memory layer on top.
Provider-agnostic compatibility
Works with all major LLMs — OpenAI, Claude, Gemini, custom LLMs, and more.
Build and test your AI application without burning tokens
PromptCacheAI lets you replay real LLM responses during development. Once a prompt is cached, you can iterate on UI, logic, and flows without making repeated calls to your AI provider.
- • Test UI and workflows using cached responses
- • Run demos without live LLM API keys once responses are cached
- • Avoid surprise token bills during development
- • Safely load-test your app using cached results
Why developers choose PromptCacheAI
- ⚡ Faster apps: Cut response times for repeated calls.
- 💸 Lower costs: Avoid paying for duplicate requests.
- 🧠 Similarity search: Serve near-duplicate prompts from cache.
- ⏱️ Configurable TTLs: Control cache lifetime per namespace.
- 🔒 Per-user isolation: Namespaces are scoped for safety.
- 🧩 Provider-agnostic: Works with all major LLMs and more.
- 🧪 Dev & test friendly: Reuse cached responses to test flows without LLM costs.
Example
curl https://api.prompt-cache.ai/v1/chat \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"namespace": "support-bot",
"provider": "openai",
"model": "gpt-4o",
"prompt": "How do I reset my password?"
}'Tip: Set a namespace TTL to control how long responses stay hot.
Stop paying twice for the same LLM call.
Join developers saving tokens and speeding up applications with PromptCacheAI.