How PromptCacheAI works

Understand the PromptCacheAI workflow

These concepts apply whether you integrate with the SDK or REST API. Start with one namespace in test mode, review what would be reused, then switch to live when the behavior is trusted.

Namespaces

A namespace is a separate cache boundary. Entries in one namespace are not visible to another namespace, including exact matches, semantic matching, prompt variants, and TTL behavior.

Use one namespace for one workflow, tenant, environment, or app.
Use test mode to evaluate a workflow before serving cached responses.
Use live mode when cached responses are allowed to be returned.

Example: support-bot-prod and support-bot-dev never share cached responses.

Similarity matching

If PromptCacheAI does not find an exact match in a namespace, it looks for a semantically similar prompt. If the saved response safely covers the new prompt, the existing cached answer can be reused in live mode.

Similarity matching is based on the prompt itself. Model, provider, and temperature do not create separate similarity spaces. If model-specific behavior matters, use separate namespaces.

Exact:
"What is the capital of France?"  -> Exact hit

Similar wording:
"capital of france"               -> Similarity hit
"What city is France's capital?"  -> Similarity hit

Different namespace:
"What is the capital of France?"  -> Not shared across namespaces

Semantic caching works best for repeated prompts with reusable answers. For personalized prompts or live user data, such as shipping status, billing status, account records, or order details, use a live model or source-system call instead.

Semantic validation

PromptCacheAI uses two similarity tiers. High-confidence matches can be served directly in live mode. Mid-confidence matches may be sent to a validator that checks whether the cached response is appropriate for the new prompt.

If validator capacity is exhausted or the validator rejects the match, PromptCacheAI treats the request as a miss instead of serving an unvalidated cached response. This favors correctness for uncertain matches.

Prompt variants

A prompt variant is a similar prompt that PromptCacheAI matched to an existing cached response. Variants help you see how users rephrase the same request and decide whether that wording should reuse the cached answer.

Approved variants can reuse the cached response in live mode.
Pending variants are not served automatically until reviewed.
Rejected variants behave as misses and should call the model.

Manual review takes precedence over automatic similarity or validator decisions for that variant-to-response relationship.

TTL strategy

Each namespace has one TTL value applied to every cached entry. Configure it in Settings - Cache TTL.

When a cache entry expires, PromptCacheAI keeps the prompt fingerprint for lookup history but treats the entry as a miss. Your app calls the model again and saves a fresh response.

Workload

Example TTL

Stable FAQ answers

Days or weeks

Docs or RAG answers

Hours or days, based on update frequency

QA and demo prompts

Longer TTLs

Personalized or live user data

Usually do not cache

Dashboard and cache insight

The dashboard helps confirm your integration is working. You can track total requests, hit rate, exact hits, similarity hits, test-mode would-hits, estimated savings, and cache entries by namespace and date range.

Search prompt and response text to see what your AI app is being asked repeatedly. Open cache entries to inspect or edit reusable answers, and review prompt variants tied to those cached responses.

For a deeper product walkthrough, read the LLM cache dashboard guide.