Persistence¶

ParaLLeM saves every LLM response to a local datastore keyed by a hash of the request content. On subsequent runs, a matching hash returns the cached response instantly — no API call is made.

How caching works¶

Every call to ask_llm computes a SHA-256 hash of:

The system prompt (instructions)
All input documents (strings, images, function call outputs, …)
Any additional salt terms (see below)

If parallem has already seen your hash, then the previous value is returned immediately. Otherwise, a request is sent to the provider and stored.

with pllm.resume_directory(".pllm/myproject", provider="openai", strategy="sync") as orch:
    with orch.agent() as agt:
        resp = agt.ask_llm("Name a prime number.")
        agt.print(resp.final_answer)  # live on first run, instant on subsequent runs

Hashing¶

What is and isn't hashed¶

By default, only the message content is hashed — including instructions (system prompt) and all input documents. Config that is not included in the hash:

Model name / LLM identity
Tool definitions
Provider type

This means that if you change the model but keep the same prompt, the cached response from the old model is returned. Use hash_by or salt to avoid this.

`hash_by`¶

hash_by is a list of named terms to fold into the hash. Currently the only supported value is "llm", which appends the model identity string before hashing.

agt.ask_llm("Name a prime.", hash_by=["llm"])

Now switching from gpt-4o to gpt-4o-mini produces a different hash and a separate cache entry.

`salt`¶

salt is a free-form string that can distinguish otherwise identical content. For example, if you are not happy with the cached result, pass a salt parameter to bypass that previous result.

agt.ask_llm("Name a prime.")
agt.ask_llm("Name a prime.", salt=1)  # Will not collide

Use salt when you want to force a fresh response without clearing the whole cache — for example when tool definitions change (which are not hashed):

agt.ask_llm(prompt, tools=my_tools, salt="tools-v2")

`ignore_cache` and `rewrite_cache`¶

Parameter	Effect
`ignore_cache=True`	Always call the provider, ignoring any stored response.
`rewrite_cache=True`	Always call the provider and overwrite the stored response with the new one.

Both are set on resume_directory:

pllm.resume_directory(".pllm/myproject", provider="openai", ignore_cache=True)