Ask API¶
We believe that an agent is a program, not an LLM. While that program often uses LLMs to automate its decision making, agents can also ask functions and humans. As a result, we can unify ask_llm, ask_functions, and ask_human into a common interface!
ask_llm¶
Send a prompt (or full conversation) to an LLM and receive a lazy LLMResponse.
resp = agt.ask_llm("What is the capital of France?")
print(resp.final_answer)
The response is lazy-loaded — it is not resolved until you call .final_answer, .resolve(), or iterate over function calls. This lets the runtime batch and cache calls efficiently.
Signature¶
ask_llm(
documents, # str | image | LLMResponse | MessageState | list thereof
*additional_documents,
instructions=None, # system prompt
llm=None, # override model, e.g. "gpt-4o"
salt=None, # manual hash differentiator
hash_by=None, # e.g. ["llm"] to include model name in hash
structured_output=None, # Pydantic model for structured output
tools=None, # list of tool dicts or ServerTool objects
tag=None, # optional label for dashboards
save_input=None, # whether to persist input documents
)
Structured output¶
Pass a Pydantic model to structured_output to get back a validated object:
from pydantic import BaseModel
class Answer(BaseModel):
capital: str
resp = agt.ask_llm("What is the capital of France?", structured_output=Answer)
print(resp.final_answer) # {"capital":"Paris"}
Tool use¶
import logging
from dotenv import load_dotenv
import parallem as pllm
load_dotenv()
def multiply(a: int, b: int) -> int:
"""Calculates a times b."""
return a * b
def add(a: int, b: int) -> int:
"""Calculates a plus b."""
return a + b
def divide(a: int, b: int) -> float:
"""Calculates a divided by b."""
return a / b
with pllm.resume_directory(
".pllm/simplest-tool",
provider="google",
strategy="sync",
log_level=logging.DEBUG,
dashboard=True,
hash_by=["llm"],
# ignore_cache=True,
) as orch:
with orch.agent() as agt:
# See docs on the MessageState abstraction.
conv = agt.get_msg_state()
last_msg = conv.ask_llm(
"Add 3 and 4.",
tools=pllm.to_tool_schema([multiply, add, divide]),
)
while last_msg.resolve_function_calls():
conv.ask_functions(multiply=multiply, add=add, divide=divide)
last_msg = conv.ask_llm()
agt.print(conv.resolve())
# ['Add 3 and 4.', '', FunctionCallOutput(name=add, call_id=, content=7...), '3 + 4 = 7']
ask_functions¶
After ask_llm returns a response that contains function calls, ask_functions executes them by running the actual Python functions.
def count_files(directory: str) -> int:
"""Counts files in a directory"""
return 4
# ...
resp = agt.ask_llm(prompt, tools=pllm.to_tool_schema([count_files]))
fc_outs = agt.ask_functions(resp, count_files=count_files)
final = agt.ask_llm([prompt, resp, *fc_outs])
ask_human¶
ask_human is for human-in-the-loop checkpoints, approvals, or missing context that should come from a person instead of a model.
conv = agt.get_msg_state()
resp = agt.ask_human(
"Please confirm next step",
# Must pass the conversation for hashing purposes.
# ParaLLeM must associate the human response with
# the right point in the conversation.
conv,
)
print(resp.final_answer)
It follows the same hashing idea as ask_llm:
promptacts like the system prompt/instructions.documentsand*additional_documentsare the hash basis.saltcan be used to force differentiation.