RAG

By design, we do not offer a native API for retrieval augmented generation (RAG).

There are many vector stores (Chroma, FAISS, Qdrant, Weaviate, Milvus, Pinecone, Elastisearch, pgvector), and we do not want to enforce a single vector store.

Nor do we want to bundle a vector store library. To keep parallem slim and lightweight, RAG is outside our scope. However, RAG can still be easily accomplished with function calling.

For example, interface with your vector store with a simple vanilla Python function. Here is a minimal in-memory chromadb example:

pip install chromadb

import chromadb
from dotenv import load_dotenv
import parallem as pllm

# RAG implementation.
# parallem does not bundle any RAG libraries, but it can be easily implemented.

client = chromadb.Client()
collection = client.create_collection(name="rag_demo")

collection.add(
    ids=["doc1", "doc2", "doc3"],
    documents=[
        "Refunds are available within 30 days of purchase with a valid receipt.",
        "Digital products are non-refundable after download unless required by law.",
        "Our offices are based in Palo Alto, California.",
    ],
)


def vector_store_tool(query: str, k: int = 2) -> str:
    """Given a query, retrieves relevant documents from the vector store."""
    result = collection.query(query_texts=[query], n_results=k)
    docs = result["documents"][0]
    return "\n".join(docs)


# Begin parallem logic


def rag_agent(agt: pllm.AgentContext, query: str):
    conv = agt.get_msg_state()
    resp = conv.ask_llm(
        query,
        instructions="Only supply information relevant to the user's question.",
        tools=pllm.to_tool_schema([vector_store_tool]),
    )
    conv.ask_functions(vector_store_tool=vector_store_tool)
    conv.ask_llm()
    print(resp.resolve_function_calls())
    return conv[-1].final_answer


if __name__ == "__main__":
    load_dotenv()
    with pllm.resume_directory(
        ".pllm/example/rag", hash_by=["llm"], provider="google"
    ) as orch:
        with orch.agent() as agt:
            out = rag_agent(agt, "What is the refund policy for digital products?")
            print(out)

In the above example, retrieval augmented generation is available as a function call. However, you can also feed the query directly to the vector store -- up to you.

def simpler_rag_agent(agt: AgentContext, query: str)
    documents = vector_store_tool("What is the refund policy for digital products?")
    resp = agt.ask_llm(
        [*documents, query]
    )
    return resp.final_answer