Changelog
Pinax now aligns with LanceDB’s latest API, replacing the deprecated table_names() with list_tables() and updating the minimum LanceDB version to 0.26.0. This avoids deprecation-related breakages and keeps integrations stable.
Details
- Action required: upgrade LanceDB to version 0.26.0 or later
- If you call LanceDB directly, update usage to list_tables(); Pinax’s adapter handles this internally
Who this is for: Teams using LanceDB as a vector store and relying on stable, forward-compatible storage integrations.
Human-in-the-loop confirmation now correctly applies to MCP Function tools via toolkit-level settings (e.g., requires_confirmation_tools). This restores predictable approval gates before tool execution, improving safety and oversight.
Details
- Centralized policy control at the toolkit level
- No code changes required to enable HITL confirmation
Who this is for: Organizations enforcing governance, compliance, or risk controls in agentic workflows.
AwsBedrockEmbedder now supports Cohere Embed v4, including configurable output dimensions and multimodal (text + image) embeddings, with async variants. This expands what you can index and search while tuning for cost, latency, and quality.
Details
- Control vector size via output_dimension for performance and cost management
- Operates through AWS Bedrock for governance and consolidated operations
Who this is for: Teams standardizing on Bedrock that need scalable, multimodal semantic search and RAG.
Introduce smarter retrieval with the new AwsBedrockReranker, supporting Cohere Rerank 3.5 and Amazon Rerank 1.0. By scoring and reordering retrieved passages, you can boost precision and reduce noise in generated answers.
Details
- Plug-and-play integration for existing retrieval pipelines
- Convenience classes streamline setup and adoption on AWS
Who this is for: Teams building retrieval-augmented generation on AWS that need higher-quality, production-grade ranking.
Condition steps now support else_steps, allowing you to define a clear alternative path when a condition evaluates to false. This makes complex automations easier to express and maintain without extra workaround steps.
Details
- First-class true/false branching directly in workflows
- Backward compatible; no changes required to existing flows
Who this is for: Teams orchestrating complex, decision-heavy workflows that need clearer control flow and easier maintenance.
Async text_reader.aread() now returns an empty list ([]) for empty files, aligning behavior with the sync API. This removes special-case handling and simplifies downstream pipelines.
Details
- Consistent return types across sync and async code paths
- Reduced edge-case logic and clearer semantics for job orchestration
- Action required: Update any logic that expects a placeholder empty document
Who this is for: Teams running ingestion pipelines or ETL workflows that need predictable document handling.
We changed the WebsiteReader deduplication model to compute content hashes per page. This aligns skip_if_exists with page-level updates and ensures accurate re-crawls.
Details
- Behavior change: Deduplication occurs at page granularity, not aggregate level
- Action required: Clear existing website crawl entries before re-indexing to prevent duplicates
- Benefits: Higher correctness, predictable re-crawls, and lower operational overhead
Who this is for: Engineering teams managing recurring website crawls and large content refreshes.
WebsiteReader now computes a unique content hash per crawled URL, fixing skip_if_exists for multi-page crawls. This ensures accurate per-page deduplication, reduces redundant ingestion, and saves processing cost during re-crawls.
Details
- Correct per-page deduplication for predictable skip_if_exists behavior
- Fewer unnecessary writes and tokens when re-indexing multi-page sites
- Action required: Clear existing website crawl entries in your knowledge store before re-indexing to avoid duplicates
Who this is for: Teams maintaining search indexes, documentation portals, or knowledge bases sourced from websites.
We’ve added a first-class SeltzTools toolkit that brings Seltz-powered semantic search directly into Pinax. Teams can now plug high-quality semantic retrieval into Agents and Workflows without building custom adapters, improving response relevance and cutting integration time from days to minutes.
Details
- Standard Tool interface works seamlessly across Agents and Workflows for consistent, composable search.
- Reduces maintenance by relying on a supported integration instead of bespoke connectors.
- Quick start:
- pip install seltz
- Set SELTZ_API_KEY in your environment
Who this is for: Teams building RAG assistants, enterprise search, or knowledge-heavy automation that want robust semantic retrieval with minimal integration effort.
Learning is now simpler and more effective. When learning=True, user memory is enabled by default, and the LearnedKnowledgeStore captures organizational context (goals, constraints, policies) to guide agent behavior. We also improved prompts, streamlined tool parameter handling, and updated status messages. New quickstart cookbooks help teams adopt faster.
Details
- Faster time-to-value with sensible defaults — no extra setup to persist memory
- Better outcomes via richer organizational context and improved prompt quality
- Reduced integration effort with simpler tool parameter handling
- Clearer operational visibility with improved status text and cookbooks
Who this is for: Teams piloting or scaling learning agents that need strong governance signals, faster setup, and consistent outcomes.
Pinax now supports Moonshot.ai as a model provider with initial models and examples to help you get started quickly. This broadens your options for performance/cost trade-offs and lets you evaluate or deploy Moonshot models using the same configuration patterns you use today. The provider integrates seamlessly, so you can swap or A/B test models without refactoring agents or workflows.
Details
- Standardized configuration and invocation across providers
- Ready-to-use examples to accelerate evaluation and onboarding
- Compatible with Agents, Tools, and Workflows
Who this is for: Teams optimizing their model portfolio for accuracy, latency, budget, or regional availability.
We added UnsplashTools, a first-class toolkit for discovering and retrieving high-quality, royalty-free images directly in Pinax. Teams can now search, fetch by ID, request a random image, and download assets without building or maintaining custom integrations. This streamlines image sourcing across agents and workflows, reduces time-to-value for media-heavy features, and lowers ongoing integration overhead.
Details
- Turnkey tools: search_photos, get_photo, get_random_photo, download_photo
- Consistent interface usable from agents and workflows
- Eliminates custom API wrappers and reduces maintenance
Who this is for: Product, content, and AI assistant teams needing on-demand images for generation, prototyping, or production experiences.
We introduced a dedicated ExcelReader for .xls/.xlsx with sheet filtering, options to skip hidden sheets, and chunking controls. ReaderFactory now routes Excel files to ExcelReader automatically. This eliminates CSV conversions and reliance on CSVReader, reducing setup time and avoiding common formatting pitfalls. Teams gain more predictable ingestion of large workbooks and can tune performance and cost via chunk sizing.
Details
- Automatic routing of .xls/.xlsx to ExcelReader; minimal code changes for common cases
- Include/exclude specific sheets and optionally skip hidden tabs to control what’s ingested
- Chunking controls to handle large files reliably and at scale
- Migration: Projects that used CSVReader for Excel should switch to ExcelReader and install the extra: pip install "pinaxai[excel]"
Who this is for: Teams ingesting spreadsheets into knowledge bases or agent workflows; platform owners standardizing document ingestion.
A fix restores reliable table creation across AsyncSQLiteDb, AsyncPostgresDb, AsyncMySQLDb, and FirestoreDb. This removes a blocker that could prevent schema setup during initialization, improving startup reliability and reducing manual intervention across environments.
Details
- Unblocks table creation during provisioning and cold starts
- Applies consistently across multiple async backends in one upgrade
- No application changes required
Who this is for: Teams using async database backends for storage who need predictable deployment and operations.
We introduced OpenAI Responses API–compatible clients, including a base OpenResponses and provider-specific clients for Ollama and OpenRouter. This gives teams a consistent request/response schema across local and hosted models, simplifying migrations and reducing provider-specific branching. The result is faster adoption, cleaner integrations, and more flexibility to switch or mix models without refactoring.
Details
- One API shape across multiple providers for better portability and governance
- Supports self-hosted (Ollama) and hosted marketplaces (OpenRouter)
- No breaking changes — upgrade and start using Responses-compatible clients
Who this is for: Platform teams running hybrid model stacks and organizations seeking vendor flexibility with minimal integration overhead.
Knowledge now connects to private Azure Blob Storage as a first-class source — alongside SharePoint and GitHub — so Azure-centric organizations can centralize content without custom ETL. This enables teams to index documents securely from private containers and make them available to agents and workflows for retrieval-augmented generation and search.
Details
- Works with private Azure Blob Storage containers under your existing access controls
- Parity with existing SharePoint and GitHub loaders for consistent operations
- Reduces setup time and ongoing maintenance for Azure-first environments
Who this is for: Teams standardizing on Azure that need governed, scalable ingestion for internal content.
Async generator tools now capture and surface errors on the tool call —matching synchronous behavior — instead of re-raising exceptions. This delivers more predictable orchestration and fewer unexpected failures in long-running or streaming tool workflows. If your implementation relied on exceptions being thrown, update handlers accordingly.
Details
- Aligns async error handling with sync tools for consistent behavior
- Reduces unexpected cancellations caused by unhandled async exceptions
- Improves reliability in streaming and long-running workflows
Who this is for: Teams building automation with async or streaming tools.
We corrected streaming token accounting for Perplexity by collecting usage only on the final chunk for providers that return cumulative metrics. This change prevents inflated token counts so your dashboards, budgets, and alerts reflect actual usage.
Details
- More accurate token and cost metrics for streaming responses
- Historical comparisons may show a step change; adjust thresholds as needed
- No application changes required
Who this is for: Platform, FinOps, and observability teams tracking model usage and spend.
Knowledge can now ingest content from private GitHub repositories and SharePoint, via SDK and API. This enables organizations to consolidate code, docs, and operational knowledge from private systems while maintaining governance, reducing manual exports, and improving coverage for enterprise RAG and analytics.
Details
- Supports authenticated access to private GitHub and SharePoint sources
- Preserves structure and basic metadata to enhance retrieval relevance
- Reduces integration effort by using a single ingestion pathway
Who this is for: Enterprises with critical content in private repos and SharePoint who need secure, governed ingestion.
Knowledge now natively ingests Excel files by routing spreadsheets through the CSV reader. Each sheet is parsed into its own document with sheet-level metadata and normalized cell content. This removes manual pre-processing steps and makes enterprise spreadsheet data immediately searchable and useful in retrieval workflows.
Details
- Each sheet becomes a separate, metadata-rich document for targeted retrieval
- Normalized cell content improves parsing, chunking, and search quality
- Works via SDK and API with no special configuration
Who this is for: Knowledge and data platform teams standardizing enterprise documents for RAG and search.
We’ve added n1n.ai as an OpenAI-compatible provider, giving teams more flexibility to optimize for cost, performance, and regional availability. The new model class and cookbook examples make it simple to adopt N1N with minimal changes, enabling vendor diversification without rework across your stack.
Details
- OpenAI-compatible semantics reduce switching costs and integration risk
- Cookbook examples accelerate rollout across Agents, Tools, and Workflows
- No changes required to existing workflows beyond selecting the provider
Who this is for: Platform teams pursuing a multi-model strategy or looking for cost and supply redundancy.
AgentOS now uses a unified db parameter and deprecates tracing_db. This reduces configuration complexity and clarifies data storage for both operational and tracing needs.
Details
- Replace tracing_db with db in configuration
- Aligns all AgentOS data under a single, explicit database setting
Who this is for: Platform teams managing AgentOS deployments who want simpler, less error-prone configuration.
Knowledge.add_content has been renamed to insert and insert_many for clarity and alignment with the new protocol direction. The change improves semantic consistency and makes batch operations explicit.
Details
- Replace add_content with insert (single) or insert_many (batch)
- No behavioral changes — just clearer method names
Who this is for: Developers ingesting data into Knowledge who want clean, consistent APIs.
We replaced the DuckDuckGo-specific web search tool with a generic WebSearchTools interface. This standardization broadens provider choice and future-proofs search integrations.
Details
- Update any DDG-specific references to the new WebSearchTools
- Switch providers without redesigning your agents in the future
Who this is for: Teams embedding web search into agents who want provider flexibility and a stable interface.
We removed deprecated fields across tools/hooks and API parameters to simplify the surface area and reduce ambiguity. This change keeps the platform focused and easier to maintain at scale.
Details
- Some integrations may require minor updates to align with the current API
- Review custom tools and hooks to replace deprecated parameters
Who this is for: Teams with custom tools or hooks who need a stable, predictable API surface.
We resolved a 400 error caused by message formatting for file Part objects in Gemini (Vertex AI) uploads. Uploads now work as expected, unblocking multimodal use cases.
Details
- No action required; existing integrations resume normal behavior
- Applies to Gemini file inputs via Vertex AI
Who this is for: Teams relying on Gemini for multimodal processing and document-aware workflows.
Gemini now accepts gs:// URIs and HTTPS URLs (including presigned URLs) directly, eliminating the need to download files before processing. This reduces operational overhead and speeds up multimodal workflows, especially for large assets.
Details
- Pass GCS and external HTTPS sources without intermediate storage
- Reduces data handling, infra footprint, and latency
- Opt-in; no changes required to existing flows
Who this is for: Teams on GCP and security-conscious organizations that prefer presigned URLs and minimal data movement.
You can now persist and manage Agent, Team, and Workflow definitions in a database, with new AgentOS endpoints for programmatic create, read, update, and delete. This consolidates configuration, reduces sprawl, and makes it easier to automate promotion across environments.
Details
- Consistent, API-driven management of component definitions
- Simplifies deployments, environment parity, and CI/CD integration
- No migration required; adopt incrementally
Who this is for: Platform and MLOps teams standardizing how they define, version, and roll out AI system components.
We introduced KnowledgeProtocol, a unified interface that enables multiple Knowledge backends to work interchangeably with Agents and Teams. The default Knowledge implementation now conforms to this protocol, opening the door to alternative stores without changing your agent logic.
Details
- Standardizes how Knowledge is integrated, improving portability and vendor choice
- Existing projects continue to work; adopt new backends when ready
Who this is for: Teams that need to bring their own vector DB or conform to enterprise data platforms without refactoring agents.
We’ve introduced request-scoped isolation for agents, teams, and workflows. Each incoming request now runs against a fresh copy of the component while expensive resources (database connections, models, MCP tools) are safely shared. This eliminates cross-request state leakage, reduces race conditions, and delivers consistent results under load. New factory helpers — get_agent_for_request, get_team_for_request, and get_workflow_for_request — simplify adoption with minimal code changes. This upgrade strengthens reliability for concurrent and multi-tenant deployments without breaking existing integrations.
Details
- Deterministic behavior via deep-copy isolation per request
- Shared heavy resources keep latency and cost in check
- Updated routers and extensive tests for confidence
- Drop-in helpers standardize lifecycle management
Who this is for: Platform and product teams operating agents at scale, especially in multi-tenant or high-concurrency environments requiring predictable execution and low operational overhead.
We now classify common non-retryable conditions (e.g., 4xx responses, payload too large, context limit exceeded) and skip retries across both sync and async flows. This delivers faster failure signals, lower compute spend, and clearer logs — improving reliability without any changes to your code.
Details
- Consistent behavior across orchestration paths and providers
- Automatic optimization; no configuration required
Who this is for: Teams running production LLM workloads at scale who want to minimize wasted cycles and speed up incident triage.
A new AST-based Code Chunker splits code into semantically meaningful units, preserving function and class boundaries across multiple languages and tokenizer options. This improves retrieval and embedding relevance for code RAG and analysis, reduces token waste, and eliminates the need for custom chunking logic.
Details
- Language-agnostic AST parsing for structured, coherent chunks
- Configurable tokenizer settings to align with your model choices
- Drop-in adoption for existing ingestion and retrieval pipelines
Who this is for: Teams building code-aware RAG, search, review assistants, static analysis, and compliance workflows that require precise code understanding.
We introduced a unified learning system that enables agents to learn from every interaction. Teams can choose learning types and plug in preferred storage backends, making continuous improvement a first-class capability without custom scaffolding. This reduces manual tuning, accelerates time-to-value, and provides consistent controls for how knowledge is captured and retained across agents and workflows.
Details
- Configurable learning modes and retention policies to fit governance and cost requirements
- Works across agents and workflows, with pluggable storage backends for flexibility
- Low-ops adoption: enable learning without restructuring existing implementations
Who this is for: Platform teams scaling agent experiences that benefit from personalization, long-term context, and continuous improvement.
Crawl4aiTools now supports proxy_config via BrowserConfig, allowing traffic to route through enterprise proxies and enabling browser-level network configuration. This removes a common blocker for deployments in egress-controlled networks and makes crawling behavior predictable across environments.
Details
- Configure proxies centrally via BrowserConfig for consistent network behavior
- Simplifies deployment in VPCs and corporate networks with mandatory outbound proxies
Who this is for: Enterprises operating behind corporate proxies and teams standardizing network egress for web-crawling workloads.
This release introduces a breaking change: PythonTools and MLXTranscribeTools now operate only within their defined base directory by default. Workloads that previously accessed arbitrary filesystem paths will be constrained unless updated. The change improves security posture and prevents unintended file access.
Details
- To maintain broader access, set restrict_to_base_dir=False or expand the base directory to include required paths
- Provides stronger guardrails with minimal configuration overhead
Who this is for: Teams upgrading existing workloads that rely on cross-directory file access and need clear migration steps.
We introduced a restrict_to_base_dir parameter for PythonTools and MLXTranscribeTools, enabled by default. Tools now operate within a contextual base directory, minimizing blast radius and protecting local or mounted data during execution.
Details
- On by default: tools read/write only within their base directory
- Opt out per tool by setting restrict_to_base_dir=False
- Adjust the base directory to allow intended paths while maintaining isolation
Who this is for: Security-conscious teams, multi-tenant deployments, and anyone running tools on shared infrastructure.
Provider usage metrics (including token counts) are now propagated to the model response in both sync and async paths. This ensures reliable cost tracking, quota enforcement, and observability without custom plumbing.
Details
- Uniform access to usage data across sync and async responses
- Simplifies building budgets, alerts, and chargeback reporting
- No migration or code changes required
Who this is for: Platform owners, FinOps, and engineering teams who track spend, quotas, or performance across models.
Toolkit now supports async tool functions and automatically selects them when an agent runs in an async context. This delivers lower latency and higher throughput for concurrent workloads, while removing boilerplate required to manage sync/async paths manually.
Details
- Automatic selection of async tools in async runs; no code changes required
- Improves responsiveness and resource efficiency under load
- Works alongside existing tools without migration
Who this is for: Teams building high-concurrency agents, streaming experiences, or serverless workloads that benefit from end-to-end async execution.
When a URL is provided, MCPTools now default to StreamableHttp transport. This makes it easier to connect to external MCP servers and improves streaming behavior out of the box, reducing configuration overhead.
Details
- Better defaults for modern streaming workflows and real-time interactions
- Fewer setup steps when integrating with third-party MCP servers
- To retain previous behavior, explicitly set your preferred transport
Who this is for: Teams integrating MCP servers that want faster setup and more reliable streaming by default.
JWTMiddleware now supports a configurable audience parameter to validate the aud claim. This ensures tokens are intended for your services, reducing the risk of token replay or misrouting and strengthening your zero-trust posture.
Details
- Enforce audience verification without changing existing token flows
- Compatible with major identity providers and standard JWT libraries
- Non-breaking change; enable when ready by configuring your expected audience
Who this is for: Security-conscious teams and enterprises running production workloads with strict auth requirements.
Pinax now supports native reasoning for OpenAI GPT-5.1/5.2, Google Gemini 3/3.5/deepthink, and DeepSeek r1/reasoner. This expands your model choices while preserving a consistent interface — making it easier to optimize for accuracy, latency, or cost without refactoring.
Details
- Standardized reasoning interface simplifies A/B testing and fallback strategies
- Unlocks provider flexibility for regional, compliance, and pricing needs
- No migration required; adopt new models as drop-in options
Who this is for: Teams optimizing cost-performance across providers and those standardizing on a reasoning-first development approach.
You can now connect to and orchestrate remote agents via A2A using the new A2AClient, with cookbook examples to get started. This unlocks scale-out and cross-boundary scenarios — such as running agents in separate processes, hosts, or partner environments — including support for Google ADK agents on AgentOS (beta).
Details
- Execute remote agents with a consistent, local-like interface
- Improve isolation, resiliency, and resource utilization by distributing workloads
- Cookbook and examples reduce setup time and operational risk
Who this is for: Enterprises coordinating agents across services or networks and teams integrating partner- or vendor-hosted agents.
MCPTools and MultiMCPTools now support a header_provider callback to generate request headers at run time. This enables per-user and per-tenant authentication without custom plumbing, supports short-lived credentials, and simplifies compliance with enterprise security standards.
Details
- Issue per-run, per-user tokens without forking or duplicating tool definitions
- Reduce integration overhead for multi-tenant deployments and token rotation
- Works across multiple MCP endpoints with consistent behavior
Who this is for: SaaS platforms and internal developer platforms that need strong isolation and policy-driven auth for MCP-based integrations.
We introduced a first-class Skills system, including a Skills class plus validation and loader utilities. Teams can now define, validate, and reuse skills across agents with a consistent interface. This reduces boilerplate, accelerates onboarding, and improves governance by making capabilities explicit and testable.
Details
- Validate skills at load time to catch issues early and reduce runtime failures
- Load local, versioned skills for consistent behavior across environments
- Examples and tests included to speed adoption and standardize usage
Who this is for: Platform teams building shared capability catalogs and organizations that need consistent, auditable agent behaviors.
Agent-as-Judge evaluation runs are now returned on GET endpoints, making them fully visible and manageable in the AgentOS UI. This gives teams end-to-end observability of evaluation pipelines, improves governance with auditable results, and reduces time-to-triage when diagnosing model or agent behavior.
Details
- Retrieve status, scores, and metadata for evaluation runs via read APIs
- Monitor, filter, and drill into evaluations directly in the AgentOS UI
- Backward-compatible; no workflow changes required to start seeing results
Who this is for: Platform, MLOps, and QA teams validating agent behavior and benchmarking models at scale.
We overhauled the getting-started cookbook with structured examples, ready-to-use configs, and clear requirements. New projects reach first value faster, with fewer setup errors and better alignment to the latest APIs and patterns.
Details
- End-to-end templates that demonstrate common agent, tool, and workflow scenarios.
- Copy-paste configurations for typical environments reduce integration time.
- Up-to-date guidance minimizes rework and accelerates team ramp-up.
Who this is for: New adopters, solution engineers, and teams scaling Pinax across multiple projects.
Pinax’s LiteLLM integration now extracts and surfaces reasoning_content for supported models, enabling richer, audit-ready reasoning traces. Teams gain better visibility into model behavior for debugging, evaluation, and governance — without changing application logic.
Details
- Structured reasoning signals are available through standard responses when using the LiteLLM gateway.
- Enhances experiment design, incident analysis, and compliance reviews with traceable model steps.
- Works across compatible reasoning models supported by LiteLLM.
Who this is for: Teams standardizing on LiteLLM who need stronger tracing for reliability engineering, model evaluation, and oversight.
We introduced an async-capable cancellation manager with in-memory and Redis-backed options. This lets you reliably stop long-running or runaway work across distributed workers, improving cost control and adherence to SLAs without adding orchestration complexity.
Details
- Redis-backed manager coordinates cancellation across multiple nodes; in-memory remains available for local and single-node use.
- Bring your own implementation via the new public API to standardize cancellation with your existing infrastructure.
- Non-disruptive adoption; defaults remain unchanged.
Who this is for: Platform and SRE teams running distributed agents/workflows who need predictable termination, cost containment, and safer rollback scenarios.
You can now pass Google OAuth2 service account credentials directly when configuring Vertex AI models. This removes reliance on ambient credentials and gives platform teams precise control over how agents authenticate to Google Cloud, improving security posture and simplifying deployments across environments.
Details
- Accepts google.oauth2.service_account.Credentials for direct Vertex AI authentication
- Enables per-environment and per-agent credential isolation for stronger governance
- Streamlines CI/CD, serverless, and multi-project setups without additional scaffolding
- Additive change with no breaking impact or migration required
Who this is for: Platform, security, and MLOps teams standardizing on service accounts, especially in regulated or multi-tenant environments.
SemanticChunking now works with all Pinax embedders (e.g., Azure OpenAI, Mistral) and custom chonkie BaseEmbeddings via a wrapper, with new parameters for finer control. This expands model choice, helps optimize cost/latency, and reduces vendor lock-in without refactoring pipelines.
Details
- Plug in your preferred embedding provider with minimal configuration
- Tune chunk sizes and thresholds to match corpus and performance goals
- Maintain consistent chunking strategies across environments
Who this is for: RAG builders and platform teams optimizing retrieval quality and TCO.
Workflow event streams now support robust reconnection, catch-up, and replay. Clients automatically resume from the last known event after transient network issues, preventing gaps in dashboards, human-in-the-loop experiences, and downstream automations.
Details
- Event buffering and replay ensure continuity without manual intervention
- Backoff and resubscribe logic reduce dropped events and duplicate handling
- No changes required to existing workflows
Who this is for: Teams running long-lived or interactive workflows that require consistent real-time updates.
AgentOSClient is a first-class client for connecting to and operating a remote AgentOS. It standardizes how you authenticate, manage agents/teams/workflows, and stream events, reducing integration effort and operational risk while accelerating time-to-value.
Details
- Production-ready patterns with examples and tests to speed adoption
- Consistent error handling and simplified remote operations
- Fits CI/CD and service-to-service integrations without bespoke tooling
Who this is for: Platform owners and integrators who need a reliable, supported way to manage AgentOS remotely.
Introducing RemoteAgent, RemoteTeam, and RemoteWorkflow to execute orchestration on a remote AgentOS. This decouples runtime from application code so you can centralize governance and observability, isolate workloads for security and compliance, and scale horizontally without increasing client complexity.
Details
- Maintain the same agent and workflow definitions; no migration needed
- Run close to data for lower latency and better utilization
- Standard APIs for consistent operations across environments
Who this is for: Platform and infrastructure teams operating multi-tenant or regulated environments, or deploying across cloud and on-prem.
Hybrid search combines dense semantic similarity with keyword matching using reciprocal rank fusion (RRF) for Chroma-backed knowledge bases. This delivers more relevant results across diverse content, especially for queries with rare terms, acronyms, or exact phrases, improving answer quality and reducing false negatives in production RAG systems.
Details
- Works with existing Chroma stores; no schema or migration required
- Balances lexical and semantic signals for robust top-k retrieval
- Improves consistency across varied content types and edge-case queries
Who this is for: Teams running RAG, internal search, or support automation that need dependable retrieval quality at scale.
We resolved issues in read and async_read across multiple readers (CSV, field‑labeled CSV, JSON, Markdown, PDF, DOCX, PPTX, S3, Text, and Web Search). Pipelines now ingest documents and data consistently in both synchronous and asynchronous modes, reducing failures, retries, and operational noise.
Details
- Restores parity between read and async_read for predictable behavior and outputs
- Stabilizes ingestion from popular file formats, S3, and web sources used in production
- No code changes required; upgrade to benefit immediately
Who this is for: Teams building knowledge bases, ETL/ingestion pipelines, and retrieval workflows that rely on diverse document sources or high‑throughput async processing.
When both JWT and security key authentication are enabled, JWT now takes precedence. This standardizes behavior, reduces ambiguity for clients, and aligns with common enterprise security practices.
Details
- No change if only one method is in use
- For deployments using both, ensure clients present a valid JWT after upgrade
- Improves governance and reduces authorization edge cases
Who this is for: Security and platform administrators, API consumers, and teams operating shared gateways.
AgentOS now exposes an API endpoint to migrate all managed databases in one operation. This reduces operational overhead in multi-tenant or multi-environment deployments and ensures consistent schema versions during upgrades.
Details
- Orchestrates migrations across all databases, reducing error risk and manual work
- Fits CI/CD workflows for faster, safer rollouts
- Action recommended after upgrade: invoke the endpoint to ensure all schemas are current
Who this is for: Platform and SRE teams operating multiple agents, tenants, or environments.
We added a cost field to Metrics for OpenRouter-backed activity. This provides a reliable, standardized view of model spend without manual spreadsheets or custom aggregations, improving financial governance across environments.
Details
- Capture per-run and aggregate cost to support budget tracking, reporting, and chargeback
- Enables cost dashboards and alerts for proactive spend management
- No configuration changes required; cost appears automatically wherever Metrics are used
Who this is for: Platform and FinOps teams managing LLM spend across providers.
A2A protocol endpoints have been updated to follow standardized URL conventions, and related payloads were aligned to the protocol. Clients must migrate to the new paths to remain compatible with future releases.
Details
- Update client base paths and payload shapes to the new conventions
- Use the new Agent Card retrieval endpoint where applicable
- Plan a staged rollout to minimize downtime and validate behavior
Who this is for: Integration teams and platform owners maintaining A2A clients and cross-system agent orchestration.
JWTMiddleware now enforces token presence on every request. validate=False no longer permits requests without a token. This improves baseline security and reduces the risk of accidental unauthenticated access.
Details
- Action: propagate JWTs across all clients and internal services
- Validate non-verified paths still include tokens and pass as expected
- Monitor auth metrics to verify parity post‑migration
Who this is for: Operators and integration teams managing authentication across services and environments.
AgentOS now blocks initialization/resync if duplicate IDs are detected across Agents, Teams, or Workflows. This ensures unambiguous references and prevents hard-to-debug behavior at runtime.
Details
- Breaking change: initialization will fail on duplicate IDs
- Action: audit and ensure unique IDs before upgrading
Who this is for: Platform owners and multi-team deployments managing large catalogs of agents and workflows.
output_schema now accepts provider-specific JSON schemas and passes them directly to model APIs (OpenAI, Claude, and OpenAI‑like). This removes mapping layers, reduces boilerplate, and enables faster adoption of the latest vendor features.
Details
- Send provider-native JSON schema objects directly to models
- Less custom translation code and fewer maintenance points
- Backward-compatible; existing usages continue to work
Who this is for: Teams standardizing structured outputs across multiple model providers.
Milvus search and async_search now support radius, range_filter, and async search_parameters. These controls help teams tune recall vs. precision and reduce tail latency in high-throughput workloads.
Details
- Radius and range_filter for precise vector similarity windows
- Optional async execution for lower latency and higher throughput
- Backward-compatible; defaults unchanged
Who this is for: Teams running RAG and vector search on Milvus that need predictable performance and relevance.
We introduced conventional A2A endpoints — including Agent Card retrieval — and aligned run endpoints and payloads to the updated protocol. This reduces custom handling across clients, improves cross-system compatibility, and clarifies long-term API boundaries.
Details
- New Agent Card retrieval endpoint
- Protocol-aligned run endpoints and payloads
- Requires client updates to adopt the new endpoints and schema
Who this is for: Platform teams integrating agents across services and organizations standardizing on A2A interfaces.
You can now stream reasoning chunks whenever a reasoning model is used. A new ReasoningManager coordinates streaming and lifecycle, giving teams earlier visibility into model thinking, faster debugging, and better auditability — with minimal changes to existing workflows.
Details
- Real-time streaming of reasoning traces for supported models
- Centralized control and error handling via ReasoningManager
- Backward-compatible; enable by providing a reasoning model
Who this is for: Teams building evaluators, regulated or safety-critical applications, and leaders who need transparent reasoning for review and governance.
We’ve added role-based access control (RBAC) to AgentOS via JWT middleware with per-endpoint authorization and per-resource scopes. This brings consistent, least-privilege enforcement across Agents, Teams, and Workflows, reducing custom policy code and operational risk. Standardized scopes help security and platform teams implement clear policies, simplify reviews, and support multi-tenant deployments with confidence. The release includes predefined scopes, enforcement, tests, and examples to speed adoption.
Details
- Per-endpoint authorization with scoped access to individual resources
- Clear, reusable scopes reduce policy drift and review overhead
- Backward compatible; adopting RBAC requires configuration
Who this is for: Platform and security teams, enterprise deployments, and organizations needing strong governance and least-privilege controls.
A new unified token counting utility provides consistent, accurate token estimates across OpenAI, Anthropic, AWS Bedrock, Google Gemini, and LiteLLM. We’ve also integrated token-based compression into Compression Manager to automatically fit content within model limits. Together, these changes simplify multi-model operations and help teams proactively control cost, latency, and throughput.
Details
- Single API for cross-provider token accounting improves planning and governance
- Token-aware compression prioritizes relevant context to meet target budgets
- Reduces prompt overruns and tail latency caused by context overflow
- Backward-compatible; no required action to upgrade
Who this is for: Platform teams orchestrating multi-model workloads, cost-sensitive deployments, and applications that must meet strict SLAs.
We now populate provider metadata for OpenAI Chat responses and surface it across key response and event objects. Completion ID, system fingerprint, and other model-specific fields are included on ModelResponse/Message and emitted in RunOutput and RunCompletedEvent. This gives teams reliable identifiers to correlate with provider logs and invoices, streamlining debugging, cost analysis, and auditability — without disrupting existing workflows.
Details
- Access completion_id, system_fingerprint, and model_extra from response.provider_data or event payloads
- Available in ModelResponse/Message, RunOutput, and RunCompletedEvent
- Backward compatible and additive; no migration required
Who this is for: Platform, MLOps, and application teams that need faster root-cause analysis, precise cost attribution, and improved observability in production.
To make runs more predictable, the stream and stream_events flags no longer persist across run/arun calls. This eliminates hidden state between invocations and ensures teams explicitly control streaming behavior per execution, improving reproducibility in development and production.
Details
- Set streaming flags on each run/arun to opt in per execution
- Reduces surprises and aligns runs across services and environments
Who this is for: Platform owners and teams standardizing run behavior in production pipelines and multi-service deployments.
Streaming experiences using Gemini now accept URL context and web_search_queries, enabling real-time retrieval and reasoning over live web content. This removes prior limitations in streaming flows, improving answer quality for research, summarization, and monitoring scenarios — without requiring any migration.
Details
- Provide URLs and suggested search queries during streaming for richer, in-flow context
- Improve response relevance in assistants that reason over current web data
Who this is for: Teams building real-time assistants, research tools, or monitoring workflows on Google Gemini.
We introduced a Shopify toolkit that lets agents analyze store data such as sales, customers, and products without custom integration work. This reduces time-to-value for commerce analytics and reporting, and provides a clear path from prototype to production via a cookbook example.
Details
- Standardized Tools interface to authenticate and query Shopify data
- Plug into agents and workflows for automated reporting, alerts, and insights
- Cookbook example to go from zero to actionable analytics quickly
Who this is for: Shopify developers, data teams, and commerce platforms building analytics, automation, or customer operations on Shopify.
Knowledge add_content_ methods now support true synchronous execution. This removes the async-only limitation, making it straightforward to integrate content ingestion into synchronous services and batch jobs without event loop management or architectural workarounds.
Details
- Synchronous parity with existing async methods for consistent behavior
- Drop-in for frameworks and environments that don’t use async
- No migration steps required
Who this is for: Backend teams building on synchronous frameworks and data pipelines that need reliable, easy-to-use Knowledge ingestion.
Pinax now supports reasoning messages from OpenRouter, enabling you to capture and act on models’ reasoning outputs where available. This provides greater transparency for debugging, evaluation, and governance, and expands the set of model capabilities you can use without changing your integration approach.
Details
- Ingest reasoning messages alongside standard outputs for improved traceability
- Works with existing routing, logging, and evaluation workflows
- No migration required; enable where OpenRouter models support reasoning
Who this is for: Teams adopting OpenRouter models that expose reasoning signals and need better observability and evaluation fidelity.
A new built-in evaluation system lets you automate LLM quality checks with binary and numeric scoring, background execution, post-hooks, and customizable evaluator agents. This makes it easier to standardize evals, gate releases, and compare models — without bolting on external systems.
Details
- Run evaluations in the background to keep pipelines responsive
- Use post-hooks to persist metrics, trigger alerts, or update dashboards
- Create custom evaluator agents to encode domain-specific criteria
Who this is for: AI platform teams, ML engineers, and QA leads who need consistent, auditable evaluation workflows at scale.
We’ve added AsyncMySQLDb with native compatibility for the asyncmy driver, enabling fully asynchronous MySQL operations. This unlocks higher concurrency, better throughput, and lower latency for agent and workflow backends that depend on MySQL. Built-in tracing support and cookbook examples reduce integration time and improve observability from day one.
Details
- Non-blocking I/O with asyncmy for scalable, event-driven architectures
- Integrated tracing hooks for end-to-end visibility and troubleshooting
- Cookbook examples to shorten time-to-value and standardize adoption
Who this is for: Teams running high-throughput agents, streaming pipelines, or workflow services that need async database performance and robust observability.
MemoriTools has been removed in favor of Memori SDK v3’s built-in auto-recording. This consolidates functionality in the SDK, reduces integration complexity, and lowers maintenance overhead. To avoid breakage, remove MemoriTools from your code and rely on the SDK for conversation recording.
Details
- MemoriTools is no longer supported; SDK v3 provides automatic recording
- Action required: remove MemoriTools imports/usages and update your flows to SDK v3
- Outcome: a simpler, more reliable integration path with fewer components to manage
Who this is for: Teams adopting or maintaining Memori-based conversation storage who want a supported, lower-friction integration.
Pinax now ships with Memori SDK v3.0.5, enabling automatic recording of agent conversations without a separate tool. This simplifies integration, reduces setup time, and ensures a consistent audit trail out of the box. If you previously used MemoriTools, you can remove it — SDK v3 handles recording automatically.
Details
- Zero-config conversation capture for Pinax agents
- Fewer moving parts and dependencies to maintain
- Action recommended: upgrade to SDK v3.0.5 to adopt auto-recording
Who this is for: Teams standardizing on conversation archiving for compliance, analytics, or customer support quality.
We’ve moved retry logic from Agents/Teams to the Model layer. When you set retries on a model, Pinax now retries at the model execution level, which is more effective for handling provider throttling and transient errors. Agent/Team retries now apply only to run-level exceptions. This change reduces wasted cycles, makes behavior more predictable, and improves throughput under rate limits.
Details
- Configure retries on the Model to handle LLM/provider errors directly
- Agent/Team retries now cover orchestration-level failures only
- Action required: move any Agent/Team retry settings to the associated Model
Who this is for: Teams running production workloads at scale who need consistent behavior and better resilience under variable provider limits.
AgentOS evaluation endpoints now work with asynchronous database backends. Teams using async DB classes can run evaluations without changing their stack, removing a key limitation for modern, event-driven deployments.
Details
- Evals run as expected with async database drivers, improving parity across environments
- No configuration changes or migration steps required
Who this is for: Engineering teams standardizing on asynchronous databases who need reliable, automated evaluation workflows.
RunRequirement simplifies how agents request and manage human input. Requirements now surface directly in agent responses or as RunPaused events in streaming flows, providing a consistent pattern for approvals, confirmations, and other human checkpoints. This reduces implementation effort today and lays the groundwork for richer triggers and orchestration in the future.
Details
- Unified model for HITL across synchronous and streaming executions
- Less glue code and fewer edge cases to handle in application logic
- No action required to benefit from the new model
Who this is for: Teams implementing approvals, compliance gates, or manual reviews in agent-driven workflows.
We introduced RedshiftTools, giving agents first-class access to Amazon Redshift without custom glue. Teams can explore schemas, describe tables, inspect and run queries, and export data directly through a consistent tool interface. The toolkit supports both standard credential-based auth and IAM-based authentication (via explicit credentials or AWS profiles), aligning with enterprise security practices.
Details
- Speed up prototyping and operations by eliminating one-off scripts and SDK wiring
- Standardize Redshift access patterns across agents and workflows
- Reduce integration risk with built-in support for IAM authentication
Who this is for: Data and platform teams building agents or workflows that need secure, governed access to Redshift.
We’ve introduced a Spotify toolkit and example agent to manage and interact with Spotify, including library management. This addition reduces custom API work and speeds up delivery of music features in assistants, automations, and internal tools. Teams can quickly prototype, then productionize common Spotify workflows without building from scratch.
Details
- Prebuilt capabilities for common library operations to minimize integration effort
- Example agent demonstrates end-to-end usage for faster adoption
- Compatible with existing Pinax agents and workflows; no migration required
Who this is for: Product and platform teams building Spotify-powered assistants, content curation tools, or media automations.
To prevent CreateTable validation errors and ensure reliable, time-ordered queries, the DynamoDB schema for the user Memory table now requires a global secondary index (GSI) on created_at. Deployments using DynamoDB must add this index or recreate the table using the updated schema.
Details
- Action required for DynamoDB users: add the created_at GSI or reprovision the table.
- Eliminates schema validation failures and improves query performance.
- No changes required for other storage backends.
Who this is for: Teams running Pinax with AWS DynamoDB for Memory storage.
We introduced native tracing with OpenTelemetry, including first-class spans and new endpoints to inspect traces. Spans are stored in your configured database, giving you immediate, consistent observability without additional instrumentation. This improves debugging, performance analysis, and compliance with reliability objectives.
Details
- New endpoints: /traces, /traces/<trace_id>, /traces/<trace_id>?<span_id>
- Works with Pinax-supported storage backends; optionally configure exporters to forward data to your observability stack.
- Speeds up root-cause analysis and shortens time-to-resolution.
Who this is for: Platform, SRE, and ops teams that need standardized, low-friction tracing across agents and tools.
Agent and Team pre- and post-hooks now run as background tasks in AgentOS, so they no longer block the main operation. This reduces end-to-end latency and increases throughput, especially under concurrent load. Teams should ensure hooks are idempotent and not dependent on synchronous completion.
Details
- Hooks execute concurrently and may complete after the primary request returns.
- Move any required synchronous logic into the main flow; treat hooks as asynchronous side effects.
- Expect reduced wait times and better parallelism in high-throughput environments.
Who this is for: Teams scaling agent workloads, multi-tenant platforms, and latency-sensitive use cases.
Runs now support an optional citations field across single, team, and workflow executions. This lets you store and surface model-provided source citations directly in your run metadata, improving traceability, auditability, and user trust without changing existing integrations. The field is non-breaking and can be adopted incrementally to power features like “show your work,” compliance review, and knowledge attribution.
Details
- Available in RunSchema, TeamRunSchema, and WorkflowRunSchema responses.
- Optional and backward-compatible; no migrations required.
Who this is for: Teams building user-facing experiences that require explainability, or organizations in regulated environments that need evidence of sources and decision trails.
We added a complete Gemini 3 demo, including example agents, configuration, and generated assets. This makes it faster to evaluate and roll out Gemini 3 within Pinax by providing opinionated, runnable patterns you can copy, adapt, and deploy. Teams can stand up proofs of concept in minutes and standardize on a repeatable setup, reducing integration effort and risk.
Details
- Preconfigured agents and sample assets showcase best practices for orchestration and evaluation.
- Works out of the box; no changes required to existing projects.
Who this is for: Platform teams and developers evaluating Gemini 3 or scaling multi-model strategies with minimal setup time.
MongoDB clients now support Motor and PyMongo async libraries with improved error handling and typing. This enables non-blocking storage operations, better concurrency, and lower latency in async-first applications.
Details
- Drop-in async clients for faster, more scalable data operations
- Enhanced typing and error handling improve reliability and observability
- Reduces custom glue code for Python async stacks
Who this is for: Teams running high-QPS, async Python services that depend on MongoDB.
We’ve added an optional API key path for AWS Bedrock Claude in addition to IAM. This reduces setup friction in environments where IAM is not feasible while preserving IAM as the default for production.
Details
- Use AWS_BEDROCK_API_KEY as an alternative authentication method
- No changes required for existing IAM-based configurations
- Simplifies local development, cross-account, and restricted-policy scenarios
Who this is for: Enterprises with constrained IAM policies or teams needing rapid prototyping paths.
You can now override output_schema at runtime for both Agent and Team (streaming and non-streaming), with automatic restoration after the run. This enables per-request structured output variations without cloning agents or adding conditional boilerplate.
Details
- Change the expected output format for a single run; state is restored automatically
- Works for streaming and batch runs to support diverse downstream consumers
- Simplifies A/B testing, multi-tenant formats, and evolving contract needs
Who this is for: Platform teams orchestrating varied integrations and output contracts across services.
Pinax now offers full support for Google Gemini File Search, including store and document management, uploads/imports, metadata filters, citation extraction, and async APIs. This enables high-quality retrieval workflows with traceability and scale on the Gemini platform.
Details
- Manage file stores and documents, including bulk uploads and async ingestion
- Filter by metadata and extract citations for auditability and explainability
- First-class integration to accelerate RAG and knowledge-heavy agents
Who this is for: Teams standardizing on Google’s AI stack and building retrieval-rich applications with compliance needs.
A new MemoryOptimizationStrategy framework and APIs allow you to summarize and optimize memories outside of agent runs. By decoupling memory maintenance from inference, you can keep context high-signal while reducing runtime tokens and improving decision quality at scale.
Details
- Schedule or trigger memory compaction and summarization independently of agent runs
- Keep knowledge current and concise to improve downstream model performance
- Works without changes to agent logic; designed for scale and governance
Who this is for: Production teams with large or fast-growing memory stores seeking lower costs and tighter control.
Automatically compress and summarize tool call results to keep agent context safely within model token windows. This change reduces context overflow errors, stabilizes long-running workflows, and lowers token spend without requiring any application changes.
Details
- Summarizes large tool outputs before adding them to conversation history
- Improves reliability for tool-heavy agents and extended sessions
- Reduces token usage while preserving relevant signal for downstream reasoning
Who this is for: Teams operating long-running or tool-intensive agents where reliability and cost control are priorities.
We fixed an issue where filtering memories by topic could return incorrect results when using SQLite or AsyncSQLite backends. Topic-based queries now behave predictably, improving the accuracy of agents and workflows that rely on segmented memory retrieval. No action is required — existing implementations will benefit immediately after upgrading.
Details
- Accurate topic filters for both SQLite and AsyncSQLite memory backends
- Reduces debugging and unexpected agent responses caused by misclassified results
- Improves determinism for evaluations, automation, and knowledge reuse
Who this is for: Teams using local SQLite storage for Memory, especially those organizing knowledge by topic for agents, offline/edge deployments, or deterministic test environments.
We added first-class support for Anthropic’s structured outputs, including schema enforcement, strict tool calling, and robust response parsing across synchronous, asynchronous, and streaming APIs. This delivers predictable, typed responses, cuts custom parsing and boilerplate, and improves reliability and governance for production workloads using Claude.
Details
- Enforce JSON/object schemas to keep outputs consistent and machine-readable
- Strict tools and end-to-end parsing reduce failure modes and post-processing
- Streaming support preserves low latency while maintaining structure
- Backward-compatible and additive; adopt incrementally
Who this is for: Teams standardizing on Claude, building workflow automations, or requiring dependable, typed outputs.
We introduced NanoBananaTools, a turnkey toolkit to generate images with Google’s Nano Banana model. It includes built-in parameter validation and a cookbook example, enabling faster adoption and fewer integration errors. Standardizing how you invoke the model within Pinax reduces glue code and makes image features easier to operate and maintain.
Details
- Ready-to-use tool wrappers with input validations to prevent malformed requests
- Cookbook example accelerates first run and team onboarding
- Designed to plug into your existing toolchain to minimize integration effort
Who this is for: Product and platform teams adding image generation or evaluating Google’s vision models.
We removed deprecated AgentOS parameters to standardize on stable naming: os_id -> id, fastapi_app -> base_app, enable_mcp -> enable_mcp_server, replace_routes -> on_route_conflict.
Details
- Action required: rename parameters to the stable forms
- Reduces ambiguity and future migration effort
Who this is for: Platform teams embedding AgentOS into services and APIs.
We removed get_messages_for_session and get_messages_from_last_n_runs in favor of get_messages, get_session_messages, and get_chat_history. This unifies patterns and reduces mental overhead.
Details
- Action required: migrate to the new method names
- Clearer contracts for history retrieval and observability
Who this is for: Teams managing conversation history, logging, or analytics.
When using knowledge_filters, you must configure contents_db. This ensures deterministic, stateless filtering aligned with AgentOS and prevents silent mismatches.
Details
- Action required: provide a contents_db for any knowledge base that uses filters
- Improves reliability and reproducibility of retrieval and filtering
Who this is for: Teams building RAG and knowledge-aware agents at scale.
We removed the stream_events parameter from print_response/aprint_response and CLI. Streaming now works correctly by default, reducing configuration and edge cases.
Details
- Action required: remove the parameter from calls
- For fine-grained control, use run()/arun() instead of print helpers
Who this is for: Teams embedding CLIs or console output in developer workflows and demos.
