Zaytri is an AI automation platform built to orchestrate multi-agent workflows, run brand-aware RAG pipelines with pgvector, and route requests across multiple LLM providers. The system uses Celery and Redis for async task queues, supports local inference for cost optimization, and exposes a premium Next.js dashboard with full observability.
2025 - 2026
The product needed a production-grade AI automation platform that could run multi-agent workflows, support brand-aware retrieval (RAG), optimize costs via local inference, and provide a single dashboard for observability across multiple LLM providers.
Design and build a multi-agent AI orchestration system with RAG using pgvector, Celery async workflows with Redis, multi-LLM routing (Ollama, OpenAI, Gemini), and a premium Next.js dashboard with observability.
Built multi-agent AI orchestration system for complex workflows; implemented brand-aware RAG with pgvector for retrieval; achieved cost optimization via local inference (Ollama); designed Celery async workflows and Redis queue for reliability; implemented multi-LLM routing for Ollama, OpenAI, and Gemini; delivered a premium Next.js dashboard with observability and monitoring.
Shipped platform with 3+ LLM providers (Ollama, OpenAI, Gemini), ~40% cost reduction via local inference, sub-2s RAG retrieval latency (pgvector), and dashboard with <500ms load for observability across agents and queues.
Coordinating multiple LLM providers, keeping RAG context brand-consistent, and ensuring the dashboard remained performant while displaying real-time task and model metrics required careful architecture.
Ensure consistent routing logic across providers, maintain brand-aware retrieval quality, and build a dashboard that scales with usage without impacting backend throughput.
Designed abstraction layer for multi-LLM routing with fallbacks and cost-based selection; tuned pgvector indexes and RAG prompts for brand consistency; used Celery task chains and Redis for observability metadata; built Next.js dashboard with server components and efficient polling/streaming for live metrics.
Multi-LLM routing with p95 latency <3s and zero provider lock-in; RAG relevance score improvement; dashboard TTI <1s with no increase in backend error rate.