AuraMind · Our production RAG platform

Our product, our proof.

Our production RAG platform. Multi-LLM, multi-tenant, in production today.

At a glance

Status● In production

CadenceIterative since launch

ProvidersAkashML · Ollama · Gemini · OpenRouter

APIOpenAI-compatible

RetrievalTag-based RBAC

Structured dataSandboxed Pandas

§ 01

What it is

A production RAG platform, not a notebook demo.

AuraMind is a retrieval-augmented generation platform built for teams that need to put a real LLM-backed system in front of real users — not a notebook demo, not a Slack bot wired to a single API, but a multi-tenant product with access control, structured-data Q&A, and the operational surface to run it in production.

It does three things most off-the-shelf RAG tools won't do without help. It routes across multiple LLM providers so the system isn't locked to a single vendor's pricing or uptime. It enforces tag-based role-based access control, so the same document can be queryable by some users and invisible to others. And it answers questions about structured data — CSVs, tables, exported reports — by running sandboxed Pandas, not by hoping the LLM gets the math right.

We built AuraMind because we needed a production RAG system Arc10 actually ran. Today it is that system, and it's the proof point we point at when a buyer asks whether we've shipped AI to production.

§ 02

How it's wired

Client → API → routing → retrieval → stream.

The shape of the system is straightforward: a client calls the API, the request is routed to whichever LLM provider matches the configured policy for that workspace, retrieval (or sandboxed Pandas, depending on the question) happens before the model is called, and the result streams back over SSE. RBAC is enforced at the document layer, not at the API layer — which means a user who asks a question that would otherwise hit a document they don't have access to gets a result that doesn't include it, instead of a permission error.

Architecture diagram in preparation. The walkthrough on the call is the inspection — pick the part you want to see.

§ 03

The components

Five components. Each independently load-bearing.

Multi-LLM routing

Routes inference requests across AkashML, Ollama, Gemini, and OpenRouter based on a configurable policy per workspace.

Why it matters

Single-provider RAG systems are brittle. Provider outages, rate limits, and model deprecations all become production incidents. Multi-provider routing makes the system survivable.

How it's built

A thin routing layer in front of the model call. The policy is workspace-scoped, so different tenants can default to different providers. Failover is automatic.

Tag-based RBAC

Documents and structured-data sources are tagged. Users see what they're entitled to see — and the LLM is only ever shown documents the user could see directly.

Why it matters

Most RAG systems do access control at the wrong layer. Filtering after retrieval is too late; the model has already seen the data. Tag-based RBAC pushes enforcement to the retrieval boundary.

How it's built

Tags are first-class. They're attached at upload, queried at retrieval, and enforced before any context reaches the model. The same document can be visible in one workspace and invisible in another, with no duplication.

Sandboxed Pandas execution

Lets users ask questions about structured data — "what was Q3 revenue by region?" — and get answers grounded in the actual numbers, not in the LLM's guesswork.

Why it matters

LLMs are unreliable at arithmetic. Asking a model to summarize a CSV is a recipe for confidently wrong answers. Running real Pandas in a sandbox eliminates the math-hallucination class of failure entirely.

How it's built

The model translates the question into a small Pandas program, the program runs in a sandboxed environment with no network and no file-system access outside the dataset, and the result is what comes back. The user sees the answer; the model never sees the raw data.

OpenAI-compatible API

AuraMind exposes an API that speaks the OpenAI Chat Completions wire format.

Why it matters

Most LLM client libraries already speak this format. By being OpenAI-compatible, AuraMind drops into existing client code without rewrites — and it lets the buyer keep tooling investments they've already made.

How it's built

A compatibility layer in front of the routing logic. From the client's perspective, AuraMind looks like an OpenAI endpoint. From inside the system, requests are routed across providers and retrieval-augmented before being served back.

SSE streaming with disconnect recovery

Streams responses to the client over Server-Sent Events. Recovers cleanly from disconnects mid-stream.

Why it matters

Streaming responses is the table-stakes UX for any LLM product — but most implementations break badly when the client connection drops. AuraMind picks up where it left off instead of restarting the request.

How it's built

A persistent stream identifier on each request. If the client reconnects within the recovery window, the server replays the missed tokens and continues. Outside the window, the request is restarted cleanly.

§ 04

The decisions, and why

A few choices that aren't obvious until you've lived with the alternative.

Multi-provider over single-provider

The fastest way to ship a RAG system is to call OpenAI directly and stop thinking about it. We did not do that. The reason: every production AI system we have shipped or watched ship has eventually had a provider-side outage that turned into a customer-facing one. Multi-provider routing is operational hygiene, not vendor agnosticism for its own sake. The cost is a routing layer that has to know about the differences between providers — token limits, streaming behavior, rate-limit semantics. We pay that cost on purpose.

Sandboxed Pandas over raw code execution

There is a class of agent tooling that gives the LLM unrestricted code execution and lets it figure out how to answer a question. That works in demos. It is not a production posture. The model can — and will — write code that touches the file system, makes network calls, or shells out in ways the operator did not intend. Sandboxed Pandas with no network and no file-system access is the smallest tool that solves the structured-data question without opening that surface. It's deliberately limited; that's the point.

OpenAI-compatible API over a proprietary one

There was a moment, early in the build, where we considered designing a richer API that exposed our routing semantics directly. We decided against it. The compatibility surface is the surface most clients already speak. Asking buyers to rewrite their integration code in order to get our routing benefits would have been a tax we did not have a good reason to charge. The internal routing is still rich. The external API is just OpenAI-shaped.

§ 05

How it ships

Iterative since the first deploy.

AuraMind has been shipped iteratively since the first deploy. The shipping log below is the public-facing summary; the actual history lives in Git.

Backend

Iterative shipping of RAG core, routing layer, RBAC enforcement, sandboxed Pandas, and SSE streaming with disconnect recovery.

Frontend

Workspace UI, document upload and tagging, query surface, and streaming response rendering.

We don't show you a roadmap on a slide. We show you the commits.

§ 06

The bridge

The team that ships ours is the team that ships yours.

AuraMind exists because Arc10 needed a production RAG platform Arc10 itself ran on. The team that designed it, built it, and runs it is the same senior engineering bench we put on client engagements.

If the architecture above looks like the kind of thing you'd want shipped into your product, the path is short — we're already that team. We've already made the production-AI mistakes you would otherwise pay to learn. Hire us once and you skip the prototype-to-production cliff that ends most AI initiatives.

Talk

Talk to a senior engineer.

We'll walk you through AuraMind's architecture on the call — yours, not ours. Pick the part you want to inspect.

Book a 30-minute call→hello@arc10.io→

What to expect

Emailhello@arc10.io

First replyWithin one business day

First call30 minutes with a senior engineer

No salesEngineering questions, engineering answers