A while back, I posted that I was going to build a real, production-grade AI platform — not a toy, not a demo, but the kind of system you’d actually trust to run things.
This is the follow-through.
It’s built. It’s running. And as of this week, it’s open source: github.com/Bleenq-Technology/ai-homelab.
But the part I actually want to talk about isn’t the platform itself. It’s what the platform does to your velocity once it exists. Because the real story here isn’t “look at this big stack.” It’s this: once you’ve built the foundation, the interesting projects stop taking months and start taking afternoons.
Let me show you what I mean.
The moment the foundation paid off
A few weeks ago I wanted something specific: I wanted to talk to my own code. Not grep it. Not scroll it. Ask it questions — “where do we wire the LLM gateway,” “what did we decide about secrets,” “show me how the KB ingest works” — and get real answers, with the actual files behind them.
A year ago, that’s a project. Stand up a vector database. Wire embeddings. Build an ingestion pipeline. Add a chat interface. Bolt on auth so it isn’t wide open. Add monitoring so you know when it breaks. Weeks of yak-shaving before you ask your first question.
This time, every one of those pieces was already running. The vector store (Qdrant), the embedding model (BGE-M3 served through our gateway), the chat UI (Open WebUI), the workflow engine (n8n), the identity layer in front of all of it — present and accounted for.
So the “project” collapsed to: point a manifest at our repositories, let the ingest flow embed them into knowledge bases, and wire a chat pipe. We were asking our own codebase questions that same day. Then we surfaced the same knowledge bases through a Discord bot, so the answers live where the team already talks.
That’s the whole thesis of this piece: the platform is the multiplier. The projects are easy because the hard part was already done, once, properly.
What we actually built
The platform is organized into four Docker Compose stacks — a clean mental model that’s worth stealing even if you never touch our code:
1. The data layer (where truth lives). PostgreSQL with vector extensions, Redis, MinIO (S3-compatible object storage), ClickHouse for analytics, QuestDB for time-series, Neo4j for graphs, and Qdrant for vectors. Different data shapes want different engines, and real AI reasoning happens across them, not in one.
2. The core / edge (the plumbing nobody sees until it’s missing). Traefik as the reverse proxy, issuing a single Let’s Encrypt wildcard certificate via a DNS-01 challenge — so every service gets TLS with zero per-service config. Keycloak for single sign-on. Infisical as the secrets source of truth. NetBox, AdGuard.
3. Monitoring (the nervous system). Prometheus + Grafana + Loki
- Alertmanager, plus Uptime Kuma — and, crucially for AI work, Langfuse for LLM-specific tracing. When an alert fires, it lands in Discord, where someone will actually see it.
4. The AI tooling (the actual point). Open WebUI, LiteLLM as the OpenAI-compatible gateway, a local LLM (a quantized Qwen3-8B served on the GPU with an 80k-token context), ComfyUI for images, Wyoming services for speech, n8n and Flowise for agentic workflows, MLflow for experiment tracking, SearXNG for private search, and a BGE-M3 embedding container wired into the gateway.
That’s roughly 40 services, every container image pinned to an audited version, all of it on a single machine with a used RTX 3090.
It is, the README cheerfully admits, a “homelab” in name only.
Why it’s professional grade (and not a pile of containers)
Here’s the distinction that matters, and it’s the one most DIY AI setups skip.
Anyone can docker run a stack of AI tools. What makes this enterprise-credible is everything
that’s boring until it isn’t:
- Identity from day one. Almost every UI sits behind Keycloak SSO — native OIDC where the app supports it, an oauth2-proxy forward-auth gate where it doesn’t. Nothing sensitive is reachable without logging in.
- Secrets done right. No credentials in the compose files. They live in Infisical and flow into
a generated
.envat deploy time. (More on this in a moment — it’s the part I got publicly reminded to take seriously.) - Observability built in, not bolted on. You can’t operate what you can’t see. Metrics, logs, uptime, and LLM traces — so when a model call misbehaves, I can actually watch what happened.
- Pinned and patched. Every image is pinned to a stable, CVE-audited version. No mutable
:latestroulette. - Backups that survive the building burning down. Nightly, online-consistent snapshots of every datastore, plus an encrypted off-host replica.
None of this is glamorous. All of it is the difference between a demo and a system you can depend on. I’ve spent 30 years building things that have to last — survive real users, real load, and the 2 a.m. failure nobody’s awake to explain. Those instincts don’t switch off just because the new toy is an LLM. If anything, AI systems need them more, because they fail in stranger ways.
And yes — it replaces a cloud bill. No surprise invoices, no data leaving the building, full control of the stack. There are absolutely workloads where the cloud wins. But for a platform you want to learn on, iterate fast on, and own outright? Hardware you control is a very different value proposition than renting it by the second.
Every model call goes through one door
Here’s a piece I want to call out specifically, because it’s exactly where “professional grade” and “AI” meet.
Every LLM call in the platform — from every app, to every model, local or cloud — goes through a single gateway (LiteLLM today, with a planned move to Kong). That one door buys three things you simply don’t get when each app talks to models directly:
- Scoped access, per application. Apollo, the Discord bot, the knowledge-base flows — each gets its own virtual key, scoped to exactly the models it’s allowed to touch. No app holds blanket access to everything. Spin up a new project, issue it a scoped key; need to cut it off, revoke in one place. Try to call a model you’re not scoped for, and the gateway says no.
- Full visibility, automatically. Every call is traced to Langfuse: the prompt, the response, token counts, latency, cost, and errors. When something misbehaves I’m not guessing — I’m reading the trace. When I want to know what we’re spending and where, it’s right there, per app and per model.
- Governance as a property of the system, not a policy doc. Access control, observability, and cost tracking live at the gateway, so they apply to everything by default and can’t be quietly skipped by the next service someone wires in.
This is precisely the control enterprises require and DIY AI setups skip. It’s also the spine of the gateway-and-security work we’ll go deep on later: the LiteLLM-to-Kong migration, hosting MCP servers with real authentication, and the LLM threat model nobody ships with.
The payoff: a foundation breeds projects
This is the part I’m most excited about, and the reason we’re building in public.
Because the platform exists, a portfolio of genuinely useful projects came together fast:
Knowledge bases over our own work. A manifest-driven library ingests our repositories and docs into vector knowledge bases, which we chat with in Open WebUI and surface through a Discord bot. Onboarding, “how did we do X,” institutional memory that doesn’t walk out the door — all of it becomes a conversation instead of an archaeology dig.
A Discord curator bot. The platform feeds a bot that gives our knowledge bases visibility right where the team already lives. (It’s its own open repo, built on this foundation.)
Apollo — a personal AI voice assistant. My own interactive voice bot, with the model, the knowledge bases, memory, and tools baked in — riding on the same gateway, the same embeddings, the same identity layer. It isn’t a separate stack. It’s an application of this one.
See the pattern? None of these required standing up infrastructure. They required using it. The foundation turned each one from a multi-week project into a focused build. That compounding is the entire reason to invest in the boring layers first.
It’s open source — here’s how to actually run it
The whole thing is public: github.com/Bleenq-Technology/ai-homelab, MIT-licensed.
And not as a code dump. We wrote the documentation we wish had existed when we started — a full DEPLOY guide for standing it up on a fresh host (including the genuinely tricky bits, like bootstrapping a secret store that lives inside the stack it secures), a “change this for your environment” checklist, and DNS/TLS guidance that doesn’t assume you have our exact network gear.
A confession on that front, because building in public means being honest: while preparing the repo, a secret-scanner flagged an old credential in our git history. We’d already rotated it, and we caught it before release — but it was a useful, humbling reminder that “we’ll clean it up later” is not a security strategy. We scrubbed the history, layered multiple scanners, and wired secret scanning into CI. The lesson made it into the docs. That’s the kind of thing open-sourcing forces you to get right, and it’s better for it.
One more thing I love about this project: my son Jacob and I built it together. His name is on the commits next to mine. There’s something right about a foundation like this being a father-and-son effort — and about putting it in the open so other people can start where we finished instead of where we started.
How you can emulate this
You do not need a data center, a research budget, or permission.
- Steal the four-stack model. Data, edge/identity, monitoring, AI tooling. Decide what goes where before you start gluing containers together. The structure is most of the value.
- Do the boring layers first. TLS, SSO, secrets, monitoring. They feel like overhead on day one and like oxygen on day thirty. They make every later project faster, because you’re never re-solving them.
- Then build the fun stuff on top. Once the foundation is real, your projects are applications, not expeditions.
- Or — start from ours. That’s literally why it’s public. Clone it, change the domain, bring your own secrets, and skip the part where you fight infrastructure for a month.
Hardware is cheap (used GPUs are a gift from the crypto crash). The knowledge is learnable. The foundation, once built, pays you back every single time you build the next thing.
What’s next
This is the first in a series. We’ll go deep on the pieces that matter — the data layer, the identity-and-secrets story, observability for AI specifically, and the gateway evolution we’re planning (moving from LiteLLM toward Kong, hosting MCP servers with proper auth, and the LLM-security threat model nobody ships with). And we’ll show the projects the platform enabled, in detail — the knowledge-base chatbots and Apollo among them.
If you’re building AI systems, or thinking about it: don’t start with the model. Start with the system. The model is the easy part. The platform around it is where the leverage lives — and now there’s an open one you can build on.