Self-Hosted Qwen3 LLM Stack


How do I get this on my iPhone?

Use LLMConnect on your iPhone over Tailscale. There is no separate “company app” — you install Tailscale (our secure network), then the LLMConnect app, and point it at our private endpoint.

  1. Install Tailscale on your iPhone (App Store). Open it and sign in with your work Apple ID or Google account.
  2. Accept the Tailnet invitation your admin sent you (email or link). You must be on the Tailnet before the chat app can reach our servers.
  3. Install LLMConnect (App Store).
  4. Add our endpoint in LLMConnect:
    • In LLMConnect, add a custom / Ollama endpoint.
    • URL: your admin will give you this (e.g. http://ai-training.your-tailnet:11434 — they’ll send the exact hostname).
    • Model: qwen3:8b (or the model name your admin specifies).
    • Save. You can now chat through LLMConnect while you’re on Tailscale (iPhone will use the VPN automatically when the app needs it).

Who sends the Tailnet invite and the URL? Your IT or the person who runs this platform. Full quick start: ops/onboarding.md.

Prefer a browser? You can also use Safari and open the Open WebUI URL your admin gave you (over Tailscale), and add it to your Home Screen.


Executive summary

This document describes the architecture and operation of a private, self-hosted Large Language Model (LLM) platform built on Qwen3 8B. The system is designed for internal Management and staff use, prioritizing privacy, reproducibility, and operational clarity.

Key characteristics:

The documentation is split into two major areas:


Goals & design principles


Documentation map


Repository structure

llm-platform/
├── README.md                # High-level overview (this document)
├── app/                     # Next.js docs app (Vercel deployment)
├── package.json             # Node deps for docs site
├── vercel.json              # Vercel config
├── arch/
│   ├── architecture.mmd     # High-level system diagram
│   ├── diagrams.md          # Mermaid diagrams
│   └── capacity-planning.md # Users, tokens/sec, GPU limits
│
├── infra/
│   ├── proxmox.md           # GPU passthrough & VM layout
│   ├── docker-compose.yml   # Ollama + Open WebUI
│   └── backups.md           # Backup targets and procedures
│
├── ml/
│   ├── training.md          # QLoRA fine-tuning workflow
│   ├── ocr.md               # Marker OCR parallelization
│   ├── requirements.txt     # Python dependencies
│   └── scripts/
│       ├── run_ocr.sh       # Parallel OCR runner
│       └── train_qlora.py   # QLoRA training script
│
├── ops/
│   ├── onboarding.md        # Management & staff access
│   ├── runbooks.md          # Failure scenarios and recovery
│   ├── security.md          # VPN, ACLs, isolation guarantees
│   └── changelog.md         # Operational change history
│
└── .gitignore

Deploy docs to Vercel

The repo includes a Next.js docs site (static export) so you can host the documentation on Vercel.

  1. Install and build locally (optional):

    npm install
    npm run build
    

    Static output is in out/.

  2. Deploy to Vercel:

    • Push the repo to GitHub/GitLab/Bitbucket, or use the Vercel CLI.
    • In Vercel, import the project. Vercel will detect Next.js and use the correct build.
    • Deploy. The docs will be served at your project URL (e.g. https://your-project.vercel.app).
  3. Local dev: npm run dev — then open http://localhost:3000.

Note: Only the documentation is deployed to Vercel. The LLM stack (Ollama, Open WebUI, training, OCR) runs on your own infrastructure (see infra/ and ops/).


Future improvements


Tone & audience

This repository is written to be:

Technical depth increases progressively by directory.

All docs