v0.2 · korean liveall-local voiceself-hostedMIT

Pick up the phone.
Talk to your homelab.

Papercup is a Discord voice bot that calls a Claude Code session running on your own box. Press /pickup, talk like it's a phone call, get spoken answers. No cloud STT/TTS, no audio leaves your network.

$install Papercup on your homelab

bash <(curl -fsSL https://raw.githubusercontent.com/powder-nomad/papercup/main/install.sh)

Need different engines / agent backend? Use the wizard ↓

# voice-with-papercup Voice connected

you

Papercup

VAD Whisper Claude Code Kokoro all running on your homelab

What makes this different

Three things, all load-bearing.

01All-local voice stack

Silero VAD → faster-whisper STT → Kokoro/MeloTTS TTS, all running in Python sidecars on your hardware. Audio never leaves your LAN.

~3–8s loop on 4 cores

02Phone-call UX

Speak, pause, get a spoken reply. Hang up and resume by name later. Multilingual (English, Korean, JP, ZH, ES, FR, …) auto-routed per utterance.

9 languages today

03Subagents do real work

The speaker delegates to sandboxed background Claude Code instances via an embedded MCP server. You hang up; they keep coding.

spawn → check → list

Three ways to install

Same core, different distribution shape. Pick whichever fits your setup.

recommended

One-liner

Paste, answer three Discord-token questions, done. Engine selection via flags or the wizard below.

As a plugin

Drop into ~/.claude/plugins. Drives setup via /papercup:setup / start / status slash commands.

OpenClaw plugin

Adds Papercup's voice stack to OpenClaw's Discord channel adapter as a SpeechProviderPlugin.

Configure your one-liner

Your one-liner

bash <(curl -fsSL https://raw.githubusercontent.com/powder-nomad/papercup/main/install.sh)

Paste into your homelab terminal. Discord token / client ID / guild ID will be prompted interactively. Re-run with different flags any time to reconfigure.

How a call flows

┌─ Discord (phone / desktop) ─┐         ┌─────────── Homelab ───────────┐
│                             │  voice  │                               │
│  /pickup → speak → /hangup  │ ──────► │  Silero VAD → Whisper STT     │
│                             │         │       ↓                       │
│  bot speaks back            │ ◄────── │  Speaker agent (Haiku)        │
│                             │ Kokoro  │       ↓                       │
└─────────────────────────────┘         │  Kokoro / MeloTTS → audio     │
                                        │                               │
                                        │  spawn_extension(task) ───►   │
                                        │       Claude Code subagent    │
                                        │       in sandboxed dir        │
                                        └───────────────────────────────┘

The speaker handles the call directly. For anything bigger than a quick file read, it spawns a background extension — a full Claude Code instance in its own dir — and narrates progress while it works. You can hang up; resume the session by name later (/resume name:foo).

System requirements

Tested on a 4-core Linux homelab. macOS works for the base path; the MeloTTS (Korean) path is Linux-tested only.

	Minimum	Recommended
OS	Linux x86_64, macOS (Intel or Apple Silicon)	Ubuntu 22.04+
Python	3.10	3.12
Node	20	20+
Disk (English-only, Kokoro)	2 GB free	4 GB free
Disk (with Korean / MeloTTS)	4 GB free	8 GB free
RAM	2 GB free	4 GB free
CPU	2 cores	4+ cores (real-time STT/TTS)
Network	Outbound HTTPS for model downloads	—

apt (Linux)

# Base install (Kokoro TTS only)
sudo apt-get install -y espeak-ng python3-venv

# + Korean / MeloTTS path
sudo apt-get install -y libmecab-dev mecab-ipadic-utf8 libssl-dev pkg-config

brew (macOS)

brew install espeak-ng node python@3.12
# Korean path also needs:
brew install mecab mecab-ipadic openssl pkg-config

What you also need running

Discord bot with token, client ID, and a guild ID. Get from Discord Developer Portal → New Application → Bot. Enable MESSAGE CONTENT INTENT and VOICE STATE under "Privileged Gateway Intents".
One agent backend: a logged-in claude CLI (Claude Code), codex CLI (ChatGPT), or an Anthropic API key. The wizard above lets you pick.

What gets downloaded on first run

Whisper model (~140 MB for base)
Kokoro voice + voices.bin (~100 MB)
Silero VAD (~1.7 MB)
If Korean enabled: PyTorch CPU (~700 MB) at install time; Korean BERT (~440 MB) lazy-loaded on first Korean utterance

Capabilities matrix

Component	Today	Notes
VAD	Silero	Only option
STT	Whisper	`small` (multilingual, default) auto-detects 99 languages; `base` / `base.en` / `small.en` available
TTS	Kokoro + MeloTTS + XTTS-v2 (`auto`)	Kokoro: en/ja/zh/es/fr/hi/it/pt. Korean → MeloTTS (light, monotone) or XTTS-v2 (~58 voices, voice cloning). Set via `TTS_KO_ENGINE`
Agent	10 backends (7 CLI agents + 3 HTTP APIs)	claude-code · codex · aider · gemini-cli · opencode · crush · amp · anthropic-api · openai-compat · gemini-api. Switch via `/backend` at runtime.
Per-session config	model · effort · permissions · backend · streaming · reactivity · notify · mode	Set via `/pickup` flags or hot-swap mid-session via individual slash commands
Modes	Voice (phone-call prompt) + Text (vibecoding)	`/pickup mode:voice` or `mode:text`. Text mode drops the system prompt → normal Claude Code behavior
Reasoning effort	minimal · low · medium · high · xhigh · max	xhigh / max are Opus-only
Live progress	sticky message, optional event log	Text mode + `/streaming summary\|full`. Anti-bomb: edit-throttled, auto-skips short turns
Budget tracking	per-day USD + tokens, daily cap	`BOT_DAILY_BUDGET_USD` or `/budget set_usd:<n>`; live on bot's rich-presence
Process hygiene	detached spawn, group-kill cancel, boot-time reaper	Each agent turn tracked in `data/process-registry.json`; orphans cleaned up on restart
Multi-bot	loop cap, reactivity modes, in-band roster	Multiple operators can co-host bots in one channel; cap prevents bot-to-bot loops
Transport	Discord voice + text	Bind a single channel via `/bind`, or @-mention anywhere

See Slash commands for the runtime surface and Components for the deep dive.

Pick up the phone.Talk to your homelab.

What makes this different

01All-local voice stack

02Phone-call UX

03Subagents do real work

Three ways to install

One-liner

As a plugin

OpenClaw plugin

Configure your one-liner ​

How a call flows ​

System requirements ​

apt (Linux) ​

brew (macOS) ​

What you also need running ​

What gets downloaded on first run ​

Capabilities matrix ​