Description
NPCs that talk back — in any voice you want. Clone any voice from a WAV sample, record your own lines, or use built-in TTS voices. Drop audio files in a folder and your NPCs start speaking with them immediately. Players talk to NPCs through proximity voice chat and get real spoken responses — full speech-to-text, LLM conversation, and text-to-speech pipeline injected through Rust's native voice system. No mods, no client-side anything.
Want your NPCs to sound like a grizzled drill sergeant? Record a sample clip, drop it in, done. Want them to sound like your server's most notorious player? Grab a recording of their voice and now every scientist on the map talks like them. The plugin auto-discovers WAV and MP3 files, pre-encodes them to Opus, and serves them with zero latency. Other players nearby hear the NPC talking in proximity voice chat, just like a real player.
Why This Is Different
- Real voice conversation — not chat messages or pop-up text. NPCs speak through Rust's proximity voice system. Other players nearby hear it too.
- Full pipeline — STT captures player speech, LLM generates a response, TTS speaks it back. All automatic.
- Custom voice library — drop your own WAV files in a folder and NPCs use them. Record your own voice lines, rip audio from movies, clone voices externally — anything goes.
- Per-NPC personalities — each NPC archetype has its own personality, knowledge, voice, and behavior triggers
- $2-3/month — recommended stack (Groq STT + Groq LLM + Voxtral TTS) costs next to nothing for a typical server.
Audio Library
Drop WAV or MP3 files into data/TommygunNPCVoices/audio/ and NPCs start using them. The plugin auto-discovers files, pre-encodes to Opus, and caches everything in memory for instant playback. Use this alongside the live TTS pipeline — pre-recorded files for barks and combat callouts, TTS for dynamic conversation.
How to build your NPC voice library
- Record yourself — grab a mic, record aggro barks, death screams, taunts. Drop the WAVs in. Your NPCs now sound like you.
- Use any voice tool — ElevenLabs, Bark, Piper, whatever. Generate lines externally, save as WAV, drop them in.
- Rip from anything — movie clips, game audio, sound packs. If it's a WAV or MP3, it works.
- Per-player voice capture — record a player's voice on your server, use those clips as NPC voice lines. Your regulars become the NPCs.
Organize by event
Structure your audio folder by trigger type and files auto-map to events:
audio/ combat/aggro_1.wav combat/aggro_2.wav combat/aggro_3.wav death/last_words_1.wav death/last_words_2.wav taunt/kill_taunt_1.wav ambient/creepy_hum.wav trader/greeting_1.wav
Multiple files per trigger = random selection. Five aggro barks means each fight sounds different.
LLM-powered audio selection
The LLM knows your entire audio library. When responding to a player, it can embed [PLAY:filename] tags to mix pre-recorded audio with live TTS:
[PLAY:evil_laugh.wav] You think you can raid MY base?
The laugh plays instantly from cache (zero latency), then the text is spoken via TTS. Lower cost, more expressive, faster than pure TTS.
Vocal sound tags
The LLM can inject natural sounds mid-sentence: [cough], [laugh], [sigh] — making NPC speech feel alive, not robotic.
Admin audio tools
| Command | Description |
|---|---|
/voice list [category] |
List loaded audio files, optionally filtered by category |
/voice play <file> [npc_id] |
Test playback from nearest NPC or a specific NPC |
/voice reload |
Rescan audio folder and regenerate manifest |
/voice info <file> |
Show metadata: duration, format, assigned NPCs, play count |
/voice unassigned |
List files loaded but not assigned to any NPC or event |
/voice describe-all |
Auto-generate descriptions for all audio files using LLM |
The Voice Pipeline
- Player speaks — proximity voice is captured and buffered with silence detection
- Speech-to-text — audio sent to Groq Whisper (or Deepgram, Voxtral) for transcription
- LLM thinks — transcript + conversation history + NPC personality + situational context sent to Groq LLM
- Text-to-speech — response synthesized by Voxtral (or Groq Orpheus, OpenAI, Deepgram)
- Voice injection — audio encoded to Opus and broadcast through Rust's native voice system from the NPC's position
NPC Archetypes
- Per-NPC personality — custom system prompt with personality traits and domain knowledge
- Per-NPC voice — assign different TTS voices to different NPC types
- Conversation memory — NPCs remember what you said (configurable history length). Admin-injected memories persist across restarts.
- Initiation — NPCs can speak first when a player walks into range. Configurable phrases or LLM-generated greetings.
- Fourth wall toggle — NPCs can be aware they're in a game, or not
- Situational awareness — LLM context includes NPC health, combat state, time of day, weather, player gear, nearby threats, squad count, corpses nearby
Activation modes
Control how NPCs respond. Enable multiple modes per archetype:
| Mode | Behavior |
|---|---|
proximity |
NPC initiates when player enters listen range |
proximity_facing |
Proximity + player must be facing the NPC |
voice |
NPC listens for player voice chat (full AI conversation) |
chat |
NPC listens for nearby text chat (fallback for no-mic players) |
addressed |
NPC only responds when its name is spoken/typed |
line_of_sight |
NPC initiates when it has LOS to a player in range |
13 Voice Triggers
NPCs react to in-game events with voice lines — pre-recorded audio, TTS phrases, or live LLM responses. Each trigger has its own chance roll, cooldown, and audio pool:
| Trigger | When | Default Chance |
|---|---|---|
OnSpawn |
NPC spawns near a player | 30% |
OnPlayerSpotted |
First detection of a player | 50% |
OnAggro |
Enters combat | 70% |
OnHit |
Takes damage | 30% |
OnInjured |
Health drops below 50% | 80% |
OnKill |
Kills a player | 90% |
OnDeath |
Dies (last words) | 100% |
OnAllyKilled |
Squadmate dies nearby | 60% |
OnAllyInjured |
Squadmate injured nearby | 40% |
OnCombatEnd |
Fight ends | 50% |
OnPlayerFled |
Player leaves range | 40% |
OnPlayerLooting |
Player loots a corpse nearby | 60% |
OnReload |
NPC reloads weapon | 30% |
TTS Providers
| Provider | Notes |
|---|---|
| Voxtral | Primary. Mistral's voice model — high quality, voice cloning support, also handles STT. |
| Groq (Orpheus) | Fast, 6 voice options (autumn, diana, hannah, austin, daniel, troy). Included with Groq API key. |
| OpenAI | 9 voice options across tts-1 and tts-1-hd models |
| Deepgram (Aura) | Low latency streaming |
Rate Limiting
- Per-NPC cooldown — prevents spam from a single NPC
- Nearby radius cooldown — stops 10 NPCs all talking at once in crowded areas
- Per-trigger cooldown — each event type has its own timer
- Chance rolls — triggers fire probabilistically, not every time
- Per-player rate limit — configurable API request cap (default 10 req/60 sec per player)
- Busy gate — NPC waits for current response to finish before accepting new input
Admin Controls
- In-game settings panel — full CUI with tabs for general, providers, archetypes, and status
- Live archetype editing — create and configure NPC personalities without restarting
- Conversation monitor — real-time feed of all NPC conversations with timing breakdown
- NPC browser — list all tracked NPCs, view conversation history, teleport-to-NPC
- Debug tools — voice packet capture, replay, WAV export, sine wave test, STT test mode
Commands
| Command | Description |
|---|---|
/npcvoices |
Open the settings panel |
/npcmemory <npc> <text> |
Inject a persistent memory into an NPC |
tts.say <npc_id> <text> |
(Console) Make a specific NPC speak |
tts.say.all <archetype> <text> |
(Console) All NPCs of an archetype speak simultaneously |
tts.memory <npc_id> <text> |
(Console) Inject a memory |
tts.memory.clear <npc_id|all> |
(Console) Clear injected memories |
tts.memory.list <npc_id> |
(Console) Show all memories for an NPC |
Compatibility
Works with any NPC that extends BasePlayer — vanilla or modded. Archetypes are matched by NPC display name, so if the name matches your config, the NPC gets a voice.
- All vanilla scientists — military tunnel, oil rig, junkpile roamers, excavator, arctic base, etc.
- HumanNPC — custom NPCs created with HumanNPC get full voice support
- BotReSpawn / BetterNPC — spawned scientists and custom bots
- Raidable Bases — NPC defenders
- Any plugin that spawns named NPCs — if the NPC has a display name that matches an archetype, it works
Requirements
-
Concentus.dll — managed Opus codec, download here and place in
RustDedicated_Data/Managed/ - API keys — recommended: Groq (STT + LLM) + Mistral (Voxtral TTS). ~$2-3/month for a typical server.
Permissions
| Permission | Default | Description |
|---|---|---|
tommygunvoices.admin |
Manual grant | Access settings panel and all admin commands |
Plugin API (for developers)
Other plugins can use the full voice pipeline without building it:
| Method | Description |
|---|---|
API_PlayAudioOnNPC(BasePlayer npc, string path) |
Play cached audio through an NPC |
API_PlayAudioAtPosition(float x, y, z, string path) |
Play cached audio at a world position |
API_PlayTTSOnNPC(BasePlayer npc, string text, string voice) |
Generate TTS and play through an NPC |
API_Say(BaseEntity npc, string text) |
Make an NPC speak a line |
API_SayAll(string archetype, string text) |
All NPCs of an archetype speak simultaneously |
API_InjectMemory(BaseEntity npc, string memory) |
Inject a persistent memory into an NPC |
API_ClearMemories(BaseEntity npc) |
Clear all injected memories |
API_GetMemories(BaseEntity npc) |
Get all memories for an NPC |
API_StopNPC(BasePlayer npc) |
Stop active playback on an NPC |
API_GetAudioFiles() |
List all cached audio files |
API_GetAudioDuration(string path) |
Get duration of a cached audio file |
Hook:
| Hook | Description |
|---|---|
OnNPCHeardPlayer(BaseEntity npc, BasePlayer player, string transcript) |
Fires when NPC transcribes player speech. Build your own conversation logic on top. |
Support
- For all questions and feedback, dm me (1928tommygun) on my discord at discord.fragmod.com
License
TOMMYGUN'S EULA – BY USING THIS PLUGIN YOU AGREE TO THE FOLLOWING!
- Code contained in this file is not licensed to be copied, shared, resold, or modified in any way.
- You may copy the plugin freely to each server instance that your organization owns.
- Do not share this plugin with other server organizations — they must purchase their own licenses.
Hello
what does this groq, mistral stuff do?
is there any chanc – maybe in the future – to make this work without it?
I love the idea. But my server is pretty small and upto now got no really community to do somethin where additional monthly payments are required. that hurts me