Hear every language.
In real time.
Voxis listens to whatever your computer plays — a video, a game, a meeting — and speaks it back in your language, instantly. No subtitles to read. No drivers to install. Just understanding.
Zero drivers.
Direct audio capture.
No VB-CABLE, no virtual sound cards. Voxis captures the system mix directly at the WASAPI layer and ducks other processes at the source, preventing feedback loops.
Two paths.
Your keys, or our cloud.
Voxis ships as a self-hosted engine on GitHub and a managed SaaS build. Use your own keys locally for total privacy, or let our high-speed Go backend route translation instantly.
Choose Your Execution Tier
Voxis offers a true open-core framework. Run client-side with complete hardware sovereignty, or leverage our high-speed global managed auth grid.
GitHub / BYOK
Take complete control of your translation setup. Bring your own Gemini API key, run the engine locally, and inspect the open-core pipeline code directly on GitHub.
- Your Gemini key is encrypted on-device
- Zero Infrastructure Costs (Pay LLM provider directly)
- Fully auditable open-source engine
Official Release Build
The premium pre-compiled experience. Get an optimized native `.exe` container with lightning-fast WASAPI capture. Authenticate instantly via our high-performance Go auth-core backend, tracking sessions and managed minute quotas over PocketBase.
Go Auth-Core Backend
JWT token signing, session rate-limiting, and zero-trust encryption.
Cloud Minute Quotas
High-speed PocketBase cloud store tracking quotas in real-time.
One engine. Every conversation.
The best ideas are
locked behind language.
A breakthrough talk in Japanese. A teammate in Berlin. A streamer in São Paulo. Today you pause, copy, paste into a translator, lose the moment — and read instead of listen.
Subtitles break the flow
Reading text while watching means you miss the faces, the timing, the room. Translation should reach your ears, not steal your eyes.
Live moments don't wait
Meetings and streams move in real time. Copy-paste tools are built for documents — not for a conversation already three sentences ahead.
Setup gets in the way
Virtual cables, routing rules, fragile audio chains. Most tools ask you to become an audio engineer before you hear a single word.
From sound wave to understanding
Voxis sits quietly beside your system audio and turns it into a translated voice — four steps, all running live on the stream.
System audio, driverless
Process-exclude WASAPI loopback grabs the exact mix you hear — and excludes Voxis itself, so it never translates its own voice.
Real speech only
On-device Silero VAD isolates speech from noise and music, while the original audio gently ducks so the translation can lead.
Gemini Live, streaming
A live speech-to-speech session translates as the words arrive — no waiting for full sentences, no copy-paste round trips.
Natural voice, aligned
A 24 kHz voice plays back into your headphones — or a virtual mic for calls. Since it's synced with the room, it never feels laggy.
One app, built for both
watching and talking.
Video & Game
One-way translation of anything that plays — streams, films, lectures, live gameplay. The original ducks, the translation leads.
- Captures system audio with zero routing
- Smart ducking keeps music in the background
- Optional on-screen subtitles & live transcript
Meeting
Two-way translation for Teams, Zoom and Meet. You hear them in your language; they hear you in theirs — through a virtual mic.
- Two independent Live sessions, in and out
- Your translated voice into any call app
- Works with the mic and speakers you already use
Engineered for the moment
Every detail tuned so translation feels like part of the audio — not a layer bolted on top.
Driverless by design
No VB-CABLE, no virtual sound card, no routing diagrams. Voxis captures the system mix directly and ducks other apps at the source. Install, sign in, listen — that's the whole setup.
Studio-grade ducking
A psychoacoustic ducker carves space for the translated voice while preserving the original's music and ambience.
Voice you choose
Pick a natural Gemini voice for the translated audio.
Live transcript
Every translation streams into a transcript and optional overlay — save it to a file when you're done.
Latency-aware
An RTT estimator keeps the translated voice aligned with the original so dialogue never drifts out of sync.
Quality presets & profiles
Switch between presets tuned for clarity, speed or fidelity — and save your favourite setup as a profile you can recall in one click.
Keys stay yours
Bring your own key, encrypted on-device — or run on the managed SaaS key. Your choice, every session.
Yours to trust,
yours to inspect.
Voxis ships as an official SaaS app and as an open-source build on GitHub. The audio engine is the same — and you can read exactly what it does.
-
Open-source engine
Inspect, fork and self-host the desktop engine from the public GitHub build.
-
Keys encrypted on-device
Bring-your-own keys are sealed with Fernet, bound to your machine and account — useless if copied elsewhere.
-
Speech detection runs locally
Silero VAD decides what's speech on your machine, before anything is sent for translation.
-
Transcripts stay with you
Saved transcripts are written to your own disk — never to a cloud you don't control.
# Same Live session lifecycle — two key sources
if IS_OFFICIAL_RELEASE:
key = server.session_key() # SaaS: per-session
else:
key = byok.load(user_id) # BYOK: local only
# Fernet, bound to MachineGuid + user_id
fkey = sha256(
machine_guid, user_id, "voxis-byok-v1"
)
session = LiveTranslator(
model = "gemini-3.5-live-translate-preview",
target = cfg.target_language_incoming,
sample_rate_out = 24000,
)
session.stream() # quota enforced server-side
Start free. Scale when you do.
Every plan unlocks the full engine — both modes, every language. You only choose how many minutes you need.
Developer (BYOK)
For engineers compiling from source. Run locally with your own Gemini API key — your key stays on your device.
- Access to GitHub Repository
- BYOK Integration & Local Processing
- Community Support & Source Transparency
Creator
Pre-compiled production .exe artifact with {min} managed minutes per month. Zero configuration, no API keys required.
- Pre-compiled .exe Artifact
- {min} Managed Minutes / month
- No API keys required
Pro
For agencies and power users. {min} managed minutes per month with priority DSP pipeline routing and commercial usage licensing.
- {min} Managed Minutes / month
- Priority DSP pipeline routing
- Commercial usage licensing
Enterprise
For teams and organisations.
- Everything in Pro
- Self-host & BYOK at scale
- Dedicated support & SLA
Prices are monthly subscriptions billed via Stripe; cancel anytime from your account. New accounts include 15 free minutes — or bring your own key to translate on your own quota.
Stop reading.
Start understanding.
Download Voxis and turn any sound your computer makes into your own language — live.
Windows 10 & 11 · Free to start · No credit card
Common questions
Last updated: June 2026
What is Voxis?
Voxis is a real-time voice-translation app for Windows that translates the audio your computer plays — videos, games, calls and meetings — into your language as you listen, and speaks it back in a natural voice. It works as live, simultaneous interpretation: it captures system audio directly (driverless — no virtual audio cables), detects speech on-device, and is powered by Gemini Live.
Is Voxis real-time or simultaneous interpretation?
Both. Voxis performs real-time interpretation of whatever your PC plays — it listens, translates, and speaks the result back with only a short delay, so you follow a video, stream or meeting live instead of reading subtitles after the fact.
Does Voxis need virtual audio cables or a meeting bot?
No. Voxis captures Windows system audio directly through WASAPI loopback — no virtual audio cables (VB-CABLE), no driver setup, and no bot that joins your Zoom, Teams or Google Meet call. Most live-translation tools rely on one or the other; Voxis runs locally alongside your audio.
How is it different from subtitles?
Subtitle tools show captions you have to read, which pulls your eyes off the screen. Voxis speaks the translation back in a natural voice with psychoacoustic ducking and latency sync, so you can keep watching, playing or talking while you listen in your own language.
What can Voxis translate?
Anything your computer plays: foreign videos and news, game audio, online courses, podcasts, and Zoom, Teams or Discord calls. A two-way meeting mode translates both sides of a conversation in real time.
Is Voxis private, and how much does it cost?
Speech detection runs on-device. You can bring your own Gemini API key (Developer / BYOK, free) to translate entirely on your own quota, or use managed cloud minutes — Creator ($19/mo) and Pro ($39/mo). New accounts include 15 free minutes.