Hear every language.
In real time.

Voxis listens to whatever your computer plays — a video, a game, a meeting — and speaks it back in your language, instantly. No subtitles to read. No drivers to install. Just understanding.

Download for Windows Sign in

Direct Hardware Loopback

Zero drivers.
Direct audio capture.

No VB-CABLE, no virtual sound cards. Voxis captures the system mix directly at the WASAPI layer and ducks other processes at the source, preventing feedback loops.

Open-Core Sovereignty

Two paths.
Your keys, or our cloud.

Voxis ships as a self-hosted engine on GitHub and a managed SaaS build. Use your own keys locally for total privacy, or let our high-speed Go backend route translation instantly.

Compare Tiers

Voxis Engine LIVE

JA→EN

この戦略は完璧に機能している。

This strategy is working perfectly.

DE→EN

Lass uns zum nächsten Punkt übergehen.

Let's move on to the next point.

WASAPI Loopback Path

System Mix

Voxis Core

Headphones

Psychoacoustic Ducking Engine

Real-time attenuation of source background audio.

Active

Sovereignty Matrix

gemini.key = Fernet.encrypt(user_key)

On-Device Vault

go-auth.session.active = true

Authenticated

Speech detection (Silero VAD) runs locally on-device; audio is translated by Google Gemini Live and is never stored.

Scroll to Explore

Architecture

Choose Your Execution Tier

Voxis offers a true open-core framework. Run client-side with complete hardware sovereignty, or leverage our high-speed global managed auth grid.

Open-Core

GitHub / BYOK

Take complete control of your translation setup. Bring your own Gemini API key, run the engine locally, and inspect the open-core pipeline code directly on GitHub.

Your Gemini key is encrypted on-device
Zero Infrastructure Costs (Pay LLM provider directly)
Fully auditable open-source engine

Free Open-Source License View Repository

SaaS Managed

Official Release Build

The premium pre-compiled experience. Get an optimized native `.exe` container with lightning-fast WASAPI capture. Authenticate instantly via our high-performance Go auth-core backend, tracking sessions and managed minute quotas over PocketBase.

Go Auth-Core Backend

JWT token signing, session rate-limiting, and zero-trust encryption.

Cloud Minute Quotas

High-speed PocketBase cloud store tracking quotas in real-time.

v1.0.15 (Pre-compiled installer) Download .exe Installer

One engine. Every conversation.

English Türkçe 日本語 Español Deutsch Français 中文 한국어 Português Italiano Русский العربية English Türkçe 日本語 Español Deutsch Français 中文 한국어 Português Italiano Русский العربية English Türkçe 日本語 Español Deutsch Français 中文 한국어 Português Italiano Русский العربية

The barrier

The best ideas are
locked behind language.

A breakthrough talk in Japanese. A teammate in Berlin. A streamer in São Paulo. Today you pause, copy, paste into a translator, lose the moment — and read instead of listen.

Subtitles break the flow

Reading text while watching means you miss the faces, the timing, the room. Translation should reach your ears, not steal your eyes.

Live moments don't wait

Meetings and streams move in real time. Copy-paste tools are built for documents — not for a conversation already three sentences ahead.

Setup gets in the way

Virtual cables, routing rules, fragile audio chains. Most tools ask you to become an audio engineer before you hear a single word.

The journey

From sound wave to understanding

Voxis sits quietly beside your system audio and turns it into a translated voice — four steps, all running live on the stream.

1 Capture

System audio, driverless

Process-exclude WASAPI loopback grabs the exact mix you hear — and excludes Voxis itself, so it never translates its own voice.

2 Detect

Real speech only

On-device Silero VAD isolates speech from noise and music, while the original audio gently ducks so the translation can lead.

3 Translate

Gemini Live, streaming

A live speech-to-speech session translates as the words arrive — no waiting for full sentences, no copy-paste round trips.

4 Speak

Natural voice, aligned

A 24 kHz voice plays back into your headphones — or a virtual mic for calls. Since it's synced with the room, it never feels laggy.

Two ways to listen

One app, built for both
watching and talking.

JA → TR

この戦略は完璧に機能している。

This strategy is working perfectly.

TR → EN

Lütfen devam edin.

Please go ahead.

WASAPI Loopback Capture

Incoming Audio

Video & Game

One-way translation of anything that plays — streams, films, lectures, live gameplay. The original ducks, the translation leads.

Captures system audio with zero routing
Smart ducking keeps music in the background
Optional on-screen subtitles & live transcript

Ducking Depth 98%

Playback Sync RTT-aligned

Under the hood

Engineered for the moment

Every detail tuned so translation feels like part of the audio — not a layer bolted on top.

Driverless by design

No VB-CABLE, no virtual sound card, no routing diagrams. Voxis captures the system mix directly and ducks other apps at the source. Install, sign in, listen — that's the whole setup.

Studio-grade ducking

A psychoacoustic ducker carves space for the translated voice while preserving the original's music and ambience.

Voice you choose

Pick a natural Gemini voice for the translated audio.

Live transcript

Every translation streams into a transcript and optional overlay — save it to a file when you're done.

Latency-aware

An RTT estimator keeps the translated voice aligned with the original so dialogue never drifts out of sync.

Quality presets & profiles

Switch between presets tuned for clarity, speed or fidelity — and save your favourite setup as a profile you can recall in one click.

Keys stay yours

Bring your own key, encrypted on-device — or run on the managed SaaS key. Your choice, every session.

0+ Languages, in and out

0kHz Studio-grade translated voice

0 Modes — watch & talk

Zero Virtual drivers to install

Open core · private by default

Yours to trust,
yours to inspect.

Voxis ships as an official SaaS app and as an open-source build on GitHub. The audio engine is the same — and you can read exactly what it does.

Open-source engine

Inspect, fork and self-host the desktop engine from the public GitHub build.
Keys encrypted on-device

Bring-your-own keys are sealed with Fernet, bound to your machine and account — useless if copied elsewhere.
Speech detection runs locally

Silero VAD decides what's speech on your machine, before anything is sent for translation.
Transcripts stay with you

Saved transcripts are written to your own disk — never to a cloud you don't control.

routing.py

# Same Live session lifecycle — two key sources
if IS_OFFICIAL_RELEASE:
    key = server.session_key()      # SaaS: per-session
else:
    key = byok.load(user_id)        # BYOK: local only

# Fernet, bound to MachineGuid + user_id
fkey = sha256(
    machine_guid, user_id, "voxis-byok-v1"
)

session = LiveTranslator(
    model = "gemini-3.5-live-translate-preview",
    target = cfg.target_language_incoming,
    sample_rate_out = 24000,
)
session.stream()   # quota enforced server-side

Pricing

Start free. Scale when you do.

Every plan unlocks the full engine — both modes, every language. You only choose how many minutes you need.

Developer (BYOK)

For engineers compiling from source. Run locally with your own Gemini API key — your key stays on your device.

$0 / Lifetime

Lifetime Access

Access to GitHub Repository
BYOK Integration & Local Processing
Community Support & Source Transparency

Fork on GitHub

Creator

Pre-compiled production .exe artifact with {min} managed minutes per month. Zero configuration, no API keys required.

$19 /mo

{min} managed minutes / month

Pre-compiled .exe Artifact
{min} Managed Minutes / month
No API keys required

Pro

For agencies and power users. {min} managed minutes per month with priority DSP pipeline routing and commercial usage licensing.

$39 /mo

{min} managed minutes / month

{min} Managed Minutes / month
Priority DSP pipeline routing
Commercial usage licensing

Enterprise

For teams and organisations.

Custom

Unlimited minutes

Everything in Pro
Self-host & BYOK at scale
Dedicated support & SLA

Contact sales

Prices are monthly subscriptions billed via Stripe; cancel anytime from your account. New accounts include 15 free minutes — or bring your own key to translate on your own quota.

Stop reading.
Start understanding.

Download Voxis and turn any sound your computer makes into your own language — live.

Download for Windows Sign in

Windows 10 & 11 · Free to start · No credit card

FAQ

Common questions

Last updated: June 2026

What is Voxis?

Voxis is a real-time voice-translation app for Windows that translates the audio your computer plays — videos, games, calls and meetings — into your language as you listen, and speaks it back in a natural voice. It works as live, simultaneous interpretation: it captures system audio directly (driverless — no virtual audio cables), detects speech on-device, and is powered by Gemini Live.

Is Voxis real-time or simultaneous interpretation?

Both. Voxis performs real-time interpretation of whatever your PC plays — it listens, translates, and speaks the result back with only a short delay, so you follow a video, stream or meeting live instead of reading subtitles after the fact.

Does Voxis need virtual audio cables or a meeting bot?

No. Voxis captures Windows system audio directly through WASAPI loopback — no virtual audio cables (VB-CABLE), no driver setup, and no bot that joins your Zoom, Teams or Google Meet call. Most live-translation tools rely on one or the other; Voxis runs locally alongside your audio.

How is it different from subtitles?

Subtitle tools show captions you have to read, which pulls your eyes off the screen. Voxis speaks the translation back in a natural voice with psychoacoustic ducking and latency sync, so you can keep watching, playing or talking while you listen in your own language.

What can Voxis translate?

Anything your computer plays: foreign videos and news, game audio, online courses, podcasts, and Zoom, Teams or Discord calls. A two-way meeting mode translates both sides of a conversation in real time.

Is Voxis private, and how much does it cost?

Speech detection runs on-device. You can bring your own Gemini API key (Developer / BYOK, free) to translate entirely on your own quota, or use managed cloud minutes — Creator ($19/mo) and Pro ($39/mo). New accounts include 15 free minutes.

Hear every language.In real time.

Zero drivers.Direct audio capture.

Two paths.Your keys, or our cloud.