System audio · in sync, end-to-end encrypted

Your Mac plays it. Your iPad hears it.

A remote desktop without sound is a screen recording. Remio captures the system audio output from your host machine and streams it to the client in lockstep with the video — YouTube tabs, Spotify, Zoom calls, Logic Pro mixes, system alerts. Native audio capture, Opus codec at 48 kHz, end-to-end encrypted, lip-sync maintained at the frame level. AirPods on the iPad work. CarPlay works. Bluetooth headphones work. It just sounds like the computer is in the room.

How the audio path works

From speaker tap on the host to AirPods on the client

Remio's audio path is built like the video path: capture from a native OS API on the host, encode with a low-latency hardware codec, transmit on the same encrypted peer-to-peer channel as the video, decode and present on the client. There is no separate “audio mode,” no Bluetooth re-pairing across the network, no DLNA discovery. The audio is part of the session.

01
Native capture on the host

macOS Audio Tap and Windows WASAPI loopback — native, not virtual cables

On macOS, Remio uses the system Audio Tap API to capture whatever is being mixed at the output device. On Windows, the same is done through the WASAPI loopback interface. Neither approach installs a virtual audio cable, neither asks you to route apps through a software mixer, and neither blocks other audio software from working normally. The capture happens at the same point the OS hands audio to the physical speakers — whatever you would hear with the speakers turned up is what the client receives.

02
Opus at 48 kHz

Encoded with Opus — the codec Zoom, Discord, and WhatsApp trust

The captured audio is encoded with Opus at 48 kHz, stereo, with adaptive bitrate between 32 and 192 kbps depending on connection quality. Opus is the same codec the major real-time apps standardized on: high quality at low bitrates, near-zero algorithmic delay (about 22 ms end-to-end on a healthy network), and excellent recovery from packet loss. Music sounds clean, dialogue sounds clean, system alerts sound exactly as they should.

03
Same channel as video

Encrypted on the same channel — no extra hops, no extra latency

Audio frames travel inside the same DTLS-SRTP encrypted channel as the video frames, which means the same direct peer-to-peer route, the same AES-256-GCM encryption, the same end-to-end guarantee that nobody in the middle — not Remio, not Cloudflare — can decrypt what's being said or played. The relay sees opaque packets and forwards them.

04
Frame-synced presentation

Aligned to the video frame — lip-sync that holds for hours

Each audio and video packet carries a capture timestamp from the host's clock. The client decodes both streams and presents them at the same wall-clock time — lip-sync stays aligned to within one frame, and there is no drift over a multi-hour session because the clock anchor is the host. Scrubbing through a YouTube video, dragging the playhead in Logic Pro, alt-clicking through a Final Cut timeline — all stay in sync.

05
Handed to the OS

Played through whatever device you're using — AirPods, CarPlay, anything

On the client device, decoded audio hands off to the normal OS audio output pipeline. If you have AirPods connected, audio goes to AirPods. If you switch to CarPlay mid-session, audio follows. If you plug in headphones, audio follows. Remio does not own the output routing — it produces audio frames that the OS routes wherever it routes everything else.

What you can actually hear

Real workflows that need the audio

A remote desktop session is silent without audio routing — which means half the apps people actually use stop being useful. Here is what works with audio in place.

Music streaming — Spotify, Apple Music, YouTube Music
Whatever's playing on the host Mac plays through the iPad speakers in real time. Skip a track on the Mac with the iPad keyboard's media keys, audio updates instantly. Cross-fade behaviour, EQ settings, lossless audio — all preserved because Remio captures at the system output level.
Video calls — Zoom, Google Meet, FaceTime, Teams
The other side of the call streams to your client device. Lip-sync stays aligned at the frame level so people don't sound off-screen. The 22 ms audio delay is well below the 200 ms threshold where call participants start noticing latency on the speaker's side, and well below the 80 ms where the listener notices echo.
Audio production — Logic Pro, Ableton, Pro Tools
Mixes play back through your client headphones in sync with the timeline. Scrub the playhead and audio scrubs with you. Solo/mute changes are instant. The 22 ms algorithmic delay is within tolerance for editing and mixing; tracking with a microphone on the client side and monitoring through plugins is closer to 40 ms round-trip, usable but tight for vocal monitoring.
Video editing — Final Cut Pro, Premiere, DaVinci Resolve
Audio waveforms scrub in sync with the timeline as you drag through. Multitrack audio mix-downs play back accurately. Audio meters on the host respond to the same audio the client is hearing because both come from the same buffer.
Games — spatial audio, footsteps, voice chat
Stereo positional cues survive the codec — footstep direction, weapon report, ambient music all arrive at the client with the same spatial signature they had on the host. Discord voice chat coming in to the host gets mixed with the game audio in the system output, captured together, streamed together.
Audio privacy

Audio that nobody in the middle can hear

A remote desktop that streams unencrypted audio is sharing what you're listening to with every hop in between. Remio's audio path is encrypted end-to-end with the same keys as the video stream — a Cloudflare relay edge forwarding the packets cannot decrypt them, and neither can Remio.

Encryption

DTLS-SRTP with AES-256-GCM

Audio frames are encrypted inside the same DTLS-SRTP envelope as video frames. The keys are negotiated via Curve25519 ECDHE at pairing time, anchored to the one-time 4-digit PIN you exchanged on the local network. No third party ever holds those keys.

No relay decrypt

Relay sees encrypted bytes only

When the direct peer-to-peer path is blocked and Remio falls back to the Cloudflare TURN relay, only encrypted packets cross the relay. The relay forwards bytes without ever knowing whether they carry audio, video, or input events. There is no audio decryption anywhere outside your two paired devices.

Common questions

Common questions about audio in Remio

Five questions people ask about remote desktop audio before they trust their workflow to it.

Yes. Every Remio session includes a bidirectional audio channel that opens automatically alongside the video stream. No setting to enable, no separate app, no DLNA discovery. Whatever the host is playing routes to the client speakers in sync with the video.
Opus at 48 kHz, stereo, with adaptive bitrate between 32 and 192 kbps. Opus is the same codec used by Zoom, WhatsApp, and Discord — high quality at low bitrates, about 22 ms end-to-end delay on a healthy network, and excellent packet loss resilience.
Yes. Remio tags audio and video frames with the same capture clock on the host and presents them together on the client. Scrubbing through a YouTube video, dragging the playhead in Logic Pro, or alt-clicking through a Final Cut timeline all keep audio and video aligned to within one frame. There is no audio drift over long sessions.
Yes. The audio plays through whatever the client device is currently routing to — built-in speakers, AirPods, AirPods Pro, USB-C headphones, Bluetooth speakers, CarPlay. The host has no idea what the client's output device is; it just sends the encoded Opus stream to the client, and the client hands it to the OS's normal audio routing pipeline.
For monitoring playback, yes — the 22 ms algorithmic delay is well within the threshold most producers tolerate for casual mixing. For real-time tracking with a microphone on the client side feeding back to monitor through the host's plugins, the round-trip is closer to 40 ms on a good network, which is at the edge for sensitive vocal monitoring. For production work that needs sub-10 ms monitoring, a local interface is still the right answer; for editing, mixing, and remote tweaks, Remio is comfortable.
Free during launch · no account · no card

Audio you can actually hear.

Remio's audio path is built into the session — native capture on the host, Opus codec, encrypted end-to-end, lip-sync at the frame level. No second app, no virtual cables, no Bluetooth pairing dance. The computer plays. The phone hears it. The end.

macOS, iOS, iPadOS, Windows, and Android. Audio routing built into every session.