Skip to content

Olm fails to establish direct connection due to UDP port conflict during NAT holepunching on macOS #16

@danohn

Description

@danohn

Description

When using the --holepunch flag on macOS, olm frequently fails to establish a direct peer-to-peer connection and falls back to relay mode due to a race condition between the holepunch goroutine and WireGuard device initialization.

Current Behavior

  1. Olm starts UDP holepunching on a randomly selected port (e.g., 55704)
  2. The holepunch goroutine holds this port open to maintain NAT mappings
  3. WireGuard attempts to bind to the same port for the tunnel
  4. Binding fails with error: Unable to update bind: listen udp4 :55704: bind: address already in use
  5. Connection falls back to relay mode after 4 seconds

Expected Behavior

WireGuard should successfully bind to the port after holepunching completes, enabling direct peer-to-peer connectivity without relay.

Root Cause

There's insufficient delay between close(stopHolepunch) and WireGuard's dev.Up() call. The current 10ms delay is not enough for macOS to release the UDP port, causing WireGuard to fail when attempting to bind.

Logs

ERROR: wireguard: 2025/08/29 12:38:01 Unable to update bind: listen udp4 :55704: bind: address already in use
DEBUG: wireguard: 2025/08/29 12:38:01 Interface state was Down, requested Up, now Down
ERROR: 2025/08/29 12:38:01 Failed to bring up WireGuard device: listen udp4 :55704: bind: address already in use

Environment

  • OS: macOS 15.6.1 (tested on MacBook Pro with M1 Pro)
  • Olm version: 1.1.0
  • Using --holepunch flag

Solution

Increase the delay after closing the holepunch channel from 10ms to 500ms to ensure the OS has released the port before WireGuard attempts to bind.

Reproduction Steps

  1. Run sudo -E olm --holepunch on macOS
  2. Observe logs showing port binding failure
  3. Note fallback to relay mode instead of direct connection

Impact

  • Higher latency due to unnecessary relay usage
  • Increased bandwidth costs through relay server
  • Degraded performance when direct connectivity should be possible

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions