Files
AtTable/MPC_how_it_works.md
jared 80de9fe057 Initial commit
AtTable iOS app with multipeer connectivity for mesh messaging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 22:06:32 -05:00

7.8 KiB

Multipeer Connectivity (MPC) Architecture

This document explains how AtTable uses Apple's Multipeer Connectivity framework to create a peer-to-peer mesh network for real-time communication between deaf and hearing users.


Overview

AtTable uses Multipeer Connectivity (MPC) to establish direct device-to-device connections without requiring a central server. The app supports connections over:

  • Wi-Fi (same network)
  • Peer-to-peer Wi-Fi (AWDL - Apple Wireless Direct Link)
  • Bluetooth

When devices aren't on the same Wi-Fi network (e.g., on 5G/cellular), MPC automatically falls back to AWDL for peer-to-peer discovery and data transfer.


User Onboarding Flow

1. Initial Setup (OnboardingView.swift)

When a user launches the app:

  1. They enter their name
  2. Select their role (Deaf or Hearing)
  3. Choose an aura color (for visual identity in the mesh)
  4. Tap "Start Conversation" to enter the mesh
User launches app → OnboardingView → Enter details → ChatView (mesh starts)

2. Identity Generation (NodeIdentity.swift)

Upon first launch, the app generates a stable Node Identity:

  • nodeID: A UUID persisted in UserDefaults (stable per app installation)
  • instance: A monotonic counter that increments each time a session starts

This identity system allows the mesh to:

  • Reliably identify users across reconnections
  • Detect and filter "ghost" peers (stale connections from previous sessions)
  • Handle device reboots gracefully

Network Connection Process

Discovery & Connection (MultipeerSession.swift)

When ChatView appears, it calls multipeerSession.start(), which:

  1. Sets up the MCSession with encryption disabled (for faster AWDL connections)
  2. Starts browsing for nearby peers using MCNearbyServiceBrowser
  3. Starts advertising (after 0.5s delay) using MCNearbyServiceAdvertiser

Wi-Fi vs Cellular/5G Connections

Network Type Connection Method Handshake Delay Connection Time
Wi-Fi (same network) Infrastructure Wi-Fi 0.5 seconds Near-instant
Cellular/5G AWDL (peer-to-peer Wi-Fi) 1.5 seconds Up to 60 seconds

The app uses NetworkMonitor.swift to detect the current network type and adjusts timing:

let isWiFi = NetworkMonitor.shared.isWiFi
let delay = isWiFi ? 0.5s : 1.5s // Slower for AWDL stability

Deterministic Leader/Follower Protocol

To prevent connection races (both devices trying to invite each other), the app uses a deterministic leader election:

if myNodeID > theirNodeID {
    // I am LEADER - I will send the invite
} else {
    // I am FOLLOWER - I wait for their invite
}

This ensures exactly one device initiates each connection.


Handshake Protocol

Once connected at the socket level, devices exchange handshake messages containing:

struct MeshMessage {
    var senderNodeID: String     // Stable identity
    var senderInstance: Int      // Session counter (for ghost detection)
    var senderRole: UserRole     // Deaf or Hearing
    var senderColorHex: String   // Aura color
    var isHandshake: Bool        // Identifies this as handshake
}

The handshake:

  1. Registers the peer in connectedPeerUsers for UI display
  2. Starts a 15-second stability timer before clearing failure counters
  3. Maps the MCPeerID to the stable nodeID for reliable identification

User Leaving the Conversation

Explicit Leave (ChatView.swift)

When a user taps "Leave":

Button(action: {
    speechRecognizer.stopRecording()  // Stop audio transcription
    multipeerSession.stop()            // Disconnect from mesh
    isOnboardingComplete = false       // Return to onboarding
})

Disconnect Cleanup (MultipeerSession.disconnect())

The disconnect() function performs complete cleanup:

  1. Cancel pending work: Recovery tasks, connection timers
  2. Stop services: Advertising and browsing
  3. Clear delegates: Prevent zombie callbacks
  4. Disconnect session: session?.disconnect()
  5. Clear all state:
    • connectedPeers / connectedPeerUsers
    • pendingInvites / latestByNodeID
    • cooldownUntil / consecutiveFailures
  6. Stop keep-alive heartbeats

Partial Transcript Preservation

If a peer disconnects mid-speech, their partial transcript is preserved as a final message:

if let partialText = liveTranscripts[peerKey], !partialText.isEmpty {
    let finalMessage = MeshMessage(content: partialText, ...)
    receivedMessages.append(finalMessage)
}

Rejoining the Conversation

Identity Recovery

When a user returns to the conversation:

  1. App resets isOnboardingComplete = false on every launch (intentional - forces Login screen)
  2. User completes onboarding again (name/role/color preserved in @AppStorage)
  3. multipeerSession.start() called again

Instance Increment

The key to reliable rejoining is the instance counter:

myInstance = NodeIdentity.nextInstance() // Monotonically increasing

When other devices see the new instance:

  1. Ghost Detection: Old connections with lower instances are rejected
  2. Cooldown Clear: Any cooldowns from previous failures are removed
  3. Fresh Connect: The leader initiates a new invitation

Handling Stale Peers

The mesh uses multiple mechanisms to handle rejoins:

Mechanism Purpose
Ghost Filtering Reject messages/invites from older instances
Cooldown Clear Give returning peers a fresh chance
Half-Open Deadlock Fix If we think we're connected but they invite us, accept the new invite
Stability Timer Only reset failure counts after 15s of stable connection

Keep-Alive & Mesh Health

Heartbeat System

When connected, the mesh sends heartbeats every 10 seconds:

let message = MeshMessage(
    content: "💓",
    isKeepAlive: true,
    connectedNodeIDs: connectedPeerUsers.map { $0.nodeID } // Gossip
)

Gossip Protocol

Heartbeats include a list of connected peers, enabling clique repair:

  1. Device A receives heartbeat from Device B
  2. If B knows Device C but A doesn't, A can proactively invite C
  3. This heals mesh partitions without requiring everyone to be discoverable

Connection Recovery

Exponential Backoff

Failed connections trigger increasing cooldown periods:

// 0.5s → 1.0s → 2.0s → 4.0s → ... → max 30s
let delay = min(0.5 * pow(2, failures - 1), 30.0)

Smart Retry

Instead of restarting everything, failed connections are retried individually:

  1. Only the leader initiates retries (prevents race conditions)
  2. Retries respect cooldown periods
  3. After 5 consecutive failures → "Poisoned State" triggers full reset

Poisoned State Recovery

If a peer has too many consecutive failures:

if failures >= 5 {
    restartServices(forcePoisonedRecovery: true)
    // Creates new MCPeerID, clears all cooldowns
}

Summary

Event What Happens
User joins NodeID retrieved, instance incremented, advertise + browse started
On Wi-Fi Fast handshake (0.5s), near-instant connections
On 5G/Cellular AWDL used, slower handshake (1.5s), up to 60s to connect
User leaves Full cleanup, partial transcripts preserved
User rejoins New instance number, ghosts filtered, cooldowns cleared
Connection fails Exponential backoff, smart retry by leader only

The architecture prioritizes reliability over speed, using defensive mechanisms like ghost filtering, stability timers, and gossip-based clique repair to maintain mesh health despite the inherent unreliability of peer-to-peer wireless connections.