# iOS 1:1 Video Chat for Deaf Users & Interpreters ## Full Technical Specifications Document **Status:** LOCKED **Version:** v1.3 (Final) **Scope:** Proof of Concept (Production-Aligned) ----- ### 1\. Purpose & Product Overview This project delivers an iOS proof-of-concept application that enables secure, real-time, one-to-one video communication between: 1. **Callers** (deaf users), and 2. **Interpreters** (who wait for incoming calls) The application is designed explicitly for sign language communication, prioritizing: * Visual clarity * Low latency * Predictable call behavior * Privacy and trust The POC is architected to evolve directly into a production system without re-architecture. ----- ### 2\. Platform & Runtime Constraints | Category | Specification | | :--- | :--- | | **Target OS** | iOS | | **Minimum iOS Version** | 18.6 | | **UI Framework** | SwiftUI | | **Devices** | iPhone & iPad | | **Orientation** | Portrait & Landscape | | **Background Execution** | Not supported (foreground-only) | | **Distribution** | TestFlight | | **Accessibility** | Sign-language-first UI decisions | | **Accessibility** | Sign-language-first UI decisions | | **Audio** | Implicit with video (no audio-only mode) | | **Network** | **Local Network Permission Required** (Discovery) | ----- ### 3\. User Roles & Authorization Model **3.1 Roles** Roles are assigned at registration time and are ephemeral. * **Caller:** Can initiate calls. * **Interpreter:** Can receive calls only. **3.2 Enforcement Rules** * Only callers may initiate `CALL_REQUEST`. * Only interpreters may respond with `CALL_ACCEPT` or `CALL_DECLINE`. * Interpreters cannot initiate calls. * Role enforcement is server-side authoritative. ----- ### 4\. Identity & Presence **4.1 Username Rules** * Usernames must be globally unique. * Validated by registrar server. * Ephemeral (no persistence). * Cannot be changed while registered. * No authentication. **4.2 Presence Lifecycle** Presence is server-authoritative and maintained via: * **Heartbeat interval:** 15 seconds * **Presence TTL:** 20 seconds * **Presence delivery:** WebSocket push (immediate) A user is considered present only while: 1. Registered 2. Heartbeat valid 3. WebSocket connection active ----- ### 5\. Presence States Each interpreter exists in exactly one state: 0. **UNAVAILABLE:** Default state after login. Can NOT receive calls. 1. **AVAILABLE:** Can receive calls. 2. **RINGING:** Call request active. 3. **IN_CALL:** Actively connected. **Presence List Behavior** * Only interpreters in `AVAILABLE` appear to callers. * Interpreters in `UNAVAILABLE`, `RINGING` or `IN_CALL` are hidden. * State transitions are server-controlled. * `UNAVAILABLE` is the enforced starting state for all interpreters. * Interpreters must explicitly "Go Online" to become `AVAILABLE`. ----- ### 6\. Video & Networking Architecture **6.1 Media Transport** * WebRTC Peer-to-Peer (1:1) * WebRTC Peer-to-Peer (1:1) * DTLS-SRTP encryption (Media) * **Signaling Transport:** WebSocket (`ws://`) for POC (Production requires `wss://`). * No SFU or MCU **6.2 NAT Traversal** * **STUN:** for candidate discovery. * **TURN:** self-hosted (coturn) as mandatory fallback. * **Requirements:** TURN is required for Symmetric NATs, Carrier-grade NAT, and Enterprise firewalls. **6.3 Video Capture & Encoding Strategy** * **Capture:** Best available front camera format (Target 1080p @ 30fps). * **Encoding/Streaming:** Dynamic resolution adaptation based on device aspect ratio (e.g. 16:9 for iPhones, 4:3 for iPads). Default target is 720p equivalent. * **iPad Support:** * **Dynamic Resolution:** Detects screen aspect ratio (using `UIScreen.nativeBounds`) to scale video output correctly (e.g. 1440x1080 for iPad 4:3) preventing distortion. * **Stability Fix:** Explicitly forces `.high` session preset (with fallback to `.medium`/`.low`) *after* capture start to override WebRTC defaults that crash iPad Mini. * **Format Selection:** Strictly prioritizes standard 16:9 capture formats (1280x720, 1920x1080) to ensure hardware compatibility, avoiding unstable 4:3 formats like 1280x960. ----- ### 7\. Call Lifecycle & Concurrency Model **7.1 Call Creation (Authoritative Server Flow)** 1. Caller selects interpreter. 2. Server generates `callId` (UUID). 3. **Interpreter state transitions:** `AVAILABLE` → `RINGING` 4. Server starts 10-second ring timer. 5. Interpreter receives call request. **7.2 Ring Outcomes** * **Accept within 10s:** State `RINGING` → `IN_CALL`. WebRTC negotiation begins. * **Decline:** State resets to `AVAILABLE`. * **Timeout:** Server auto-reverts to `AVAILABLE`. * **Race condition:** Immediate `BUSY` error to caller. **7.3 Call Termination** Any of the following revert interpreter to `AVAILABLE`: * Hangup * WebRTC failure * ICE timeout * Heartbeat expiration * WebSocket disconnect * App crash / force quit ----- ### 8\. Timeouts (Hard Guarantees) | Stage | Timeout | | :--- | :--- | | **Ringing** | 10 seconds | | **Offer/Answer** | 10 seconds after accept | | **ICE gathering/connection** | 10–15 seconds | | **Max “connecting” state** | 20 seconds total | ----- ### 9\. Signaling & Call Identification **9.1 callId** * UUID generated only by server. * Required on all signaling messages. * Used to correlate messages, enforce authorization, and prevent race conditions. **9.2 Authorization Rules** For a given `callId`: * Only caller + interpreter may exchange signaling. * Messages with unknown/expired `callId` are rejected. * Messages violating role rules are rejected. * Invalid messages are explicitly errored (not forwarded). ----- ### 10\. Signaling Protocol **WebSocket Message Envelope** ```json { "type": "CALL_REQUEST | CALL_ACCEPT | CALL_DECLINE | OFFER | ANSWER | ICE | HANGUP | BUSY | REPORT_ABUSE | STATS_UPDATE | VIDEO_VISIBLE", "callId": "uuid", "from": "username", "to": "username", "payload": {} } ``` ----- ### 11\. Presence Delivery * Presence updates are pushed immediately via WebSocket. * Clients do not poll. * Server is the single source of truth. * Client does not infer presence locally. ----- ### 12\. Video UX Requirements | Aspect | Specification | | :--- | :--- | | **Camera** | Front only | | **Remote View** | Full-screen | | **Local Preview** | Picture-in-Picture (PiP) | | **PiP Default** | Top-right | | **PiP Behavior** | Draggable, snap-to-corners | | **Safe Area** | Enforced | | **Mirroring** | Enabled (front camera) | | **Controls** | Hang-up only | ----- ### 13\. Bandwidth & Connection Quality Rules **Setup Phase** * **TCP Relay:** Permitted (allowed as fallback for restrictive firewalls/campus networks). **Connected Phase** * **Packet Loss:** Accepted. * **Error Correction:** WebRTC handles packet loss via NACK and FEC (Forward Error Correction). * **Degradation Policy:** * If quality drops below usable thresholds (defined as tunable constants): **Show "Poor Connection" UI Warning.** * Connection remains active unless fully severed by network timeout. ----- ### 14\. iOS Client Architecture **14.1 Frameworks** * SwiftUI * WebRTC iOS SDK * Combine / async-await as needed **14.2 Architecture Pattern** * MVVM **14.3 Modules** * Registration * Presence * Call State Machine * WebRTC Engine * PiP Video View **14.4 Call State Machine** `Idle` → `Registered` → `Calling` → `IncomingCall` → `Connecting` → `InCall` → `Ending` → `Error` * Transitions driven only by: User action, Server signaling, WebRTC callbacks, Timeout events. ----- ### 15\. Backend Architecture **15.1 Stack** * Node.js (TypeScript) * WebSocket signaling * In-memory presence store (POC) * HTTPS/WSS only **15.2 Server Responsibilities** * Username uniqueness * Role enforcement * Presence tracking * Call state transitions * Presence tracking * Call state transitions * Mutex/locking on call requests * **Service Discovery:** Advertises via Bonjour (`_bemyears._tcp`) for zero-conf client connection. * TURN configuration delivery ----- ### 16\. TURN Server Specification | Item | Specification | | :--- | :--- | | **Software** | coturn | | **Auth** | Static long-term credentials (POC only) | | **Transport** | UDP + TCP | | **Encryption** | TURN-TLS enabled | | **Capacity** | ≤5 concurrent calls | *Security Note: Static credentials are acceptable for TestFlight only and must be replaced before App Store release.* ----- ### 17. Service Discovery (Bonjour) **17.1 Mechanism** * The iOS client uses `NetServiceBrowser` to discover the backend server on the local network. * **Service Type:** `_bemyears._tcp` * **Domain:** `local.` * **Resolution:** Resolves IPv4 address of the backend and auto-populates it for the user. * **Fallback:** Manual IP entry is supported via `UserDefaults` persistence. ----- ### 18. Logging & Privacy **18.1 Allowed Server Logs** * Timestamp (YYYY-MM-DD HH:MM:SS format) * Event type * callId (Required for call-related events) * sessionId * Role * Offender identity (for Abuse Reports) * Periodic User Statistics (every 5 minutes) **18.2 Explicitly Forbidden** * SDP bodies * ICE candidates * Video metadata * Media statistics tied to identity **18.3 Client-Side** * No logging beyond OS crash reports. * `lastCallId` and `lastRemoteUser` are persisted temporarily for abuse reporting only. ----- ### 19. Security Posture * TLS for all signaling (Production). * DTLS-SRTP for media. * No stored personal data. * No call content persistence. * Server-authoritative enforcement everywhere. ----- ### 20. Acceptance Criteria * Users register successfully with unique usernames. * Presence updates are immediate. * Interpreters disappear on call request. * Race conditions result in exactly one successful call. * Calls succeed behind NAT using TURN (including TCP relay fallback). * PiP works reliably for signing framing. * All failure paths recover cleanly. -----