Files
BeMyEars/project-technical-specifications.md
jared d29b8182ca Initial commit
Add iOS app with Node.js/TypeScript backend for BeMyEars project.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 21:51:47 -05:00

10 KiB
Raw Blame History

iOS 1:1 Video Chat for Deaf Users & Interpreters

Full Technical Specifications Document

Status: LOCKED Version: v1.3 (Final) Scope: Proof of Concept (Production-Aligned)


1. Purpose & Product Overview

This project delivers an iOS proof-of-concept application that enables secure, real-time, one-to-one video communication between:

  1. Callers (deaf users), and
  2. Interpreters (who wait for incoming calls)

The application is designed explicitly for sign language communication, prioritizing:

  • Visual clarity
  • Low latency
  • Predictable call behavior
  • Privacy and trust

The POC is architected to evolve directly into a production system without re-architecture.


2. Platform & Runtime Constraints

Category Specification
Target OS iOS
Minimum iOS Version 18.6
UI Framework SwiftUI
Devices iPhone & iPad
Orientation Portrait & Landscape
Background Execution Not supported (foreground-only)
Distribution TestFlight
Accessibility Sign-language-first UI decisions
Accessibility Sign-language-first UI decisions
Audio Implicit with video (no audio-only mode)
Network Local Network Permission Required (Discovery)

3. User Roles & Authorization Model

3.1 Roles Roles are assigned at registration time and are ephemeral.

  • Caller: Can initiate calls.
  • Interpreter: Can receive calls only.

3.2 Enforcement Rules

  • Only callers may initiate CALL_REQUEST.
  • Only interpreters may respond with CALL_ACCEPT or CALL_DECLINE.
  • Interpreters cannot initiate calls.
  • Role enforcement is server-side authoritative.

4. Identity & Presence

4.1 Username Rules

  • Usernames must be globally unique.
  • Validated by registrar server.
  • Ephemeral (no persistence).
  • Cannot be changed while registered.
  • No authentication.

4.2 Presence Lifecycle Presence is server-authoritative and maintained via:

  • Heartbeat interval: 15 seconds
  • Presence TTL: 20 seconds
  • Presence delivery: WebSocket push (immediate)

A user is considered present only while:

  1. Registered
  2. Heartbeat valid
  3. WebSocket connection active

5. Presence States

Each interpreter exists in exactly one state:

  1. UNAVAILABLE: Default state after login. Can NOT receive calls.
  2. AVAILABLE: Can receive calls.
  3. RINGING: Call request active.
  4. IN_CALL: Actively connected.

Presence List Behavior

  • Only interpreters in AVAILABLE appear to callers.
  • Interpreters in UNAVAILABLE, RINGING or IN_CALL are hidden.
  • State transitions are server-controlled.
  • UNAVAILABLE is the enforced starting state for all interpreters.
  • Interpreters must explicitly "Go Online" to become AVAILABLE.

6. Video & Networking Architecture

6.1 Media Transport

  • WebRTC Peer-to-Peer (1:1)
  • WebRTC Peer-to-Peer (1:1)
  • DTLS-SRTP encryption (Media)
  • Signaling Transport: WebSocket (ws://) for POC (Production requires wss://).
  • No SFU or MCU

6.2 NAT Traversal

  • STUN: for candidate discovery.
  • TURN: self-hosted (coturn) as mandatory fallback.
  • Requirements: TURN is required for Symmetric NATs, Carrier-grade NAT, and Enterprise firewalls.

6.3 Video Capture & Encoding Strategy

  • Capture: Best available front camera format (Target 1080p @ 30fps).
  • Encoding/Streaming: Dynamic resolution adaptation based on device aspect ratio (e.g. 16:9 for iPhones, 4:3 for iPads). Default target is 720p equivalent.
  • iPad Support:
    • Dynamic Resolution: Detects screen aspect ratio (using UIScreen.nativeBounds) to scale video output correctly (e.g. 1440x1080 for iPad 4:3) preventing distortion.
    • Stability Fix: Explicitly forces .high session preset (with fallback to .medium/.low) after capture start to override WebRTC defaults that crash iPad Mini.
    • Format Selection: Strictly prioritizes standard 16:9 capture formats (1280x720, 1920x1080) to ensure hardware compatibility, avoiding unstable 4:3 formats like 1280x960.

7. Call Lifecycle & Concurrency Model

7.1 Call Creation (Authoritative Server Flow)

  1. Caller selects interpreter.
  2. Server generates callId (UUID).
  3. Interpreter state transitions: AVAILABLERINGING
  4. Server starts 10-second ring timer.
  5. Interpreter receives call request.

7.2 Ring Outcomes

  • Accept within 10s: State RINGINGIN_CALL. WebRTC negotiation begins.
  • Decline: State resets to AVAILABLE.
  • Timeout: Server auto-reverts to AVAILABLE.
  • Race condition: Immediate BUSY error to caller.

7.3 Call Termination Any of the following revert interpreter to AVAILABLE:

  • Hangup
  • WebRTC failure
  • ICE timeout
  • Heartbeat expiration
  • WebSocket disconnect
  • App crash / force quit

8. Timeouts (Hard Guarantees)

Stage Timeout
Ringing 10 seconds
Offer/Answer 10 seconds after accept
ICE gathering/connection 1015 seconds
Max “connecting” state 20 seconds total

9. Signaling & Call Identification

9.1 callId

  • UUID generated only by server.
  • Required on all signaling messages.
  • Used to correlate messages, enforce authorization, and prevent race conditions.

9.2 Authorization Rules For a given callId:

  • Only caller + interpreter may exchange signaling.
  • Messages with unknown/expired callId are rejected.
  • Messages violating role rules are rejected.
  • Invalid messages are explicitly errored (not forwarded).

10. Signaling Protocol

WebSocket Message Envelope

{
  "type": "CALL_REQUEST | CALL_ACCEPT | CALL_DECLINE | OFFER | ANSWER | ICE | HANGUP | BUSY | REPORT_ABUSE | STATS_UPDATE | VIDEO_VISIBLE",
  "callId": "uuid",
  "from": "username",
  "to": "username",
  "payload": {}
}

11. Presence Delivery

  • Presence updates are pushed immediately via WebSocket.
  • Clients do not poll.
  • Server is the single source of truth.
  • Client does not infer presence locally.

12. Video UX Requirements

Aspect Specification
Camera Front only
Remote View Full-screen
Local Preview Picture-in-Picture (PiP)
PiP Default Top-right
PiP Behavior Draggable, snap-to-corners
Safe Area Enforced
Mirroring Enabled (front camera)
Controls Hang-up only

13. Bandwidth & Connection Quality Rules

Setup Phase

  • TCP Relay: Permitted (allowed as fallback for restrictive firewalls/campus networks).

Connected Phase

  • Packet Loss: Accepted.
  • Error Correction: WebRTC handles packet loss via NACK and FEC (Forward Error Correction).
  • Degradation Policy:
    • If quality drops below usable thresholds (defined as tunable constants): Show "Poor Connection" UI Warning.
    • Connection remains active unless fully severed by network timeout.

14. iOS Client Architecture

14.1 Frameworks

  • SwiftUI
  • WebRTC iOS SDK
  • Combine / async-await as needed

14.2 Architecture Pattern

  • MVVM

14.3 Modules

  • Registration
  • Presence
  • Call State Machine
  • WebRTC Engine
  • PiP Video View

14.4 Call State Machine IdleRegisteredCallingIncomingCallConnectingInCallEndingError

  • Transitions driven only by: User action, Server signaling, WebRTC callbacks, Timeout events.

15. Backend Architecture

15.1 Stack

  • Node.js (TypeScript)
  • WebSocket signaling
  • In-memory presence store (POC)
  • HTTPS/WSS only

15.2 Server Responsibilities

  • Username uniqueness
  • Role enforcement
  • Presence tracking
  • Call state transitions
  • Presence tracking
  • Call state transitions
  • Mutex/locking on call requests
  • Service Discovery: Advertises via Bonjour (_bemyears._tcp) for zero-conf client connection.
  • TURN configuration delivery

16. TURN Server Specification

Item Specification
Software coturn
Auth Static long-term credentials (POC only)
Transport UDP + TCP
Encryption TURN-TLS enabled
Capacity ≤5 concurrent calls

Security Note: Static credentials are acceptable for TestFlight only and must be replaced before App Store release.


17. Service Discovery (Bonjour)

17.1 Mechanism

  • The iOS client uses NetServiceBrowser to discover the backend server on the local network.
  • Service Type: _bemyears._tcp
  • Domain: local.
  • Resolution: Resolves IPv4 address of the backend and auto-populates it for the user.
  • Fallback: Manual IP entry is supported via UserDefaults persistence.

18. Logging & Privacy

18.1 Allowed Server Logs

  • Timestamp (YYYY-MM-DD HH:MM:SS format)
  • Event type
  • callId (Required for call-related events)
  • sessionId
  • Role
  • Offender identity (for Abuse Reports)
  • Periodic User Statistics (every 5 minutes)

18.2 Explicitly Forbidden

  • SDP bodies
  • ICE candidates
  • Video metadata
  • Media statistics tied to identity

18.3 Client-Side

  • No logging beyond OS crash reports.
  • lastCallId and lastRemoteUser are persisted temporarily for abuse reporting only.

19. Security Posture

  • TLS for all signaling (Production).
  • DTLS-SRTP for media.
  • No stored personal data.
  • No call content persistence.
  • Server-authoritative enforcement everywhere.

20. Acceptance Criteria

  • Users register successfully with unique usernames.
  • Presence updates are immediate.
  • Interpreters disappear on call request.
  • Race conditions result in exactly one successful call.
  • Calls succeed behind NAT using TURN (including TCP relay fallback).
  • PiP works reliably for signing framing.
  • All failure paths recover cleanly.