Add iOS app with Node.js/TypeScript backend for BeMyEars project. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
10 KiB
iOS 1:1 Video Chat for Deaf Users & Interpreters
Full Technical Specifications Document
Status: LOCKED Version: v1.3 (Final) Scope: Proof of Concept (Production-Aligned)
1. Purpose & Product Overview
This project delivers an iOS proof-of-concept application that enables secure, real-time, one-to-one video communication between:
- Callers (deaf users), and
- Interpreters (who wait for incoming calls)
The application is designed explicitly for sign language communication, prioritizing:
- Visual clarity
- Low latency
- Predictable call behavior
- Privacy and trust
The POC is architected to evolve directly into a production system without re-architecture.
2. Platform & Runtime Constraints
| Category | Specification |
|---|---|
| Target OS | iOS |
| Minimum iOS Version | 18.6 |
| UI Framework | SwiftUI |
| Devices | iPhone & iPad |
| Orientation | Portrait & Landscape |
| Background Execution | Not supported (foreground-only) |
| Distribution | TestFlight |
| Accessibility | Sign-language-first UI decisions |
| Accessibility | Sign-language-first UI decisions |
| Audio | Implicit with video (no audio-only mode) |
| Network | Local Network Permission Required (Discovery) |
3. User Roles & Authorization Model
3.1 Roles Roles are assigned at registration time and are ephemeral.
- Caller: Can initiate calls.
- Interpreter: Can receive calls only.
3.2 Enforcement Rules
- Only callers may initiate
CALL_REQUEST. - Only interpreters may respond with
CALL_ACCEPTorCALL_DECLINE. - Interpreters cannot initiate calls.
- Role enforcement is server-side authoritative.
4. Identity & Presence
4.1 Username Rules
- Usernames must be globally unique.
- Validated by registrar server.
- Ephemeral (no persistence).
- Cannot be changed while registered.
- No authentication.
4.2 Presence Lifecycle Presence is server-authoritative and maintained via:
- Heartbeat interval: 15 seconds
- Presence TTL: 20 seconds
- Presence delivery: WebSocket push (immediate)
A user is considered present only while:
- Registered
- Heartbeat valid
- WebSocket connection active
5. Presence States
Each interpreter exists in exactly one state:
- UNAVAILABLE: Default state after login. Can NOT receive calls.
- AVAILABLE: Can receive calls.
- RINGING: Call request active.
- IN_CALL: Actively connected.
Presence List Behavior
- Only interpreters in
AVAILABLEappear to callers. - Interpreters in
UNAVAILABLE,RINGINGorIN_CALLare hidden. - State transitions are server-controlled.
UNAVAILABLEis the enforced starting state for all interpreters.- Interpreters must explicitly "Go Online" to become
AVAILABLE.
6. Video & Networking Architecture
6.1 Media Transport
- WebRTC Peer-to-Peer (1:1)
- WebRTC Peer-to-Peer (1:1)
- DTLS-SRTP encryption (Media)
- Signaling Transport: WebSocket (
ws://) for POC (Production requireswss://). - No SFU or MCU
6.2 NAT Traversal
- STUN: for candidate discovery.
- TURN: self-hosted (coturn) as mandatory fallback.
- Requirements: TURN is required for Symmetric NATs, Carrier-grade NAT, and Enterprise firewalls.
6.3 Video Capture & Encoding Strategy
- Capture: Best available front camera format (Target 1080p @ 30fps).
- Encoding/Streaming: Dynamic resolution adaptation based on device aspect ratio (e.g. 16:9 for iPhones, 4:3 for iPads). Default target is 720p equivalent.
- iPad Support:
- Dynamic Resolution: Detects screen aspect ratio (using
UIScreen.nativeBounds) to scale video output correctly (e.g. 1440x1080 for iPad 4:3) preventing distortion. - Stability Fix: Explicitly forces
.highsession preset (with fallback to.medium/.low) after capture start to override WebRTC defaults that crash iPad Mini. - Format Selection: Strictly prioritizes standard 16:9 capture formats (1280x720, 1920x1080) to ensure hardware compatibility, avoiding unstable 4:3 formats like 1280x960.
- Dynamic Resolution: Detects screen aspect ratio (using
7. Call Lifecycle & Concurrency Model
7.1 Call Creation (Authoritative Server Flow)
- Caller selects interpreter.
- Server generates
callId(UUID). - Interpreter state transitions:
AVAILABLE→RINGING - Server starts 10-second ring timer.
- Interpreter receives call request.
7.2 Ring Outcomes
- Accept within 10s: State
RINGING→IN_CALL. WebRTC negotiation begins. - Decline: State resets to
AVAILABLE. - Timeout: Server auto-reverts to
AVAILABLE. - Race condition: Immediate
BUSYerror to caller.
7.3 Call Termination
Any of the following revert interpreter to AVAILABLE:
- Hangup
- WebRTC failure
- ICE timeout
- Heartbeat expiration
- WebSocket disconnect
- App crash / force quit
8. Timeouts (Hard Guarantees)
| Stage | Timeout |
|---|---|
| Ringing | 10 seconds |
| Offer/Answer | 10 seconds after accept |
| ICE gathering/connection | 10–15 seconds |
| Max “connecting” state | 20 seconds total |
9. Signaling & Call Identification
9.1 callId
- UUID generated only by server.
- Required on all signaling messages.
- Used to correlate messages, enforce authorization, and prevent race conditions.
9.2 Authorization Rules
For a given callId:
- Only caller + interpreter may exchange signaling.
- Messages with unknown/expired
callIdare rejected. - Messages violating role rules are rejected.
- Invalid messages are explicitly errored (not forwarded).
10. Signaling Protocol
WebSocket Message Envelope
{
"type": "CALL_REQUEST | CALL_ACCEPT | CALL_DECLINE | OFFER | ANSWER | ICE | HANGUP | BUSY | REPORT_ABUSE | STATS_UPDATE | VIDEO_VISIBLE",
"callId": "uuid",
"from": "username",
"to": "username",
"payload": {}
}
11. Presence Delivery
- Presence updates are pushed immediately via WebSocket.
- Clients do not poll.
- Server is the single source of truth.
- Client does not infer presence locally.
12. Video UX Requirements
| Aspect | Specification |
|---|---|
| Camera | Front only |
| Remote View | Full-screen |
| Local Preview | Picture-in-Picture (PiP) |
| PiP Default | Top-right |
| PiP Behavior | Draggable, snap-to-corners |
| Safe Area | Enforced |
| Mirroring | Enabled (front camera) |
| Controls | Hang-up only |
13. Bandwidth & Connection Quality Rules
Setup Phase
- TCP Relay: Permitted (allowed as fallback for restrictive firewalls/campus networks).
Connected Phase
- Packet Loss: Accepted.
- Error Correction: WebRTC handles packet loss via NACK and FEC (Forward Error Correction).
- Degradation Policy:
- If quality drops below usable thresholds (defined as tunable constants): Show "Poor Connection" UI Warning.
- Connection remains active unless fully severed by network timeout.
14. iOS Client Architecture
14.1 Frameworks
- SwiftUI
- WebRTC iOS SDK
- Combine / async-await as needed
14.2 Architecture Pattern
- MVVM
14.3 Modules
- Registration
- Presence
- Call State Machine
- WebRTC Engine
- PiP Video View
14.4 Call State Machine
Idle → Registered → Calling → IncomingCall → Connecting → InCall → Ending → Error
- Transitions driven only by: User action, Server signaling, WebRTC callbacks, Timeout events.
15. Backend Architecture
15.1 Stack
- Node.js (TypeScript)
- WebSocket signaling
- In-memory presence store (POC)
- HTTPS/WSS only
15.2 Server Responsibilities
- Username uniqueness
- Role enforcement
- Presence tracking
- Call state transitions
- Presence tracking
- Call state transitions
- Mutex/locking on call requests
- Service Discovery: Advertises via Bonjour (
_bemyears._tcp) for zero-conf client connection. - TURN configuration delivery
16. TURN Server Specification
| Item | Specification |
|---|---|
| Software | coturn |
| Auth | Static long-term credentials (POC only) |
| Transport | UDP + TCP |
| Encryption | TURN-TLS enabled |
| Capacity | ≤5 concurrent calls |
Security Note: Static credentials are acceptable for TestFlight only and must be replaced before App Store release.
17. Service Discovery (Bonjour)
17.1 Mechanism
- The iOS client uses
NetServiceBrowserto discover the backend server on the local network. - Service Type:
_bemyears._tcp - Domain:
local. - Resolution: Resolves IPv4 address of the backend and auto-populates it for the user.
- Fallback: Manual IP entry is supported via
UserDefaultspersistence.
18. Logging & Privacy
18.1 Allowed Server Logs
- Timestamp (YYYY-MM-DD HH:MM:SS format)
- Event type
- callId (Required for call-related events)
- sessionId
- Role
- Offender identity (for Abuse Reports)
- Periodic User Statistics (every 5 minutes)
18.2 Explicitly Forbidden
- SDP bodies
- ICE candidates
- Video metadata
- Media statistics tied to identity
18.3 Client-Side
- No logging beyond OS crash reports.
lastCallIdandlastRemoteUserare persisted temporarily for abuse reporting only.
19. Security Posture
- TLS for all signaling (Production).
- DTLS-SRTP for media.
- No stored personal data.
- No call content persistence.
- Server-authoritative enforcement everywhere.
20. Acceptance Criteria
- Users register successfully with unique usernames.
- Presence updates are immediate.
- Interpreters disappear on call request.
- Race conditions result in exactly one successful call.
- Calls succeed behind NAT using TURN (including TCP relay fallback).
- PiP works reliably for signing framing.
- All failure paths recover cleanly.