Add iOS app with Node.js/TypeScript backend for BeMyEars project. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
371 lines
10 KiB
Markdown
371 lines
10 KiB
Markdown
# iOS 1:1 Video Chat for Deaf Users & Interpreters
|
||
|
||
## Full Technical Specifications Document
|
||
|
||
**Status:** LOCKED
|
||
**Version:** v1.3 (Final)
|
||
**Scope:** Proof of Concept (Production-Aligned)
|
||
|
||
-----
|
||
|
||
### 1\. Purpose & Product Overview
|
||
|
||
This project delivers an iOS proof-of-concept application that enables secure, real-time, one-to-one video communication between:
|
||
|
||
1. **Callers** (deaf users), and
|
||
2. **Interpreters** (who wait for incoming calls)
|
||
|
||
The application is designed explicitly for sign language communication, prioritizing:
|
||
|
||
* Visual clarity
|
||
* Low latency
|
||
* Predictable call behavior
|
||
* Privacy and trust
|
||
|
||
The POC is architected to evolve directly into a production system without re-architecture.
|
||
|
||
-----
|
||
|
||
### 2\. Platform & Runtime Constraints
|
||
|
||
| Category | Specification |
|
||
| :--- | :--- |
|
||
| **Target OS** | iOS |
|
||
| **Minimum iOS Version** | 18.6 |
|
||
| **UI Framework** | SwiftUI |
|
||
| **Devices** | iPhone & iPad |
|
||
| **Orientation** | Portrait & Landscape |
|
||
| **Background Execution** | Not supported (foreground-only) |
|
||
| **Distribution** | TestFlight |
|
||
| **Accessibility** | Sign-language-first UI decisions |
|
||
| **Accessibility** | Sign-language-first UI decisions |
|
||
| **Audio** | Implicit with video (no audio-only mode) |
|
||
| **Network** | **Local Network Permission Required** (Discovery) |
|
||
|
||
-----
|
||
|
||
### 3\. User Roles & Authorization Model
|
||
|
||
**3.1 Roles**
|
||
Roles are assigned at registration time and are ephemeral.
|
||
|
||
* **Caller:** Can initiate calls.
|
||
* **Interpreter:** Can receive calls only.
|
||
|
||
**3.2 Enforcement Rules**
|
||
|
||
* Only callers may initiate `CALL_REQUEST`.
|
||
* Only interpreters may respond with `CALL_ACCEPT` or `CALL_DECLINE`.
|
||
* Interpreters cannot initiate calls.
|
||
* Role enforcement is server-side authoritative.
|
||
|
||
-----
|
||
|
||
### 4\. Identity & Presence
|
||
|
||
**4.1 Username Rules**
|
||
|
||
* Usernames must be globally unique.
|
||
* Validated by registrar server.
|
||
* Ephemeral (no persistence).
|
||
* Cannot be changed while registered.
|
||
* No authentication.
|
||
|
||
**4.2 Presence Lifecycle**
|
||
Presence is server-authoritative and maintained via:
|
||
|
||
* **Heartbeat interval:** 15 seconds
|
||
* **Presence TTL:** 20 seconds
|
||
* **Presence delivery:** WebSocket push (immediate)
|
||
|
||
A user is considered present only while:
|
||
|
||
1. Registered
|
||
2. Heartbeat valid
|
||
3. WebSocket connection active
|
||
|
||
-----
|
||
|
||
### 5\. Presence States
|
||
|
||
Each interpreter exists in exactly one state:
|
||
|
||
0. **UNAVAILABLE:** Default state after login. Can NOT receive calls.
|
||
1. **AVAILABLE:** Can receive calls.
|
||
2. **RINGING:** Call request active.
|
||
3. **IN_CALL:** Actively connected.
|
||
|
||
**Presence List Behavior**
|
||
|
||
* Only interpreters in `AVAILABLE` appear to callers.
|
||
* Interpreters in `UNAVAILABLE`, `RINGING` or `IN_CALL` are hidden.
|
||
* State transitions are server-controlled.
|
||
* `UNAVAILABLE` is the enforced starting state for all interpreters.
|
||
* Interpreters must explicitly "Go Online" to become `AVAILABLE`.
|
||
|
||
-----
|
||
|
||
### 6\. Video & Networking Architecture
|
||
|
||
**6.1 Media Transport**
|
||
|
||
* WebRTC Peer-to-Peer (1:1)
|
||
* WebRTC Peer-to-Peer (1:1)
|
||
* DTLS-SRTP encryption (Media)
|
||
* **Signaling Transport:** WebSocket (`ws://`) for POC (Production requires `wss://`).
|
||
* No SFU or MCU
|
||
|
||
**6.2 NAT Traversal**
|
||
|
||
* **STUN:** for candidate discovery.
|
||
* **TURN:** self-hosted (coturn) as mandatory fallback.
|
||
* **Requirements:** TURN is required for Symmetric NATs, Carrier-grade NAT, and Enterprise firewalls.
|
||
|
||
**6.3 Video Capture & Encoding Strategy**
|
||
|
||
* **Capture:** Best available front camera format (Target 1080p @ 30fps).
|
||
* **Encoding/Streaming:** Dynamic resolution adaptation based on device aspect ratio (e.g. 16:9 for iPhones, 4:3 for iPads). Default target is 720p equivalent.
|
||
* **iPad Support:**
|
||
* **Dynamic Resolution:** Detects screen aspect ratio (using `UIScreen.nativeBounds`) to scale video output correctly (e.g. 1440x1080 for iPad 4:3) preventing distortion.
|
||
* **Stability Fix:** Explicitly forces `.high` session preset (with fallback to `.medium`/`.low`) *after* capture start to override WebRTC defaults that crash iPad Mini.
|
||
* **Format Selection:** Strictly prioritizes standard 16:9 capture formats (1280x720, 1920x1080) to ensure hardware compatibility, avoiding unstable 4:3 formats like 1280x960.
|
||
|
||
-----
|
||
|
||
### 7\. Call Lifecycle & Concurrency Model
|
||
|
||
**7.1 Call Creation (Authoritative Server Flow)**
|
||
|
||
1. Caller selects interpreter.
|
||
2. Server generates `callId` (UUID).
|
||
3. **Interpreter state transitions:** `AVAILABLE` → `RINGING`
|
||
4. Server starts 10-second ring timer.
|
||
5. Interpreter receives call request.
|
||
|
||
**7.2 Ring Outcomes**
|
||
|
||
* **Accept within 10s:** State `RINGING` → `IN_CALL`. WebRTC negotiation begins.
|
||
* **Decline:** State resets to `AVAILABLE`.
|
||
* **Timeout:** Server auto-reverts to `AVAILABLE`.
|
||
* **Race condition:** Immediate `BUSY` error to caller.
|
||
|
||
**7.3 Call Termination**
|
||
Any of the following revert interpreter to `AVAILABLE`:
|
||
|
||
* Hangup
|
||
* WebRTC failure
|
||
* ICE timeout
|
||
* Heartbeat expiration
|
||
* WebSocket disconnect
|
||
* App crash / force quit
|
||
|
||
-----
|
||
|
||
### 8\. Timeouts (Hard Guarantees)
|
||
|
||
| Stage | Timeout |
|
||
| :--- | :--- |
|
||
| **Ringing** | 10 seconds |
|
||
| **Offer/Answer** | 10 seconds after accept |
|
||
| **ICE gathering/connection** | 10–15 seconds |
|
||
| **Max “connecting” state** | 20 seconds total |
|
||
|
||
-----
|
||
|
||
### 9\. Signaling & Call Identification
|
||
|
||
**9.1 callId**
|
||
|
||
* UUID generated only by server.
|
||
* Required on all signaling messages.
|
||
* Used to correlate messages, enforce authorization, and prevent race conditions.
|
||
|
||
**9.2 Authorization Rules**
|
||
For a given `callId`:
|
||
|
||
* Only caller + interpreter may exchange signaling.
|
||
* Messages with unknown/expired `callId` are rejected.
|
||
* Messages violating role rules are rejected.
|
||
* Invalid messages are explicitly errored (not forwarded).
|
||
|
||
-----
|
||
|
||
### 10\. Signaling Protocol
|
||
|
||
**WebSocket Message Envelope**
|
||
|
||
```json
|
||
{
|
||
"type": "CALL_REQUEST | CALL_ACCEPT | CALL_DECLINE | OFFER | ANSWER | ICE | HANGUP | BUSY | REPORT_ABUSE | STATS_UPDATE | VIDEO_VISIBLE",
|
||
"callId": "uuid",
|
||
"from": "username",
|
||
"to": "username",
|
||
"payload": {}
|
||
}
|
||
```
|
||
|
||
-----
|
||
|
||
### 11\. Presence Delivery
|
||
|
||
* Presence updates are pushed immediately via WebSocket.
|
||
* Clients do not poll.
|
||
* Server is the single source of truth.
|
||
* Client does not infer presence locally.
|
||
|
||
-----
|
||
|
||
### 12\. Video UX Requirements
|
||
|
||
| Aspect | Specification |
|
||
| :--- | :--- |
|
||
| **Camera** | Front only |
|
||
| **Remote View** | Full-screen |
|
||
| **Local Preview** | Picture-in-Picture (PiP) |
|
||
| **PiP Default** | Top-right |
|
||
| **PiP Behavior** | Draggable, snap-to-corners |
|
||
| **Safe Area** | Enforced |
|
||
| **Mirroring** | Enabled (front camera) |
|
||
| **Controls** | Hang-up only |
|
||
|
||
-----
|
||
|
||
### 13\. Bandwidth & Connection Quality Rules
|
||
|
||
**Setup Phase**
|
||
|
||
* **TCP Relay:** Permitted (allowed as fallback for restrictive firewalls/campus networks).
|
||
|
||
**Connected Phase**
|
||
|
||
* **Packet Loss:** Accepted.
|
||
* **Error Correction:** WebRTC handles packet loss via NACK and FEC (Forward Error Correction).
|
||
* **Degradation Policy:**
|
||
* If quality drops below usable thresholds (defined as tunable constants): **Show "Poor Connection" UI Warning.**
|
||
* Connection remains active unless fully severed by network timeout.
|
||
|
||
-----
|
||
|
||
### 14\. iOS Client Architecture
|
||
|
||
**14.1 Frameworks**
|
||
|
||
* SwiftUI
|
||
* WebRTC iOS SDK
|
||
* Combine / async-await as needed
|
||
|
||
**14.2 Architecture Pattern**
|
||
|
||
* MVVM
|
||
|
||
**14.3 Modules**
|
||
|
||
* Registration
|
||
* Presence
|
||
* Call State Machine
|
||
* WebRTC Engine
|
||
* PiP Video View
|
||
|
||
**14.4 Call State Machine**
|
||
`Idle` → `Registered` → `Calling` → `IncomingCall` → `Connecting` → `InCall` → `Ending` → `Error`
|
||
|
||
* Transitions driven only by: User action, Server signaling, WebRTC callbacks, Timeout events.
|
||
|
||
-----
|
||
|
||
### 15\. Backend Architecture
|
||
|
||
**15.1 Stack**
|
||
|
||
* Node.js (TypeScript)
|
||
* WebSocket signaling
|
||
* In-memory presence store (POC)
|
||
* HTTPS/WSS only
|
||
|
||
**15.2 Server Responsibilities**
|
||
|
||
* Username uniqueness
|
||
* Role enforcement
|
||
* Presence tracking
|
||
* Call state transitions
|
||
* Presence tracking
|
||
* Call state transitions
|
||
* Mutex/locking on call requests
|
||
* **Service Discovery:** Advertises via Bonjour (`_bemyears._tcp`) for zero-conf client connection.
|
||
* TURN configuration delivery
|
||
|
||
-----
|
||
|
||
### 16\. TURN Server Specification
|
||
|
||
| Item | Specification |
|
||
| :--- | :--- |
|
||
| **Software** | coturn |
|
||
| **Auth** | Static long-term credentials (POC only) |
|
||
| **Transport** | UDP + TCP |
|
||
| **Encryption** | TURN-TLS enabled |
|
||
| **Capacity** | ≤5 concurrent calls |
|
||
|
||
*Security Note: Static credentials are acceptable for TestFlight only and must be replaced before App Store release.*
|
||
|
||
-----
|
||
|
||
### 17. Service Discovery (Bonjour)
|
||
|
||
**17.1 Mechanism**
|
||
* The iOS client uses `NetServiceBrowser` to discover the backend server on the local network.
|
||
* **Service Type:** `_bemyears._tcp`
|
||
* **Domain:** `local.`
|
||
* **Resolution:** Resolves IPv4 address of the backend and auto-populates it for the user.
|
||
* **Fallback:** Manual IP entry is supported via `UserDefaults` persistence.
|
||
|
||
-----
|
||
|
||
### 18. Logging & Privacy
|
||
|
||
**18.1 Allowed Server Logs**
|
||
|
||
* Timestamp (YYYY-MM-DD HH:MM:SS format)
|
||
* Event type
|
||
* callId (Required for call-related events)
|
||
* sessionId
|
||
* Role
|
||
* Offender identity (for Abuse Reports)
|
||
* Periodic User Statistics (every 5 minutes)
|
||
|
||
**18.2 Explicitly Forbidden**
|
||
|
||
* SDP bodies
|
||
* ICE candidates
|
||
* Video metadata
|
||
* Media statistics tied to identity
|
||
|
||
**18.3 Client-Side**
|
||
|
||
* No logging beyond OS crash reports.
|
||
* `lastCallId` and `lastRemoteUser` are persisted temporarily for abuse reporting only.
|
||
|
||
-----
|
||
|
||
### 19. Security Posture
|
||
|
||
* TLS for all signaling (Production).
|
||
* DTLS-SRTP for media.
|
||
* No stored personal data.
|
||
* No call content persistence.
|
||
* Server-authoritative enforcement everywhere.
|
||
|
||
-----
|
||
|
||
### 20. Acceptance Criteria
|
||
|
||
* Users register successfully with unique usernames.
|
||
* Presence updates are immediate.
|
||
* Interpreters disappear on call request.
|
||
* Race conditions result in exactly one successful call.
|
||
* Calls succeed behind NAT using TURN (including TCP relay fallback).
|
||
* PiP works reliably for signing framing.
|
||
* All failure paths recover cleanly.
|
||
|
||
-----
|