Files
BeMyEars/project-technical-specifications.md
jared d29b8182ca Initial commit
Add iOS app with Node.js/TypeScript backend for BeMyEars project.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 21:51:47 -05:00

371 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# iOS 1:1 Video Chat for Deaf Users & Interpreters
## Full Technical Specifications Document
**Status:** LOCKED
**Version:** v1.3 (Final)
**Scope:** Proof of Concept (Production-Aligned)
-----
### 1\. Purpose & Product Overview
This project delivers an iOS proof-of-concept application that enables secure, real-time, one-to-one video communication between:
1. **Callers** (deaf users), and
2. **Interpreters** (who wait for incoming calls)
The application is designed explicitly for sign language communication, prioritizing:
* Visual clarity
* Low latency
* Predictable call behavior
* Privacy and trust
The POC is architected to evolve directly into a production system without re-architecture.
-----
### 2\. Platform & Runtime Constraints
| Category | Specification |
| :--- | :--- |
| **Target OS** | iOS |
| **Minimum iOS Version** | 18.6 |
| **UI Framework** | SwiftUI |
| **Devices** | iPhone & iPad |
| **Orientation** | Portrait & Landscape |
| **Background Execution** | Not supported (foreground-only) |
| **Distribution** | TestFlight |
| **Accessibility** | Sign-language-first UI decisions |
| **Accessibility** | Sign-language-first UI decisions |
| **Audio** | Implicit with video (no audio-only mode) |
| **Network** | **Local Network Permission Required** (Discovery) |
-----
### 3\. User Roles & Authorization Model
**3.1 Roles**
Roles are assigned at registration time and are ephemeral.
* **Caller:** Can initiate calls.
* **Interpreter:** Can receive calls only.
**3.2 Enforcement Rules**
* Only callers may initiate `CALL_REQUEST`.
* Only interpreters may respond with `CALL_ACCEPT` or `CALL_DECLINE`.
* Interpreters cannot initiate calls.
* Role enforcement is server-side authoritative.
-----
### 4\. Identity & Presence
**4.1 Username Rules**
* Usernames must be globally unique.
* Validated by registrar server.
* Ephemeral (no persistence).
* Cannot be changed while registered.
* No authentication.
**4.2 Presence Lifecycle**
Presence is server-authoritative and maintained via:
* **Heartbeat interval:** 15 seconds
* **Presence TTL:** 20 seconds
* **Presence delivery:** WebSocket push (immediate)
A user is considered present only while:
1. Registered
2. Heartbeat valid
3. WebSocket connection active
-----
### 5\. Presence States
Each interpreter exists in exactly one state:
0. **UNAVAILABLE:** Default state after login. Can NOT receive calls.
1. **AVAILABLE:** Can receive calls.
2. **RINGING:** Call request active.
3. **IN_CALL:** Actively connected.
**Presence List Behavior**
* Only interpreters in `AVAILABLE` appear to callers.
* Interpreters in `UNAVAILABLE`, `RINGING` or `IN_CALL` are hidden.
* State transitions are server-controlled.
* `UNAVAILABLE` is the enforced starting state for all interpreters.
* Interpreters must explicitly "Go Online" to become `AVAILABLE`.
-----
### 6\. Video & Networking Architecture
**6.1 Media Transport**
* WebRTC Peer-to-Peer (1:1)
* WebRTC Peer-to-Peer (1:1)
* DTLS-SRTP encryption (Media)
* **Signaling Transport:** WebSocket (`ws://`) for POC (Production requires `wss://`).
* No SFU or MCU
**6.2 NAT Traversal**
* **STUN:** for candidate discovery.
* **TURN:** self-hosted (coturn) as mandatory fallback.
* **Requirements:** TURN is required for Symmetric NATs, Carrier-grade NAT, and Enterprise firewalls.
**6.3 Video Capture & Encoding Strategy**
* **Capture:** Best available front camera format (Target 1080p @ 30fps).
* **Encoding/Streaming:** Dynamic resolution adaptation based on device aspect ratio (e.g. 16:9 for iPhones, 4:3 for iPads). Default target is 720p equivalent.
* **iPad Support:**
* **Dynamic Resolution:** Detects screen aspect ratio (using `UIScreen.nativeBounds`) to scale video output correctly (e.g. 1440x1080 for iPad 4:3) preventing distortion.
* **Stability Fix:** Explicitly forces `.high` session preset (with fallback to `.medium`/`.low`) *after* capture start to override WebRTC defaults that crash iPad Mini.
* **Format Selection:** Strictly prioritizes standard 16:9 capture formats (1280x720, 1920x1080) to ensure hardware compatibility, avoiding unstable 4:3 formats like 1280x960.
-----
### 7\. Call Lifecycle & Concurrency Model
**7.1 Call Creation (Authoritative Server Flow)**
1. Caller selects interpreter.
2. Server generates `callId` (UUID).
3. **Interpreter state transitions:** `AVAILABLE``RINGING`
4. Server starts 10-second ring timer.
5. Interpreter receives call request.
**7.2 Ring Outcomes**
* **Accept within 10s:** State `RINGING``IN_CALL`. WebRTC negotiation begins.
* **Decline:** State resets to `AVAILABLE`.
* **Timeout:** Server auto-reverts to `AVAILABLE`.
* **Race condition:** Immediate `BUSY` error to caller.
**7.3 Call Termination**
Any of the following revert interpreter to `AVAILABLE`:
* Hangup
* WebRTC failure
* ICE timeout
* Heartbeat expiration
* WebSocket disconnect
* App crash / force quit
-----
### 8\. Timeouts (Hard Guarantees)
| Stage | Timeout |
| :--- | :--- |
| **Ringing** | 10 seconds |
| **Offer/Answer** | 10 seconds after accept |
| **ICE gathering/connection** | 1015 seconds |
| **Max “connecting” state** | 20 seconds total |
-----
### 9\. Signaling & Call Identification
**9.1 callId**
* UUID generated only by server.
* Required on all signaling messages.
* Used to correlate messages, enforce authorization, and prevent race conditions.
**9.2 Authorization Rules**
For a given `callId`:
* Only caller + interpreter may exchange signaling.
* Messages with unknown/expired `callId` are rejected.
* Messages violating role rules are rejected.
* Invalid messages are explicitly errored (not forwarded).
-----
### 10\. Signaling Protocol
**WebSocket Message Envelope**
```json
{
"type": "CALL_REQUEST | CALL_ACCEPT | CALL_DECLINE | OFFER | ANSWER | ICE | HANGUP | BUSY | REPORT_ABUSE | STATS_UPDATE | VIDEO_VISIBLE",
"callId": "uuid",
"from": "username",
"to": "username",
"payload": {}
}
```
-----
### 11\. Presence Delivery
* Presence updates are pushed immediately via WebSocket.
* Clients do not poll.
* Server is the single source of truth.
* Client does not infer presence locally.
-----
### 12\. Video UX Requirements
| Aspect | Specification |
| :--- | :--- |
| **Camera** | Front only |
| **Remote View** | Full-screen |
| **Local Preview** | Picture-in-Picture (PiP) |
| **PiP Default** | Top-right |
| **PiP Behavior** | Draggable, snap-to-corners |
| **Safe Area** | Enforced |
| **Mirroring** | Enabled (front camera) |
| **Controls** | Hang-up only |
-----
### 13\. Bandwidth & Connection Quality Rules
**Setup Phase**
* **TCP Relay:** Permitted (allowed as fallback for restrictive firewalls/campus networks).
**Connected Phase**
* **Packet Loss:** Accepted.
* **Error Correction:** WebRTC handles packet loss via NACK and FEC (Forward Error Correction).
* **Degradation Policy:**
* If quality drops below usable thresholds (defined as tunable constants): **Show "Poor Connection" UI Warning.**
* Connection remains active unless fully severed by network timeout.
-----
### 14\. iOS Client Architecture
**14.1 Frameworks**
* SwiftUI
* WebRTC iOS SDK
* Combine / async-await as needed
**14.2 Architecture Pattern**
* MVVM
**14.3 Modules**
* Registration
* Presence
* Call State Machine
* WebRTC Engine
* PiP Video View
**14.4 Call State Machine**
`Idle``Registered``Calling``IncomingCall``Connecting``InCall``Ending``Error`
* Transitions driven only by: User action, Server signaling, WebRTC callbacks, Timeout events.
-----
### 15\. Backend Architecture
**15.1 Stack**
* Node.js (TypeScript)
* WebSocket signaling
* In-memory presence store (POC)
* HTTPS/WSS only
**15.2 Server Responsibilities**
* Username uniqueness
* Role enforcement
* Presence tracking
* Call state transitions
* Presence tracking
* Call state transitions
* Mutex/locking on call requests
* **Service Discovery:** Advertises via Bonjour (`_bemyears._tcp`) for zero-conf client connection.
* TURN configuration delivery
-----
### 16\. TURN Server Specification
| Item | Specification |
| :--- | :--- |
| **Software** | coturn |
| **Auth** | Static long-term credentials (POC only) |
| **Transport** | UDP + TCP |
| **Encryption** | TURN-TLS enabled |
| **Capacity** | ≤5 concurrent calls |
*Security Note: Static credentials are acceptable for TestFlight only and must be replaced before App Store release.*
-----
### 17. Service Discovery (Bonjour)
**17.1 Mechanism**
* The iOS client uses `NetServiceBrowser` to discover the backend server on the local network.
* **Service Type:** `_bemyears._tcp`
* **Domain:** `local.`
* **Resolution:** Resolves IPv4 address of the backend and auto-populates it for the user.
* **Fallback:** Manual IP entry is supported via `UserDefaults` persistence.
-----
### 18. Logging & Privacy
**18.1 Allowed Server Logs**
* Timestamp (YYYY-MM-DD HH:MM:SS format)
* Event type
* callId (Required for call-related events)
* sessionId
* Role
* Offender identity (for Abuse Reports)
* Periodic User Statistics (every 5 minutes)
**18.2 Explicitly Forbidden**
* SDP bodies
* ICE candidates
* Video metadata
* Media statistics tied to identity
**18.3 Client-Side**
* No logging beyond OS crash reports.
* `lastCallId` and `lastRemoteUser` are persisted temporarily for abuse reporting only.
-----
### 19. Security Posture
* TLS for all signaling (Production).
* DTLS-SRTP for media.
* No stored personal data.
* No call content persistence.
* Server-authoritative enforcement everywhere.
-----
### 20. Acceptance Criteria
* Users register successfully with unique usernames.
* Presence updates are immediate.
* Interpreters disappear on call request.
* Race conditions result in exactly one successful call.
* Calls succeed behind NAT using TURN (including TCP relay fallback).
* PiP works reliably for signing framing.
* All failure paths recover cleanly.
-----