- Add project constitution with vision, principles, and autonomy settings - Add 15 feature specifications covering full app scope - Configure agent entry points (AGENTS.md, CLAUDE.md) - Add build prompt and speckit command for spec creation - Include comprehensive .gitignore for iOS development Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
332 lines
10 KiB
Plaintext
332 lines
10 KiB
Plaintext
## CheapRetouch — Revised Specification
|
||
|
||
### Project Overview
|
||
|
||
**Platform**: iOS 17.0+
|
||
|
||
**Objective**: On-device photo editor for removing unwanted elements using only Apple's public frameworks and classical image processing. No custom ML models.
|
||
|
||
**Core Capabilities** (achievable without ML):
|
||
- Person removal (Vision handles this well)
|
||
- Foreground object removal (user-initiated, Vision-assisted)
|
||
- Wire/line removal (geometric contour detection)
|
||
|
||
**Removed from Scope**:
|
||
- Automatic fence/mesh detection (requires semantic understanding)
|
||
- Automatic identification of object types (trash cans, stop signs, etc.)
|
||
|
||
---
|
||
|
||
### Technical Stack
|
||
|
||
| Layer | Framework | Purpose |
|
||
|-------|-----------|---------|
|
||
| UI | SwiftUI + UIKit interop | Canvas, tools, state management |
|
||
| Masking | Vision | `VNGenerateForegroundInstanceMaskRequest`, `VNDetectContoursRequest` |
|
||
| Subject Interaction | VisionKit | `ImageAnalyzer`, `ImageAnalysis`, `ImageAnalysisInteraction` |
|
||
| Inpainting | Metal (custom) | Patch-based synthesis, mask feathering, blending |
|
||
| Compositing | Core Image | Color adjustments, preview pipeline |
|
||
| Fallback Processing | Accelerate/vImage | Simulator, older devices without Metal |
|
||
|
||
---
|
||
|
||
### Features
|
||
|
||
#### 1. Person Removal
|
||
|
||
**How it works**:
|
||
1. User taps a person in the photo
|
||
2. `VNGenerateForegroundInstanceMaskRequest` generates a precise mask
|
||
3. Mask is dilated and feathered
|
||
4. Custom Metal inpainting fills the region from surrounding context
|
||
|
||
**Why this works**: Vision's person segmentation is robust and well-documented for iOS 17+.
|
||
|
||
**User flow**:
|
||
```
|
||
Tap person → Mask preview shown → Confirm → Inpaint → Done
|
||
```
|
||
|
||
**Edge cases**:
|
||
- Multiple people: user taps each individually or uses "select all people" option
|
||
- Partial occlusion: Vision still provides usable mask; user can refine with brush
|
||
- No person detected: show "No person found at tap location" feedback
|
||
|
||
---
|
||
|
||
#### 2. Foreground Object Removal
|
||
|
||
**How it works**:
|
||
1. User taps an object
|
||
2. `VNGenerateForegroundInstanceMaskRequest` attempts to isolate it
|
||
3. If successful, mask is used for inpainting
|
||
4. If Vision returns no mask (object not salient), fall back to smart brush
|
||
|
||
**Smart brush fallback**:
|
||
- User paints rough selection over object
|
||
- App refines selection to nearest strong edges using gradient magnitude analysis
|
||
- User confirms refined mask
|
||
|
||
**Why this works**: Vision detects visually distinct foreground regions. It doesn't know *what* the object is, but it can separate it from the background if there's sufficient contrast.
|
||
|
||
**Limitations** (be explicit with users):
|
||
- Works best on objects that stand out from their background
|
||
- Low-contrast objects require manual brush selection
|
||
- App cannot identify object types — it sees shapes, not meanings
|
||
|
||
**User flow**:
|
||
```
|
||
Tap object → Vision attempts mask
|
||
├─ Success → Mask preview → Confirm → Inpaint
|
||
└─ Failure → "Use brush to select" prompt → User paints → Edge refinement → Confirm → Inpaint
|
||
```
|
||
|
||
---
|
||
|
||
#### 3. Wire & Line Removal
|
||
|
||
**How it works**:
|
||
1. User taps near a wire or line
|
||
2. `VNDetectContoursRequest` returns all detected contours
|
||
3. App scores contours by:
|
||
- Proximity to tap point
|
||
- Aspect ratio (thin and elongated)
|
||
- Straightness / low curvature
|
||
- Length (longer scores higher)
|
||
4. Best-scoring contour becomes mask
|
||
5. Mask is expanded to configurable width (default 6px, range 2–20px)
|
||
6. Inpaint along the mask
|
||
|
||
**Line brush fallback**:
|
||
When contour detection fails (low contrast, busy background):
|
||
- User switches to "Line brush" tool
|
||
- User draws along the wire
|
||
- App maintains consistent stroke width
|
||
- Stroke becomes mask for inpainting
|
||
|
||
**Why this works**: Power lines against sky have strong edges that `VNDetectContoursRequest` captures reliably. The scoring heuristics select the most "wire-like" contour.
|
||
|
||
**Limitations**:
|
||
- High-contrast lines (sky background): works well
|
||
- Low-contrast lines (against buildings, trees): requires manual line brush
|
||
- Curved wires: contour detection still works; scoring allows moderate curvature
|
||
|
||
**User flow**:
|
||
```
|
||
Tap near wire → Contour analysis
|
||
├─ Match found → Highlight line → Confirm → Inpaint
|
||
└─ No match → "Use line brush" prompt → User draws → Inpaint
|
||
```
|
||
|
||
---
|
||
|
||
### Inpainting Engine (Metal)
|
||
|
||
Since there's no public Apple API for content-aware fill, you must implement this yourself.
|
||
|
||
**Algorithm**: Exemplar-based inpainting (Criminisi-style)
|
||
|
||
**Why this approach**:
|
||
- Deterministic (same input → same output)
|
||
- Handles textures reasonably well
|
||
- No ML required
|
||
- Well-documented in academic literature
|
||
|
||
**Pipeline**:
|
||
```
|
||
1. Input: source image + binary mask
|
||
2. Dilate mask by 2–4px (capture edge pixels)
|
||
3. Feather mask edges (gaussian blur on alpha)
|
||
4. Build image pyramid (for preview vs export)
|
||
5. For each pixel on mask boundary (priority order):
|
||
a. Find best matching patch from known region
|
||
b. Copy patch into unknown region
|
||
c. Update boundary
|
||
6. Final edge-aware blend to reduce seams
|
||
```
|
||
|
||
**Performance targets**:
|
||
|
||
| Resolution | Target Time | Device Baseline |
|
||
|------------|-------------|-----------------|
|
||
| Preview (2048px) | < 300ms | iPhone 12 / A14 |
|
||
| Export (12MP) | < 4 seconds | iPhone 12 / A14 |
|
||
| Export (48MP) | < 12 seconds | iPhone 15 Pro / A17 |
|
||
|
||
**Memory management**:
|
||
- Tile-based processing for images > 12MP
|
||
- Peak memory budget: 1.5GB
|
||
- Release intermediate textures aggressively
|
||
|
||
---
|
||
|
||
### Data Model (Non-Destructive Editing)
|
||
|
||
**Principles**:
|
||
- Original image is never modified
|
||
- All edits stored as an operation stack
|
||
- Full undo/redo support
|
||
|
||
**Operation types**:
|
||
```swift
|
||
enum EditOperation: Codable {
|
||
case mask(MaskOperation)
|
||
case inpaint(InpaintOperation)
|
||
case adjustment(AdjustmentOperation)
|
||
}
|
||
|
||
struct MaskOperation: Codable {
|
||
let id: UUID
|
||
let toolType: ToolType // .person, .object, .wire, .brush
|
||
let maskData: Data // compressed R8 texture
|
||
let timestamp: Date
|
||
}
|
||
|
||
struct InpaintOperation: Codable {
|
||
let id: UUID
|
||
let maskOperationId: UUID
|
||
let patchRadius: Int
|
||
let featherAmount: Float
|
||
let timestamp: Date
|
||
}
|
||
```
|
||
|
||
**Persistence**:
|
||
- Project saved as JSON (operation stack) + original image reference
|
||
- Store PHAsset local identifier when sourced from Photos
|
||
- Store embedded image data when imported from Files
|
||
- Cached previews marked `isExcludedFromBackup = true`
|
||
|
||
---
|
||
|
||
### UI Specification
|
||
|
||
**Main Canvas**:
|
||
- Pinch to zoom, pan to navigate
|
||
- Mask overlay toggle (red tint / marching ants / hidden)
|
||
- Before/after comparison (long press or toggle)
|
||
|
||
**Toolbar**:
|
||
|
||
| Tool | Icon | Behavior |
|
||
|------|------|----------|
|
||
| Person | 👤 | Tap to select/remove people |
|
||
| Object | ⬭ | Tap to select foreground objects |
|
||
| Wire | ⚡ | Tap to select lines/wires |
|
||
| Brush | 🖌 | Manual selection for fallback |
|
||
| Undo | ↩ | Step back in operation stack |
|
||
| Redo | ↪ | Step forward in operation stack |
|
||
|
||
**Inspector Panel** (contextual):
|
||
- Brush size slider (when brush active)
|
||
- Feather amount slider
|
||
- Mask expansion slider (for wire tool)
|
||
- "Refine edges" toggle
|
||
|
||
**Feedback states**:
|
||
- Processing: show spinner on affected region
|
||
- No detection: toast message with fallback suggestion
|
||
- Success: brief checkmark animation
|
||
|
||
---
|
||
|
||
### Error Handling
|
||
|
||
| Scenario | Response |
|
||
|----------|----------|
|
||
| Vision returns no mask | "Couldn't detect object. Try the brush tool to select manually." |
|
||
| Vision returns low-confidence mask | Show mask preview with "Does this look right?" confirmation |
|
||
| Contour detection finds no lines | "No lines detected. Use the line brush to draw along the wire." |
|
||
| Inpaint produces visible seams | Offer "Refine" button that expands mask and re-runs |
|
||
| Memory pressure during export | "Image too large to process. Try cropping first." |
|
||
| Metal unavailable | Fall back to Accelerate with "Processing may be slower" warning |
|
||
|
||
---
|
||
|
||
### Privacy & Permissions
|
||
|
||
- All processing on-device
|
||
- No network calls for core functionality
|
||
- Photo library access via `PHPickerViewController` (limited access supported)
|
||
- Request write permission only when user saves
|
||
- No analytics or telemetry in core features
|
||
|
||
---
|
||
|
||
### Accessibility
|
||
|
||
- All tools labeled for VoiceOver
|
||
- Brush size adjustable via stepper (not just slider)
|
||
- High contrast mask visualization option
|
||
- Reduce Motion: disable transition animations
|
||
- Dynamic Type support in all UI text
|
||
|
||
---
|
||
|
||
### Testing Requirements
|
||
|
||
| Test Type | Coverage |
|
||
|-----------|----------|
|
||
| Unit | Edit stack operations, mask combination logic, contour scoring |
|
||
| Snapshot | Inpaint engine (reference images with known outputs) |
|
||
| UI | Full flow: import → edit → export |
|
||
| Performance | Render times on A14, A15, A17 devices |
|
||
| Memory | Peak usage during 48MP export |
|
||
|
||
---
|
||
|
||
### Project Structure
|
||
|
||
```
|
||
CheapRetouch/
|
||
├── App/
|
||
│ └── CheapRetouchApp.swift
|
||
├── Features/
|
||
│ ├── Editor/
|
||
│ │ ├── PhotoEditorView.swift
|
||
│ │ ├── CanvasView.swift
|
||
│ │ └── ToolbarView.swift
|
||
│ └── Export/
|
||
│ └── ExportView.swift
|
||
├── Services/
|
||
│ ├── MaskingService.swift // Vision/VisionKit wrappers
|
||
│ ├── ContourService.swift // Line detection + scoring
|
||
│ └── InpaintEngine/
|
||
│ ├── InpaintEngine.swift // Public interface
|
||
│ ├── Shaders.metal // Metal kernels
|
||
│ └── PatchMatch.swift // Algorithm implementation
|
||
├── Models/
|
||
│ ├── EditOperation.swift
|
||
│ ├── Project.swift
|
||
│ └── MaskData.swift
|
||
├── Utilities/
|
||
│ ├── ImagePipeline.swift // Preview/export rendering
|
||
│ └── EdgeRefinement.swift // Smart brush edge detection
|
||
└── Resources/
|
||
└── Assets.xcassets
|
||
```
|
||
|
||
---
|
||
|
||
### What's Explicitly Out of Scope
|
||
|
||
| Feature | Reason |
|
||
|---------|--------|
|
||
| Automatic fence/mesh detection | Requires semantic understanding (ML) |
|
||
| Object type identification | Requires classification (ML) |
|
||
| "Find all X in photo" | Requires semantic search (ML) |
|
||
| Blemish/skin retouching | Removed to keep scope focused; could add later |
|
||
| Background replacement | Different feature set; out of scope for v1 |
|
||
|
||
---
|
||
|
||
### Summary
|
||
|
||
This spec delivers three solid features using only public APIs:
|
||
|
||
1. **Person removal** — Vision handles the hard part
|
||
2. **Object removal** — Vision-assisted with brush fallback
|
||
3. **Wire removal** — Contour detection with line brush fallback
|
||
|
||
Each feature has a clear primary path and a fallback for when detection fails. The user is never stuck.
|