jared/ItemSense

Files

jared da6b6ddcd3 Port ItemSense to iOS

2026-01-21 15:41:18 -05:00

1.7 KiB

Raw Permalink Blame History

Implementation Plan: ItemSense MVP

Goal Description

Build a desktop application to identify items using the webcam and OpenAI's visual capabilities, using a native macOS UI (PyObjC).

User Review Required

Technology Shift: Switching from Tkinter to PyObjC (AppKit).
Camera Strategy: Using OpenCV for frame capture and bridging to AppKit for display. This keeps the implementation simpler than writing a raw AVFoundation delegate in Python while maintaining a native UI.

Proposed Changes

Spec 001: Core UI & Camera Feed

[NEW] main.py

Initialize NSApplication and NSWindow (AppKit).
Implement a custom AppDelegate to handle app lifecycle.
Integrate OpenCV (cv2) for webcam capture.
Display video frames in an NSImageView.

[NEW] requirements.txt

pyobjc-framework-Cocoa
opencv-python
Pillow (for easier image data manipulation if needed)

Spec 002: OpenAI Vision Integration

[MODIFY] main.py

Add Capture button (NSButton) to the UI.
Implement logic to snapshot the current OpenCV frame.
Run OpenAI API request in a background thread to prevent UI freezing.
Send image to OpenAI API.

[MODIFY] requirements.txt

Add openai
Add python-dotenv

Spec 003: Result Display

[MODIFY] main.py

Add NSTextView (in NSScrollView) for results.
Add "Scan Another" button logic.
Ensure UI layout manages state transitions cleanly (Live vs Result).

Verification Plan

Manual Verification

Launch: Run python main.py.
Native Look: Verify the window uses native macOS controls.
Feed: Verify camera feed is smooth and correctly oriented.
Flow: Capture -> Processing -> Result -> Scan Another.