# Implementation Plan: ItemSense MVP ## Goal Description Build a desktop application to identify items using the webcam and OpenAI's visual capabilities, using a **native macOS UI (PyObjC)**. ## User Review Required - **Technology Shift**: Switching from Tkinter to PyObjC (AppKit). - **Camera Strategy**: Using OpenCV for frame capture and bridging to AppKit for display. This keeps the implementation simpler than writing a raw AVFoundation delegate in Python while maintaining a native UI. ## Proposed Changes ### Spec 001: Core UI & Camera Feed #### [NEW] main.py - Initialize `NSApplication` and `NSWindow` (AppKit). - Implement a custom `AppDelegate` to handle app lifecycle. - Integrate OpenCV (`cv2`) for webcam capture. - Display video frames in an `NSImageView`. #### [NEW] requirements.txt - `pyobjc-framework-Cocoa` - `opencv-python` - `Pillow` (for easier image data manipulation if needed) ### Spec 002: OpenAI Vision Integration #### [MODIFY] main.py - Add `Capture` button (`NSButton`) to the UI. - Implement logic to snapshot the current OpenCV frame. - Run OpenAI API request in a background thread to prevent UI freezing. - Send image to OpenAI API. #### [MODIFY] requirements.txt - Add `openai` - Add `python-dotenv` ### Spec 003: Result Display #### [MODIFY] main.py - Add `NSTextView` (in `NSScrollView`) for results. - Add "Scan Another" button logic. - Ensure UI layout manages state transitions cleanly (Live vs Result). ## Verification Plan ### Manual Verification 1. **Launch**: Run `python main.py`. 2. **Native Look**: Verify the window uses native macOS controls. 3. **Feed**: Verify camera feed is smooth and correctly oriented. 4. **Flow**: Capture -> Processing -> Result -> Scan Another.