Files
ItemSense/implementation_plan.md

1.7 KiB

Implementation Plan: ItemSense MVP

Goal Description

Build a desktop application to identify items using the webcam and OpenAI's visual capabilities, using a native macOS UI (PyObjC).

User Review Required

  • Technology Shift: Switching from Tkinter to PyObjC (AppKit).
  • Camera Strategy: Using OpenCV for frame capture and bridging to AppKit for display. This keeps the implementation simpler than writing a raw AVFoundation delegate in Python while maintaining a native UI.

Proposed Changes

Spec 001: Core UI & Camera Feed

[NEW] main.py

  • Initialize NSApplication and NSWindow (AppKit).
  • Implement a custom AppDelegate to handle app lifecycle.
  • Integrate OpenCV (cv2) for webcam capture.
  • Display video frames in an NSImageView.

[NEW] requirements.txt

  • pyobjc-framework-Cocoa
  • opencv-python
  • Pillow (for easier image data manipulation if needed)

Spec 002: OpenAI Vision Integration

[MODIFY] main.py

  • Add Capture button (NSButton) to the UI.
  • Implement logic to snapshot the current OpenCV frame.
  • Run OpenAI API request in a background thread to prevent UI freezing.
  • Send image to OpenAI API.

[MODIFY] requirements.txt

  • Add openai
  • Add python-dotenv

Spec 003: Result Display

[MODIFY] main.py

  • Add NSTextView (in NSScrollView) for results.
  • Add "Scan Another" button logic.
  • Ensure UI layout manages state transitions cleanly (Live vs Result).

Verification Plan

Manual Verification

  1. Launch: Run python main.py.
  2. Native Look: Verify the window uses native macOS controls.
  3. Feed: Verify camera feed is smooth and correctly oriented.
  4. Flow: Capture -> Processing -> Result -> Scan Another.