1.7 KiB
1.7 KiB
Implementation Plan: ItemSense MVP
Goal Description
Build a desktop application to identify items using the webcam and OpenAI's visual capabilities, using a native macOS UI (PyObjC).
User Review Required
- Technology Shift: Switching from Tkinter to PyObjC (AppKit).
- Camera Strategy: Using OpenCV for frame capture and bridging to AppKit for display. This keeps the implementation simpler than writing a raw AVFoundation delegate in Python while maintaining a native UI.
Proposed Changes
Spec 001: Core UI & Camera Feed
[NEW] main.py
- Initialize
NSApplicationandNSWindow(AppKit). - Implement a custom
AppDelegateto handle app lifecycle. - Integrate OpenCV (
cv2) for webcam capture. - Display video frames in an
NSImageView.
[NEW] requirements.txt
pyobjc-framework-Cocoaopencv-pythonPillow(for easier image data manipulation if needed)
Spec 002: OpenAI Vision Integration
[MODIFY] main.py
- Add
Capturebutton (NSButton) to the UI. - Implement logic to snapshot the current OpenCV frame.
- Run OpenAI API request in a background thread to prevent UI freezing.
- Send image to OpenAI API.
[MODIFY] requirements.txt
- Add
openai - Add
python-dotenv
Spec 003: Result Display
[MODIFY] main.py
- Add
NSTextView(inNSScrollView) for results. - Add "Scan Another" button logic.
- Ensure UI layout manages state transitions cleanly (Live vs Result).
Verification Plan
Manual Verification
- Launch: Run
python main.py. - Native Look: Verify the window uses native macOS controls.
- Feed: Verify camera feed is smooth and correctly oriented.
- Flow: Capture -> Processing -> Result -> Scan Another.