48 lines
1.7 KiB
Markdown
48 lines
1.7 KiB
Markdown
# Implementation Plan: ItemSense MVP
|
|
|
|
## Goal Description
|
|
Build a desktop application to identify items using the webcam and OpenAI's visual capabilities, using a **native macOS UI (PyObjC)**.
|
|
|
|
## User Review Required
|
|
- **Technology Shift**: Switching from Tkinter to PyObjC (AppKit).
|
|
- **Camera Strategy**: Using OpenCV for frame capture and bridging to AppKit for display. This keeps the implementation simpler than writing a raw AVFoundation delegate in Python while maintaining a native UI.
|
|
|
|
## Proposed Changes
|
|
|
|
### Spec 001: Core UI & Camera Feed
|
|
#### [NEW] main.py
|
|
- Initialize `NSApplication` and `NSWindow` (AppKit).
|
|
- Implement a custom `AppDelegate` to handle app lifecycle.
|
|
- Integrate OpenCV (`cv2`) for webcam capture.
|
|
- Display video frames in an `NSImageView`.
|
|
|
|
#### [NEW] requirements.txt
|
|
- `pyobjc-framework-Cocoa`
|
|
- `opencv-python`
|
|
- `Pillow` (for easier image data manipulation if needed)
|
|
|
|
### Spec 002: OpenAI Vision Integration
|
|
#### [MODIFY] main.py
|
|
- Add `Capture` button (`NSButton`) to the UI.
|
|
- Implement logic to snapshot the current OpenCV frame.
|
|
- Run OpenAI API request in a background thread to prevent UI freezing.
|
|
- Send image to OpenAI API.
|
|
|
|
#### [MODIFY] requirements.txt
|
|
- Add `openai`
|
|
- Add `python-dotenv`
|
|
|
|
### Spec 003: Result Display
|
|
#### [MODIFY] main.py
|
|
- Add `NSTextView` (in `NSScrollView`) for results.
|
|
- Add "Scan Another" button logic.
|
|
- Ensure UI layout manages state transitions cleanly (Live vs Result).
|
|
|
|
## Verification Plan
|
|
|
|
### Manual Verification
|
|
1. **Launch**: Run `python main.py`.
|
|
2. **Native Look**: Verify the window uses native macOS controls.
|
|
3. **Feed**: Verify camera feed is smooth and correctly oriented.
|
|
4. **Flow**: Capture -> Processing -> Result -> Scan Another.
|