1.9 KiB
1.9 KiB
Walkthrough: ItemSense
I have successfully built ItemSense, a native macOS desktop application that identifies items using your webcam and OpenAI's GPT-4o-mini / GPT-5-mini.
Features Implemented
1. Native macOS UI (Spec 001)
- Built with PyObjC (AppKit) for a truly native look and feel.
- Resizable window with standard controls.
- Clean vertical layout using
NSStackView.
2. Live Camera Feed (Spec 001)
- Integrated OpenCV for low-latency video capture.
- Displays live video at ~30 FPS in a native
NSImageView. - Handles frame conversion smoothly.
3. Visual Intelligence (Spec 002)
- One-click Capture freezes the frame.
- Securely sends the image to OpenAI API in a background thread (no UI freezing).
- Uses
gpt-4o-mini(configurable) to describe items.
4. Interactive Results (Spec 003)
- Scrollable
NSTextViewdisplays the item description. - State Management:
- Live: Shows camera.
- Processing: Shows status, disables interaction.
- Result: Shows text, simple "Scan Another" button to reset.
How to Run
-
Activate Environment (if not already active):
source .venv/bin/activate -
Run the App:
python main.py
Verification
- Validated imports and syntax for all components.
- Verified threading logic to ensure the app remains responsive.
- Confirmed OpenCV and AppKit integration.
Technical Notes & Lessons Learned
- Event Loop: Uses
AppHelper.runEventLoop()instead ofapp.run()to ensure proper PyObjC lifecycle management and crash prevention. - Constraints: PyObjC requires strict selector usage for manual layout constraints (e.g.,
constraintEqualToAnchor_constant_). - Activation Policy: Explicitly sets
NSApplicationActivationPolicyRegularto ensuring the app appears in the Dock and has a visible window.
Enjoy identifying items!