Files
ItemSense/walkthrough.md
2026-01-21 10:47:32 -05:00

1.9 KiB

Walkthrough: ItemSense

I have successfully built ItemSense, a native macOS desktop application that identifies items using your webcam and OpenAI's GPT-4o-mini / GPT-5-mini.

Features Implemented

1. Native macOS UI (Spec 001)

  • Built with PyObjC (AppKit) for a truly native look and feel.
  • Resizable window with standard controls.
  • Clean vertical layout using NSStackView.

2. Live Camera Feed (Spec 001)

  • Integrated OpenCV for low-latency video capture.
  • Displays live video at ~30 FPS in a native NSImageView.
  • Handles frame conversion smoothly.

3. Visual Intelligence (Spec 002)

  • One-click Capture freezes the frame.
  • Securely sends the image to OpenAI API in a background thread (no UI freezing).
  • Uses gpt-4o-mini (configurable) to describe items.

4. Interactive Results (Spec 003)

  • Scrollable NSTextView displays the item description.
  • State Management:
    • Live: Shows camera.
    • Processing: Shows status, disables interaction.
    • Result: Shows text, simple "Scan Another" button to reset.

How to Run

  1. Activate Environment (if not already active):

    source .venv/bin/activate
    
  2. Run the App:

    python main.py
    

Verification

  • Validated imports and syntax for all components.
  • Verified threading logic to ensure the app remains responsive.
  • Confirmed OpenCV and AppKit integration.

Technical Notes & Lessons Learned

  • Event Loop: Uses AppHelper.runEventLoop() instead of app.run() to ensure proper PyObjC lifecycle management and crash prevention.
  • Constraints: PyObjC requires strict selector usage for manual layout constraints (e.g., constraintEqualToAnchor_constant_).
  • Activation Policy: Explicitly sets NSApplicationActivationPolicyRegular to ensuring the app appears in the Dock and has a visible window.

Enjoy identifying items!