Initial commit: MediaPipe landmarks demo

HTML demos for face, hand, gesture, and posture tracking using MediaPipe. Includes Python CLI tools for processing video files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 22:38:40 -05:00
commit 8bcc62b045
22 changed files with 2347 additions and 0 deletions
--- a/Training_handshape_B.md
+++ b/Training_handshape_B.md
@@ -0,0 +1,179 @@
+Let’s add a custom gesture for the **ASL letter “B”** (flat hand, fingers together, thumb folded across the palm) using MediaPipe **Gesture Recognizer (Model Maker)**.
+
+# Plan (what you’ll build)
+
+* A custom model with a new class label, e.g. `ASL_B`, plus the required `none` class.
+* A small, labeled image dataset (Model Maker will extract hand landmarks for you).
+* A trained `.task` file you can drop into your Python/JS app and allowlist.
+
+---
+
+# 1) Pick labels
+
+Use:
+
+* `ASL_B`  ← your new gesture
+* `none`   ← anything that’s not one of your target gestures (mandatory)
+
+Folder layout:
+
+```
+dataset/
+  ASL_B/
+    ...images...
+  none/
+    ...images...
+```
+
+---
+
+# 2) Collect the right data (what to capture)
+
+Target handshape for **B**:
+
+* **Fingers**: index–pinky fully extended and **pressed together**
+* **Thumb**: folded across palm (not sticking out to the side)
+* **Palm**: facing camera (front) and also a few angles
+
+Suggested minimums (per label):
+
+| Bucket                                              | Shots                |
+| --------------------------------------------------- | -------------------- |
+| Distances: close (\~40–60 cm), medium (\~80–120 cm) | 80                   |
+| View angles: front, \~30°, \~60° yaw                | 80                   |
+| Rotations: slight roll/tilt                         | 40                   |
+| Lighting: bright, dim, backlit                      | 40                   |
+| Backgrounds: plain wall, cluttered office/outdoor   | 40                   |
+| Hands: left & right (both)                          | included across all  |
+| Skin tones / several people                         | as many as practical |
+
+Do **at least \~300–500** `ASL_B` images to start.
+For **`none`**, include: open palm (“High-Five”), slightly spread fingers, thumbs-up, fist, pointing, random objects/background frames, other ASL letters—especially **Open\_Palm** look-alikes so the model learns “not B”.
+
+Quick ways to get images:
+
+* Record short clips on laptop/phone and extract frames (e.g., 2 fps).
+* Ask 3–5 colleagues to contribute a short 10–20s clip each.
+
+Frame extraction example:
+
+```bash
+# Extract 2 frames/sec from a video into dataset/ASL_B/
+ffmpeg -i b_sign.mov -vf fps=2 dataset/ASL_B/b_%05d.jpg
+# Do the same for negatives into dataset/none/
+```
+
+---
+
+# 3) Train with Model Maker (Python)
+
+Create and activate a venv, then:
+
+```bash
+pip install --upgrade pip
+pip install mediapipe-model-maker
+```
+
+Training script (save as `train_asl_b.py` and run it):
+
+```python
+from mediapipe_model_maker import gesture_recognizer as gr
+
+DATA_DIR = "dataset"
+EXPORT_DIR = "exported_model"
+
+# Load & auto-preprocess (runs hand detection, keeps images with a detected hand)
+data = gr.Dataset.from_folder(
+    dirname=DATA_DIR,
+    hparams=gr.HandDataPreprocessingParams(  # you can tweak these if needed
+        min_detection_confidence=0.5
+    )
+)
+
+# Split
+train_data, rest = data.split(0.8)
+validation_data, test_data = rest.split(0.5)
+
+# Hyperparameters (start small; bump epochs if needed)
+hparams = gr.HParams(
+    export_dir=EXPORT_DIR,
+    epochs=12,
+    batch_size=16,
+    learning_rate=0.001,
+)
+
+# Optional model head size & dropout
+options = gr.GestureRecognizerOptions(
+    hparams=hparams,
+    model_options=gr.ModelOptions(layer_widths=[128, 64], dropout_rate=0.1)
+)
+
+model = gr.GestureRecognizer.create(
+    train_data=train_data,
+    validation_data=validation_data,
+    options=options
+)
+
+# Evaluate
+loss, acc = model.evaluate(test_data, batch_size=1)
+print(f"Test loss={loss:.4f}, acc={acc:.4f}")
+
+# Export .task
+model.export_model()   # writes exported_model/gesture_recognizer.task
+print("Exported:", EXPORT_DIR + "/gesture_recognizer.task")
+```
+
+Tips:
+
+* If many `ASL_B` images get dropped at load time (no hand detected), back up the camera a little or ensure the whole hand is visible.
+* If `none` is weak, add more “near-miss” negatives: open palm with fingers slightly apart, thumb slightly out, partial occlusions.
+
+---
+
+# 4) Plug it into your app
+
+**Python (Tasks API example):**
+
+```python
+import mediapipe as mp
+BaseOptions = mp.tasks.BaseOptions
+GestureRecognizer = mp.tasks.vision.GestureRecognizer
+GestureRecognizerOptions = mp.tasks.vision.GestureRecognizerOptions
+VisionRunningMode = mp.tasks.vision.RunningMode
+ClassifierOptions = mp.tasks.components.processors.ClassifierOptions
+
+options = GestureRecognizerOptions(
+    base_options=BaseOptions(model_asset_path="exported_model/gesture_recognizer.task"),
+    running_mode=VisionRunningMode.LIVE_STREAM,
+    custom_gesture_classifier_options=ClassifierOptions(
+        score_threshold=0.6,                 # tighten until false positives drop
+        category_allowlist=["ASL_B"]         # only report your class
+    ),
+)
+recognizer = GestureRecognizer.create_from_options(options)
+```
+
+**Web (JS):**
+
+```js
+const recognizer = await GestureRecognizer.createFromOptions(fileset, {
+  baseOptions: { modelAssetPath: "exported_model/gesture_recognizer.task" },
+  runningMode: "LIVE_STREAM",
+  customGesturesClassifierOptions: {
+    scoreThreshold: 0.6,
+    categoryAllowlist: ["ASL_B"]
+  }
+});
+```
+
+---
+
+# 5) Troubleshooting & tuning
+
+* **False positives with Open Palm:** Add more `none` examples where fingers are together but **thumb is visible** to the side. The model needs to see “almost B but not B.”
+* **Left vs right hand:** Include both in training. If you only trained on right hands, left hands may underperform.
+* **Distance issues:** If far-away hands fail, capture more medium/far shots. Landmarks get noisier when small.
+* **Thresholds:** Raise `score_threshold` to reduce spurious detections; lower it if you miss true B’s.
+* **Confusion matrix:** If accuracy is fine but live results wobble, collect more from the exact camera/lighting you’ll use.
+
+---