Files
slr_google_landmarks_demo/Training_handshape_B.md
jared 8bcc62b045 Initial commit: MediaPipe landmarks demo
HTML demos for face, hand, gesture, and posture tracking using MediaPipe.
Includes Python CLI tools for processing video files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 22:38:40 -05:00

180 lines
5.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Lets add a custom gesture for the **ASL letter “B”** (flat hand, fingers together, thumb folded across the palm) using MediaPipe **Gesture Recognizer (Model Maker)**.
# Plan (what youll build)
* A custom model with a new class label, e.g. `ASL_B`, plus the required `none` class.
* A small, labeled image dataset (Model Maker will extract hand landmarks for you).
* A trained `.task` file you can drop into your Python/JS app and allowlist.
---
# 1) Pick labels
Use:
* `ASL_B` ← your new gesture
* `none` ← anything thats not one of your target gestures (mandatory)
Folder layout:
```
dataset/
ASL_B/
...images...
none/
...images...
```
---
# 2) Collect the right data (what to capture)
Target handshape for **B**:
* **Fingers**: indexpinky fully extended and **pressed together**
* **Thumb**: folded across palm (not sticking out to the side)
* **Palm**: facing camera (front) and also a few angles
Suggested minimums (per label):
| Bucket | Shots |
| --------------------------------------------------- | -------------------- |
| Distances: close (\~4060 cm), medium (\~80120 cm) | 80 |
| View angles: front, \~30°, \~60° yaw | 80 |
| Rotations: slight roll/tilt | 40 |
| Lighting: bright, dim, backlit | 40 |
| Backgrounds: plain wall, cluttered office/outdoor | 40 |
| Hands: left & right (both) | included across all |
| Skin tones / several people | as many as practical |
Do **at least \~300500** `ASL_B` images to start.
For **`none`**, include: open palm (“High-Five”), slightly spread fingers, thumbs-up, fist, pointing, random objects/background frames, other ASL letters—especially **Open\_Palm** look-alikes so the model learns “not B”.
Quick ways to get images:
* Record short clips on laptop/phone and extract frames (e.g., 2 fps).
* Ask 35 colleagues to contribute a short 1020s clip each.
Frame extraction example:
```bash
# Extract 2 frames/sec from a video into dataset/ASL_B/
ffmpeg -i b_sign.mov -vf fps=2 dataset/ASL_B/b_%05d.jpg
# Do the same for negatives into dataset/none/
```
---
# 3) Train with Model Maker (Python)
Create and activate a venv, then:
```bash
pip install --upgrade pip
pip install mediapipe-model-maker
```
Training script (save as `train_asl_b.py` and run it):
```python
from mediapipe_model_maker import gesture_recognizer as gr
DATA_DIR = "dataset"
EXPORT_DIR = "exported_model"
# Load & auto-preprocess (runs hand detection, keeps images with a detected hand)
data = gr.Dataset.from_folder(
dirname=DATA_DIR,
hparams=gr.HandDataPreprocessingParams( # you can tweak these if needed
min_detection_confidence=0.5
)
)
# Split
train_data, rest = data.split(0.8)
validation_data, test_data = rest.split(0.5)
# Hyperparameters (start small; bump epochs if needed)
hparams = gr.HParams(
export_dir=EXPORT_DIR,
epochs=12,
batch_size=16,
learning_rate=0.001,
)
# Optional model head size & dropout
options = gr.GestureRecognizerOptions(
hparams=hparams,
model_options=gr.ModelOptions(layer_widths=[128, 64], dropout_rate=0.1)
)
model = gr.GestureRecognizer.create(
train_data=train_data,
validation_data=validation_data,
options=options
)
# Evaluate
loss, acc = model.evaluate(test_data, batch_size=1)
print(f"Test loss={loss:.4f}, acc={acc:.4f}")
# Export .task
model.export_model() # writes exported_model/gesture_recognizer.task
print("Exported:", EXPORT_DIR + "/gesture_recognizer.task")
```
Tips:
* If many `ASL_B` images get dropped at load time (no hand detected), back up the camera a little or ensure the whole hand is visible.
* If `none` is weak, add more “near-miss” negatives: open palm with fingers slightly apart, thumb slightly out, partial occlusions.
---
# 4) Plug it into your app
**Python (Tasks API example):**
```python
import mediapipe as mp
BaseOptions = mp.tasks.BaseOptions
GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerOptions = mp.tasks.vision.GestureRecognizerOptions
VisionRunningMode = mp.tasks.vision.RunningMode
ClassifierOptions = mp.tasks.components.processors.ClassifierOptions
options = GestureRecognizerOptions(
base_options=BaseOptions(model_asset_path="exported_model/gesture_recognizer.task"),
running_mode=VisionRunningMode.LIVE_STREAM,
custom_gesture_classifier_options=ClassifierOptions(
score_threshold=0.6, # tighten until false positives drop
category_allowlist=["ASL_B"] # only report your class
),
)
recognizer = GestureRecognizer.create_from_options(options)
```
**Web (JS):**
```js
const recognizer = await GestureRecognizer.createFromOptions(fileset, {
baseOptions: { modelAssetPath: "exported_model/gesture_recognizer.task" },
runningMode: "LIVE_STREAM",
customGesturesClassifierOptions: {
scoreThreshold: 0.6,
categoryAllowlist: ["ASL_B"]
}
});
```
---
# 5) Troubleshooting & tuning
* **False positives with Open Palm:** Add more `none` examples where fingers are together but **thumb is visible** to the side. The model needs to see “almost B but not B.”
* **Left vs right hand:** Include both in training. If you only trained on right hands, left hands may underperform.
* **Distance issues:** If far-away hands fail, capture more medium/far shots. Landmarks get noisier when small.
* **Thresholds:** Raise `score_threshold` to reduce spurious detections; lower it if you miss true Bs.
* **Confusion matrix:** If accuracy is fine but live results wobble, collect more from the exact camera/lighting youll use.
---