Files

jared 8bcc62b045 Initial commit: MediaPipe landmarks demo

HTML demos for face, hand, gesture, and posture tracking using MediaPipe.
Includes Python CLI tools for processing video files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-19 22:38:40 -05:00

5.7 KiB

Raw Permalink Blame History

Let’s add a custom gesture for the ASL letter “B” (flat hand, fingers together, thumb folded across the palm) using MediaPipe Gesture Recognizer (Model Maker).

Plan (what you’ll build)

A custom model with a new class label, e.g. ASL_B, plus the required none class.
A small, labeled image dataset (Model Maker will extract hand landmarks for you).
A trained .task file you can drop into your Python/JS app and allowlist.

1) Pick labels

Use:

ASL_B ← your new gesture
none ← anything that’s not one of your target gestures (mandatory)

Folder layout:

dataset/
  ASL_B/
    ...images...
  none/
    ...images...

2) Collect the right data (what to capture)

Target handshape for B:

Fingers: index–pinky fully extended and pressed together
Thumb: folded across palm (not sticking out to the side)
Palm: facing camera (front) and also a few angles

Suggested minimums (per label):

Bucket	Shots
Distances: close (~40–60 cm), medium (~80–120 cm)	80
View angles: front, ~30°, ~60° yaw	80
Rotations: slight roll/tilt	40
Lighting: bright, dim, backlit	40
Backgrounds: plain wall, cluttered office/outdoor	40
Hands: left & right (both)	included across all
Skin tones / several people	as many as practical

Do at least ~300–500 ASL_B images to start. For none, include: open palm (“High-Five”), slightly spread fingers, thumbs-up, fist, pointing, random objects/background frames, other ASL letters—especially Open_Palm look-alikes so the model learns “not B”.

Quick ways to get images:

Record short clips on laptop/phone and extract frames (e.g., 2 fps).
Ask 3–5 colleagues to contribute a short 10–20s clip each.

Frame extraction example:

# Extract 2 frames/sec from a video into dataset/ASL_B/
ffmpeg -i b_sign.mov -vf fps=2 dataset/ASL_B/b_%05d.jpg
# Do the same for negatives into dataset/none/

3) Train with Model Maker (Python)

Create and activate a venv, then:

pip install --upgrade pip
pip install mediapipe-model-maker

Training script (save as train_asl_b.py and run it):

from mediapipe_model_maker import gesture_recognizer as gr

DATA_DIR = "dataset"
EXPORT_DIR = "exported_model"

# Load & auto-preprocess (runs hand detection, keeps images with a detected hand)
data = gr.Dataset.from_folder(
    dirname=DATA_DIR,
    hparams=gr.HandDataPreprocessingParams(  # you can tweak these if needed
        min_detection_confidence=0.5
    )
)

# Split
train_data, rest = data.split(0.8)
validation_data, test_data = rest.split(0.5)

# Hyperparameters (start small; bump epochs if needed)
hparams = gr.HParams(
    export_dir=EXPORT_DIR,
    epochs=12,
    batch_size=16,
    learning_rate=0.001,
)

# Optional model head size & dropout
options = gr.GestureRecognizerOptions(
    hparams=hparams,
    model_options=gr.ModelOptions(layer_widths=[128, 64], dropout_rate=0.1)
)

model = gr.GestureRecognizer.create(
    train_data=train_data,
    validation_data=validation_data,
    options=options
)

# Evaluate
loss, acc = model.evaluate(test_data, batch_size=1)
print(f"Test loss={loss:.4f}, acc={acc:.4f}")

# Export .task
model.export_model()   # writes exported_model/gesture_recognizer.task
print("Exported:", EXPORT_DIR + "/gesture_recognizer.task")

Tips:

If many ASL_B images get dropped at load time (no hand detected), back up the camera a little or ensure the whole hand is visible.
If none is weak, add more “near-miss” negatives: open palm with fingers slightly apart, thumb slightly out, partial occlusions.

4) Plug it into your app

Python (Tasks API example):

import mediapipe as mp
BaseOptions = mp.tasks.BaseOptions
GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerOptions = mp.tasks.vision.GestureRecognizerOptions
VisionRunningMode = mp.tasks.vision.RunningMode
ClassifierOptions = mp.tasks.components.processors.ClassifierOptions

options = GestureRecognizerOptions(
    base_options=BaseOptions(model_asset_path="exported_model/gesture_recognizer.task"),
    running_mode=VisionRunningMode.LIVE_STREAM,
    custom_gesture_classifier_options=ClassifierOptions(
        score_threshold=0.6,                 # tighten until false positives drop
        category_allowlist=["ASL_B"]         # only report your class
    ),
)
recognizer = GestureRecognizer.create_from_options(options)

Web (JS):

const recognizer = await GestureRecognizer.createFromOptions(fileset, {
  baseOptions: { modelAssetPath: "exported_model/gesture_recognizer.task" },
  runningMode: "LIVE_STREAM",
  customGesturesClassifierOptions: {
    scoreThreshold: 0.6,
    categoryAllowlist: ["ASL_B"]
  }
});

5) Troubleshooting & tuning

False positives with Open Palm: Add more none examples where fingers are together but thumb is visible to the side. The model needs to see “almost B but not B.”
Left vs right hand: Include both in training. If you only trained on right hands, left hands may underperform.
Distance issues: If far-away hands fail, capture more medium/far shots. Landmarks get noisier when small.
Thresholds: Raise score_threshold to reduce spurious detections; lower it if you miss true B’s.
Confusion matrix: If accuracy is fine but live results wobble, collect more from the exact camera/lighting you’ll use.

5.7 KiB Raw Permalink Blame History Unescape Escape