Let’s add a custom gesture for the **ASL letter “B”** (flat hand, fingers together, thumb folded across the palm) using MediaPipe **Gesture Recognizer (Model Maker)**. # Plan (what you’ll build) * A custom model with a new class label, e.g. `ASL_B`, plus the required `none` class. * A small, labeled image dataset (Model Maker will extract hand landmarks for you). * A trained `.task` file you can drop into your Python/JS app and allowlist. --- # 1) Pick labels Use: * `ASL_B` ← your new gesture * `none` ← anything that’s not one of your target gestures (mandatory) Folder layout: ``` dataset/ ASL_B/ ...images... none/ ...images... ``` --- # 2) Collect the right data (what to capture) Target handshape for **B**: * **Fingers**: index–pinky fully extended and **pressed together** * **Thumb**: folded across palm (not sticking out to the side) * **Palm**: facing camera (front) and also a few angles Suggested minimums (per label): | Bucket | Shots | | --------------------------------------------------- | -------------------- | | Distances: close (\~40–60 cm), medium (\~80–120 cm) | 80 | | View angles: front, \~30°, \~60° yaw | 80 | | Rotations: slight roll/tilt | 40 | | Lighting: bright, dim, backlit | 40 | | Backgrounds: plain wall, cluttered office/outdoor | 40 | | Hands: left & right (both) | included across all | | Skin tones / several people | as many as practical | Do **at least \~300–500** `ASL_B` images to start. For **`none`**, include: open palm (“High-Five”), slightly spread fingers, thumbs-up, fist, pointing, random objects/background frames, other ASL letters—especially **Open\_Palm** look-alikes so the model learns “not B”. Quick ways to get images: * Record short clips on laptop/phone and extract frames (e.g., 2 fps). * Ask 3–5 colleagues to contribute a short 10–20s clip each. Frame extraction example: ```bash # Extract 2 frames/sec from a video into dataset/ASL_B/ ffmpeg -i b_sign.mov -vf fps=2 dataset/ASL_B/b_%05d.jpg # Do the same for negatives into dataset/none/ ``` --- # 3) Train with Model Maker (Python) Create and activate a venv, then: ```bash pip install --upgrade pip pip install mediapipe-model-maker ``` Training script (save as `train_asl_b.py` and run it): ```python from mediapipe_model_maker import gesture_recognizer as gr DATA_DIR = "dataset" EXPORT_DIR = "exported_model" # Load & auto-preprocess (runs hand detection, keeps images with a detected hand) data = gr.Dataset.from_folder( dirname=DATA_DIR, hparams=gr.HandDataPreprocessingParams( # you can tweak these if needed min_detection_confidence=0.5 ) ) # Split train_data, rest = data.split(0.8) validation_data, test_data = rest.split(0.5) # Hyperparameters (start small; bump epochs if needed) hparams = gr.HParams( export_dir=EXPORT_DIR, epochs=12, batch_size=16, learning_rate=0.001, ) # Optional model head size & dropout options = gr.GestureRecognizerOptions( hparams=hparams, model_options=gr.ModelOptions(layer_widths=[128, 64], dropout_rate=0.1) ) model = gr.GestureRecognizer.create( train_data=train_data, validation_data=validation_data, options=options ) # Evaluate loss, acc = model.evaluate(test_data, batch_size=1) print(f"Test loss={loss:.4f}, acc={acc:.4f}") # Export .task model.export_model() # writes exported_model/gesture_recognizer.task print("Exported:", EXPORT_DIR + "/gesture_recognizer.task") ``` Tips: * If many `ASL_B` images get dropped at load time (no hand detected), back up the camera a little or ensure the whole hand is visible. * If `none` is weak, add more “near-miss” negatives: open palm with fingers slightly apart, thumb slightly out, partial occlusions. --- # 4) Plug it into your app **Python (Tasks API example):** ```python import mediapipe as mp BaseOptions = mp.tasks.BaseOptions GestureRecognizer = mp.tasks.vision.GestureRecognizer GestureRecognizerOptions = mp.tasks.vision.GestureRecognizerOptions VisionRunningMode = mp.tasks.vision.RunningMode ClassifierOptions = mp.tasks.components.processors.ClassifierOptions options = GestureRecognizerOptions( base_options=BaseOptions(model_asset_path="exported_model/gesture_recognizer.task"), running_mode=VisionRunningMode.LIVE_STREAM, custom_gesture_classifier_options=ClassifierOptions( score_threshold=0.6, # tighten until false positives drop category_allowlist=["ASL_B"] # only report your class ), ) recognizer = GestureRecognizer.create_from_options(options) ``` **Web (JS):** ```js const recognizer = await GestureRecognizer.createFromOptions(fileset, { baseOptions: { modelAssetPath: "exported_model/gesture_recognizer.task" }, runningMode: "LIVE_STREAM", customGesturesClassifierOptions: { scoreThreshold: 0.6, categoryAllowlist: ["ASL_B"] } }); ``` --- # 5) Troubleshooting & tuning * **False positives with Open Palm:** Add more `none` examples where fingers are together but **thumb is visible** to the side. The model needs to see “almost B but not B.” * **Left vs right hand:** Include both in training. If you only trained on right hands, left hands may underperform. * **Distance issues:** If far-away hands fail, capture more medium/far shots. Landmarks get noisier when small. * **Thresholds:** Raise `score_threshold` to reduce spurious detections; lower it if you miss true B’s. * **Confusion matrix:** If accuracy is fine but live results wobble, collect more from the exact camera/lighting you’ll use. ---