# Handshapes Multiclass (Holistic) — README A small end-to-end pipeline that records MediaPipe **Holistic** landmarks, builds fixed-length sequences, trains a **bidirectional GRU** classifier, evaluates it, and runs a **live webcam demo** that recognizes classes such as words (“Mother”, “Father”, “Go”) or letters. --- ## Quick Start ```bash # 0) Create class folders ./make_seq_dirs.sh Mother Father Go # 1) Capture clips (per class; adjust counts as you like) python capture_sequence.py --label Mother --split train --seconds 0.8 --count 100 python capture_sequence.py --label Mother --split val --seconds 0.8 --count 20 python capture_sequence.py --label Father --split train --seconds 0.8 --count 100 python capture_sequence.py --label Father --split val --seconds 0.8 --count 20 python capture_sequence.py --label Go --split train --seconds 0.8 --count 100 python capture_sequence.py --label Go --split val --seconds 0.8 --count 20 # 2) Build fixed-length dataset (32 frames/clip) python prep_sequence_resampled.py --in sequences --out landmarks_seq32 --frames 32 # 3) Train, evaluate, and run live inference python train_seq.py --landmarks landmarks_seq32 --out asl_seq32_gru_mother_father_go.pt python eval_val.py --landmarks landmarks_seq32 --model asl_seq32_gru_mother_father_go.pt python infer_seq_webcam.py --model asl_seq32_gru_mother_father_go.pt --threshold 0.35 --smooth 0.1 ``` Folder layout after capture: ``` sequences/ train/ Mother/ clip_001.npz ... Father/ clip_001.npz ... Go/ clip_001.npz ... val/ Mother/ ... Father/ ... Go/ ... ``` --- ## Feature Representation (per frame) From MediaPipe **Holistic**: * **Right hand** 21×(x,y,z) → 63 * **Left hand** 21×(x,y,z) → 63 * **Face** 468×(x,y,z) → 1,404 * **Pose** 33×(x,y,z,visibility) → 132 * **Face-relative hand extras**: wrist (x,y) + index tip (x,y) for each hand, expressed in the face-normalized frame → 8 **Total** = **1,670 dims** per frame. ### Normalization (high level) * Hands: translate to wrist, mirror left → right, rotate so middle-finger MCP points +Y, scale by max pairwise distance. * Face: center at eye midpoint, scale by inter-ocular distance, rotate to align eyeline horizontally. * Pose: center at shoulder midpoint, scale by shoulder width, rotate shoulders horizontal. * Extras: per-hand wrist/tip projected into the face frame so the model retains *where* the hand is relative to the face (critical for signs like **Mother** vs **Father**). --- ## How the Pipeline Works ### 1) `make_seq_dirs.sh` Creates the directory scaffolding under `sequences/` for any labels you pass (letters or words). * **Usage:** `./make_seq_dirs.sh Mother Father Go` * **Why:** Keeps data organized as `train/` and `val/` per class. --- ### 2) `capture_sequence.py` Records short clips from your webcam and saves per-frame **feature vectors** into compressed `.npz` files. **Key behaviors** * Uses **MediaPipe Holistic** to extract right/left hands, full face mesh, and pose. * Computes normalized features + face-relative extras. * Writes each clip as `sequences//