Initial commit: handshapes multiclass project

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 22:27:20 -05:00
commit 816e34cb17
22 changed files with 2820 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,14 @@
 # Ignore everything
 *
 # But not directories (so git can traverse into them)
 !*/
 # Allow these file types
 !*.py
 !*.txt
 !*.md
 !*.sh
 # Don't ignore .gitignore itself
 !.gitignore
--- a/README.md
+++ b/README.md
@@ -0,0 +1,255 @@
 # Handshapes Multiclass (Holistic) — README
 A small end-to-end pipeline that records MediaPipe **Holistic** landmarks, builds fixed-length sequences, trains a **bidirectional GRU** classifier, evaluates it, and runs a **live webcam demo** that recognizes classes such as words (“Mother”, “Father”, “Go”) or letters.
 ---
 ## Quick Start
 ```bash
 # 0) Create class folders
 ./make_seq_dirs.sh Mother Father Go
 # 1) Capture clips (per class; adjust counts as you like)
 python capture_sequence.py --label Mother --split train --seconds 0.8 --count 100
 python capture_sequence.py --label Mother --split val   --seconds 0.8 --count 20
 python capture_sequence.py --label Father --split train --seconds 0.8 --count 100
 python capture_sequence.py --label Father --split val   --seconds 0.8 --count 20
 python capture_sequence.py --label Go     --split train --seconds 0.8 --count 100
 python capture_sequence.py --label Go     --split val   --seconds 0.8 --count 20
 # 2) Build fixed-length dataset (32 frames/clip)
 python prep_sequence_resampled.py --in sequences --out landmarks_seq32 --frames 32
 # 3) Train, evaluate, and run live inference
 python train_seq.py --landmarks landmarks_seq32 --out asl_seq32_gru_mother_father_go.pt
 python eval_val.py --landmarks landmarks_seq32 --model asl_seq32_gru_mother_father_go.pt
 python infer_seq_webcam.py --model asl_seq32_gru_mother_father_go.pt --threshold 0.35 --smooth 0.1
 ```
 Folder layout after capture:
 ```
 sequences/
  train/
    Mother/ clip_001.npz ...
    Father/ clip_001.npz ...
    Go/     clip_001.npz ...
  val/
    Mother/ ...
    Father/ ...
    Go/     ...
 ```
 ---
 ## Feature Representation (per frame)
 From MediaPipe **Holistic**:
 * **Right hand** 21×(x,y,z) → 63
 * **Left hand**  21×(x,y,z) → 63
 * **Face**       468×(x,y,z) → 1,404
 * **Pose**       33×(x,y,z,visibility) → 132
 * **Face-relative hand extras**: wrist (x,y) + index tip (x,y) for each hand, expressed in the face-normalized frame → 8
  **Total** = **1,670 dims** per frame.
 ### Normalization (high level)
 * Hands: translate to wrist, mirror left → right, rotate so middle-finger MCP points +Y, scale by max pairwise distance.
 * Face: center at eye midpoint, scale by inter-ocular distance, rotate to align eyeline horizontally.
 * Pose: center at shoulder midpoint, scale by shoulder width, rotate shoulders horizontal.
 * Extras: per-hand wrist/tip projected into the face frame so the model retains *where* the hand is relative to the face (critical for signs like **Mother** vs **Father**).
 ---
 ## How the Pipeline Works
 ### 1) `make_seq_dirs.sh`
 Creates the directory scaffolding under `sequences/` for any labels you pass (letters or words).
 * **Usage:** `./make_seq_dirs.sh Mother Father Go`
 * **Why:** Keeps data organized as `train/` and `val/` per class.
 ---
 ### 2) `capture_sequence.py`
 Records short clips from your webcam and saves per-frame **feature vectors** into compressed `.npz` files.
 **Key behaviors**
 * Uses **MediaPipe Holistic** to extract right/left hands, full face mesh, and pose.
 * Computes normalized features + face-relative extras.
 * Writes each clip as `sequences/<split>/<label>/clip_XXX.npz` with an array `X` of shape `(T, 1670)`.
 **Common flags**
 * `--label` (string): class name (e.g., `Mother`, `Go`).
 * `--split`: `train` or `val`.
 * `--seconds` (float): clip length; 0.8s pairs well with 32 frames.
 * `--count` (int): how many clips to record in one run.
 * `--camera`, `--width`, `--height`: webcam settings.
 * `--holistic-complexity` (`0|1|2`): higher is more accurate but slower.
 * UI niceties: 3-second countdown; on-screen progress bar; optional fingertip markers.
 ---
 ### 3) `prep_sequence_resampled.py`
 Aggregates all `clip_*.npz` files into a fixed-length dataset.
 **What it does**
 * Loads each clip’s `X` `(T, 1670)` and **linearly resamples** to exactly `N` frames (default `32`), resulting in `(N, 1670)`.
 * Stacks clips into:
  * `train_X.npy` `(Nclips, Nframes, F)`
  * `train_y.npy` `(Nclips,)`
  * `val_X.npy`, `val_y.npy`
  * `class_names.json` (sorted list of class names)
  * `meta.json` with `{ "frames": N, "input_dim": F }`
 **Flags**
 * `--in` root of `sequences/`
 * `--out` dataset folder (e.g., `landmarks_seq32`)
 * `--frames` number of frames per clip after resampling (e.g., `16`, `32`, `64`)
 > Tip: Reducing `--frames` (e.g., 16) lowers first-prediction latency in the live demo, at the cost of some stability/accuracy.
 ---
 ### 4) `train_seq.py`
 Trains a **bidirectional GRU** classifier on the resampled sequences.
 **What it does**
 * Loads `train_*.npy` / `val_*.npy`, `class_names.json`, and `meta.json`.
 * Computes **feature-wise mean/std** on the train set; normalizes train/val.
 * Model: `GRU(input_dim → 128 hidden, bidirectional) → ReLU → Dropout → Linear(num_classes)`.
 * Tracks best **val accuracy**; saves a checkpoint containing:
  * `model` weights
  * `classes`, `frames`
  * `X_mean`, `X_std` (for inference normalization)
 **Flags**
 * `--epochs`, `--batch`, `--lr`: typical training hyperparams.
 * `--out`: model file (e.g., `asl_seq32_gru_mother_father_go.pt`)
 ---
 ### 5) `eval_val.py`
 Evaluates your saved model on the validation set.
 **What it does**
 * Loads `val_X.npy`, `val_y.npy`, `class_names.json`, `meta.json`, and the `*.pt` checkpoint.
 * Normalizes `val_X` using the **training** mean/std stored in the checkpoint.
 * Prints **confusion matrix** and a full **classification report** (precision/recall/F1/accuracy).
 **Usage**
 ```bash
 python eval_val.py --landmarks landmarks_seq32 --model asl_seq32_gru_mother_father_go.pt
 ```
 ---
 ### 6) `infer_seq_webcam.py`
 Live webcam demo that streams landmarks, builds a rolling buffer, and classifies in real time.
 **Key behaviors**
 * Maintains a **rolling window** of `T` frames (from the model’s `frames` value; default 32).
 * No prediction until the buffer is full → expect a short warm-up.
 * Applies the same normalization using the model’s stored `X_mean`/`X_std`.
 * Optional **EMA smoothing** over probabilities for stability.
 * Example **action hook** included: spell “W → E → B” to open a URL.
 **Common flags**
 * `--threshold` (e.g., `0.35`): minimum top-class probability to “emit” a label.
 * `--smooth` (seconds): temporal EMA (0 disables). Lower = more responsive; higher = steadier.
 * `--holistic-complexity`, `--det-thresh`: detector accuracy/sensitivity tradeoffs.
 ---
 ## Parameters & Practical Tips
 * **Threshold vs Smooth**
  * Lower `--threshold` (e.g., `0.3–0.4`) → more sensitive, but may produce more false positives.
  * `--smooth` ≈ `0.1–0.3s` → responsive; `0.5–0.8s` → steadier but laggier.
 * **Frames (`--frames` in prep)**
  * `16–24` frames: snappier first detection.
  * `32` frames: balanced.
  * `64` frames: more context, slower to first prediction.
 * **Data balance & variety**
  * Similar clip counts per class help training.
  * Vary lighting, small head angles, distance, and speed of motion.
  * For location-based signs (e.g., Mother vs Father), the **face-relative extras** help the model disambiguate.
 ---
 ## File-by-File Summary
 | File                         | Purpose                                                                                                                    | Inputs → Outputs                                                                                    |
 | ---------------------------- | -------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
 | `make_seq_dirs.sh`           | Creates `train/` and `val/` subfolders for each label you pass in.                                                         | Labels → `sequences/train/<label>/`, `sequences/val/<label>/`                                       |
 | `capture_sequence.py`        | Captures webcam frames, extracts Holistic landmarks, normalizes, builds per-frame 1,670-D features, and saves each clip.   | Webcam → `sequences/<split>/<label>/clip_XXX.npz` (X: `(T,1670)`)                                   |
 | `prep_sequence_resampled.py` | Resamples variable-length clips to fixed length; aggregates into train/val arrays and writes metadata.                     | `sequences/` → `landmarks_seq32/{train_X,train_y,val_X,val_y}.npy`, `class_names.json`, `meta.json` |
 | `train_seq.py`               | Trains a BiGRU multiclass classifier with normalization and simple augmentation.                                           | `landmarks_seq32` → `asl_seq32_gru_*.pt` (includes model, classes, frames, mean/std)                |
 | `eval_val.py`                | Evaluates the saved model on the validation split; prints metrics.                                                         | Model + `landmarks_seq32` → console metrics                                                         |
 | `infer_seq_webcam.py`        | Streams webcam landmarks, builds rolling sequences, classifies in real time; optional action (e.g., open URL on sequence). | Webcam + `asl_seq32_gru_*.pt` → on-screen predictions/actions                                       |
 | `what_to_do.txt`             | Step-by-step command cheat-sheet reflecting the current multi-word workflow.                                               | —                                                                                                   |
 ---
 ## Troubleshooting
 * **“No classes found in sequences/train/”**
  Ensure class folders exist: `sequences/train/<Label>/` and `sequences/val/<Label>/`, and that they contain `clip_*.npz`.
 * **No live prediction initially**
  Expected; the model needs the first **T** frames to fill the buffer.
 * **Lag or low FPS**
  Try `--holistic-complexity 0`, reduce camera resolution, or use a smaller `--frames` and retrain.
 * **Overconfident but wrong**
  Raise `--threshold`, increase `--smooth`, or record more varied data per class (especially negatives or near-misses).
 ---
 ## Add/Remove Classes
 * To **add** a class (e.g., `Go`): create dirs, capture clips, rerun **prep**, retrain, re-eval.
 * To **remove/replace** a class: delete its folders or rename, **then** rerun **prep** and retrain.
 ---
 ## Dependencies
 * Python 3.x, `numpy`, `opencv-python`, `mediapipe`, `torch`, `scikit-learn` (for evaluation).
 * macOS with Apple Silicon can use MPS acceleration automatically (already handled in the code).
 ---
 ## Notes
 * Labels are **arbitrary strings** (not restricted to A–Z).
 * Features are **zero-filled** for missing parts in a frame (e.g., if a hand isn’t detected) to keep dimensions stable.
 * The face is used as a global anchor for geometry; keeping the face visible improves robustness.
 ---
--- a/capture_sequence.py
+++ b/capture_sequence.py
@@ -0,0 +1,259 @@
 #!/usr/bin/env python3
 # capture_sequence.py
 # Record N short sequences per label with MediaPipe Holistic and build per-frame features:
 #   RightHand(63) + LeftHand(63) + Face(468*3=1404) + Pose(33*4=132) + Face-relative hand extras(8) = 1670 dims
 # Requirements: numpy, opencv-python, mediapipe
 import argparse, os, time, math, re
 from pathlib import Path
 import numpy as np, cv2, mediapipe as mp
 mp_holistic = mp.solutions.holistic
 # ---------- geometry / normalization ----------
 def _angle(v): 
    return math.atan2(v[1], v[0])
 def _rot2d(t):
    c, s = math.cos(t), math.sin(t)
    return np.array([[c, -s], [s, c]], dtype=np.float32)
 def normalize_hand(pts, handed=None):
    """Hand (21,3) → translate wrist, mirror left, rotate middle-MCP to +Y, scale by max pairwise distance."""
    pts = pts.astype(np.float32).copy()
    pts[:, :2] -= pts[0, :2]
    if handed and str(handed).lower().startswith("left"):
        pts[:, 0] *= -1.0
    v = pts[9, :2]
    R = _rot2d(math.pi/2 - _angle(v))
    pts[:, :2] = pts[:, :2] @ R.T
    xy = pts[:, :2]
    d = np.linalg.norm(xy[None,:,:] - xy[:,None,:], axis=-1).max()
    d = 1.0 if d < 1e-6 else float(d)
    pts[:, :2] /= d; pts[:, 2] /= d
    return pts  # (21,3)
 def normalize_face(face):
    """Face (468,3) → center at eye midpoint, scale by inter-ocular, rotate eye-line horizontal."""
    f = face.astype(np.float32).copy()
    left = f[33, :2]; right = f[263, :2]  # outer eye corners
    center = 0.5 * (left + right)
    f[:, :2] -= center[None, :]
    eye_vec = right - left
    eye_dist = float(np.linalg.norm(eye_vec)) or 1.0
    f[:, :2] /= eye_dist; f[:, 2] /= eye_dist
    R = _rot2d(-_angle(eye_vec))
    f[:, :2] = f[:, :2] @ R.T
    return f  # (468,3)
 def normalize_pose(pose):
    """
    Pose (33,4: x,y,z,vis) → center at shoulder midpoint, scale by shoulder width, rotate shoulders horizontal.
    Keep visibility ([:,3]) as-is.
    """
    p = pose.astype(np.float32).copy()
    ls = p[11, :2]; rs = p[12, :2]  # left/right shoulder
    center = 0.5 * (ls + rs)
    p[:, :2] -= center[None, :]
    sw_vec = rs - ls
    sw = float(np.linalg.norm(sw_vec)) or 1.0
    p[:, :2] /= sw; p[:, 2] /= sw
    R = _rot2d(-_angle(sw_vec))
    p[:, :2] = p[:, :2] @ R.T
    return p  # (33,4)
 def face_frame_transform(face_pts):
    """
    Return (center, eye_dist, R) to map image XY to the normalized face frame (same as normalize_face).
    Use: v' = ((v - center)/eye_dist) @ R.T
    """
    left = face_pts[33, :2]; right = face_pts[263, :2]
    center = 0.5*(left + right)
    eye_vec = right - left
    eye_dist = float(np.linalg.norm(eye_vec)) or 1.0
    # rotation that aligns eye line to +X (inverse of normalize_face's rotation matrix)
    # normalize_face uses R = rot(-theta) applied after scaling/centering.
    theta = _angle(eye_vec)
    R = _rot2d(-theta)
    return center, eye_dist, R
 def to_face_frame(pt_xy, center, eye_dist, R):
    v = (pt_xy - center) / eye_dist
    return (v @ R.T).astype(np.float32)
 # ---------- utils ----------
 def next_idx(folder: Path, prefix="clip_"):
    pat = re.compile(rf"^{re.escape(prefix)}(\d+)\.npz$")
    mx = 0
    if folder.exists():
        for n in os.listdir(folder):
            m = pat.match(n)
            if m: mx = max(mx, int(m.group(1)))
    return mx + 1
 def countdown(cap, seconds=3):
    for i in range(seconds, 0, -1):
        start = time.time()
        while time.time() - start < 1.0:
            ok, frame = cap.read()
            if not ok: continue
            h, w = frame.shape[:2]
            text = str(i)
            (tw, th), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 5, 10)
            cv2.putText(frame, text, ((w - tw)//2, (h + th)//2),
                        cv2.FONT_HERSHEY_SIMPLEX, 5, (0,0,255), 10, cv2.LINE_AA)
            msg = "Starting in..."
            (mw, mh), _ = cv2.getTextSize(msg, cv2.FONT_HERSHEY_SIMPLEX, 1.2, 3)
            cv2.putText(frame, msg, ((w - mw)//2, (h//2) - th - 20),
                        cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0,255,255), 3, cv2.LINE_AA)
            cv2.imshow("sequence capture", frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                cap.release(); cv2.destroyAllWindows(); raise SystemExit("Aborted during countdown")
 def draw_progress_bar(img, frac_remaining, bar_h=16, margin=12):
    h, w = img.shape[:2]
    x0, x1 = margin, w - margin
    y0, y1 = margin, margin + bar_h
    cv2.rectangle(img, (x0, y0), (x1, y1), (40, 40, 40), -1)
    cv2.rectangle(img, (x0, y0), (x1, y1), (90, 90, 90), 2)
    rem_w = int((x1 - x0) * max(0.0, min(1.0, frac_remaining)))
    if rem_w > 0:
        cv2.rectangle(img, (x0, y0), (x0 + rem_w, y1), (0, 200, 0), -1)
 # ---------- holistic wrapper ----------
 class HolisticDetector:
    def __init__(self, det_conf=0.5, track_conf=0.5, model_complexity=1):
        self.h = mp_holistic.Holistic(
            static_image_mode=False,
            model_complexity=model_complexity,
            smooth_landmarks=True,
            enable_segmentation=False,
            refine_face_landmarks=False,
            min_detection_confidence=det_conf,
            min_tracking_confidence=track_conf,
        )
    def process(self, rgb):
        return self.h.process(rgb)
 # ---------- main ----------
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--label", required=True, help="Class label (e.g., A, B, Mother, Father, etc.)")
    ap.add_argument("--split", required=True, choices=["train","val"])
    ap.add_argument("--seconds", type=float, default=0.8)
    ap.add_argument("--camera", type=int, default=0)
    ap.add_argument("--width", type=int, default=640)
    ap.add_argument("--height", type=int, default=480)
    ap.add_argument("--count", type=int, default=None)
    ap.add_argument("--det-thresh", type=float, default=0.5)
    ap.add_argument("--holistic-complexity", type=int, default=1, choices=[0,1,2])
    args = ap.parse_args()
    L = args.label.strip()
    if len(L) == 0 or ("/" in L or "\\" in L):
        raise SystemExit("Use a non-empty label without slashes")
    if args.count is None:
        args.count = 100 if args.split == "train" else 20
    out_dir = Path("sequences") / args.split / L
    out_dir.mkdir(parents=True, exist_ok=True)
    idx = next_idx(out_dir)
    det = HolisticDetector(args.det_thresh, args.det_thresh, args.holistic_complexity)
    cap = cv2.VideoCapture(args.camera)
    if not cap.isOpened(): raise SystemExit(f"Could not open camera {args.camera}")
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, args.width)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, args.height)
    print(f"Recording {args.count} clips for {L}/{args.split}, {args.seconds}s each. (R+L hands + face + pose + face-relative extras)")
    countdown(cap, 3)
    for n in range(args.count):
        seq_X = []
        start_t = time.time(); end_t = start_t + args.seconds
        while True:
            now = time.time()
            if now >= end_t: break
            ok, fr = cap.read()
            if not ok: break
            rgb = cv2.cvtColor(fr, cv2.COLOR_BGR2RGB)
            res = det.process(rgb)
            # hands
            right_pts = left_pts = None
            if res.right_hand_landmarks is not None:
                right_pts = np.array([[lm.x, lm.y, lm.z] for lm in res.right_hand_landmarks.landmark], np.float32)
            if res.left_hand_landmarks is not None:
                left_pts  = np.array([[lm.x, lm.y, lm.z] for lm in res.left_hand_landmarks.landmark],  np.float32)
            # face
            face_pts = None
            if res.face_landmarks is not None:
                face_pts = np.array([[lm.x, lm.y, lm.z] for lm in res.face_landmarks.landmark], np.float32)
            # pose
            pose_arr = None
            if res.pose_landmarks is not None:
                pose_arr = np.array([[lm.x, lm.y, lm.z, lm.visibility] for lm in res.pose_landmarks.landmark], np.float32)
            # Build feature: require face present and at least one hand (pose optional)
            if face_pts is not None and (right_pts is not None or left_pts is not None):
                f_norm = normalize_face(face_pts)  # (468,3)
                # transform pieces to express hand positions in face frame
                f_center, f_scale, f_R = face_frame_transform(face_pts)
                def hand_face_extras(hand_pts):
                    if hand_pts is None: 
                        return np.zeros(4, np.float32)
                    wrist_xy = hand_pts[0, :2]
                    tip_xy   = hand_pts[8, :2]
                    w = to_face_frame(wrist_xy, f_center, f_scale, f_R)
                    t = to_face_frame(tip_xy,   f_center, f_scale, f_R)
                    return np.array([w[0], w[1], t[0], t[1]], np.float32)
                rh_ex = hand_face_extras(right_pts)  # 4
                lh_ex = hand_face_extras(left_pts)   # 4
                rh = normalize_hand(right_pts, "Right").reshape(-1) if right_pts is not None else np.zeros(63, np.float32)
                lh = normalize_hand(left_pts,  "Left").reshape(-1)  if left_pts  is not None else np.zeros(63, np.float32)
                p_norm = normalize_pose(pose_arr).reshape(-1) if pose_arr is not None else np.zeros(33*4, np.float32)
                feat = np.concatenate([rh, lh, f_norm.reshape(-1), p_norm, rh_ex, lh_ex], axis=0)  # (1670,)
                seq_X.append(feat)
                # optional fingertip markers for visual feedback
                if right_pts is not None:
                    pt = normalize_hand(right_pts, "Right")[8, :2]
                    cv2.circle(fr, (int(fr.shape[1]*pt[0]), int(fr.shape[0]*pt[1])), 6, (0,255,0), -1)
                if left_pts is not None:
                    pt = normalize_hand(left_pts, "Left")[8, :2]
                    cv2.circle(fr, (int(fr.shape[1]*pt[0]), int(fr.shape[0]*pt[1])), 6, (255,0,0), -1)
            # UI
            frac_remaining = (end_t - now) / max(1e-6, args.seconds)
            draw_progress_bar(fr, frac_remaining, bar_h=16, margin=12)
            cv2.putText(fr, f"{L} {args.split}  Clip {n+1}/{args.count}",
                        (20, 40), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,0), 2, cv2.LINE_AA)
            cv2.imshow("sequence capture", fr)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                cap.release(); cv2.destroyAllWindows(); return
        if seq_X:
            X = np.stack(seq_X, 0).astype(np.float32)  # (T, 1670)
            path = out_dir / f"clip_{idx:03d}.npz"
            np.savez_compressed(path, X=X)
            print(f"💾 saved {path} frames={X.shape[0]} dims={X.shape[1]}")
            idx += 1
        else:
            print("⚠️ Not enough frames with face + any hand; skipped clip.")
    print("✅ Done recording.")
    cap.release(); cv2.destroyAllWindows()
 if __name__ == "__main__":
    main()
--- a/doc/capture_sequence.py
+++ b/doc/capture_sequence.py
@@ -0,0 +1,323 @@
 #!/usr/bin/env python3
 # capture_sequence.py
 # Record N short sequences per label with MediaPipe Holistic and build per-frame features:
 #   RightHand(63) + LeftHand(63) + Face(468*3=1404) + Pose(33*4=132) + Face-relative hand extras(8) = 1670 dims
 # Requirements: numpy, opencv-python, mediapipe
 import argparse, os, time, math, re               # stdlib: args, filesystem, timing, trig, regex
 from pathlib import Path                           # pathlib for portable paths
 import numpy as np, cv2, mediapipe as mp           # core libs: arrays, webcam/GUI, landmarks
 mp_holistic = mp.solutions.holistic                # alias to the Holistic solution entry
 # ---------- geometry / normalization ----------
 def _angle(v):
    """
    Return atan2(y, x) of a 2D vector.
    Used to compute the orientation of a segment in the image plane.
    """
    return math.atan2(v[1], v[0])                  # angle in radians for rotation normalization
 def _rot2d(t):
    """
    Build a 2×2 rotation matrix for angle t (radians).
    Used to rotate landmark sets into a canonical frame.
    """
    c, s = math.cos(t), math.sin(t)                # precompute cos/sin for speed
    return np.array([[c, -s], [s, c]], dtype=np.float32)  # standard 2D rotation matrix
 def normalize_hand(pts, handed=None):
    """
    Normalize a (21,3) hand landmark array:
      1) translate so wrist (idx 0) is at origin
      2) mirror X for left hands so both hands look like right
      3) rotate so vector from wrist->middle-MCP (idx 9) points +Y
      4) scale by max pairwise XY distance so size is comparable across frames
    Returns: (21,3) float32
    """
    pts = pts.astype(np.float32).copy()            # make float32 copy (avoid mutating caller)
    pts[:, :2] -= pts[0, :2]                       # translate: wrist to origin (stabilizes position)
    if handed and str(handed).lower().startswith("left"):
        pts[:, 0] *= -1.0                          # mirror X for left hand to canonicalize handedness
    v = pts[9, :2]                                 # vector from wrist→middle MCP (index 9)
    R = _rot2d(math.pi/2 - _angle(v))              # rotate so this vector points up (+Y)
    pts[:, :2] = pts[:, :2] @ R.T                  # apply rotation to XY (keep Z as-is for now)
    xy = pts[:, :2]                                # convenience view
    d = np.linalg.norm(xy[None,:,:] - xy[:,None,:], axis=-1).max()  # max pairwise XY distance (scale)
    d = 1.0 if d < 1e-6 else float(d)              # avoid divide-by-zero on degenerate frames
    pts[:, :2] /= d; pts[:, 2] /= d                # isotropic scale XY and Z by same factor
    return pts                                     # return normalized hand landmarks
 def normalize_face(face):
    """
    Normalize a (468,3) face mesh:
      1) center at midpoint between outer eye corners (33, 263)
      2) scale by inter-ocular distance
      3) rotate so eye-line is horizontal
    Returns: (468,3) float32
    """
    f = face.astype(np.float32).copy()             # safe copy
    left = f[33, :2]; right = f[263, :2]           # outer eye corners per MediaPipe indexing
    center = 0.5 * (left + right)                  # center between eyes anchors the face
    f[:, :2] -= center[None, :]                    # translate to center
    eye_vec = right - left                         # vector from left→right eye
    eye_dist = float(np.linalg.norm(eye_vec)) or 1.0  # scale factor; avoid zero
    f[:, :2] /= eye_dist; f[:, 2] /= eye_dist      # scale all dims consistently
    R = _rot2d(-_angle(eye_vec))                   # rotate so eye line aligns with +X
    f[:, :2] = f[:, :2] @ R.T                      # apply rotation to XY
    return f
 def normalize_pose(pose):
    """
    Normalize a (33,4) pose landmark array (x,y,z,visibility):
      1) center at shoulder midpoint (11,12)
      2) scale by shoulder width
      3) rotate so shoulders are horizontal
      Visibility channel ([:,3]) is preserved as-is.
    Returns: (33,4) float32
    """
    p = pose.astype(np.float32).copy()             # copy to avoid mutating input
    ls = p[11, :2]; rs = p[12, :2]                 # left/right shoulder in XY
    center = 0.5 * (ls + rs)                       # mid-shoulder anchor
    p[:, :2] -= center[None, :]                    # translate to center
    sw_vec = rs - ls                               # shoulder vector (scale + rotation anchor)
    sw = float(np.linalg.norm(sw_vec)) or 1.0      # shoulder width (avoid zero)
    p[:, :2] /= sw; p[:, 2] /= sw                  # scale pose consistently
    R = _rot2d(-_angle(sw_vec))                    # rotate so shoulders are horizontal
    p[:, :2] = p[:, :2] @ R.T                      # apply rotation to XY
    return p
 def face_frame_transform(face_pts):
    """
    Compute a transform that maps image XY into the normalized face frame
    (same definition as in normalize_face).
    Returns:
      center  : (2,) eye midpoint
      eye_dist: scalar inter-ocular distance
      R       : 2×2 rotation aligning eye-line to +X
    Use downstream as: v' = ((v - center)/eye_dist) @ R.T
    """
    left = face_pts[33, :2]; right = face_pts[263, :2]  # reference points: eye corners
    center = 0.5*(left + right)                         # face center
    eye_vec = right - left                              # direction of eye line
    eye_dist = float(np.linalg.norm(eye_vec)) or 1.0    # scale of face
    theta = _angle(eye_vec)                             # angle of eye line
    R = _rot2d(-theta)                                  # rotation to align with +X
    return center, eye_dist, R
 def to_face_frame(pt_xy, center, eye_dist, R):
    """
    Transform a 2D point from image space into the normalized face frame.
    Inputs are from face_frame_transform().
    """
    v = (pt_xy - center) / eye_dist                    # translate + scale
    return (v @ R.T).astype(np.float32)                # rotate into face frame
 # ---------- utils ----------
 def next_idx(folder: Path, prefix="clip_"):
    """
    Scan a folder for files like 'clip_###.npz' and return the next index.
    Keeps your saved clips sequential without collisions.
    """
    pat = re.compile(rf"^{re.escape(prefix)}(\d+)\.npz$")  # matches clip index
    mx = 0                                                # track max index seen
    if folder.exists():                                   # only if folder exists
        for n in os.listdir(folder):                      # iterate files
            m = pat.match(n)                              # regex match
            if m: mx = max(mx, int(m.group(1)))          # update max on matches
    return mx + 1                                         # next available index
 def countdown(cap, seconds=3):
    """
    Show a full-screen countdown overlay before recording starts.
    Press 'q' to abort during countdown.
    """
    for i in range(seconds, 0, -1):                       # 3..2..1
        start = time.time()                               # ensure ~1s display per number
        while time.time() - start < 1.0:
            ok, frame = cap.read()                        # read a frame
            if not ok: continue                           # skip if camera hiccups
            h, w = frame.shape[:2]                        # frame size for centering text
            text = str(i)                                 # the digit to render
            (tw, th), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 5, 10)  # size of big number
            cv2.putText(frame, text, ((w - tw)//2, (h + th)//2),
                        cv2.FONT_HERSHEY_SIMPLEX, 5, (0,0,255), 10, cv2.LINE_AA)  # draw big red numeral
            msg = "Starting in..."                        # helper message above the number
            (mw, mh), _ = cv2.getTextSize(msg, cv2.FONT_HERSHEY_SIMPLEX, 1.2, 3)
            cv2.putText(frame, msg, ((w - mw)//2, (h//2) - th - 20),
                        cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0,255,255), 3, cv2.LINE_AA)
            cv2.imshow("sequence capture", frame)         # show overlay
            if cv2.waitKey(1) & 0xFF == ord('q'):         # allow abort
                cap.release(); cv2.destroyAllWindows()
                raise SystemExit("Aborted during countdown")
 def draw_progress_bar(img, frac_remaining, bar_h=16, margin=12):
    """
    Draw a simple progress bar at the top of the frame.
    frac_remaining in [0,1] indicates time left in the clip.
    """
    h, w = img.shape[:2]                                 # image dimensions
    x0, x1 = margin, w - margin                          # horizontal extent
    y0, y1 = margin, margin + bar_h                      # vertical extent
    cv2.rectangle(img, (x0, y0), (x1, y1), (40, 40, 40), -1)  # dark background bar
    cv2.rectangle(img, (x0, y0), (x1, y1), (90, 90, 90), 2)   # border
    rem_w = int((x1 - x0) * max(0.0, min(1.0, frac_remaining)))  # filled width clamped
    if rem_w > 0:
        cv2.rectangle(img, (x0, y0), (x0 + rem_w, y1), (0, 200, 0), -1)  # green fill
 # ---------- holistic wrapper ----------
 class HolisticDetector:
    """
    Thin wrapper around MediaPipe Holistic to fix configuration once and expose process().
    """
    def __init__(self, det_conf=0.5, track_conf=0.5, model_complexity=1):
        # Build the Holistic detector with steady defaults; smooth_landmarks helps temporal stability.
        self.h = mp_holistic.Holistic(
            static_image_mode=False,                     # realtime video stream
            model_complexity=model_complexity,           # 0=fastest, 2=most accurate
            smooth_landmarks=True,                       # temporal smoothing reduces jitter
            enable_segmentation=False,                   # not needed; saves compute
            refine_face_landmarks=False,                 # faster; we only need coarse face
            min_detection_confidence=det_conf,           # detection threshold
            min_tracking_confidence=track_conf,          # tracking threshold
        )
    def process(self, rgb):
        """
        Run landmark detection on an RGB frame and return MediaPipe results object.
        """
        return self.h.process(rgb)                       # delegate to MP
 # ---------- main ----------
 def main():
    """
    CLI entry: capture N clips of length --seconds for a given --label and --split,
    save per-frame 1670-D features into sequences/<split>/<label>/clip_XXX.npz.
    """
    ap = argparse.ArgumentParser()                       # CLI flag parsing
    ap.add_argument("--label", required=True, help="Class label (e.g., A, B, Mother, Father, etc.)")
    ap.add_argument("--split", required=True, choices=["train","val"])
    ap.add_argument("--seconds", type=float, default=0.8)
    ap.add_argument("--camera", type=int, default=0)
    ap.add_argument("--width", type=int, default=640)
    ap.add_argument("--height", type=int, default=480)
    ap.add_argument("--count", type=int, default=None)
    ap.add_argument("--det-thresh", type=float, default=0.5)
    ap.add_argument("--holistic-complexity", type=int, default=1, choices=[0,1,2])
    args = ap.parse_args()                               # finalize args
    L = args.label.strip()                               # normalized label string
    if len(L) == 0 or ("/" in L or "\\" in L):           # basic validation to keep clean paths
        raise SystemExit("Use a non-empty label without slashes")
    if args.count is None:                               # default count per split for convenience
        args.count = 100 if args.split == "train" else 20
    out_dir = Path("sequences") / args.split / L         # where clip_*.npz will go
    out_dir.mkdir(parents=True, exist_ok=True)           # ensure directory exists
    idx = next_idx(out_dir)                              # next clip index to use
    det = HolisticDetector(args.det_thresh, args.det_thresh, args.holistic_complexity)  # detector
    cap = cv2.VideoCapture(args.camera)                  # open camera device
    if not cap.isOpened():                               # fail early if missing
        raise SystemExit(f"Could not open camera {args.camera}")
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, args.width)        # set capture width
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, args.height)      # set capture height
    print(f"Recording {args.count} clips for {L}/{args.split}, {args.seconds}s each. (R+L hands + face + pose + face-relative extras)")
    countdown(cap, 3)                                    # give operator time to get ready
    for n in range(args.count):                          # loop over requested clips
        seq_X = []                                       # holds per-frame features
        start_t = time.time(); end_t = start_t + args.seconds  # fixed-length recording window
        while True:                                      # per-frame capture loop
            now = time.time()
            if now >= end_t: break                       # stop after desired duration
            ok, fr = cap.read()                          # grab a frame
            if not ok: break                             # camera yielded nothing; end clip
            rgb = cv2.cvtColor(fr, cv2.COLOR_BGR2RGB)    # MediaPipe expects RGB
            res = det.process(rgb)                       # run landmark detection
            # hands
            right_pts = left_pts = None                  # initialize as missing
            if res.right_hand_landmarks is not None:     # if right detected…
                right_pts = np.array([[lm.x, lm.y, lm.z]
                                       for lm in res.right_hand_landmarks.landmark], np.float32)  # (21,3)
            if res.left_hand_landmarks is not None:      # if left detected…
                left_pts  = np.array([[lm.x, lm.y, lm.z]
                                       for lm in res.left_hand_landmarks.landmark],  np.float32)  # (21,3)
            # face
            face_pts = None
            if res.face_landmarks is not None:           # 468 face landmarks
                face_pts = np.array([[lm.x, lm.y, lm.z] for lm in res.face_landmarks.landmark], np.float32)
            # pose
            pose_arr = None
            if res.pose_landmarks is not None:           # 33 pose landmarks with visibility
                pose_arr = np.array([[lm.x, lm.y, lm.z, lm.visibility]
                                     for lm in res.pose_landmarks.landmark], np.float32)
            # Build feature: require face present and at least one hand (pose optional)
            if face_pts is not None and (right_pts is not None or left_pts is not None):
                f_norm = normalize_face(face_pts)        # canonicalize face geometry → (468,3)
                # transform pieces to express hand positions in face frame
                f_center, f_scale, f_R = face_frame_transform(face_pts)  # face frame for extras
                def hand_face_extras(hand_pts):
                    """
                    For a hand, return [wrist_x, wrist_y, tip_x, tip_y] in face frame.
                    If hand missing, returns zeros. Keeps coarse spatial relation to face.
                    """
                    if hand_pts is None:
                        return np.zeros(4, np.float32)   # missing hand → zeros to keep dims fixed
                    wrist_xy = hand_pts[0, :2]           # wrist point
                    tip_xy   = hand_pts[8, :2]           # index fingertip (salient for pointing)
                    w = to_face_frame(wrist_xy, f_center, f_scale, f_R)  # project to face frame
                    t = to_face_frame(tip_xy,   f_center, f_scale, f_R)
                    return np.array([w[0], w[1], t[0], t[1]], np.float32)  # pack features
                rh_ex = hand_face_extras(right_pts)      # (4,) right extras
                lh_ex = hand_face_extras(left_pts)       # (4,) left extras
                rh = normalize_hand(right_pts, "Right").reshape(-1) if right_pts is not None else np.zeros(63, np.float32)  # hand (63,)
                lh = normalize_hand(left_pts,  "Left" ).reshape(-1) if left_pts  is not None else np.zeros(63, np.float32)  # hand (63,)
                p_norm = normalize_pose(pose_arr).reshape(-1) if pose_arr is not None else np.zeros(33*4, np.float32)       # pose (132,)
                feat = np.concatenate([rh, lh, f_norm.reshape(-1), p_norm, rh_ex, lh_ex], axis=0)  # (1670,) full feature
                seq_X.append(feat)                          # push this frame’s feature vector
                # optional fingertip markers for visual feedback (normalized hand to index tip)
                if right_pts is not None:
                    pt = normalize_hand(right_pts, "Right")[8, :2]     # index tip in normalized [0..1]-ish coords
                    cv2.circle(fr, (int(fr.shape[1]*pt[0]), int(fr.shape[0]*pt[1])), 6, (0,255,0), -1)  # green dot
                if left_pts is not None:
                    pt = normalize_hand(left_pts, "Left")[8, :2]
                    cv2.circle(fr, (int(fr.shape[1]*pt[0]), int(fr.shape[0]*pt[1])), 6, (255,0,0), -1)  # blue/red dot
            # UI overlay (progress + label)
            frac_remaining = (end_t - now) / max(1e-6, args.seconds)  # progress bar fraction
            draw_progress_bar(fr, frac_remaining, bar_h=16, margin=12)
            cv2.putText(fr, f"{L} {args.split}  Clip {n+1}/{args.count}",
                        (20, 40), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,0), 2, cv2.LINE_AA)
            cv2.imshow("sequence capture", fr)            # show live preview
            if cv2.waitKey(1) & 0xFF == ord('q'):         # allow stopping whole session
                cap.release(); cv2.destroyAllWindows(); return
        # After clip duration, save if we collected any valid frames
        if seq_X:
            X = np.stack(seq_X, 0).astype(np.float32)     # (T, 1670) stack into array
            path = out_dir / f"clip_{idx:03d}.npz"        # next filename
            np.savez_compressed(path, X=X)                # compressed .npz with key 'X'
            print(f"💾 saved {path} frames={X.shape[0]} dims={X.shape[1]}")
            idx += 1                                      # advance index
        else:
            print("⚠️ Not enough frames with face + any hand; skipped clip.")  # guardrail
    print("✅ Done recording.")                           # session complete
    cap.release(); cv2.destroyAllWindows()               # clean up resources
 if __name__ == "__main__":
    main()                                               # run CLI
--- a/doc/eval_val.py
+++ b/doc/eval_val.py
@@ -0,0 +1,70 @@
 #!/usr/bin/env python3
 # Evaluate a trained SeqGRU on the validation set; reads input_dim from meta.json
 import os, json, argparse                          # stdlib
 import numpy as np                                  # arrays
 import torch, torch.nn as nn                        # model
 from sklearn.metrics import classification_report, confusion_matrix  # metrics
 class SeqGRU(nn.Module):
    """
    BiGRU classifier head:
      GRU(input_dim → hidden, bidirectional) → Linear/ReLU/Dropout → Linear(num_classes)
    Uses the last time step's hidden state for classification.
    """
    def __init__(self, input_dim, hidden=128, num_classes=26):
        super().__init__()
        self.gru = nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)  # temporal encoder
        self.head = nn.Sequential(                                                  # MLP head
            nn.Linear(hidden*2, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, num_classes),
        )
    def forward(self, x):
        h, _ = self.gru(x)                           # h: (B, T, 2*hidden)
        return self.head(h[:, -1, :])                # take last time step → logits
 def main():
    """
    Load val split + model checkpoint, normalize using stored mean/std, run inference,
    then print confusion matrix and classification report.
    """
    ap = argparse.ArgumentParser()
    ap.add_argument("--landmarks", default="landmarks_seq32")  # dataset folder
    ap.add_argument("--model", required=True)                  # .pt checkpoint path
    args = ap.parse_args()
    vaX = np.load(os.path.join(args.landmarks,"val_X.npy"))    # (N, T, F)
    vaY = np.load(os.path.join(args.landmarks,"val_y.npy"))    # (N,)
    classes = json.load(open(os.path.join(args.landmarks,"class_names.json")))  # label names
    meta = json.load(open(os.path.join(args.landmarks,"meta.json")))            # frames, input_dim
    T = int(meta.get("frames", vaX.shape[1]))                  # clip length
    input_dim = int(meta.get("input_dim", vaX.shape[-1]))      # feature dimension
    state = torch.load(args.model, map_location="cpu", weights_only=False)  # load checkpoint dict
    X_mean, X_std = state["X_mean"], state["X_std"]             # stored normalization stats
    if isinstance(X_mean, torch.Tensor): X_mean = X_mean.numpy()  # ensure numpy arrays
    if isinstance(X_std,  torch.Tensor): X_std  = X_std.numpy()
    X_mean = X_mean.astype(np.float32)                          # float32 for compute
    X_std  = (X_std.astype(np.float32) + 1e-6)                  # add epsilon for safety
    vaXn = (vaX - X_mean) / X_std                               # normalize val features
    device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")  # accel if on Mac
    model = SeqGRU(input_dim=input_dim, hidden=128, num_classes=len(classes))                   # build model
    model.load_state_dict(state["model"])                    # load trained weights
    model.eval().to(device)                                  # eval mode
    with torch.no_grad():                                    # no grad for eval
        xb = torch.from_numpy(vaXn).float().to(device)       # tensorize val set
        logits = model(xb)                                   # forward pass
        pred = logits.argmax(1).cpu().numpy()                # top-1 class indices
    cm = confusion_matrix(vaY, pred)                         # confusion matrix
    print("Classes:", classes)
    print("\nConfusion matrix (rows=true, cols=pred):\n", cm)
    print("\nReport:\n", classification_report(vaY, pred, target_names=classes))  # precision/recall/F1
 if __name__ == "__main__":
    main()
--- a/doc/infer_seq_webcam.py
+++ b/doc/infer_seq_webcam.py
@@ -0,0 +1,249 @@
 #!/usr/bin/env python3
 """
 Live webcam inference for two hands + full face + pose + face-relative hand extras (1670 dims/frame).
 Works for letters (A..Z) or word classes (e.g., Mother, Father).
 Optionally detects the sequence W → E → B to open a URL.
 """
 import os, math, argparse, time, webbrowser          # stdlib
 import numpy as np                                   # arrays
 import cv2                                           # webcam UI
 import torch                                         # inference
 import mediapipe as mp                               # Holistic landmarks
 # Quiet logs: reduce console noise from TF/absl/OpenCV
 os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"; os.environ["GLOG_minloglevel"] = "2"
 import absl.logging; absl.logging.set_verbosity(absl.logging.ERROR)
 cv2.setLogLevel(0)
 mp_holistic = mp.solutions.holistic
 # ---------- normalization ----------
 def _angle(v):
    """atan2 for 2D vector."""
    return math.atan2(v[1], v[0])
 def _rot2d(t):
    """2×2 rotation matrix for angle t."""
    c, s = math.cos(t), math.sin(t)
    return np.array([[c, -s], [s, c]], dtype=np.float32)
 def normalize_hand(pts, handed=None):
    """
    Wrist-translate, mirror left→right, rotate so middle MCP is +Y, scale by max XY spread.
    Returns (21,3).
    """
    pts = pts.astype(np.float32).copy()
    pts[:, :2] -= pts[0, :2]
    if handed and str(handed).lower().startswith("left"): pts[:, 0] *= -1.0
    v = pts[9, :2]; R = _rot2d(math.pi/2 - _angle(v))
    pts[:, :2] = pts[:, :2] @ R.T
    xy = pts[:, :2]; d = np.linalg.norm(xy[None,:,:] - xy[:,None,:], axis=-1).max()
    d = 1.0 if d < 1e-6 else float(d)
    pts[:, :2] /= d; pts[:, 2] /= d
    return pts
 def normalize_face(face):
    """Center at eye midpoint, scale by inter-ocular, rotate eye-line horizontal; returns (468,3)."""
    f = face.astype(np.float32).copy()
    left, right = f[33, :2], f[263, :2]
    center = 0.5*(left+right)
    f[:, :2] -= center[None, :]
    eye_vec = right - left; eye_dist = float(np.linalg.norm(eye_vec)) or 1.0
    f[:, :2] /= eye_dist; f[:, 2] /= eye_dist
    R = _rot2d(-_angle(eye_vec)); f[:, :2] = f[:, :2] @ R.T
    return f
 def normalize_pose(pose):
    """Center at shoulder midpoint, scale by shoulder width, rotate shoulders horizontal; returns (33,4)."""
    p = pose.astype(np.float32).copy()
    ls, rs = p[11, :2], p[12, :2]
    center = 0.5*(ls+rs); p[:, :2] -= center[None, :]
    sw_vec = rs - ls; sw = float(np.linalg.norm(sw_vec)) or 1.0
    p[:, :2] /= sw; p[:, 2] /= sw
    R = _rot2d(-_angle(sw_vec)); p[:, :2] = p[:, :2] @ R.T
    return p
 def face_frame_transform(face_pts):
    """Return (center, eye_dist, R) to project points into the face-normalized frame."""
    left = face_pts[33, :2]; right = face_pts[263, :2]
    center = 0.5*(left + right)
    eye_vec = right - left
    eye_dist = float(np.linalg.norm(eye_vec)) or 1.0
    R = _rot2d(-_angle(eye_vec))
    return center, eye_dist, R
 def to_face_frame(pt_xy, center, eye_dist, R):
    """Project a 2D point into the face frame."""
    v = (pt_xy - center) / eye_dist
    return (v @ R.T).astype(np.float32)
 # ---------- model ----------
 class SeqGRU(torch.nn.Module):
    """
    BiGRU classifier used at training time; same shape and head for inference.
    """
    def __init__(self, input_dim, hidden=128, num_classes=26):
        super().__init__()
        self.gru = torch.nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
        self.head = torch.nn.Sequential(
            torch.nn.Linear(hidden*2, 128), torch.nn.ReLU(), torch.nn.Dropout(0.2),
            torch.nn.Linear(128, num_classes),
        )
    def forward(self, x):
        h,_ = self.gru(x)                            # (B,T,2H)
        return self.head(h[:, -1, :])                # last-time-step logits
 # ---------- main ----------
 def main():
    """
    Stream webcam, build rolling window of T frames, normalize with training stats,
    classify with BiGRU, overlay current top prediction, and optionally trigger
    an action when the sequence 'W', 'E', 'B' is observed.
    """
    ap = argparse.ArgumentParser()
    ap.add_argument("--model", required=True)                    # path to .pt checkpoint
    ap.add_argument("--camera", type=int, default=0)             # webcam device index
    ap.add_argument("--threshold", type=float, default=0.35)     # emit threshold for top prob
    ap.add_argument("--smooth", type=float, default=0.1, help="EMA window (seconds); 0 disables")
    ap.add_argument("--width", type=int, default=640)            # capture resolution
    ap.add_argument("--height", type=int, default=480)
    ap.add_argument("--holistic-complexity", type=int, default=1, choices=[0,1,2])  # accuracy/speed
    ap.add_argument("--det-thresh", type=float, default=0.5)     # detector confidence thresholds
    ap.add_argument("--url", type=str, default="https://www.google.com")            # used on WEB
    args = ap.parse_args()
    state = torch.load(args.model, map_location="cpu", weights_only=False)          # load checkpoint dict
    classes = state["classes"]                                                      # label names
    T = int(state.get("frames", 32))                                                # window length
    X_mean = state["X_mean"].cpu().numpy().astype(np.float32)                       # normalization stats
    X_std  = (state["X_std"].cpu().numpy().astype(np.float32) + 1e-6)
    input_dim = X_mean.shape[-1]                                                    # expected F (1670)
    device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")  # Apple MPS if avail
    model = SeqGRU(input_dim=input_dim, hidden=128, num_classes=len(classes)).to(device)        # same arch
    model.load_state_dict(state["model"]); model.eval()                                         # load weights
    hol = mp_holistic.Holistic(                                                     # configure detector
        static_image_mode=False,
        model_complexity=args.holistic_complexity,
        smooth_landmarks=True,
        enable_segmentation=False,
        refine_face_landmarks=False,
        min_detection_confidence=args.det_thresh,
        min_tracking_confidence=args.det_thresh,
    )
    cap = cv2.VideoCapture(args.camera)                                             # open camera
    if not cap.isOpened(): raise SystemExit(f"❌ Could not open camera {args.camera}")
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, args.width); cap.set(cv2.CAP_PROP_FRAME_HEIGHT, args.height)
    print(f"✅ Loaded {args.model}  frames={T}  classes={classes}  input_dim={input_dim}")
    print("Press 'q' to quit.")
    seq_buffer, ema_probs = [], None                                                # rolling window + smoother
    last_ts = time.time()                                                           # for EMA time constant
    last_emitted = None                                                             # de-bounce repeated prints
    history = []                                                                    # recent emitted labels
    while True:
        ok, frame = cap.read()                                                      # grab a frame
        if not ok: break
        now = time.time(); dt = max(1e-6, now - last_ts); last_ts = now            # frame delta seconds
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)                                # BGR→RGB
        res = hol.process(rgb)                                                      # run detection
        overlay = "No face/hand"                                                    # default HUD text
        current = None                                                              # currently confident label
        # hands
        right_pts = left_pts = None
        if res.right_hand_landmarks is not None:
            right_pts = np.array([[lm.x, lm.y, lm.z] for lm in res.right_hand_landmarks.landmark], np.float32)
        if res.left_hand_landmarks is not None:
            left_pts  = np.array([[lm.x, lm.y, lm.z] for lm in res.left_hand_landmarks.landmark],  np.float32)
        # face
        face_pts = None
        if res.face_landmarks is not None:
            face_pts = np.array([[lm.x, lm.y, lm.z] for lm in res.face_landmarks.landmark], np.float32)
        # pose
        pose_arr = None
        if res.pose_landmarks is not None:
            pose_arr = np.array([[lm.x, lm.y, lm.z, lm.visibility] for lm in res.pose_landmarks.landmark], np.float32)
        if face_pts is not None and (right_pts is not None or left_pts is not None):
            f_norm = normalize_face(face_pts)                                       # normalized face (anchor)
            # build extras in face frame (preserve where hands are relative to face)
            f_center, f_scale, f_R = face_frame_transform(face_pts)
            def hand_face_extras(hand_pts):
                """Return [wrist.x, wrist.y, tip.x, tip.y] projected into the face frame, or zeros."""
                if hand_pts is None:
                    return np.zeros(4, np.float32)
                wrist_xy = hand_pts[0, :2]
                tip_xy   = hand_pts[8, :2]
                w = to_face_frame(wrist_xy, f_center, f_scale, f_R)
                t = to_face_frame(tip_xy,   f_center, f_scale, f_R)
                return np.array([w[0], w[1], t[0], t[1]], np.float32)
            rh_ex = hand_face_extras(right_pts)
            lh_ex = hand_face_extras(left_pts)
            rh = normalize_hand(right_pts, "Right").reshape(-1) if right_pts is not None else np.zeros(63, np.float32)
            lh = normalize_hand(left_pts,  "Left" ).reshape(-1) if left_pts  is not None else np.zeros(63, np.float32)
            p_norm = normalize_pose(pose_arr).reshape(-1)      if pose_arr is not None else np.zeros(33*4, np.float32)
            feat = np.concatenate([rh, lh, f_norm.reshape(-1), p_norm, rh_ex, lh_ex], axis=0)  # (1670,)
            seq_buffer.append(feat)                                           # push newest feature frame
            if len(seq_buffer) > T: seq_buffer.pop(0)                         # keep last T frames only
            if len(seq_buffer) == T:                                          # only infer when buffer full
                X = np.stack(seq_buffer, 0)                                   # (T, F)
                Xn = (X - X_mean) / X_std                                     # normalize with training stats
                xt = torch.from_numpy(Xn).float().unsqueeze(0).to(device)     # (1, T, F)
                with torch.no_grad():                                         # inference (no grads)
                    probs = torch.softmax(model(xt), dim=1)[0].cpu().numpy()  # class probabilities
                if args.smooth > 0:
                    alpha = 1.0 - math.exp(-dt / args.smooth)                 # EMA with time-based alpha
                    ema_probs = probs if ema_probs is None else (1.0 - alpha) * ema_probs + alpha * probs
                    use = ema_probs
                else:
                    use = probs
                top_idx = int(np.argmax(use)); top_p = float(use[top_idx]); top_cls = classes[top_idx]  # best class
                overlay = f"{top_cls} {top_p*100:.1f}%"                         # HUD text
                if top_p >= args.threshold: current = top_cls                   # only emit when confident
        else:
            seq_buffer, ema_probs = [], None                                    # reset if face+hand not available
        # Emit on change & optional "WEB" sequence trigger
        if current is not None and current != last_emitted:
            print(f"Detected: {current}")                                       # console feedback
            last_emitted = current
            history.append(current)                                             # remember last few
            if len(history) > 3: history.pop(0)
            if history == ["W","E","B"]:                                        # simple finite-seq detector
                print("🚀 Detected WEB! Opening browser…")
                try: webbrowser.open(args.url)                                  # launch default browser
                except Exception as e: print(f"⚠️ Browser open failed: {e}")
                history.clear()                                                 # reset after triggering
        # Overlay HUD
        buf = f"buf={len(seq_buffer)}/{T}"                                      # show buffer fill
        if ema_probs is not None:
            ti = int(np.argmax(ema_probs)); tp = float(ema_probs[ti]); tc = classes[ti]
            buf += f"  top={tc} {tp:.2f}"                                       # show smoothed top prob
        cv2.putText(frame, overlay, (20, 40), cv2.FONT_HERSHEY_SIMPLEX, 1.1, (0,255,0), 2)
        cv2.putText(frame, buf,     (20, 75), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0,255,0), 2)
        cv2.imshow("ASL demo (R+L hands + face + pose + extras)", frame)        # preview window
        if cv2.waitKey(1) & 0xFF == ord('q'): break                             # quit key
    cap.release(); cv2.destroyAllWindows()                                      # cleanup
 if __name__ == "__main__":
    main()
--- a/doc/prep_sequence_resampled.py
+++ b/doc/prep_sequence_resampled.py
@@ -0,0 +1,93 @@
 #!/usr/bin/env python3
 # Build fixed-length (N frames) dataset from sequences/<split>/<CLASS>/clip_*.npz
 import argparse, os, glob, json                   # stdlib utilities
 from pathlib import Path
 import numpy as np                                 # arrays
 def resample_sequence(X, N=32):
    """
    Linearly resample a variable-length sequence (T,F) to exactly (N,F) over the frame index.
    This preserves temporal order and distributes frames evenly across the clip.
    """
    T = len(X)                                     # original number of frames
    if T == 0: return np.zeros((N, X.shape[1]), np.float32)  # empty → zeros
    if T == 1: return np.repeat(X, N, axis=0)                 # single frame → tile N times
    src = np.linspace(0, T-1, num=T, dtype=np.float32)        # original frame positions
    dst = np.linspace(0, T-1, num=N, dtype=np.float32)        # desired positions
    out = np.zeros((N, X.shape[1]), np.float32)               # allocate result
    for d in range(X.shape[1]):                               # interpolate each feature independently
        out[:, d] = np.interp(dst, src, X[:, d])             # linear interpolation
    return out
 def load_classes(seq_root: Path):
    """
    Discover class subfolders under sequences/train/.
    Ignores hidden/system directories. Returns sorted list of class names.
    """
    train_dir = seq_root / "train"
    if not train_dir.exists():
        raise SystemExit(f"Missing folder: {train_dir}")
    classes = sorted([
        p.name for p in train_dir.iterdir()
        if p.is_dir() and not p.name.startswith(".")
    ])
    if not classes:
        raise SystemExit("No classes found in sequences/train/ (folders should be class names like Mother, Father, etc.)")
    return classes
 def collect_split(seq_root: Path, split: str, classes, N):
    """
    Collect all clips for a given split ('train' or 'val'):
      - Load each clip_*.npz
      - Resample to (N,F)
      - Stack into X (num_clips, N, F) and y (num_clips,)
    """
    Xs, ys = [], []
    for ci, cls in enumerate(classes):                            # class index, name
        for f in sorted(glob.glob(str(seq_root / split / cls / "clip_*.npz"))):  # iterate clips
            d = np.load(f)                                        # load .npz
            Xi = d["X"].astype(np.float32)                        # (T,F) features
            XiN = resample_sequence(Xi, N)                        # (N,F) resampled
            Xs.append(XiN); ys.append(ci)                         # add to lists
    if Xs:
        X = np.stack(Xs, 0); y = np.array(ys, np.int64)           # stack arrays
    else:
        X = np.zeros((0, N, 1), np.float32); y = np.zeros((0,), np.int64)  # empty split guard
    return X, y
 def main():
    """
    CLI: read sequences/*/*/clip_*.npz, resample to --frames, and write dataset arrays and metadata.
    """
    ap = argparse.ArgumentParser()
    ap.add_argument("--in", dest="in_dir", default="sequences")     # source root
    ap.add_argument("--out", default="landmarks_seq32")             # destination folder
    ap.add_argument("--frames", type=int, default=32)               # target frames per clip
    args = ap.parse_args()
    seq_root = Path(args.in_dir)                                    # resolve input root
    outdir = Path(args.out); outdir.mkdir(parents=True, exist_ok=True)
    classes = load_classes(seq_root)                                # discover class names
    trX, trY = collect_split(seq_root, "train", classes, args.frames)  # build train split
    vaX, vaY = collect_split(seq_root, "val",   classes, args.frames)  # build val split
    if trX.size == 0 and vaX.size == 0:                             # sanity check
        raise SystemExit("Found no clips. Did you run capture and save any clip_*.npz files?")
    np.save(outdir/"train_X.npy", trX)                              # save arrays
    np.save(outdir/"train_y.npy", trY)
    np.save(outdir/"val_X.npy",   vaX)
    np.save(outdir/"val_y.npy",   vaY)
    json.dump(classes, open(outdir/"class_names.json", "w"))        # save labels
    # Detect true feature dimension from data (in case it changes)
    input_dim = int(trX.shape[-1] if trX.size else vaX.shape[-1])
    json.dump({"frames": args.frames, "input_dim": input_dim}, open(outdir/"meta.json","w"))
    print(f"Saved dataset → {outdir}")
    print(f"  train {trX.shape}, val {vaX.shape}, classes={classes}, input_dim={input_dim}")
 if __name__ == "__main__":
    main()
--- a/doc/train_seq.py
+++ b/doc/train_seq.py
@@ -0,0 +1,137 @@
 #!/usr/bin/env python3
 # Train BiGRU on (T, F) sequences; reads input_dim from meta.json
 import os, json, argparse                           # stdlib
 import numpy as np                                   # arrays
 import torch, torch.nn as nn                         # model/ops
 from torch.utils.data import Dataset, DataLoader     # data pipeline
 def get_device():
    """
    Prefer Apple Silicon's MPS if available; fallback to CPU/GPU accordingly.
    """
    return torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
 class SeqDataset(Dataset):
    """
    Simple dataset wrapper with optional light augmentation.
    """
    def __init__(self, X, y, augment=False):
        self.X = X.astype(np.float32)               # ensure float32 features
        self.y = y.astype(np.int64)                 # class indices as int64
        self.augment = augment
    def __len__(self): return len(self.y)           # number of samples
    def _augment(self, seq):
        # Add tiny Gaussian noise; helpful regularizer for high-D continuous features.
        return seq + np.random.normal(0, 0.01, size=seq.shape).astype(np.float32)
    def __getitem__(self, i):
        xi = self.X[i]                               # (T, F)
        if self.augment: xi = self._augment(xi)      # optional noise
        return torch.from_numpy(xi).float(), int(self.y[i])  # return (tensor, label)
 class SeqGRU(nn.Module):
    """
    BiGRU → MLP head classifier.
    Uses last time step of GRU outputs (many-to-one).
    """
    def __init__(self, input_dim, hidden=128, num_classes=26):
        super().__init__()
        self.gru = nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
        self.head = nn.Sequential(
            nn.Linear(hidden*2, 128), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(128, num_classes),
        )
    def forward(self, x):
        h,_ = self.gru(x)                            # (B,T,2H)
        return self.head(h[:, -1, :])                # logits (B,C)
 def main():
    """
    Train loop:
      - Load prepared dataset
      - Compute global mean/std on train and normalize train/val
      - Train BiGRU with AdamW + cosine schedule
      - Save best checkpoint by val accuracy (includes mean/std)
    """
    ap = argparse.ArgumentParser()
    ap.add_argument("--landmarks", default="landmarks_seq32")  # dataset folder
    ap.add_argument("--epochs", type=int, default=40)
    ap.add_argument("--batch", type=int, default=64)
    ap.add_argument("--lr", type=float, default=1e-3)
    ap.add_argument("--out", default="asl_seq32_gru.pt")       # model save path
    args = ap.parse_args()
    trX = np.load(os.path.join(args.landmarks,"train_X.npy"))  # (Ntr, T, F)
    trY = np.load(os.path.join(args.landmarks,"train_y.npy"))  # (Ntr,)
    vaX = np.load(os.path.join(args.landmarks,"val_X.npy"))    # (Nva, T, F)
    vaY = np.load(os.path.join(args.landmarks,"val_y.npy"))    # (Nva,)
    classes = json.load(open(os.path.join(args.landmarks,"class_names.json")))
    meta = json.load(open(os.path.join(args.landmarks,"meta.json")))
    T = int(meta["frames"])                                    # #frames per clip
    input_dim = int(meta.get("input_dim", trX.shape[-1]))      # feature dim (safety)
    print(f"Loaded: train {trX.shape}  val {vaX.shape}  classes={classes}  input_dim={input_dim}")
    # Global normalization (feature-wise) computed on TRAIN ONLY
    X_mean = trX.reshape(-1, trX.shape[-1]).mean(axis=0, keepdims=True).astype(np.float32)  # (1,F)
    X_std  = trX.reshape(-1, trX.shape[-1]).std(axis=0,  keepdims=True).astype(np.float32) + 1e-6
    trXn   = (trX - X_mean) / X_std                            # normalize train
    vaXn   = (vaX - X_mean) / X_std                            # normalize val using train stats
    tr_ds = SeqDataset(trXn, trY, augment=True)                # datasets
    va_ds = SeqDataset(vaXn, vaY, augment=False)
    tr_dl = DataLoader(tr_ds, batch_size=args.batch, shuffle=True)   # loaders
    va_dl = DataLoader(va_ds, batch_size=args.batch, shuffle=False)
    device = get_device()                                      # target device
    model = SeqGRU(input_dim=input_dim, hidden=128, num_classes=len(classes)).to(device)
    crit = nn.CrossEntropyLoss()                                # standard multi-class loss
    opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=1e-4)  # AdamW helps generalization
    sch = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=args.epochs)    # smooth LR decay
    best_acc, best_state = 0.0, None                           # track best val acc
    for epoch in range(1, args.epochs+1):
        model.train()
        tot, correct, loss_sum = 0, 0, 0.0
        for xb, yb in tr_dl:
            xb, yb = xb.to(device), yb.to(device)              # move to device
            opt.zero_grad(set_to_none=True)                    # reset grads
            logits = model(xb)                                 # forward
            loss = crit(logits, yb)                            # compute loss
            loss.backward()                                    # backprop
            opt.step()                                         # update weights
            loss_sum += loss.item() * yb.size(0)               # accumulate loss
            correct += (logits.argmax(1)==yb).sum().item()     # count train correct
            tot += yb.size(0)                                  # sample counter
        tr_loss = loss_sum / max(1, tot)
        tr_acc = correct / max(1, tot)
        model.eval()
        vtot, vcorrect = 0, 0
        with torch.no_grad():
            for xb, yb in va_dl:
                xb, yb = xb.to(device), yb.to(device)
                logits = model(xb)
                vcorrect += (logits.argmax(1)==yb).sum().item()
                vtot += yb.size(0)
        va_acc = vcorrect / max(1, vtot)                       # validation accuracy
        sch.step()                                             # update LR schedule
        print(f"Epoch {epoch:02d}: train_loss={tr_loss:.4f} train_acc={tr_acc:.3f} val_acc={va_acc:.3f}")
        if va_acc > best_acc:                                  # save best checkpoint
            best_acc = va_acc
            best_state = {
                "model": model.state_dict(),
                "classes": classes,
                "frames": T,
                "X_mean": torch.from_numpy(X_mean),
                "X_std":  torch.from_numpy(X_std),
            }
            torch.save(best_state, args.out)
            print(f"  ✅ Saved best → {args.out} (val_acc={best_acc:.3f})")
    print("Done. Best val_acc:", best_acc)
 if __name__ == "__main__":
    main()
--- a/eval_val.py
+++ b/eval_val.py
@@ -0,0 +1,61 @@
 #!/usr/bin/env python3
 # Evaluate a trained SeqGRU on the validation set; reads input_dim from meta.json
 import os, json, argparse
 import numpy as np
 import torch, torch.nn as nn
 from sklearn.metrics import classification_report, confusion_matrix
 class SeqGRU(nn.Module):
    def __init__(self, input_dim, hidden=128, num_classes=26):
        super().__init__()
        self.gru = nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
        self.head = nn.Sequential(
            nn.Linear(hidden*2, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, num_classes),
        )
    def forward(self, x):
        h,_ = self.gru(x)
        return self.head(h[:, -1, :])
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--landmarks", default="landmarks_seq32")
    ap.add_argument("--model", required=True)
    args = ap.parse_args()
    vaX = np.load(os.path.join(args.landmarks,"val_X.npy"))
    vaY = np.load(os.path.join(args.landmarks,"val_y.npy"))
    classes = json.load(open(os.path.join(args.landmarks,"class_names.json")))
    meta = json.load(open(os.path.join(args.landmarks,"meta.json")))
    T = int(meta.get("frames", vaX.shape[1]))
    input_dim = int(meta.get("input_dim", vaX.shape[-1]))
    state = torch.load(args.model, map_location="cpu", weights_only=False)
    X_mean, X_std = state["X_mean"], state["X_std"]
    if isinstance(X_mean, torch.Tensor): X_mean = X_mean.numpy()
    if isinstance(X_std,  torch.Tensor): X_std  = X_std.numpy()
    X_mean = X_mean.astype(np.float32)
    X_std  = (X_std.astype(np.float32) + 1e-6)
    vaXn = (vaX - X_mean) / X_std
    device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
    model = SeqGRU(input_dim=input_dim, hidden=128, num_classes=len(classes))
    model.load_state_dict(state["model"])
    model.eval().to(device)
    with torch.no_grad():
        xb = torch.from_numpy(vaXn).float().to(device)
        logits = model(xb)
        pred = logits.argmax(1).cpu().numpy()
    cm = confusion_matrix(vaY, pred)
    print("Classes:", classes)
    print("\nConfusion matrix (rows=true, cols=pred):\n", cm)
    print("\nReport:\n", classification_report(vaY, pred, target_names=classes))
 if __name__ == "__main__":
    main()
--- a/first_attempt_landmark_hands/README.md
+++ b/first_attempt_landmark_hands/README.md
@@ -0,0 +1,216 @@
 # Handshape Sequence Classifier (MediaPipe + PyTorch, macOS MPS-ready)
 Live ASL handshape letter demo powered by MediaPipe Hands landmarks and a bidirectional GRU sequence model.
 Record short clips per letter, resample to a fixed length, train, evaluate, and run a real-time webcam demo that can react to detected letter sequences (e.g., **W → E → B** opens a URL).
 ## Features
 * **Data capture UI:** 3-second centered countdown + top progress bar; fingertip dot feedback.
 * **Robust normalization:** wrist-anchored, left/right mirroring, rotation to +Y, scale by max pairwise distance.
 * **Fixed-length preprocessing:** linear resampling to *N* frames (default **32**).
 * **Sequence model:** BiGRU (128 hidden × 2) → MLP head; light augmentation during training.
 * **Live inference:** EMA smoothing + thresholding; emits letters only on change; detects special sequences (**WEB**) and opens a browser.
 ---
 ## Quick Start
 ```bash
 # 0) (optional) Create & activate a virtual env
 python -m venv .venv && source .venv/bin/activate
 # 1) Install deps
 pip install numpy opencv-python mediapipe torch scikit-learn
 # 2) Make directories for the letters you’ll collect
 ./make_seq_dirs.sh A B J Z
 # 3) Capture short clips per letter (train/val)
 python capture_sequence.py --label A --split train
 python capture_sequence.py --label A --split val
 # ...repeat for B, J, Z
 # 4) Preprocess → fixed-length dataset (32 frames)
 python prep_sequence_resampled.py --in sequences --out landmarks_seq32 --frames 32
 # 5) Train the BiGRU
 python train_seq.py --landmarks landmarks_seq32 --epochs 40 --batch 64 --lr 1e-3 \
  --out asl_seq32_gru_ABJZ.pt
 # 6) Evaluate on the validation set (confusion matrix + report)
 python eval_val.py --landmarks landmarks_seq32 --model asl_seq32_gru_ABJZ.pt
 # 7) Live webcam demo (press 'q' to quit)
 python infer_seq_webcam.py --model asl_seq32_gru_ABJZ.pt --threshold 0.8 --smooth 0.7
 ```
 > **WEB trigger:** In the live demo, if the emitted letters form **W → E → B**, the app prints a message and opens `--url` (default: Google).
 > Example: `--url https://www.gallaudet.edu`
 ---
 ## Repository Layout
 ```
 handshapes-multiclass/
 ├─ make_seq_dirs.sh                 # creates sequences/train|val/<LETTER>/
 ├─ capture_sequence.py              # webcam capture → clip_XXX.npz (X: (T,63), tip: (T,2))
 ├─ prep_sequence_resampled.py       # resample clips to fixed N frames → landmarks_seq32/
 ├─ train_seq.py                     # train BiGRU; saves best checkpoint (.pt + stats)
 ├─ eval_val.py                      # evaluate on val set; prints metrics
 ├─ infer_seq_webcam.py              # live demo; emits letters; detects "WEB" → opens URL
 ├─ what_to_do.txt                   # quick, step-by-step playbook
 └─ sequences/                       # created by you (after running make_seq_dirs.sh)
   ├─ train/<LETTER>/clip_XXX.npz
   └─ val/<LETTER>/clip_XXX.npz
 ```
 **Clip file format (`clip_XXX.npz`)**
 * `X`: `(T, 63)` — per-frame normalized landmarks (21 points × (x, y, z))
 * `tip`: `(T, 2)` — normalized index fingertip positions (for sanity checks)
 **Prepared dataset (`landmarks_seq32/`)**
 * `train_X.npy`, `train_y.npy`, `val_X.npy`, `val_y.npy`
 * `class_names.json` (e.g., `["A","B","J","Z"]`)
 * `meta.json` (e.g., `{"frames":32,"input_dim":63}`)
 **Checkpoint (`*.pt`)**
 * `model` (state_dict), `classes`, `frames`, `X_mean`, `X_std`
 ---
 ## Normalization (consistent across capture & inference)
 1. Translate so **wrist** (landmark 0) is at the origin.
 2. If detected **left** hand, mirror `x *= -1`.
 3. Rotate so the **middle-finger MCP** (landmark 9) points along **+Y**.
 4. Scale all coords by the **max pairwise distance** among 2D landmarks.
 5. Flatten to **63 features** per frame.
 This ensures letter-style, not camera pose, drives classification.
 ---
 ## Training Details
 * **Model:** BiGRU (input=63, hidden=128, bidirectional) → `[Linear(256→128), ReLU, Dropout(0.2), Linear(128→num_classes)]`
 * **Optimizer:** AdamW (`lr=1e-3`, `weight_decay=1e-4`)
 * **Scheduler:** CosineAnnealingLR (`T_max = epochs`)
 * **Augmentation:** small 2D rotate (±7°), scale (±10%), Gaussian noise (σ=0.01)
 * **Normalization:** global `X_mean`/`X_std` computed over **train** (time+batch), applied to both train & val and saved into the checkpoint.
 ---
 ## Live Inference Behavior
 * Maintains a rolling buffer of **T = frames** (from the checkpoint).
 * Applies the saved `X_mean`/`X_std`.
 * **EMA smoothing** over softmax probs with time constant `--smooth` (seconds).
 * Emits a letter only if:
  * top prob ≥ `--threshold` (e.g., 0.8), **and**
  * the letter **changed** from the previous emission (prevents repeats).
 * Tracks a short history of emitted letters to detect **W → E → B**; on match:
  * prints “Detected WEB! …”
  * calls `webbrowser.open(--url)`
 **Common flags**
 ```bash
 # Camera & size
 --camera 0 --width 640 --height 480
 # Confidence vs. latency tradeoffs
 --threshold 0.85           # higher → fewer false positives
 --smooth 1.0               # higher → steadier output but more lag
 # Action on sequence
 --url https://example.com
 ```
 ---
 ## Tips for High Accuracy
 * Record **balanced** train/val counts per class (e.g., 100 train / 20 val).
 * Keep the hand **centered**, well lit, and mostly **single-hand** (model expects 1 hand).
 * Maintain consistent **distance** and **orientation** during capture.
 * If you add new letters later, just record them, re-run preprocessing, and retrain — classes are **auto-discovered** from `sequences/train/*`.
 ---
 ## macOS (M-series) Notes
 * PyTorch will automatically use **Metal (MPS)** if available (`torch.backends.mps.is_available()`); otherwise CPU.
 * If the webcam feed looks low FPS, try reducing `--width/--height` or raising `--threshold` / `--smooth`.
 ---
 ## Troubleshooting
 * **“Could not open camera”** → try `--camera 1` (or check macOS camera permission).
 * **No detections / “No hand” on screen** → improve lighting, ensure a single clear hand, check MediaPipe install.
 * **Model emits wrong letters** → increase `--threshold`, collect more data, or raise `--smooth`.
 * **Mismatch T during inference** → ensure `--frames` at preprocessing matches the checkpoint’s `frames` (saved & auto-used).
 ---
 ## Commands Reference
 ### Create class folders
 ```bash
 ./make_seq_dirs.sh A B J Z
 ```
 ### Capture clips
 ```bash
 python capture_sequence.py --label A --split train --seconds 0.8 --count 100
 python capture_sequence.py --label A --split val   --seconds 0.8 --count 20
 ```
 ### Prepare dataset (resample to 32 frames)
 ```bash
 python prep_sequence_resampled.py --in sequences --out landmarks_seq32 --frames 32
 ```
 ### Train
 ```bash
 python train_seq.py --landmarks landmarks_seq32 --epochs 40 --batch 64 --lr 1e-3 \
  --out asl_seq32_gru_ABJZ.pt
 ```
 ### Evaluate
 ```bash
 python eval_val.py --landmarks landmarks_seq32 --model asl_seq32_gru_ABJZ.pt
 ```
 ### Live demo (open URL on “WEB”)
 ```bash
 python infer_seq_webcam.py --model asl_seq32_gru_ABJZ.pt --threshold 0.8 --smooth 0.7 \
  --url https://www.gallaudet.edu
 ```
 ---
 ## License
 MIT
 ---
 ## Acknowledgments
 * **MediaPipe Hands** for robust, fast hand landmark detection.
 * **PyTorch** for flexible sequence modeling on CPU/MPS.
 ---
--- a/first_attempt_landmark_hands/capture_sequence.py
+++ b/first_attempt_landmark_hands/capture_sequence.py
@@ -0,0 +1,176 @@
 #!/usr/bin/env python3
 # capture_sequence.py
 # Automatically record N short sequences for each label (default: 100 train / 20 val)
 # Centered 3-second countdown before recording.
 # Per-clip depleting progress bar (full → empty) across the top during capture.
 import argparse, os, time, math, re
 from pathlib import Path
 import numpy as np, cv2, mediapipe as mp
 def normalize_frame(pts, handed=None):
    pts = pts.astype(np.float32).copy()
    pts[:, :2] -= pts[0, :2]
    if handed and handed.lower().startswith("left"):
        pts[:, 0] *= -1.0
    v = pts[9, :2]
    ang = math.atan2(v[1], v[0])
    c, s = math.cos(math.pi/2 - ang), math.sin(math.pi/2 - ang)
    R = np.array([[c, -s], [s, c]], np.float32)
    pts[:, :2] = pts[:, :2] @ R.T
    xy = pts[:, :2]
    d = np.max(np.linalg.norm(xy[None,:,:] - xy[:,None,:], axis=-1))
    if d < 1e-6: d = 1.0
    pts[:, :2] /= d; pts[:, 2] /= d
    return pts
 def next_idx(folder: Path, prefix="clip_"):
    pat = re.compile(rf"^{re.escape(prefix)}(\d+)\.npz$")
    mx = 0
    if folder.exists():
        for n in os.listdir(folder):
            m = pat.match(n)
            if m: mx = max(mx, int(m.group(1)))
    return mx + 1
 def countdown(cap, seconds=3):
    """Display a centered countdown before starting capture."""
    for i in range(seconds, 0, -1):
        start = time.time()
        while time.time() - start < 1.0:
            ok, frame = cap.read()
            if not ok:
                continue
            h, w = frame.shape[:2]
            # Main big number in center
            text = str(i)
            font_scale = 5
            thickness = 10
            (tw, th), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, thickness)
            cv2.putText(frame, text,
                        ((w - tw)//2, (h + th)//2),
                        cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0,0,255), thickness, cv2.LINE_AA)
            # Smaller message above
            msg = "Starting in..."
            font_scale_msg = 1.2
            thickness_msg = 3
            (mw, mh), _ = cv2.getTextSize(msg, cv2.FONT_HERSHEY_SIMPLEX, font_scale_msg, thickness_msg)
            cv2.putText(frame, msg,
                        ((w - mw)//2, (h//2) - th - 20),
                        cv2.FONT_HERSHEY_SIMPLEX, font_scale_msg, (0,255,255), thickness_msg, cv2.LINE_AA)
            cv2.imshow("sequence capture", frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                cap.release(); cv2.destroyAllWindows(); raise SystemExit("Aborted during countdown")
 def draw_progress_bar(img, frac_remaining, bar_h=16, margin=12):
    """
    Draw a top progress bar that starts full and depletes to empty.
    frac_remaining: 1.0 at start → 0.0 at end.
    """
    h, w = img.shape[:2]
    x0, x1 = margin, w - margin
    y0, y1 = margin, margin + bar_h
    # Background bar
    cv2.rectangle(img, (x0, y0), (x1, y1), (40, 40, 40), -1)  # dark gray
    cv2.rectangle(img, (x0, y0), (x1, y1), (90, 90, 90), 2)   # border
    # Foreground (remaining)
    rem_w = int((x1 - x0) * max(0.0, min(1.0, frac_remaining)))
    if rem_w > 0:
        cv2.rectangle(img, (x0, y0), (x0 + rem_w, y1), (0, 200, 0), -1)  # green
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--label", required=True, help="Letter label (A..Z)")
    ap.add_argument("--split", required=True, choices=["train","val"])
    ap.add_argument("--seconds", type=float, default=0.8, help="Clip length (s)")
    ap.add_argument("--camera", type=int, default=0)
    ap.add_argument("--width", type=int, default=640)
    ap.add_argument("--height", type=int, default=480)
    ap.add_argument("--count", type=int, default=None,
                    help="How many clips (default=100 train, 20 val)")
    args = ap.parse_args()
    if args.count is None:
        args.count = 100 if args.split == "train" else 20
    L = args.label.upper().strip()
    if not (len(L) == 1 and "A" <= L <= "Z"):
        raise SystemExit("Use --label A..Z")
    out_dir = Path("sequences") / args.split / L
    out_dir.mkdir(parents=True, exist_ok=True)
    idx = next_idx(out_dir)
    hands = mp.solutions.hands.Hands(
        static_image_mode=False, max_num_hands=1, min_detection_confidence=0.5
    )
    cap = cv2.VideoCapture(args.camera)
    if not cap.isOpened():
        raise SystemExit(f"Could not open camera {args.camera}")
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, args.width)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, args.height)
    print(f"Recording {args.count} clips for {L}/{args.split}, {args.seconds}s each.")
    countdown(cap, 3)
    for n in range(args.count):
        seq_X, seq_tip = [], []
        start_t = time.time()
        end_t = start_t + args.seconds
        while True:
            now = time.time()
            if now >= end_t:
                break
            ok, fr = cap.read()
            if not ok:
                break
            rgb = cv2.cvtColor(fr, cv2.COLOR_BGR2RGB)
            res = hands.process(rgb)
            if res.multi_hand_landmarks:
                ih = res.multi_hand_landmarks[0]
                handed = None
                if res.multi_handedness:
                    handed = res.multi_handedness[0].classification[0].label
                pts = np.array([[lm.x, lm.y, lm.z] for lm in ih.landmark], np.float32)
                pts = normalize_frame(pts, handed)
                seq_X.append(pts.reshape(-1))
                seq_tip.append(pts[8, :2])
                # draw fingertip marker (for feedback)
                cv2.circle(fr,
                           (int(fr.shape[1] * pts[8, 0]), int(fr.shape[0] * pts[8, 1])),
                           6, (0, 255, 0), -1)
            # overlay progress + status
            frac_remaining = (end_t - now) / max(1e-6, args.seconds)  # 1 → 0
            draw_progress_bar(fr, frac_remaining, bar_h=16, margin=12)
            cv2.putText(fr, f"{L} {args.split}  Clip {n+1}/{args.count}",
                        (20, 40), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,0), 2, cv2.LINE_AA)
            cv2.imshow("sequence capture", fr)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                cap.release(); cv2.destroyAllWindows(); return
        if seq_X:
            X = np.stack(seq_X, 0)
            tip = np.stack(seq_tip, 0)
            path = out_dir / f"clip_{idx:03d}.npz"
            np.savez_compressed(path, X=X, tip=tip)
            print(f"💾 saved {path} frames={X.shape[0]}")
            idx += 1
        else:
            print("⚠️ No hand detected; skipped clip.")
    print("✅ Done recording.")
    cap.release(); cv2.destroyAllWindows()
 if __name__ == "__main__":
    main()
--- a/first_attempt_landmark_hands/eval_val.py
+++ b/first_attempt_landmark_hands/eval_val.py
@@ -0,0 +1,60 @@
 #!/usr/bin/env python3
 # eval_seq_val.py
 import os, json, argparse
 import numpy as np
 import torch, torch.nn as nn
 from sklearn.metrics import classification_report, confusion_matrix
 class SeqGRU(nn.Module):
    def __init__(self, input_dim=63, hidden=128, num_classes=26):
        super().__init__()
        self.gru = nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
        self.head = nn.Sequential(
            nn.Linear(hidden*2, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, num_classes),
        )
    def forward(self, x):
        h,_ = self.gru(x)
        h_last = h[:, -1, :]
        return self.head(h_last)
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--landmarks", default="landmarks_seq32")
    ap.add_argument("--model", default="asl_seq32_gru_ABJZ.pt")
    args = ap.parse_args()
    vaX = np.load(os.path.join(args.landmarks,"val_X.npy"))   # (N, T, 63)
    vaY = np.load(os.path.join(args.landmarks,"val_y.npy"))
    classes = json.load(open(os.path.join(args.landmarks,"class_names.json")))
    meta = json.load(open(os.path.join(args.landmarks,"meta.json")))
    T = int(meta.get("frames", 32))
    state = torch.load(args.model, map_location="cpu", weights_only=False)
    X_mean, X_std = state["X_mean"], state["X_std"]
    if isinstance(X_mean, torch.Tensor): X_mean = X_mean.numpy()
    if isinstance(X_std,  torch.Tensor): X_std  = X_std.numpy()
    X_mean = X_mean.astype(np.float32)
    X_std  = (X_std.astype(np.float32) + 1e-6)
    vaXn = (vaX - X_mean) / X_std
    device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
    model = SeqGRU(63, 128, num_classes=len(classes))
    model.load_state_dict(state["model"])
    model.eval().to(device)
    with torch.no_grad():
        xb = torch.from_numpy(vaXn).float().to(device)
        logits = model(xb)
        pred = logits.argmax(1).cpu().numpy()
    cm = confusion_matrix(vaY, pred)
    print("Classes:", classes)
    print("\nConfusion matrix (rows=true, cols=pred):\n", cm)
    print("\nReport:\n", classification_report(vaY, pred, target_names=classes))
 if __name__ == "__main__":
    main()
--- a/first_attempt_landmark_hands/infer_seq_webcam.py
+++ b/first_attempt_landmark_hands/infer_seq_webcam.py
@@ -0,0 +1,198 @@
 #!/usr/bin/env python3
 """
 infer_seq_webcam.py
 Live webcam demo: detect a hand with MediaPipe, normalize landmarks,
 classify with a trained sequence GRU model (multiclass).
 Examples:
  python infer_seq_webcam.py --model asl_seq32_gru_ABJZ.pt --threshold 0.8 --smooth 0.7
  python infer_seq_webcam.py --model asl_seq32_gru_ABJZ.pt --threshold 0.85 --smooth 1.0 --url https://www.google.com
 """
 import os, math, argparse, time, webbrowser
 import numpy as np
 import cv2
 import torch
 import mediapipe as mp
 # --- Quiet logs ---
 os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
 os.environ["GLOG_minloglevel"] = "2"
 import absl.logging
 absl.logging.set_verbosity(absl.logging.ERROR)
 cv2.setLogLevel(0)
 # ---------- geometry helpers ----------
 def _angle(v): return math.atan2(v[1], v[0])
 def _rot2d(t):
    c, s = math.cos(t), math.sin(t)
    return np.array([[c, -s], [s, c]], dtype=np.float32)
 def normalize_landmarks(pts, handedness_label=None):
    """
    pts: (21,3) MediaPipe normalized coords in [0..1]
    Steps: translate wrist->origin, mirror left to right, rotate to +Y, scale by max pairwise distance.
    Returns: (63,) float32
    """
    pts = pts.astype(np.float32).copy()
    pts[:, :2] -= pts[0, :2]
    if handedness_label and handedness_label.lower().startswith("left"):
        pts[:, 0] *= -1.0
    v = pts[9, :2]  # middle MCP
    R = _rot2d(math.pi/2 - _angle(v))
    pts[:, :2] = pts[:, :2] @ R.T
    xy = pts[:, :2]
    d = np.linalg.norm(xy[None,:,:] - xy[:,None,:], axis=-1).max()
    d = 1.0 if d < 1e-6 else float(d)
    pts[:, :2] /= d; pts[:, 2] /= d
    return pts.reshape(-1)
 # ---------- sequence model ----------
 class SeqGRU(torch.nn.Module):
    def __init__(self, input_dim=63, hidden=128, num_classes=26):
        super().__init__()
        self.gru = torch.nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
        self.head = torch.nn.Sequential(
            torch.nn.Linear(hidden*2, 128),
            torch.nn.ReLU(),
            torch.nn.Dropout(0.2),
            torch.nn.Linear(128, num_classes),
        )
    def forward(self, x):
        h, _ = self.gru(x)          # (B,T,2H)
        h_last = h[:, -1, :]        # or h.mean(1)
        return self.head(h_last)
 # ---------- main ----------
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--model", required=True, help="Path to trained .pt model")
    ap.add_argument("--camera", type=int, default=0)
    ap.add_argument("--threshold", type=float, default=0.8)
    ap.add_argument("--smooth", type=float, default=0.7,
                    help="EMA smoothing window in seconds (0 disables smoothing)")
    ap.add_argument("--width", type=int, default=640)
    ap.add_argument("--height", type=int, default=480)
    ap.add_argument("--url", type=str, default="https://www.google.com",
                    help="URL to open when the sequence W→E→B is detected")
    args = ap.parse_args()
    if not os.path.exists(args.model):
        raise SystemExit(f"❌ Model file not found: {args.model}")
    # Load checkpoint (support numpy or tensor stats; support 'frames' if present)
    state = torch.load(args.model, map_location="cpu", weights_only=False)
    classes = state["classes"]
    T = int(state.get("frames", 32))
    X_mean, X_std = state["X_mean"], state["X_std"]
    if isinstance(X_mean, torch.Tensor): X_mean = X_mean.cpu().numpy()
    if isinstance(X_std,  torch.Tensor): X_std  = X_std.cpu().numpy()
    X_mean = X_mean.astype(np.float32)
    X_std  = (X_std.astype(np.float32) + 1e-6)
    device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
    model = SeqGRU(63, 128, num_classes=len(classes)).to(device)
    model.load_state_dict(state["model"])
    model.eval()
    hands = mp.solutions.hands.Hands(
        static_image_mode=False, max_num_hands=1, min_detection_confidence=0.5
    )
    cap = cv2.VideoCapture(args.camera)
    if not cap.isOpened():
        raise SystemExit(f"❌ Could not open camera index {args.camera}")
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, args.width)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, args.height)
    print(f"✅ Loaded {args.model}  frames={T}  classes={classes}")
    print("Press 'q' to quit.")
    seq_buffer, ema_probs = [], None
    last_ts = time.time()
    last_emitted_letter = None
    # Rolling history of emitted letters to detect the sequence "WEB"
    detected_history = []  # only stores emitted letters (deduped by change)
    while True:
        ok, frame = cap.read()
        if not ok: break
        now = time.time()
        dt = max(1e-6, now - last_ts)
        last_ts = now
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        res = hands.process(rgb)
        overlay_text = "No hand"
        current_letter = None
        if res.multi_hand_landmarks:
            ih = res.multi_hand_landmarks[0]
            handed = None
            if res.multi_handedness:
                handed = res.multi_handedness[0].classification[0].label
            pts = np.array([[lm.x, lm.y, lm.z] for lm in ih.landmark], dtype=np.float32)
            feat = normalize_landmarks(pts, handedness_label=handed)
            seq_buffer.append(feat)
            if len(seq_buffer) > T: seq_buffer.pop(0)
            if len(seq_buffer) == T:
                X = np.stack(seq_buffer, 0)
                Xn = (X - X_mean) / X_std
                xt = torch.from_numpy(Xn).float().unsqueeze(0).to(device)
                with torch.no_grad():
                    logits = model(xt)
                    probs = torch.softmax(logits, dim=1)[0].cpu().numpy()
                if args.smooth > 0:
                    alpha = 1.0 - math.exp(-dt / args.smooth)
                    if ema_probs is None: ema_probs = probs
                    else: ema_probs = (1.0 - alpha) * ema_probs + alpha * probs
                    use_probs = ema_probs
                else:
                    use_probs = probs
                top_idx = int(np.argmax(use_probs))
                top_p = float(use_probs[top_idx])
                top_cls = classes[top_idx]
                if top_p >= args.threshold:
                    overlay_text = f"{top_cls} {top_p*100:.1f}%"
                    current_letter = top_cls
        else:
            seq_buffer, ema_probs = [], None
        # Only emit when a *letter* changes (ignore no-hand and repeats)
        if current_letter is not None and current_letter != last_emitted_letter:
            print(f"Detected: {current_letter}")
            last_emitted_letter = current_letter
            # Update rolling history
            detected_history.append(current_letter)
            if len(detected_history) > 3:
                detected_history.pop(0)
            # Check for special sequence "WEB"
            if detected_history == ["W", "E", "B"]:
                print("🚀 Detected WEB! Time to open the web browser app.")
                try:
                    webbrowser.open(args.url)
                except Exception as e:
                    print(f"⚠️ Failed to open browser: {e}")
                detected_history.clear()  # fire once per occurrence
        # On-screen overlay (still shows "No hand" when nothing is detected)
        cv2.putText(frame, overlay_text, (20, 40),
                    cv2.FONT_HERSHEY_SIMPLEX, 1.1, (0,255,0), 2)
        cv2.imshow("ASL sequence demo", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    cap.release()
    cv2.destroyAllWindows()
 if __name__ == "__main__":
    main()
--- a/first_attempt_landmark_hands/make_seq_dirs.sh
+++ b/first_attempt_landmark_hands/make_seq_dirs.sh
@@ -0,0 +1,19 @@
 #!/usr/bin/env bash
 # Create sequences/<train|val>/<LETTER>/ for the given letters.
 # Example: ./make_seq_dirs.sh A B J Z
 set -euo pipefail
 if [ "$#" -lt 1 ]; then
  echo "Usage: $0 LETTER [LETTER ...]   e.g.  $0 A B J Z"
  exit 1
 fi
 ROOT="sequences"
 for SPLIT in train val; do
  for L in "$@"; do
    mkdir -p "$ROOT/$SPLIT/$L"
  done
 done
 echo "✅ Created $ROOT/train and $ROOT/val for: $*"
--- a/first_attempt_landmark_hands/prep_sequence_resampled.py
+++ b/first_attempt_landmark_hands/prep_sequence_resampled.py
@@ -0,0 +1,71 @@
 #!/usr/bin/env python3
 # prep_sequence_resampled.py
 # Build a fixed-length (N frames) multiclass dataset from sequences/<split>/<CLASS>/clip_*.npz
 import argparse, os, glob, json
 from pathlib import Path
 import numpy as np
 def resample_sequence(X, N=32):
    # X: (T,63)  -> (N,63) by linear interpolation along frame index
    T = len(X)
    if T == 0:
        return np.zeros((N, X.shape[1]), np.float32)
    if T == 1:
        return np.repeat(X, N, axis=0)
    src = np.linspace(0, T-1, num=T)
    dst = np.linspace(0, T-1, num=N)
    out = np.zeros((N, X.shape[1]), np.float32)
    for d in range(X.shape[1]):
        out[:, d] = np.interp(dst, src, X[:, d])
    return out.astype(np.float32)
 def load_classes(seq_root: Path):
    # classes are subdirs in sequences/train/
    classes = sorted([p.name for p in (seq_root/"train").iterdir() if p.is_dir()])
    classes = [c for c in classes if len(c)==1 and "A"<=c<="Z"]
    if not classes:
        raise SystemExit("No letter classes found in sequences/train/")
    return classes
 def collect_split(seq_root: Path, split: str, classes, N):
    Xs, ys = [], []
    for ci, cls in enumerate(classes):
        for f in sorted(glob.glob(str(seq_root/split/cls/"clip_*.npz"))):
            d = np.load(f)
            Xi = d["X"].astype(np.float32)   # (T,63)
            XiN = resample_sequence(Xi, N)   # (N,63)
            Xs.append(XiN); ys.append(ci)
    if Xs:
        X = np.stack(Xs, 0)
        y = np.array(ys, np.int64)
    else:
        X = np.zeros((0, N, 63), np.float32); y = np.zeros((0,), np.int64)
    return X, y
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--in", dest="in_dir", default="sequences", help="Root sequences/ with train/ and val/")
    ap.add_argument("--out", default="landmarks_seq32", help="Output folder with npy files")
    ap.add_argument("--frames", type=int, default=32, help="Frames per clip after resampling (default: 32)")
    args = ap.parse_args()
    seq_root = Path(args.in_dir)
    outdir = Path(args.out)
    outdir.mkdir(parents=True, exist_ok=True)
    classes = load_classes(seq_root)
    trX, trY = collect_split(seq_root, "train", classes, args.frames)
    vaX, vaY = collect_split(seq_root, "val", classes, args.frames)
    np.save(outdir/"train_X.npy", trX)
    np.save(outdir/"train_y.npy", trY)
    np.save(outdir/"val_X.npy", vaX)
    np.save(outdir/"val_y.npy", vaY)
    json.dump(classes, open(outdir/"class_names.json", "w"))
    json.dump({"frames": args.frames, "input_dim": 63}, open(outdir/"meta.json","w"))
    print(f"Saved dataset → {outdir}")
    print(f"  train {trX.shape}, val {vaX.shape}, classes={classes}")
 if __name__ == "__main__":
    main()
--- a/first_attempt_landmark_hands/train_seq.py
+++ b/first_attempt_landmark_hands/train_seq.py
@@ -0,0 +1,136 @@
 #!/usr/bin/env python3
 # train_seq.py
 import os, json, argparse
 import numpy as np
 import torch, torch.nn as nn
 from torch.utils.data import Dataset, DataLoader
 def get_device():
    return torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
 class SeqDataset(Dataset):
    def __init__(self, X, y, augment=False):
        self.X = X.astype(np.float32)   # (Nclip, T, 63)
        self.y = y.astype(np.int64)
        self.augment = augment
    def __len__(self): return len(self.y)
    def _augment(self, seq):  # seq: (T,63)
        T = seq.shape[0]
        pts = seq.reshape(T, 21, 3).copy()
        # small 2D rotation (±7°) + scale (±10%) + Gaussian noise (σ=0.01)
        ang = np.deg2rad(np.random.uniform(-7, 7))
        c, s = np.cos(ang), np.sin(ang)
        R = np.array([[c,-s],[s,c]], np.float32)
        scale = np.random.uniform(0.9, 1.1)
        pts[:, :, :2] = (pts[:, :, :2] @ R.T) * scale
        pts += np.random.normal(0, 0.01, size=pts.shape).astype(np.float32)
        return pts.reshape(T, 63)
    def __getitem__(self, i):
        xi = self.X[i]
        if self.augment:
            xi = self._augment(xi)
        return torch.from_numpy(xi).float(), int(self.y[i])
 class SeqGRU(nn.Module):
    def __init__(self, input_dim=63, hidden=128, num_classes=26):
        super().__init__()
        self.gru = nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
        self.head = nn.Sequential(
            nn.Linear(hidden*2, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, num_classes),
        )
    def forward(self, x):           # x: (B,T,63)
        h,_ = self.gru(x)           # (B,T,2H)
        h_last = h[:, -1, :]        # or mean over time: h.mean(1)
        return self.head(h_last)
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--landmarks", default="landmarks_seq32", help="Folder from prep_sequence_resampled.py")
    ap.add_argument("--epochs", type=int, default=40)
    ap.add_argument("--batch", type=int, default=64)
    ap.add_argument("--lr", type=float, default=1e-3)
    ap.add_argument("--out", default="asl_seq32_gru.pt")
    args = ap.parse_args()
    # Load dataset
    trX = np.load(os.path.join(args.landmarks,"train_X.npy"))  # (N, T, 63)
    trY = np.load(os.path.join(args.landmarks,"train_y.npy"))
    vaX = np.load(os.path.join(args.landmarks,"val_X.npy"))
    vaY = np.load(os.path.join(args.landmarks,"val_y.npy"))
    classes = json.load(open(os.path.join(args.landmarks,"class_names.json")))
    meta = json.load(open(os.path.join(args.landmarks,"meta.json")))
    T = int(meta["frames"])
    print(f"Loaded: train {trX.shape}  val {vaX.shape}  classes={classes}")
    # Global mean/std over train (time+batch)
    X_mean = trX.reshape(-1, trX.shape[-1]).mean(axis=0, keepdims=True).astype(np.float32)  # (1,63)
    X_std  = trX.reshape(-1, trX.shape[-1]).std(axis=0, keepdims=True).astype(np.float32) + 1e-6
    trXn   = (trX - X_mean) / X_std
    vaXn   = (vaX - X_mean) / X_std
    tr_ds = SeqDataset(trXn, trY, augment=True)
    va_ds = SeqDataset(vaXn, vaY, augment=False)
    tr_dl = DataLoader(tr_ds, batch_size=args.batch, shuffle=True)
    va_dl = DataLoader(va_ds, batch_size=args.batch, shuffle=False)
    device = get_device()
    model = SeqGRU(input_dim=63, hidden=128, num_classes=len(classes)).to(device)
    crit = nn.CrossEntropyLoss()
    opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=1e-4)
    sch = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=args.epochs)
    best_acc, best_state = 0.0, None
    for epoch in range(1, args.epochs+1):
        # Train
        model.train()
        tot, correct, loss_sum = 0, 0, 0.0
        for xb, yb in tr_dl:
            xb, yb = xb.to(device), yb.to(device)
            opt.zero_grad(set_to_none=True)
            logits = model(xb)
            loss = crit(logits, yb)
            loss.backward()
            opt.step()
            loss_sum += loss.item() * yb.size(0)
            correct += (logits.argmax(1)==yb).sum().item()
            tot += yb.size(0)
        tr_loss = loss_sum / max(1, tot)
        tr_acc = correct / max(1, tot)
        # Validate
        model.eval()
        vtot, vcorrect = 0, 0
        with torch.no_grad():
            for xb, yb in va_dl:
                xb, yb = xb.to(device), yb.to(device)
                logits = model(xb)
                vcorrect += (logits.argmax(1)==yb).sum().item()
                vtot += yb.size(0)
        va_acc = vcorrect / max(1, vtot)
        sch.step()
        print(f"Epoch {epoch:02d}: train_loss={tr_loss:.4f} train_acc={tr_acc:.3f} val_acc={va_acc:.3f}")
        if va_acc > best_acc:
            best_acc = va_acc
            best_state = {
                "model": model.state_dict(),
                "classes": classes,
                "frames": T,
                "X_mean": torch.from_numpy(X_mean),   # tensors → future-proof
                "X_std":  torch.from_numpy(X_std),
            }
            torch.save(best_state, args.out)
            print(f"  ✅ Saved best → {args.out} (val_acc={best_acc:.3f})")
    print("Done. Best val_acc:", best_acc)
 if __name__ == "__main__":
    main()
--- a/first_attempt_landmark_hands/what_to_do.txt
+++ b/first_attempt_landmark_hands/what_to_do.txt
@@ -0,0 +1,24 @@
 # 1) Create dirs
 # ./make_seq_dirs.sh A B J Z
 # 2) Capture clips (0.8s each by default)
 python capture_sequence.py --label A --split train
 python capture_sequence.py --label A --split val
 python capture_sequence.py --label B --split train
 python capture_sequence.py --label B --split val
 python capture_sequence.py --label J --split train
 python capture_sequence.py --label J --split val
 python capture_sequence.py --label Z --split train
 python capture_sequence.py --label Z --split val
 # 3) Preprocess to 32 frames (auto-picks classes from sequences/train/*)
 python prep_sequence_resampled.py --in sequences --out landmarks_seq32 --frames 32
 # 4) Train GRU (multiclass on A/B/J/Z)
 python train_seq.py --landmarks landmarks_seq32 --epochs 40 --batch 64 --lr 1e-3 --out asl_seq32_gru_ABJZ.pt
 # 5) Live inference
 python infer_seq_webcam.py --model asl_seq32_gru_ABJZ.pt --threshold 0.6 --smooth 0.2
 # If you later add more letters (e.g., C, D), 
 # just create those folders, record clips, re-run the prep step, then train again — the pipeline will include whatever letters exist under sequences/train/.
--- a/infer_seq_webcam.py
+++ b/infer_seq_webcam.py
@@ -0,0 +1,227 @@
 #!/usr/bin/env python3
 """
 Live webcam inference for two hands + full face + pose + face-relative hand extras (1670 dims/frame).
 Works for letters (A..Z) or word classes (e.g., Mother, Father).
 Optionally detects the sequence W → E → B to open a URL.
 """
 import os, math, argparse, time, webbrowser
 import numpy as np
 import cv2
 import torch
 import mediapipe as mp
 # Quiet logs
 os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"; os.environ["GLOG_minloglevel"] = "2"
 import absl.logging; absl.logging.set_verbosity(absl.logging.ERROR)
 cv2.setLogLevel(0)
 mp_holistic = mp.solutions.holistic
 # ---------- normalization ----------
 def _angle(v): 
    return math.atan2(v[1], v[0])
 def _rot2d(t):
    c, s = math.cos(t), math.sin(t)
    return np.array([[c, -s], [s, c]], dtype=np.float32)
 def normalize_hand(pts, handed=None):
    pts = pts.astype(np.float32).copy()
    pts[:, :2] -= pts[0, :2]
    if handed and str(handed).lower().startswith("left"): pts[:, 0] *= -1.0
    v = pts[9, :2]; R = _rot2d(math.pi/2 - _angle(v))
    pts[:, :2] = pts[:, :2] @ R.T
    xy = pts[:, :2]; d = np.linalg.norm(xy[None,:,:] - xy[:,None,:], axis=-1).max()
    d = 1.0 if d < 1e-6 else float(d)
    pts[:, :2] /= d; pts[:, 2] /= d
    return pts
 def normalize_face(face):
    f = face.astype(np.float32).copy()
    left, right = f[33, :2], f[263, :2]
    center = 0.5*(left+right)
    f[:, :2] -= center[None, :]
    eye_vec = right - left; eye_dist = float(np.linalg.norm(eye_vec)) or 1.0
    f[:, :2] /= eye_dist; f[:, 2] /= eye_dist
    R = _rot2d(-_angle(eye_vec)); f[:, :2] = f[:, :2] @ R.T
    return f
 def normalize_pose(pose):
    p = pose.astype(np.float32).copy()
    ls, rs = p[11, :2], p[12, :2]
    center = 0.5*(ls+rs); p[:, :2] -= center[None, :]
    sw_vec = rs - ls; sw = float(np.linalg.norm(sw_vec)) or 1.0
    p[:, :2] /= sw; p[:, 2] /= sw
    R = _rot2d(-_angle(sw_vec)); p[:, :2] = p[:, :2] @ R.T
    return p
 def face_frame_transform(face_pts):
    left = face_pts[33, :2]; right = face_pts[263, :2]
    center = 0.5*(left + right)
    eye_vec = right - left
    eye_dist = float(np.linalg.norm(eye_vec)) or 1.0
    R = _rot2d(-_angle(eye_vec))
    return center, eye_dist, R
 def to_face_frame(pt_xy, center, eye_dist, R):
    v = (pt_xy - center) / eye_dist
    return (v @ R.T).astype(np.float32)
 # ---------- model ----------
 class SeqGRU(torch.nn.Module):
    def __init__(self, input_dim, hidden=128, num_classes=26):
        super().__init__()
        self.gru = torch.nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
        self.head = torch.nn.Sequential(
            torch.nn.Linear(hidden*2, 128), torch.nn.ReLU(), torch.nn.Dropout(0.2),
            torch.nn.Linear(128, num_classes),
        )
    def forward(self, x):
        h,_ = self.gru(x); return self.head(h[:, -1, :])
 # ---------- main ----------
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--model", required=True)
    ap.add_argument("--camera", type=int, default=0)
    ap.add_argument("--threshold", type=float, default=0.35)
    ap.add_argument("--smooth", type=float, default=0.1, help="EMA window (seconds); 0 disables")
    ap.add_argument("--width", type=int, default=640)
    ap.add_argument("--height", type=int, default=480)
    ap.add_argument("--holistic-complexity", type=int, default=1, choices=[0,1,2])
    ap.add_argument("--det-thresh", type=float, default=0.5)
    ap.add_argument("--url", type=str, default="https://www.google.com")
    args = ap.parse_args()
    state = torch.load(args.model, map_location="cpu", weights_only=False)
    classes = state["classes"]
    T = int(state.get("frames", 32))
    X_mean = state["X_mean"].cpu().numpy().astype(np.float32)
    X_std  = (state["X_std"].cpu().numpy().astype(np.float32) + 1e-6)
    input_dim = X_mean.shape[-1]  # expected 1670
    device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
    model = SeqGRU(input_dim=input_dim, hidden=128, num_classes=len(classes)).to(device)
    model.load_state_dict(state["model"]); model.eval()
    hol = mp_holistic.Holistic(
        static_image_mode=False,
        model_complexity=args.holistic_complexity,
        smooth_landmarks=True,
        enable_segmentation=False,
        refine_face_landmarks=False,
        min_detection_confidence=args.det_thresh,
        min_tracking_confidence=args.det_thresh,
    )
    cap = cv2.VideoCapture(args.camera)
    if not cap.isOpened(): raise SystemExit(f"❌ Could not open camera {args.camera}")
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, args.width); cap.set(cv2.CAP_PROP_FRAME_HEIGHT, args.height)
    print(f"✅ Loaded {args.model}  frames={T}  classes={classes}  input_dim={input_dim}")
    print("Press 'q' to quit.")
    seq_buffer, ema_probs = [], None
    last_ts = time.time()
    last_emitted = None
    history = []
    while True:
        ok, frame = cap.read()
        if not ok: break
        now = time.time(); dt = max(1e-6, now - last_ts); last_ts = now
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        res = hol.process(rgb)
        overlay = "No face/hand"
        current = None
        # hands
        right_pts = left_pts = None
        if res.right_hand_landmarks is not None:
            right_pts = np.array([[lm.x, lm.y, lm.z] for lm in res.right_hand_landmarks.landmark], np.float32)
        if res.left_hand_landmarks is not None:
            left_pts  = np.array([[lm.x, lm.y, lm.z] for lm in res.left_hand_landmarks.landmark],  np.float32)
        # face
        face_pts = None
        if res.face_landmarks is not None:
            face_pts = np.array([[lm.x, lm.y, lm.z] for lm in res.face_landmarks.landmark], np.float32)
        # pose
        pose_arr = None
        if res.pose_landmarks is not None:
            pose_arr = np.array([[lm.x, lm.y, lm.z, lm.visibility] for lm in res.pose_landmarks.landmark], np.float32)
        if face_pts is not None and (right_pts is not None or left_pts is not None):
            f_norm = normalize_face(face_pts)
            f_center, f_scale, f_R = face_frame_transform(face_pts)
            def hand_face_extras(hand_pts):
                if hand_pts is None: 
                    return np.zeros(4, np.float32)
                wrist_xy = hand_pts[0, :2]
                tip_xy   = hand_pts[8, :2]
                w = to_face_frame(wrist_xy, f_center, f_scale, f_R)
                t = to_face_frame(tip_xy,   f_center, f_scale, f_R)
                return np.array([w[0], w[1], t[0], t[1]], np.float32)
            rh_ex = hand_face_extras(right_pts)
            lh_ex = hand_face_extras(left_pts)
            rh = normalize_hand(right_pts, "Right").reshape(-1) if right_pts is not None else np.zeros(63, np.float32)
            lh = normalize_hand(left_pts,  "Left").reshape(-1)  if left_pts  is not None else np.zeros(63, np.float32)
            p_norm = normalize_pose(pose_arr).reshape(-1) if pose_arr is not None else np.zeros(33*4, np.float32)
            feat = np.concatenate([rh, lh, f_norm.reshape(-1), p_norm, rh_ex, lh_ex], axis=0)  # (1670,)
            seq_buffer.append(feat)
            if len(seq_buffer) > T: seq_buffer.pop(0)
            if len(seq_buffer) == T:
                X = np.stack(seq_buffer, 0)
                Xn = (X - X_mean) / X_std
                xt = torch.from_numpy(Xn).float().unsqueeze(0).to(device)
                with torch.no_grad():
                    probs = torch.softmax(model(xt), dim=1)[0].cpu().numpy()
                if args.smooth > 0:
                    alpha = 1.0 - math.exp(-dt / args.smooth)
                    ema_probs = probs if ema_probs is None else (1.0 - alpha) * ema_probs + alpha * probs
                    use = ema_probs
                else:
                    use = probs
                top_idx = int(np.argmax(use)); top_p = float(use[top_idx]); top_cls = classes[top_idx]
                overlay = f"{top_cls} {top_p*100:.1f}%"
                if top_p >= args.threshold: current = top_cls
        else:
            seq_buffer, ema_probs = [], None
        # Emit on change & optional "WEB" sequence trigger
        if current is not None and current != last_emitted:
            print(f"Detected: {current}")
            last_emitted = current
            history.append(current)
            if len(history) > 3: history.pop(0)
            if history == ["W","E","B"]:
                print("🚀 Detected WEB! Opening browser…")
                try: webbrowser.open(args.url)
                except Exception as e: print(f"⚠️ Browser open failed: {e}")
                history.clear()
        # Overlay
        buf = f"buf={len(seq_buffer)}/{T}"
        if ema_probs is not None:
            ti = int(np.argmax(ema_probs)); tp = float(ema_probs[ti]); tc = classes[ti]
            buf += f"  top={tc} {tp:.2f}"
        cv2.putText(frame, overlay, (20, 40), cv2.FONT_HERSHEY_SIMPLEX, 1.1, (0,255,0), 2)
        cv2.putText(frame, buf, (20, 75), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0,255,0), 2)
        cv2.imshow("ASL demo (R+L hands + face + pose + extras)", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'): break
    cap.release(); cv2.destroyAllWindows()
 if __name__ == "__main__":
    main()
--- a/make_seq_dirs.sh
+++ b/make_seq_dirs.sh
@@ -0,0 +1,19 @@
 #!/usr/bin/env bash
 # Create sequences/<train|val>/<LETTER>/ for the given letters.
 # Example: ./make_seq_dirs.sh A B J Z
 set -euo pipefail
 if [ "$#" -lt 1 ]; then
  echo "Usage: $0 LETTER [LETTER ...]   e.g.  $0 A B J Z"
  exit 1
 fi
 ROOT="sequences"
 for SPLIT in train val; do
  for L in "$@"; do
    mkdir -p "$ROOT/$SPLIT/$L"
  done
 done
 echo "✅ Created $ROOT/train and $ROOT/val for: $*"
--- a/prep_sequence_resampled.py
+++ b/prep_sequence_resampled.py
@@ -0,0 +1,77 @@
 #!/usr/bin/env python3
 # Build fixed-length (N frames) dataset from sequences/<split>/<CLASS>/clip_*.npz
 import argparse, os, glob, json
 from pathlib import Path
 import numpy as np
 def resample_sequence(X, N=32):
    # X: (T,F) -> (N,F) via linear interpolation over frame index
    T = len(X)
    if T == 0: return np.zeros((N, X.shape[1]), np.float32)
    if T == 1: return np.repeat(X, N, axis=0)
    src = np.linspace(0, T-1, num=T, dtype=np.float32)
    dst = np.linspace(0, T-1, num=N, dtype=np.float32)
    out = np.zeros((N, X.shape[1]), np.float32)
    for d in range(X.shape[1]):
        out[:, d] = np.interp(dst, src, X[:, d])
    return out
 def load_classes(seq_root: Path):
    # Accept ANY class subfolder under sequences/train/, ignore hidden/system dirs
    train_dir = seq_root / "train"
    if not train_dir.exists():
        raise SystemExit(f"Missing folder: {train_dir}")
    classes = sorted([
        p.name for p in train_dir.iterdir()
        if p.is_dir() and not p.name.startswith(".")
    ])
    if not classes:
        raise SystemExit("No classes found in sequences/train/ (folders should be class names like Mother, Father, etc.)")
    return classes
 def collect_split(seq_root: Path, split: str, classes, N):
    Xs, ys = [], []
    for ci, cls in enumerate(classes):
        for f in sorted(glob.glob(str(seq_root / split / cls / "clip_*.npz"))):
            d = np.load(f)
            Xi = d["X"].astype(np.float32)   # (T,F)
            XiN = resample_sequence(Xi, N)   # (N,F)
            Xs.append(XiN); ys.append(ci)
    if Xs:
        X = np.stack(Xs, 0); y = np.array(ys, np.int64)
    else:
        X = np.zeros((0, N, 1), np.float32); y = np.zeros((0,), np.int64)
    return X, y
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--in", dest="in_dir", default="sequences")
    ap.add_argument("--out", default="landmarks_seq32")
    ap.add_argument("--frames", type=int, default=32)
    args = ap.parse_args()
    seq_root = Path(args.in_dir)
    outdir = Path(args.out); outdir.mkdir(parents=True, exist_ok=True)
    classes = load_classes(seq_root)
    trX, trY = collect_split(seq_root, "train", classes, args.frames)
    vaX, vaY = collect_split(seq_root, "val",   classes, args.frames)
    if trX.size == 0 and vaX.size == 0:
        raise SystemExit("Found no clips. Did you run capture and save any clip_*.npz files?")
    np.save(outdir/"train_X.npy", trX)
    np.save(outdir/"train_y.npy", trY)
    np.save(outdir/"val_X.npy",   vaX)
    np.save(outdir/"val_y.npy",   vaY)
    json.dump(classes, open(outdir/"class_names.json", "w"))
    # Detect true feature dimension from data
    input_dim = int(trX.shape[-1] if trX.size else vaX.shape[-1])
    json.dump({"frames": args.frames, "input_dim": input_dim}, open(outdir/"meta.json","w"))
    print(f"Saved dataset → {outdir}")
    print(f"  train {trX.shape}, val {vaX.shape}, classes={classes}, input_dim={input_dim}")
 if __name__ == "__main__":
    main()
--- a/train_seq.py
+++ b/train_seq.py
@@ -0,0 +1,120 @@
 #!/usr/bin/env python3
 # Train BiGRU on (T, F=1662) sequences; reads input_dim from meta.json
 import os, json, argparse
 import numpy as np
 import torch, torch.nn as nn
 from torch.utils.data import Dataset, DataLoader
 def get_device():
    return torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
 class SeqDataset(Dataset):
    def __init__(self, X, y, augment=False):
        self.X = X.astype(np.float32)
        self.y = y.astype(np.int64)
        self.augment = augment
    def __len__(self): return len(self.y)
    def _augment(self, seq):
        # Light Gaussian noise — safe for high-D features
        return seq + np.random.normal(0, 0.01, size=seq.shape).astype(np.float32)
    def __getitem__(self, i):
        xi = self.X[i]
        if self.augment: xi = self._augment(xi)
        return torch.from_numpy(xi).float(), int(self.y[i])
 class SeqGRU(nn.Module):
    def __init__(self, input_dim, hidden=128, num_classes=26):
        super().__init__()
        self.gru = nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
        self.head = nn.Sequential(
            nn.Linear(hidden*2, 128), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(128, num_classes),
        )
    def forward(self, x):
        h,_ = self.gru(x)
        return self.head(h[:, -1, :])
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--landmarks", default="landmarks_seq32")
    ap.add_argument("--epochs", type=int, default=40)
    ap.add_argument("--batch", type=int, default=64)
    ap.add_argument("--lr", type=float, default=1e-3)
    ap.add_argument("--out", default="asl_seq32_gru.pt")
    args = ap.parse_args()
    trX = np.load(os.path.join(args.landmarks,"train_X.npy"))
    trY = np.load(os.path.join(args.landmarks,"train_y.npy"))
    vaX = np.load(os.path.join(args.landmarks,"val_X.npy"))
    vaY = np.load(os.path.join(args.landmarks,"val_y.npy"))
    classes = json.load(open(os.path.join(args.landmarks,"class_names.json")))
    meta = json.load(open(os.path.join(args.landmarks,"meta.json")))
    T = int(meta["frames"])
    input_dim = int(meta.get("input_dim", trX.shape[-1]))
    print(f"Loaded: train {trX.shape}  val {vaX.shape}  classes={classes}  input_dim={input_dim}")
    # Global normalization (feature-wise)
    X_mean = trX.reshape(-1, trX.shape[-1]).mean(axis=0, keepdims=True).astype(np.float32)
    X_std  = trX.reshape(-1, trX.shape[-1]).std(axis=0, keepdims=True).astype(np.float32) + 1e-6
    trXn   = (trX - X_mean) / X_std
    vaXn   = (vaX - X_mean) / X_std
    tr_ds = SeqDataset(trXn, trY, augment=True)
    va_ds = SeqDataset(vaXn, vaY, augment=False)
    tr_dl = DataLoader(tr_ds, batch_size=args.batch, shuffle=True)
    va_dl = DataLoader(va_ds, batch_size=args.batch, shuffle=False)
    device = get_device()
    model = SeqGRU(input_dim=input_dim, hidden=128, num_classes=len(classes)).to(device)
    crit = nn.CrossEntropyLoss()
    opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=1e-4)
    sch = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=args.epochs)
    best_acc, best_state = 0.0, None
    for epoch in range(1, args.epochs+1):
        model.train()
        tot, correct, loss_sum = 0, 0, 0.0
        for xb, yb in tr_dl:
            xb, yb = xb.to(device), yb.to(device)
            opt.zero_grad(set_to_none=True)
            logits = model(xb)
            loss = crit(logits, yb)
            loss.backward()
            opt.step()
            loss_sum += loss.item() * yb.size(0)
            correct += (logits.argmax(1)==yb).sum().item()
            tot += yb.size(0)
        tr_loss = loss_sum / max(1, tot)
        tr_acc = correct / max(1, tot)
        model.eval()
        vtot, vcorrect = 0, 0
        with torch.no_grad():
            for xb, yb in va_dl:
                xb, yb = xb.to(device), yb.to(device)
                logits = model(xb)
                vcorrect += (logits.argmax(1)==yb).sum().item()
                vtot += yb.size(0)
        va_acc = vcorrect / max(1, vtot)
        sch.step()
        print(f"Epoch {epoch:02d}: train_loss={tr_loss:.4f} train_acc={tr_acc:.3f} val_acc={va_acc:.3f}")
        if va_acc > best_acc:
            best_acc = va_acc
            best_state = {
                "model": model.state_dict(),
                "classes": classes,
                "frames": T,
                "X_mean": torch.from_numpy(X_mean),
                "X_std":  torch.from_numpy(X_std),
            }
            torch.save(best_state, args.out)
            print(f"  ✅ Saved best → {args.out} (val_acc={best_acc:.3f})")
    print("Done. Best val_acc:", best_acc)
 if __name__ == "__main__":
    main()
--- a/what_to_do.txt
+++ b/what_to_do.txt
@@ -0,0 +1,16 @@
 ./make_seq_dirs.sh Mother Father
 python capture_sequence.py --label Mother --split train --seconds 0.8 --count 100
 python capture_sequence.py --label Mother --split val   --seconds 0.8 --count 20
 python capture_sequence.py --label Father --split train --seconds 0.8 --count 100
 python capture_sequence.py --label Father --split val   --seconds 0.8 --count 20
 python capture_sequence.py --label Go --split train --seconds 0.8 --count 100
 python capture_sequence.py --label Go --split val   --seconds 0.8 --count 20
 python prep_sequence_resampled.py --in sequences --out landmarks_seq32 --frames 32
 python train_seq.py --landmarks landmarks_seq32 --out asl_seq32_gru_mother_father_go.pt
 python eval_val.py --landmarks landmarks_seq32 --model asl_seq32_gru_mother_father_go.pt
 python infer_seq_webcam.py --model asl_seq32_gru_mother_father_go.pt --threshold 0.35 --smooth 0.1