Initial commit: handshapes multiclass project

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 22:27:20 -05:00
commit 816e34cb17
22 changed files with 2820 additions and 0 deletions
--- a/first_attempt_landmark_hands/README.md
+++ b/first_attempt_landmark_hands/README.md
@@ -0,0 +1,216 @@
+# Handshape Sequence Classifier (MediaPipe + PyTorch, macOS MPS-ready)
+
+Live ASL handshape letter demo powered by MediaPipe Hands landmarks and a bidirectional GRU sequence model.
+Record short clips per letter, resample to a fixed length, train, evaluate, and run a real-time webcam demo that can react to detected letter sequences (e.g., **W → E → B** opens a URL).
+
+## Features
+
+* **Data capture UI:** 3-second centered countdown + top progress bar; fingertip dot feedback.
+* **Robust normalization:** wrist-anchored, left/right mirroring, rotation to +Y, scale by max pairwise distance.
+* **Fixed-length preprocessing:** linear resampling to *N* frames (default **32**).
+* **Sequence model:** BiGRU (128 hidden × 2) → MLP head; light augmentation during training.
+* **Live inference:** EMA smoothing + thresholding; emits letters only on change; detects special sequences (**WEB**) and opens a browser.
+
+---
+
+## Quick Start
+
+```bash
+# 0) (optional) Create & activate a virtual env
+python -m venv .venv && source .venv/bin/activate
+
+# 1) Install deps
+pip install numpy opencv-python mediapipe torch scikit-learn
+
+# 2) Make directories for the letters you’ll collect
+./make_seq_dirs.sh A B J Z
+
+# 3) Capture short clips per letter (train/val)
+python capture_sequence.py --label A --split train
+python capture_sequence.py --label A --split val
+# ...repeat for B, J, Z
+
+# 4) Preprocess → fixed-length dataset (32 frames)
+python prep_sequence_resampled.py --in sequences --out landmarks_seq32 --frames 32
+
+# 5) Train the BiGRU
+python train_seq.py --landmarks landmarks_seq32 --epochs 40 --batch 64 --lr 1e-3 \
+  --out asl_seq32_gru_ABJZ.pt
+
+# 6) Evaluate on the validation set (confusion matrix + report)
+python eval_val.py --landmarks landmarks_seq32 --model asl_seq32_gru_ABJZ.pt
+
+# 7) Live webcam demo (press 'q' to quit)
+python infer_seq_webcam.py --model asl_seq32_gru_ABJZ.pt --threshold 0.8 --smooth 0.7
+```
+
+> **WEB trigger:** In the live demo, if the emitted letters form **W → E → B**, the app prints a message and opens `--url` (default: Google).
+> Example: `--url https://www.gallaudet.edu`
+
+---
+
+## Repository Layout
+
+```
+handshapes-multiclass/
+├─ make_seq_dirs.sh                 # creates sequences/train|val/<LETTER>/
+├─ capture_sequence.py              # webcam capture → clip_XXX.npz (X: (T,63), tip: (T,2))
+├─ prep_sequence_resampled.py       # resample clips to fixed N frames → landmarks_seq32/
+├─ train_seq.py                     # train BiGRU; saves best checkpoint (.pt + stats)
+├─ eval_val.py                      # evaluate on val set; prints metrics
+├─ infer_seq_webcam.py              # live demo; emits letters; detects "WEB" → opens URL
+├─ what_to_do.txt                   # quick, step-by-step playbook
+└─ sequences/                       # created by you (after running make_seq_dirs.sh)
+   ├─ train/<LETTER>/clip_XXX.npz
+   └─ val/<LETTER>/clip_XXX.npz
+```
+
+**Clip file format (`clip_XXX.npz`)**
+
+* `X`: `(T, 63)` — per-frame normalized landmarks (21 points × (x, y, z))
+* `tip`: `(T, 2)` — normalized index fingertip positions (for sanity checks)
+
+**Prepared dataset (`landmarks_seq32/`)**
+
+* `train_X.npy`, `train_y.npy`, `val_X.npy`, `val_y.npy`
+* `class_names.json` (e.g., `["A","B","J","Z"]`)
+* `meta.json` (e.g., `{"frames":32,"input_dim":63}`)
+
+**Checkpoint (`*.pt`)**
+
+* `model` (state_dict), `classes`, `frames`, `X_mean`, `X_std`
+
+---
+
+## Normalization (consistent across capture & inference)
+
+1. Translate so **wrist** (landmark 0) is at the origin.
+2. If detected **left** hand, mirror `x *= -1`.
+3. Rotate so the **middle-finger MCP** (landmark 9) points along **+Y**.
+4. Scale all coords by the **max pairwise distance** among 2D landmarks.
+5. Flatten to **63 features** per frame.
+
+This ensures letter-style, not camera pose, drives classification.
+
+---
+
+## Training Details
+
+* **Model:** BiGRU (input=63, hidden=128, bidirectional) → `[Linear(256→128), ReLU, Dropout(0.2), Linear(128→num_classes)]`
+* **Optimizer:** AdamW (`lr=1e-3`, `weight_decay=1e-4`)
+* **Scheduler:** CosineAnnealingLR (`T_max = epochs`)
+* **Augmentation:** small 2D rotate (±7°), scale (±10%), Gaussian noise (σ=0.01)
+* **Normalization:** global `X_mean`/`X_std` computed over **train** (time+batch), applied to both train & val and saved into the checkpoint.
+
+---
+
+## Live Inference Behavior
+
+* Maintains a rolling buffer of **T = frames** (from the checkpoint).
+* Applies the saved `X_mean`/`X_std`.
+* **EMA smoothing** over softmax probs with time constant `--smooth` (seconds).
+* Emits a letter only if:
+
+  * top prob ≥ `--threshold` (e.g., 0.8), **and**
+  * the letter **changed** from the previous emission (prevents repeats).
+* Tracks a short history of emitted letters to detect **W → E → B**; on match:
+
+  * prints “Detected WEB! …”
+  * calls `webbrowser.open(--url)`
+
+**Common flags**
+
+```bash
+# Camera & size
+--camera 0 --width 640 --height 480
+
+# Confidence vs. latency tradeoffs
+--threshold 0.85           # higher → fewer false positives
+--smooth 1.0               # higher → steadier output but more lag
+
+# Action on sequence
+--url https://example.com
+```
+
+---
+
+## Tips for High Accuracy
+
+* Record **balanced** train/val counts per class (e.g., 100 train / 20 val).
+* Keep the hand **centered**, well lit, and mostly **single-hand** (model expects 1 hand).
+* Maintain consistent **distance** and **orientation** during capture.
+* If you add new letters later, just record them, re-run preprocessing, and retrain — classes are **auto-discovered** from `sequences/train/*`.
+
+---
+
+## macOS (M-series) Notes
+
+* PyTorch will automatically use **Metal (MPS)** if available (`torch.backends.mps.is_available()`); otherwise CPU.
+* If the webcam feed looks low FPS, try reducing `--width/--height` or raising `--threshold` / `--smooth`.
+
+---
+
+## Troubleshooting
+
+* **“Could not open camera”** → try `--camera 1` (or check macOS camera permission).
+* **No detections / “No hand” on screen** → improve lighting, ensure a single clear hand, check MediaPipe install.
+* **Model emits wrong letters** → increase `--threshold`, collect more data, or raise `--smooth`.
+* **Mismatch T during inference** → ensure `--frames` at preprocessing matches the checkpoint’s `frames` (saved & auto-used).
+
+---
+
+## Commands Reference
+
+### Create class folders
+
+```bash
+./make_seq_dirs.sh A B J Z
+```
+
+### Capture clips
+
+```bash
+python capture_sequence.py --label A --split train --seconds 0.8 --count 100
+python capture_sequence.py --label A --split val   --seconds 0.8 --count 20
+```
+
+### Prepare dataset (resample to 32 frames)
+
+```bash
+python prep_sequence_resampled.py --in sequences --out landmarks_seq32 --frames 32
+```
+
+### Train
+
+```bash
+python train_seq.py --landmarks landmarks_seq32 --epochs 40 --batch 64 --lr 1e-3 \
+  --out asl_seq32_gru_ABJZ.pt
+```
+
+### Evaluate
+
+```bash
+python eval_val.py --landmarks landmarks_seq32 --model asl_seq32_gru_ABJZ.pt
+```
+
+### Live demo (open URL on “WEB”)
+
+```bash
+python infer_seq_webcam.py --model asl_seq32_gru_ABJZ.pt --threshold 0.8 --smooth 0.7 \
+  --url https://www.gallaudet.edu
+```
+
+---
+
+## License
+
+MIT
+
+---
+
+## Acknowledgments
+
+* **MediaPipe Hands** for robust, fast hand landmark detection.
+* **PyTorch** for flexible sequence modeling on CPU/MPS.
+
+---
--- a/first_attempt_landmark_hands/capture_sequence.py
+++ b/first_attempt_landmark_hands/capture_sequence.py
@@ -0,0 +1,176 @@
+#!/usr/bin/env python3
+# capture_sequence.py
+# Automatically record N short sequences for each label (default: 100 train / 20 val)
+# Centered 3-second countdown before recording.
+# Per-clip depleting progress bar (full → empty) across the top during capture.
+
+import argparse, os, time, math, re
+from pathlib import Path
+import numpy as np, cv2, mediapipe as mp
+
+def normalize_frame(pts, handed=None):
+    pts = pts.astype(np.float32).copy()
+    pts[:, :2] -= pts[0, :2]
+    if handed and handed.lower().startswith("left"):
+        pts[:, 0] *= -1.0
+    v = pts[9, :2]
+    ang = math.atan2(v[1], v[0])
+    c, s = math.cos(math.pi/2 - ang), math.sin(math.pi/2 - ang)
+    R = np.array([[c, -s], [s, c]], np.float32)
+    pts[:, :2] = pts[:, :2] @ R.T
+    xy = pts[:, :2]
+    d = np.max(np.linalg.norm(xy[None,:,:] - xy[:,None,:], axis=-1))
+    if d < 1e-6: d = 1.0
+    pts[:, :2] /= d; pts[:, 2] /= d
+    return pts
+
+def next_idx(folder: Path, prefix="clip_"):
+    pat = re.compile(rf"^{re.escape(prefix)}(\d+)\.npz$")
+    mx = 0
+    if folder.exists():
+        for n in os.listdir(folder):
+            m = pat.match(n)
+            if m: mx = max(mx, int(m.group(1)))
+    return mx + 1
+
+def countdown(cap, seconds=3):
+    """Display a centered countdown before starting capture."""
+    for i in range(seconds, 0, -1):
+        start = time.time()
+        while time.time() - start < 1.0:
+            ok, frame = cap.read()
+            if not ok:
+                continue
+            h, w = frame.shape[:2]
+
+            # Main big number in center
+            text = str(i)
+            font_scale = 5
+            thickness = 10
+            (tw, th), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, thickness)
+            cv2.putText(frame, text,
+                        ((w - tw)//2, (h + th)//2),
+                        cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0,0,255), thickness, cv2.LINE_AA)
+
+            # Smaller message above
+            msg = "Starting in..."
+            font_scale_msg = 1.2
+            thickness_msg = 3
+            (mw, mh), _ = cv2.getTextSize(msg, cv2.FONT_HERSHEY_SIMPLEX, font_scale_msg, thickness_msg)
+            cv2.putText(frame, msg,
+                        ((w - mw)//2, (h//2) - th - 20),
+                        cv2.FONT_HERSHEY_SIMPLEX, font_scale_msg, (0,255,255), thickness_msg, cv2.LINE_AA)
+
+            cv2.imshow("sequence capture", frame)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                cap.release(); cv2.destroyAllWindows(); raise SystemExit("Aborted during countdown")
+
+def draw_progress_bar(img, frac_remaining, bar_h=16, margin=12):
+    """
+    Draw a top progress bar that starts full and depletes to empty.
+    frac_remaining: 1.0 at start → 0.0 at end.
+    """
+    h, w = img.shape[:2]
+    x0, x1 = margin, w - margin
+    y0, y1 = margin, margin + bar_h
+
+    # Background bar
+    cv2.rectangle(img, (x0, y0), (x1, y1), (40, 40, 40), -1)  # dark gray
+    cv2.rectangle(img, (x0, y0), (x1, y1), (90, 90, 90), 2)   # border
+
+    # Foreground (remaining)
+    rem_w = int((x1 - x0) * max(0.0, min(1.0, frac_remaining)))
+    if rem_w > 0:
+        cv2.rectangle(img, (x0, y0), (x0 + rem_w, y1), (0, 200, 0), -1)  # green
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--label", required=True, help="Letter label (A..Z)")
+    ap.add_argument("--split", required=True, choices=["train","val"])
+    ap.add_argument("--seconds", type=float, default=0.8, help="Clip length (s)")
+    ap.add_argument("--camera", type=int, default=0)
+    ap.add_argument("--width", type=int, default=640)
+    ap.add_argument("--height", type=int, default=480)
+    ap.add_argument("--count", type=int, default=None,
+                    help="How many clips (default=100 train, 20 val)")
+    args = ap.parse_args()
+
+    if args.count is None:
+        args.count = 100 if args.split == "train" else 20
+
+    L = args.label.upper().strip()
+    if not (len(L) == 1 and "A" <= L <= "Z"):
+        raise SystemExit("Use --label A..Z")
+
+    out_dir = Path("sequences") / args.split / L
+    out_dir.mkdir(parents=True, exist_ok=True)
+    idx = next_idx(out_dir)
+
+    hands = mp.solutions.hands.Hands(
+        static_image_mode=False, max_num_hands=1, min_detection_confidence=0.5
+    )
+    cap = cv2.VideoCapture(args.camera)
+    if not cap.isOpened():
+        raise SystemExit(f"Could not open camera {args.camera}")
+    cap.set(cv2.CAP_PROP_FRAME_WIDTH, args.width)
+    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, args.height)
+
+    print(f"Recording {args.count} clips for {L}/{args.split}, {args.seconds}s each.")
+    countdown(cap, 3)
+
+    for n in range(args.count):
+        seq_X, seq_tip = [], []
+        start_t = time.time()
+        end_t = start_t + args.seconds
+
+        while True:
+            now = time.time()
+            if now >= end_t:
+                break
+
+            ok, fr = cap.read()
+            if not ok:
+                break
+
+            rgb = cv2.cvtColor(fr, cv2.COLOR_BGR2RGB)
+            res = hands.process(rgb)
+            if res.multi_hand_landmarks:
+                ih = res.multi_hand_landmarks[0]
+                handed = None
+                if res.multi_handedness:
+                    handed = res.multi_handedness[0].classification[0].label
+                pts = np.array([[lm.x, lm.y, lm.z] for lm in ih.landmark], np.float32)
+                pts = normalize_frame(pts, handed)
+                seq_X.append(pts.reshape(-1))
+                seq_tip.append(pts[8, :2])
+
+                # draw fingertip marker (for feedback)
+                cv2.circle(fr,
+                           (int(fr.shape[1] * pts[8, 0]), int(fr.shape[0] * pts[8, 1])),
+                           6, (0, 255, 0), -1)
+
+            # overlay progress + status
+            frac_remaining = (end_t - now) / max(1e-6, args.seconds)  # 1 → 0
+            draw_progress_bar(fr, frac_remaining, bar_h=16, margin=12)
+            cv2.putText(fr, f"{L} {args.split}  Clip {n+1}/{args.count}",
+                        (20, 40), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,0), 2, cv2.LINE_AA)
+
+            cv2.imshow("sequence capture", fr)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                cap.release(); cv2.destroyAllWindows(); return
+
+        if seq_X:
+            X = np.stack(seq_X, 0)
+            tip = np.stack(seq_tip, 0)
+            path = out_dir / f"clip_{idx:03d}.npz"
+            np.savez_compressed(path, X=X, tip=tip)
+            print(f"💾 saved {path} frames={X.shape[0]}")
+            idx += 1
+        else:
+            print("⚠️ No hand detected; skipped clip.")
+
+    print("✅ Done recording.")
+    cap.release(); cv2.destroyAllWindows()
+
+if __name__ == "__main__":
+    main()
--- a/first_attempt_landmark_hands/eval_val.py
+++ b/first_attempt_landmark_hands/eval_val.py
@@ -0,0 +1,60 @@
+#!/usr/bin/env python3
+# eval_seq_val.py
+import os, json, argparse
+import numpy as np
+import torch, torch.nn as nn
+from sklearn.metrics import classification_report, confusion_matrix
+
+class SeqGRU(nn.Module):
+    def __init__(self, input_dim=63, hidden=128, num_classes=26):
+        super().__init__()
+        self.gru = nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
+        self.head = nn.Sequential(
+            nn.Linear(hidden*2, 128),
+            nn.ReLU(),
+            nn.Dropout(0.2),
+            nn.Linear(128, num_classes),
+        )
+    def forward(self, x):
+        h,_ = self.gru(x)
+        h_last = h[:, -1, :]
+        return self.head(h_last)
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--landmarks", default="landmarks_seq32")
+    ap.add_argument("--model", default="asl_seq32_gru_ABJZ.pt")
+    args = ap.parse_args()
+
+    vaX = np.load(os.path.join(args.landmarks,"val_X.npy"))   # (N, T, 63)
+    vaY = np.load(os.path.join(args.landmarks,"val_y.npy"))
+    classes = json.load(open(os.path.join(args.landmarks,"class_names.json")))
+    meta = json.load(open(os.path.join(args.landmarks,"meta.json")))
+    T = int(meta.get("frames", 32))
+
+    state = torch.load(args.model, map_location="cpu", weights_only=False)
+    X_mean, X_std = state["X_mean"], state["X_std"]
+    if isinstance(X_mean, torch.Tensor): X_mean = X_mean.numpy()
+    if isinstance(X_std,  torch.Tensor): X_std  = X_std.numpy()
+    X_mean = X_mean.astype(np.float32)
+    X_std  = (X_std.astype(np.float32) + 1e-6)
+
+    vaXn = (vaX - X_mean) / X_std
+
+    device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
+    model = SeqGRU(63, 128, num_classes=len(classes))
+    model.load_state_dict(state["model"])
+    model.eval().to(device)
+
+    with torch.no_grad():
+        xb = torch.from_numpy(vaXn).float().to(device)
+        logits = model(xb)
+        pred = logits.argmax(1).cpu().numpy()
+
+    cm = confusion_matrix(vaY, pred)
+    print("Classes:", classes)
+    print("\nConfusion matrix (rows=true, cols=pred):\n", cm)
+    print("\nReport:\n", classification_report(vaY, pred, target_names=classes))
+
+if __name__ == "__main__":
+    main()
--- a/first_attempt_landmark_hands/infer_seq_webcam.py
+++ b/first_attempt_landmark_hands/infer_seq_webcam.py
@@ -0,0 +1,198 @@
+#!/usr/bin/env python3
+"""
+infer_seq_webcam.py
+Live webcam demo: detect a hand with MediaPipe, normalize landmarks,
+classify with a trained sequence GRU model (multiclass).
+
+Examples:
+  python infer_seq_webcam.py --model asl_seq32_gru_ABJZ.pt --threshold 0.8 --smooth 0.7
+  python infer_seq_webcam.py --model asl_seq32_gru_ABJZ.pt --threshold 0.85 --smooth 1.0 --url https://www.google.com
+"""
+
+import os, math, argparse, time, webbrowser
+import numpy as np
+import cv2
+import torch
+import mediapipe as mp
+
+# --- Quiet logs ---
+os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
+os.environ["GLOG_minloglevel"] = "2"
+import absl.logging
+absl.logging.set_verbosity(absl.logging.ERROR)
+cv2.setLogLevel(0)
+
+# ---------- geometry helpers ----------
+def _angle(v): return math.atan2(v[1], v[0])
+def _rot2d(t):
+    c, s = math.cos(t), math.sin(t)
+    return np.array([[c, -s], [s, c]], dtype=np.float32)
+
+def normalize_landmarks(pts, handedness_label=None):
+    """
+    pts: (21,3) MediaPipe normalized coords in [0..1]
+    Steps: translate wrist->origin, mirror left to right, rotate to +Y, scale by max pairwise distance.
+    Returns: (63,) float32
+    """
+    pts = pts.astype(np.float32).copy()
+    pts[:, :2] -= pts[0, :2]
+    if handedness_label and handedness_label.lower().startswith("left"):
+        pts[:, 0] *= -1.0
+    v = pts[9, :2]  # middle MCP
+    R = _rot2d(math.pi/2 - _angle(v))
+    pts[:, :2] = pts[:, :2] @ R.T
+    xy = pts[:, :2]
+    d = np.linalg.norm(xy[None,:,:] - xy[:,None,:], axis=-1).max()
+    d = 1.0 if d < 1e-6 else float(d)
+    pts[:, :2] /= d; pts[:, 2] /= d
+    return pts.reshape(-1)
+
+# ---------- sequence model ----------
+class SeqGRU(torch.nn.Module):
+    def __init__(self, input_dim=63, hidden=128, num_classes=26):
+        super().__init__()
+        self.gru = torch.nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
+        self.head = torch.nn.Sequential(
+            torch.nn.Linear(hidden*2, 128),
+            torch.nn.ReLU(),
+            torch.nn.Dropout(0.2),
+            torch.nn.Linear(128, num_classes),
+        )
+    def forward(self, x):
+        h, _ = self.gru(x)          # (B,T,2H)
+        h_last = h[:, -1, :]        # or h.mean(1)
+        return self.head(h_last)
+
+# ---------- main ----------
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--model", required=True, help="Path to trained .pt model")
+    ap.add_argument("--camera", type=int, default=0)
+    ap.add_argument("--threshold", type=float, default=0.8)
+    ap.add_argument("--smooth", type=float, default=0.7,
+                    help="EMA smoothing window in seconds (0 disables smoothing)")
+    ap.add_argument("--width", type=int, default=640)
+    ap.add_argument("--height", type=int, default=480)
+    ap.add_argument("--url", type=str, default="https://www.google.com",
+                    help="URL to open when the sequence W→E→B is detected")
+    args = ap.parse_args()
+
+    if not os.path.exists(args.model):
+        raise SystemExit(f"❌ Model file not found: {args.model}")
+
+    # Load checkpoint (support numpy or tensor stats; support 'frames' if present)
+    state = torch.load(args.model, map_location="cpu", weights_only=False)
+    classes = state["classes"]
+    T = int(state.get("frames", 32))
+
+    X_mean, X_std = state["X_mean"], state["X_std"]
+    if isinstance(X_mean, torch.Tensor): X_mean = X_mean.cpu().numpy()
+    if isinstance(X_std,  torch.Tensor): X_std  = X_std.cpu().numpy()
+    X_mean = X_mean.astype(np.float32)
+    X_std  = (X_std.astype(np.float32) + 1e-6)
+
+    device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
+    model = SeqGRU(63, 128, num_classes=len(classes)).to(device)
+    model.load_state_dict(state["model"])
+    model.eval()
+
+    hands = mp.solutions.hands.Hands(
+        static_image_mode=False, max_num_hands=1, min_detection_confidence=0.5
+    )
+
+    cap = cv2.VideoCapture(args.camera)
+    if not cap.isOpened():
+        raise SystemExit(f"❌ Could not open camera index {args.camera}")
+    cap.set(cv2.CAP_PROP_FRAME_WIDTH, args.width)
+    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, args.height)
+
+    print(f"✅ Loaded {args.model}  frames={T}  classes={classes}")
+    print("Press 'q' to quit.")
+
+    seq_buffer, ema_probs = [], None
+    last_ts = time.time()
+    last_emitted_letter = None
+
+    # Rolling history of emitted letters to detect the sequence "WEB"
+    detected_history = []  # only stores emitted letters (deduped by change)
+
+    while True:
+        ok, frame = cap.read()
+        if not ok: break
+        now = time.time()
+        dt = max(1e-6, now - last_ts)
+        last_ts = now
+
+        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+        res = hands.process(rgb)
+
+        overlay_text = "No hand"
+        current_letter = None
+
+        if res.multi_hand_landmarks:
+            ih = res.multi_hand_landmarks[0]
+            handed = None
+            if res.multi_handedness:
+                handed = res.multi_handedness[0].classification[0].label
+            pts = np.array([[lm.x, lm.y, lm.z] for lm in ih.landmark], dtype=np.float32)
+            feat = normalize_landmarks(pts, handedness_label=handed)
+            seq_buffer.append(feat)
+            if len(seq_buffer) > T: seq_buffer.pop(0)
+
+            if len(seq_buffer) == T:
+                X = np.stack(seq_buffer, 0)
+                Xn = (X - X_mean) / X_std
+                xt = torch.from_numpy(Xn).float().unsqueeze(0).to(device)
+                with torch.no_grad():
+                    logits = model(xt)
+                    probs = torch.softmax(logits, dim=1)[0].cpu().numpy()
+
+                if args.smooth > 0:
+                    alpha = 1.0 - math.exp(-dt / args.smooth)
+                    if ema_probs is None: ema_probs = probs
+                    else: ema_probs = (1.0 - alpha) * ema_probs + alpha * probs
+                    use_probs = ema_probs
+                else:
+                    use_probs = probs
+
+                top_idx = int(np.argmax(use_probs))
+                top_p = float(use_probs[top_idx])
+                top_cls = classes[top_idx]
+
+                if top_p >= args.threshold:
+                    overlay_text = f"{top_cls} {top_p*100:.1f}%"
+                    current_letter = top_cls
+        else:
+            seq_buffer, ema_probs = [], None
+
+        # Only emit when a *letter* changes (ignore no-hand and repeats)
+        if current_letter is not None and current_letter != last_emitted_letter:
+            print(f"Detected: {current_letter}")
+            last_emitted_letter = current_letter
+
+            # Update rolling history
+            detected_history.append(current_letter)
+            if len(detected_history) > 3:
+                detected_history.pop(0)
+
+            # Check for special sequence "WEB"
+            if detected_history == ["W", "E", "B"]:
+                print("🚀 Detected WEB! Time to open the web browser app.")
+                try:
+                    webbrowser.open(args.url)
+                except Exception as e:
+                    print(f"⚠️ Failed to open browser: {e}")
+                detected_history.clear()  # fire once per occurrence
+
+        # On-screen overlay (still shows "No hand" when nothing is detected)
+        cv2.putText(frame, overlay_text, (20, 40),
+                    cv2.FONT_HERSHEY_SIMPLEX, 1.1, (0,255,0), 2)
+        cv2.imshow("ASL sequence demo", frame)
+        if cv2.waitKey(1) & 0xFF == ord('q'):
+            break
+
+    cap.release()
+    cv2.destroyAllWindows()
+
+if __name__ == "__main__":
+    main()
--- a/first_attempt_landmark_hands/make_seq_dirs.sh
+++ b/first_attempt_landmark_hands/make_seq_dirs.sh
@@ -0,0 +1,19 @@
+#!/usr/bin/env bash
+# Create sequences/<train|val>/<LETTER>/ for the given letters.
+# Example: ./make_seq_dirs.sh A B J Z
+
+set -euo pipefail
+
+if [ "$#" -lt 1 ]; then
+  echo "Usage: $0 LETTER [LETTER ...]   e.g.  $0 A B J Z"
+  exit 1
+fi
+
+ROOT="sequences"
+for SPLIT in train val; do
+  for L in "$@"; do
+    mkdir -p "$ROOT/$SPLIT/$L"
+  done
+done
+
+echo "✅ Created $ROOT/train and $ROOT/val for: $*"
--- a/first_attempt_landmark_hands/prep_sequence_resampled.py
+++ b/first_attempt_landmark_hands/prep_sequence_resampled.py
@@ -0,0 +1,71 @@
+#!/usr/bin/env python3
+# prep_sequence_resampled.py
+# Build a fixed-length (N frames) multiclass dataset from sequences/<split>/<CLASS>/clip_*.npz
+import argparse, os, glob, json
+from pathlib import Path
+import numpy as np
+
+def resample_sequence(X, N=32):
+    # X: (T,63)  -> (N,63) by linear interpolation along frame index
+    T = len(X)
+    if T == 0:
+        return np.zeros((N, X.shape[1]), np.float32)
+    if T == 1:
+        return np.repeat(X, N, axis=0)
+    src = np.linspace(0, T-1, num=T)
+    dst = np.linspace(0, T-1, num=N)
+    out = np.zeros((N, X.shape[1]), np.float32)
+    for d in range(X.shape[1]):
+        out[:, d] = np.interp(dst, src, X[:, d])
+    return out.astype(np.float32)
+
+def load_classes(seq_root: Path):
+    # classes are subdirs in sequences/train/
+    classes = sorted([p.name for p in (seq_root/"train").iterdir() if p.is_dir()])
+    classes = [c for c in classes if len(c)==1 and "A"<=c<="Z"]
+    if not classes:
+        raise SystemExit("No letter classes found in sequences/train/")
+    return classes
+
+def collect_split(seq_root: Path, split: str, classes, N):
+    Xs, ys = [], []
+    for ci, cls in enumerate(classes):
+        for f in sorted(glob.glob(str(seq_root/split/cls/"clip_*.npz"))):
+            d = np.load(f)
+            Xi = d["X"].astype(np.float32)   # (T,63)
+            XiN = resample_sequence(Xi, N)   # (N,63)
+            Xs.append(XiN); ys.append(ci)
+    if Xs:
+        X = np.stack(Xs, 0)
+        y = np.array(ys, np.int64)
+    else:
+        X = np.zeros((0, N, 63), np.float32); y = np.zeros((0,), np.int64)
+    return X, y
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--in", dest="in_dir", default="sequences", help="Root sequences/ with train/ and val/")
+    ap.add_argument("--out", default="landmarks_seq32", help="Output folder with npy files")
+    ap.add_argument("--frames", type=int, default=32, help="Frames per clip after resampling (default: 32)")
+    args = ap.parse_args()
+
+    seq_root = Path(args.in_dir)
+    outdir = Path(args.out)
+    outdir.mkdir(parents=True, exist_ok=True)
+
+    classes = load_classes(seq_root)
+    trX, trY = collect_split(seq_root, "train", classes, args.frames)
+    vaX, vaY = collect_split(seq_root, "val", classes, args.frames)
+
+    np.save(outdir/"train_X.npy", trX)
+    np.save(outdir/"train_y.npy", trY)
+    np.save(outdir/"val_X.npy", vaX)
+    np.save(outdir/"val_y.npy", vaY)
+    json.dump(classes, open(outdir/"class_names.json", "w"))
+    json.dump({"frames": args.frames, "input_dim": 63}, open(outdir/"meta.json","w"))
+
+    print(f"Saved dataset → {outdir}")
+    print(f"  train {trX.shape}, val {vaX.shape}, classes={classes}")
+
+if __name__ == "__main__":
+    main()
--- a/first_attempt_landmark_hands/train_seq.py
+++ b/first_attempt_landmark_hands/train_seq.py
@@ -0,0 +1,136 @@
+#!/usr/bin/env python3
+# train_seq.py
+import os, json, argparse
+import numpy as np
+import torch, torch.nn as nn
+from torch.utils.data import Dataset, DataLoader
+
+def get_device():
+    return torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
+
+class SeqDataset(Dataset):
+    def __init__(self, X, y, augment=False):
+        self.X = X.astype(np.float32)   # (Nclip, T, 63)
+        self.y = y.astype(np.int64)
+        self.augment = augment
+
+    def __len__(self): return len(self.y)
+
+    def _augment(self, seq):  # seq: (T,63)
+        T = seq.shape[0]
+        pts = seq.reshape(T, 21, 3).copy()
+        # small 2D rotation (±7°) + scale (±10%) + Gaussian noise (σ=0.01)
+        ang = np.deg2rad(np.random.uniform(-7, 7))
+        c, s = np.cos(ang), np.sin(ang)
+        R = np.array([[c,-s],[s,c]], np.float32)
+        scale = np.random.uniform(0.9, 1.1)
+        pts[:, :, :2] = (pts[:, :, :2] @ R.T) * scale
+        pts += np.random.normal(0, 0.01, size=pts.shape).astype(np.float32)
+        return pts.reshape(T, 63)
+
+    def __getitem__(self, i):
+        xi = self.X[i]
+        if self.augment:
+            xi = self._augment(xi)
+        return torch.from_numpy(xi).float(), int(self.y[i])
+
+class SeqGRU(nn.Module):
+    def __init__(self, input_dim=63, hidden=128, num_classes=26):
+        super().__init__()
+        self.gru = nn.GRU(input_dim, hidden, batch_first=True, bidirectional=True)
+        self.head = nn.Sequential(
+            nn.Linear(hidden*2, 128),
+            nn.ReLU(),
+            nn.Dropout(0.2),
+            nn.Linear(128, num_classes),
+        )
+    def forward(self, x):           # x: (B,T,63)
+        h,_ = self.gru(x)           # (B,T,2H)
+        h_last = h[:, -1, :]        # or mean over time: h.mean(1)
+        return self.head(h_last)
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--landmarks", default="landmarks_seq32", help="Folder from prep_sequence_resampled.py")
+    ap.add_argument("--epochs", type=int, default=40)
+    ap.add_argument("--batch", type=int, default=64)
+    ap.add_argument("--lr", type=float, default=1e-3)
+    ap.add_argument("--out", default="asl_seq32_gru.pt")
+    args = ap.parse_args()
+
+    # Load dataset
+    trX = np.load(os.path.join(args.landmarks,"train_X.npy"))  # (N, T, 63)
+    trY = np.load(os.path.join(args.landmarks,"train_y.npy"))
+    vaX = np.load(os.path.join(args.landmarks,"val_X.npy"))
+    vaY = np.load(os.path.join(args.landmarks,"val_y.npy"))
+    classes = json.load(open(os.path.join(args.landmarks,"class_names.json")))
+    meta = json.load(open(os.path.join(args.landmarks,"meta.json")))
+    T = int(meta["frames"])
+
+    print(f"Loaded: train {trX.shape}  val {vaX.shape}  classes={classes}")
+
+    # Global mean/std over train (time+batch)
+    X_mean = trX.reshape(-1, trX.shape[-1]).mean(axis=0, keepdims=True).astype(np.float32)  # (1,63)
+    X_std  = trX.reshape(-1, trX.shape[-1]).std(axis=0, keepdims=True).astype(np.float32) + 1e-6
+    trXn   = (trX - X_mean) / X_std
+    vaXn   = (vaX - X_mean) / X_std
+
+    tr_ds = SeqDataset(trXn, trY, augment=True)
+    va_ds = SeqDataset(vaXn, vaY, augment=False)
+    tr_dl = DataLoader(tr_ds, batch_size=args.batch, shuffle=True)
+    va_dl = DataLoader(va_ds, batch_size=args.batch, shuffle=False)
+
+    device = get_device()
+    model = SeqGRU(input_dim=63, hidden=128, num_classes=len(classes)).to(device)
+    crit = nn.CrossEntropyLoss()
+    opt = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=1e-4)
+    sch = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=args.epochs)
+
+    best_acc, best_state = 0.0, None
+    for epoch in range(1, args.epochs+1):
+        # Train
+        model.train()
+        tot, correct, loss_sum = 0, 0, 0.0
+        for xb, yb in tr_dl:
+            xb, yb = xb.to(device), yb.to(device)
+            opt.zero_grad(set_to_none=True)
+            logits = model(xb)
+            loss = crit(logits, yb)
+            loss.backward()
+            opt.step()
+            loss_sum += loss.item() * yb.size(0)
+            correct += (logits.argmax(1)==yb).sum().item()
+            tot += yb.size(0)
+        tr_loss = loss_sum / max(1, tot)
+        tr_acc = correct / max(1, tot)
+
+        # Validate
+        model.eval()
+        vtot, vcorrect = 0, 0
+        with torch.no_grad():
+            for xb, yb in va_dl:
+                xb, yb = xb.to(device), yb.to(device)
+                logits = model(xb)
+                vcorrect += (logits.argmax(1)==yb).sum().item()
+                vtot += yb.size(0)
+        va_acc = vcorrect / max(1, vtot)
+        sch.step()
+
+        print(f"Epoch {epoch:02d}: train_loss={tr_loss:.4f} train_acc={tr_acc:.3f} val_acc={va_acc:.3f}")
+
+        if va_acc > best_acc:
+            best_acc = va_acc
+            best_state = {
+                "model": model.state_dict(),
+                "classes": classes,
+                "frames": T,
+                "X_mean": torch.from_numpy(X_mean),   # tensors → future-proof
+                "X_std":  torch.from_numpy(X_std),
+            }
+            torch.save(best_state, args.out)
+            print(f"  ✅ Saved best → {args.out} (val_acc={best_acc:.3f})")
+
+    print("Done. Best val_acc:", best_acc)
+
+if __name__ == "__main__":
+    main()
--- a/first_attempt_landmark_hands/what_to_do.txt
+++ b/first_attempt_landmark_hands/what_to_do.txt
@@ -0,0 +1,24 @@
+# 1) Create dirs
+# ./make_seq_dirs.sh A B J Z
+
+# 2) Capture clips (0.8s each by default)
+python capture_sequence.py --label A --split train
+python capture_sequence.py --label A --split val
+python capture_sequence.py --label B --split train
+python capture_sequence.py --label B --split val
+python capture_sequence.py --label J --split train
+python capture_sequence.py --label J --split val
+python capture_sequence.py --label Z --split train
+python capture_sequence.py --label Z --split val
+
+# 3) Preprocess to 32 frames (auto-picks classes from sequences/train/*)
+python prep_sequence_resampled.py --in sequences --out landmarks_seq32 --frames 32
+
+# 4) Train GRU (multiclass on A/B/J/Z)
+python train_seq.py --landmarks landmarks_seq32 --epochs 40 --batch 64 --lr 1e-3 --out asl_seq32_gru_ABJZ.pt
+
+# 5) Live inference
+python infer_seq_webcam.py --model asl_seq32_gru_ABJZ.pt --threshold 0.6 --smooth 0.2
+
+# If you later add more letters (e.g., C, D), 
+# just create those folders, record clips, re-run the prep step, then train again — the pipeline will include whatever letters exist under sequences/train/.