Initial commit: MediaPipe landmarks demo

HTML demos for face, hand, gesture, and posture tracking using MediaPipe.
Includes Python CLI tools for processing video files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-19 22:38:40 -05:00
commit 8bcc62b045
22 changed files with 2347 additions and 0 deletions

3
.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
.venv/
__pycache__/
*.pyc

179
Training_handshape_B.md Normal file
View File

@@ -0,0 +1,179 @@
Lets add a custom gesture for the **ASL letter “B”** (flat hand, fingers together, thumb folded across the palm) using MediaPipe **Gesture Recognizer (Model Maker)**.
# Plan (what youll build)
* A custom model with a new class label, e.g. `ASL_B`, plus the required `none` class.
* A small, labeled image dataset (Model Maker will extract hand landmarks for you).
* A trained `.task` file you can drop into your Python/JS app and allowlist.
---
# 1) Pick labels
Use:
* `ASL_B` ← your new gesture
* `none` ← anything thats not one of your target gestures (mandatory)
Folder layout:
```
dataset/
ASL_B/
...images...
none/
...images...
```
---
# 2) Collect the right data (what to capture)
Target handshape for **B**:
* **Fingers**: indexpinky fully extended and **pressed together**
* **Thumb**: folded across palm (not sticking out to the side)
* **Palm**: facing camera (front) and also a few angles
Suggested minimums (per label):
| Bucket | Shots |
| --------------------------------------------------- | -------------------- |
| Distances: close (\~4060 cm), medium (\~80120 cm) | 80 |
| View angles: front, \~30°, \~60° yaw | 80 |
| Rotations: slight roll/tilt | 40 |
| Lighting: bright, dim, backlit | 40 |
| Backgrounds: plain wall, cluttered office/outdoor | 40 |
| Hands: left & right (both) | included across all |
| Skin tones / several people | as many as practical |
Do **at least \~300500** `ASL_B` images to start.
For **`none`**, include: open palm (“High-Five”), slightly spread fingers, thumbs-up, fist, pointing, random objects/background frames, other ASL letters—especially **Open\_Palm** look-alikes so the model learns “not B”.
Quick ways to get images:
* Record short clips on laptop/phone and extract frames (e.g., 2 fps).
* Ask 35 colleagues to contribute a short 1020s clip each.
Frame extraction example:
```bash
# Extract 2 frames/sec from a video into dataset/ASL_B/
ffmpeg -i b_sign.mov -vf fps=2 dataset/ASL_B/b_%05d.jpg
# Do the same for negatives into dataset/none/
```
---
# 3) Train with Model Maker (Python)
Create and activate a venv, then:
```bash
pip install --upgrade pip
pip install mediapipe-model-maker
```
Training script (save as `train_asl_b.py` and run it):
```python
from mediapipe_model_maker import gesture_recognizer as gr
DATA_DIR = "dataset"
EXPORT_DIR = "exported_model"
# Load & auto-preprocess (runs hand detection, keeps images with a detected hand)
data = gr.Dataset.from_folder(
dirname=DATA_DIR,
hparams=gr.HandDataPreprocessingParams( # you can tweak these if needed
min_detection_confidence=0.5
)
)
# Split
train_data, rest = data.split(0.8)
validation_data, test_data = rest.split(0.5)
# Hyperparameters (start small; bump epochs if needed)
hparams = gr.HParams(
export_dir=EXPORT_DIR,
epochs=12,
batch_size=16,
learning_rate=0.001,
)
# Optional model head size & dropout
options = gr.GestureRecognizerOptions(
hparams=hparams,
model_options=gr.ModelOptions(layer_widths=[128, 64], dropout_rate=0.1)
)
model = gr.GestureRecognizer.create(
train_data=train_data,
validation_data=validation_data,
options=options
)
# Evaluate
loss, acc = model.evaluate(test_data, batch_size=1)
print(f"Test loss={loss:.4f}, acc={acc:.4f}")
# Export .task
model.export_model() # writes exported_model/gesture_recognizer.task
print("Exported:", EXPORT_DIR + "/gesture_recognizer.task")
```
Tips:
* If many `ASL_B` images get dropped at load time (no hand detected), back up the camera a little or ensure the whole hand is visible.
* If `none` is weak, add more “near-miss” negatives: open palm with fingers slightly apart, thumb slightly out, partial occlusions.
---
# 4) Plug it into your app
**Python (Tasks API example):**
```python
import mediapipe as mp
BaseOptions = mp.tasks.BaseOptions
GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerOptions = mp.tasks.vision.GestureRecognizerOptions
VisionRunningMode = mp.tasks.vision.RunningMode
ClassifierOptions = mp.tasks.components.processors.ClassifierOptions
options = GestureRecognizerOptions(
base_options=BaseOptions(model_asset_path="exported_model/gesture_recognizer.task"),
running_mode=VisionRunningMode.LIVE_STREAM,
custom_gesture_classifier_options=ClassifierOptions(
score_threshold=0.6, # tighten until false positives drop
category_allowlist=["ASL_B"] # only report your class
),
)
recognizer = GestureRecognizer.create_from_options(options)
```
**Web (JS):**
```js
const recognizer = await GestureRecognizer.createFromOptions(fileset, {
baseOptions: { modelAssetPath: "exported_model/gesture_recognizer.task" },
runningMode: "LIVE_STREAM",
customGesturesClassifierOptions: {
scoreThreshold: 0.6,
categoryAllowlist: ["ASL_B"]
}
});
```
---
# 5) Troubleshooting & tuning
* **False positives with Open Palm:** Add more `none` examples where fingers are together but **thumb is visible** to the side. The model needs to see “almost B but not B.”
* **Left vs right hand:** Include both in training. If you only trained on right hands, left hands may underperform.
* **Distance issues:** If far-away hands fail, capture more medium/far shots. Landmarks get noisier when small.
* **Thresholds:** Raise `score_threshold` to reduce spurious detections; lower it if you miss true Bs.
* **Confusion matrix:** If accuracy is fine but live results wobble, collect more from the exact camera/lighting youll use.
---

435
face.html Normal file
View File

@@ -0,0 +1,435 @@
<!-- face.html • Single-file MediaPipe Face Landmarker demo -->
<!-- Copyright 2023 The MediaPipe Authors.
Licensed under the Apache License, Version 2.0 -->
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="Cache-control" content="no-cache, no-store, must-revalidate" />
<meta http-equiv="Pragma" content="no-cache" />
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
<title>Face Landmarker</title>
<!-- Material Components (styles only for the raised button) -->
<link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet" />
<script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>
<style>
/* Inlined CSS from your snippet (with minor cleanups) */
body {
font-family: helvetica, arial, sans-serif;
margin: 2em;
color: #3d3d3d;
--mdc-theme-primary: #007f8b;
--mdc-theme-on-primary: #f1f3f4;
}
h1 {
font-style: italic;
color: #007f8b;
}
h2 {
clear: both;
}
em { font-weight: bold; }
video {
clear: both;
display: block;
transform: rotateY(180deg);
-webkit-transform: rotateY(180deg);
-moz-transform: rotateY(180deg);
}
section {
opacity: 1;
transition: opacity 500ms ease-in-out;
}
.removed { display: none; }
.invisible { opacity: 0.2; }
.note {
font-style: italic;
font-size: 130%;
}
.videoView,
.detectOnClick,
.blend-shapes {
position: relative;
float: left;
width: 48%;
margin: 2% 1%;
cursor: pointer;
}
.videoView p,
.detectOnClick p {
position: absolute;
padding: 5px;
background-color: #007f8b;
color: #fff;
border: 1px dashed rgba(255, 255, 255, 0.7);
z-index: 2;
font-size: 12px;
margin: 0;
}
.highlighter {
background: rgba(0, 255, 0, 0.25);
border: 1px dashed #fff;
z-index: 1;
position: absolute;
}
.canvas {
z-index: 1;
position: absolute;
pointer-events: none;
}
.output_canvas {
transform: rotateY(180deg);
-webkit-transform: rotateY(180deg);
-moz-transform: rotateY(180deg);
}
.detectOnClick { z-index: 0; }
.detectOnClick img { width: 100%; }
.blend-shapes-item {
display: flex;
align-items: center;
height: 20px;
}
.blend-shapes-label {
display: flex;
width: 120px;
justify-content: flex-end;
align-items: center;
margin-right: 4px;
}
.blend-shapes-value {
display: flex;
height: 16px;
align-items: center;
background-color: #007f8b;
color: #fff;
padding: 0 6px;
border-radius: 2px;
white-space: nowrap;
overflow: hidden;
}
/* Ensure video/canvas overlap correctly inside the container */
#liveView > div {
position: relative;
display: inline-block;
}
#webcam {
position: absolute; left: 0; top: 0;
}
#output_canvas {
position: absolute; left: 0; top: 0;
}
</style>
</head>
<body>
<h1>Face landmark detection using the MediaPipe FaceLandmarker task</h1>
<section id="demos" class="invisible">
<h2>Demo: Webcam continuous face landmarks detection</h2>
<p>
Hold your face in front of your webcam to get real-time face landmarker detection.<br />
Click <b>enable webcam</b> below and grant access to the webcam if prompted.
</p>
<div id="liveView" class="videoView">
<button id="webcamButton" class="mdc-button mdc-button--raised">
<span class="mdc-button__ripple"></span>
<span class="mdc-button__label">ENABLE WEBCAM</span>
</button>
<div>
<video id="webcam" autoplay playsinline></video>
<canvas class="output_canvas" id="output_canvas"></canvas>
</div>
</div>
<div class="blend-shapes">
<ul class="blend-shapes-list" id="video-blend-shapes"></ul>
</div>
</section>
<script type="module">
// Inlined JS (converted to plain JS; removed TS types)
import vision from "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3";
const { FaceLandmarker, FilesetResolver, DrawingUtils } = vision;
const demosSection = document.getElementById("demos");
const imageBlendShapes = document.getElementById("image-blend-shapes");
const videoBlendShapes = document.getElementById("video-blend-shapes");
let faceLandmarker;
let runningMode = "IMAGE"; // "IMAGE" | "VIDEO"
let enableWebcamButton;
let webcamRunning = false;
const videoWidth = 480;
async function createFaceLandmarker() {
const filesetResolver = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3/wasm"
);
faceLandmarker = await FaceLandmarker.createFromOptions(filesetResolver, {
baseOptions: {
modelAssetPath:
"https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task",
delegate: "GPU",
},
outputFaceBlendshapes: true,
runningMode,
numFaces: 1,
});
demosSection.classList.remove("invisible");
}
createFaceLandmarker();
/********************************************************************
// Demo 1: Click image to detect landmarks
********************************************************************/
const imageContainers = document.getElementsByClassName("detectOnClick");
for (let imageContainer of imageContainers) {
imageContainer.children[0].addEventListener("click", handleClick);
}
async function handleClick(event) {
if (!faceLandmarker) {
console.log("Wait for faceLandmarker to load before clicking!");
return;
}
if (runningMode === "VIDEO") {
runningMode = "IMAGE";
await faceLandmarker.setOptions({ runningMode });
}
const parent = event.target.parentNode;
const allCanvas = parent.getElementsByClassName("canvas");
for (let i = allCanvas.length - 1; i >= 0; i--) {
const n = allCanvas[i];
n.parentNode.removeChild(n);
}
const faceLandmarkerResult = faceLandmarker.detect(event.target);
const canvas = document.createElement("canvas");
canvas.setAttribute("class", "canvas");
canvas.setAttribute("width", event.target.naturalWidth + "px");
canvas.setAttribute("height", event.target.naturalHeight + "px");
canvas.style.left = "0px";
canvas.style.top = "0px";
canvas.style.width = `${event.target.width}px`;
canvas.style.height = `${event.target.height}px`;
parent.appendChild(canvas);
const ctx = canvas.getContext("2d");
const drawingUtils = new DrawingUtils(ctx);
for (const landmarks of faceLandmarkerResult.faceLandmarks) {
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_TESSELATION,
{ color: "#C0C0C070", lineWidth: 1 }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_RIGHT_EYE,
{ color: "#FF3030" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_RIGHT_EYEBROW,
{ color: "#FF3030" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_LEFT_EYE,
{ color: "#30FF30" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_LEFT_EYEBROW,
{ color: "#30FF30" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_FACE_OVAL,
{ color: "#E0E0E0" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_LIPS,
{ color: "#E0E0E0" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_RIGHT_IRIS,
{ color: "#FF3030" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_LEFT_IRIS,
{ color: "#30FF30" }
);
}
drawBlendShapes(imageBlendShapes, faceLandmarkerResult.faceBlendshapes);
}
/********************************************************************
// Demo 2: Webcam stream detection
********************************************************************/
const video = document.getElementById("webcam");
const canvasElement = document.getElementById("output_canvas");
const canvasCtx = canvasElement.getContext("2d");
function hasGetUserMedia() {
return !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
}
if (hasGetUserMedia()) {
enableWebcamButton = document.getElementById("webcamButton");
enableWebcamButton.addEventListener("click", enableCam);
} else {
console.warn("getUserMedia() is not supported by your browser");
}
function enableCam() {
if (!faceLandmarker) {
console.log("Wait! faceLandmarker not loaded yet.");
return;
}
webcamRunning = !webcamRunning;
enableWebcamButton.innerText = webcamRunning
? "DISABLE PREDICTIONS"
: "ENABLE PREDICTIONS";
const constraints = { video: true };
navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
video.srcObject = stream;
video.addEventListener("loadeddata", predictWebcam);
});
}
let lastVideoTime = -1;
let results;
const drawingUtils = new DrawingUtils(canvasCtx);
async function predictWebcam() {
const ratio = video.videoHeight / video.videoWidth;
video.style.width = videoWidth + "px";
video.style.height = videoWidth * ratio + "px";
canvasElement.style.width = videoWidth + "px";
canvasElement.style.height = videoWidth * ratio + "px";
canvasElement.width = video.videoWidth;
canvasElement.height = video.videoHeight;
if (runningMode === "IMAGE") {
runningMode = "VIDEO";
await faceLandmarker.setOptions({ runningMode });
}
const startTimeMs = performance.now();
if (lastVideoTime !== video.currentTime) {
lastVideoTime = video.currentTime;
results = faceLandmarker.detectForVideo(video, startTimeMs);
}
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
if (results && results.faceLandmarks) {
for (const landmarks of results.faceLandmarks) {
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_TESSELATION,
{ color: "#C0C0C070", lineWidth: 1 }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_RIGHT_EYE,
{ color: "#FF3030" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_RIGHT_EYEBROW,
{ color: "#FF3030" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_LEFT_EYE,
{ color: "#30FF30" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_LEFT_EYEBROW,
{ color: "#30FF30" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_FACE_OVAL,
{ color: "#E0E0E0" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_LIPS,
{ color: "#E0E0E0" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_RIGHT_IRIS,
{ color: "#FF3030" }
);
drawingUtils.drawConnectors(
landmarks,
FaceLandmarker.FACE_LANDMARKS_LEFT_IRIS,
{ color: "#30FF30" }
);
}
}
drawBlendShapes(videoBlendShapes, (results && results.faceBlendshapes) || []);
if (webcamRunning === true) {
window.requestAnimationFrame(predictWebcam);
}
}
function drawBlendShapes(el, blendShapes) {
if (!blendShapes || !blendShapes.length) {
el.innerHTML = "";
return;
}
let htmlMaker = "";
blendShapes[0].categories.forEach((shape) => {
const label = shape.displayName || shape.categoryName;
const pct = Math.max(0, Math.min(1, Number(shape.score) || 0));
htmlMaker += `
<li class="blend-shapes-item">
<span class="blend-shapes-label">${label}</span>
<span class="blend-shapes-value" style="width: calc(${pct * 100}% - 120px)">${pct.toFixed(4)}</span>
</li>
`;
});
el.innerHTML = htmlMaker;
}
</script>
</body>
</html>

BIN
face_landmarker.task Normal file

Binary file not shown.

1
fingers_positions.sh Executable file
View File

@@ -0,0 +1 @@
python hand_landmarker_cli.py --image hand.png --model hand_landmarker.task --out annotated.png

290
gesture.html Normal file
View File

@@ -0,0 +1,290 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>MediaPipe Hand Gesture Recognizer — Single File Demo</title>
<!-- Material Components (for button styling) -->
<link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet" />
<script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>
<style>
/* Inlined from the CodePen CSS (Sass directives removed) */
body {
font-family: Roboto, system-ui, -apple-system, Segoe UI, Helvetica, Arial, sans-serif;
margin: 2em;
color: #3d3d3d;
--mdc-theme-primary: #007f8b;
--mdc-theme-on-primary: #f1f3f4;
}
h1 { color: #007f8b; }
h2 { clear: both; }
video {
clear: both;
display: block;
transform: rotateY(180deg);
-webkit-transform: rotateY(180deg);
-moz-transform: rotateY(180deg);
height: 280px;
}
section { opacity: 1; transition: opacity 500ms ease-in-out; }
.removed { display: none; }
.invisible { opacity: 0.2; }
.detectOnClick {
position: relative;
float: left;
width: 48%;
margin: 2% 1%;
cursor: pointer;
z-index: 0;
font-size: calc(8px + 1.2vw);
}
.videoView {
position: absolute;
float: left;
width: 48%;
margin: 2% 1%;
cursor: pointer;
min-height: 500px;
}
.videoView p,
.detectOnClick p {
padding-top: 5px;
padding-bottom: 5px;
background-color: #007f8b;
color: #fff;
border: 1px dashed rgba(255, 255, 255, 0.7);
z-index: 2;
margin: 0;
}
.highlighter { background: rgba(0, 255, 0, 0.25); border: 1px dashed #fff; z-index: 1; position: absolute; }
.canvas { z-index: 1; position: absolute; pointer-events: none; }
.output_canvas {
transform: rotateY(180deg);
-webkit-transform: rotateY(180deg);
-moz-transform: rotateY(180deg);
}
.detectOnClick img { width: 45vw; }
.output { display: none; width: 100%; font-size: calc(8px + 1.2vw); }
</style>
</head>
<body>
<section id="demos" class="invisible">
<h2><br>Demo: Webcam continuous hand gesture detection</h2>
<p>Use your hand to make gestures in front of the camera to get gesture classification. <br />Click <b>enable webcam</b> below and grant access to the webcam if prompted.</p>
<PRE>
Gesture Label Description
Closed_Fist Hand fully closed into a fist
Open_Palm Flat open hand
Pointing_Up Index finger extended upward, others closed
Thumb_Down Thumb extended downward
Thumb_Up Thumb extended upward
Victory Index and middle finger extended in a “V”
ILoveYou Thumb, index, and pinky extended (ASL “I love you”)
None No recognized gesture / below confidence threshold
</PRE>
<div id="liveView" class="videoView">
<button id="webcamButton" class="mdc-button mdc-button--raised">
<span class="mdc-button__ripple"></span>
<span class="mdc-button__label">ENABLE WEBCAM</span>
</button>
<div style="position: relative;">
<video id="webcam" autoplay playsinline></video>
<canvas class="output_canvas" id="output_canvas" width="1280" height="720" style="position: absolute; left: 0; top: 0;"></canvas>
<p id="gesture_output" class="output"></p>
</div>
</div>
</section>
<script type="module">
import { GestureRecognizer, FilesetResolver, DrawingUtils } from "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3";
const demosSection = document.getElementById("demos");
/** @type {GestureRecognizer} */
let gestureRecognizer;
let runningMode = "IMAGE";
/** @type {HTMLButtonElement} */
let enableWebcamButton;
let webcamRunning = false;
const videoHeight = "360px";
const videoWidth = "480px";
// Load the WASM and model, then reveal the demos section
const createGestureRecognizer = async () => {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3/wasm"
);
gestureRecognizer = await GestureRecognizer.createFromOptions(vision, {
baseOptions: {
modelAssetPath: "https://storage.googleapis.com/mediapipe-models/gesture_recognizer/gesture_recognizer/float16/1/gesture_recognizer.task",
delegate: "GPU"
},
runningMode
});
demosSection.classList.remove("invisible");
};
createGestureRecognizer();
/********************************************************************
// Demo 1: Detect hand gestures in images
********************************************************************/
const imageContainers = document.getElementsByClassName("detectOnClick");
for (let i = 0; i < imageContainers.length; i++) {
const img = imageContainers[i].children[0];
img.addEventListener("click", handleClick);
}
async function handleClick(event) {
if (!gestureRecognizer) {
alert("Please wait for gestureRecognizer to load");
return;
}
if (runningMode === "VIDEO") {
runningMode = "IMAGE";
await gestureRecognizer.setOptions({ runningMode: "IMAGE" });
}
const parent = event.target.parentNode;
// Remove previous overlays
const allCanvas = parent.getElementsByClassName("canvas");
for (let i = allCanvas.length - 1; i >= 0; i--) {
const n = allCanvas[i];
n.parentNode.removeChild(n);
}
const results = gestureRecognizer.recognize(event.target);
console.log(results);
if (results.gestures && results.gestures.length > 0) {
const p = parent.querySelector(".classification");
p.classList.remove("removed");
const categoryName = results.gestures[0][0].categoryName;
const categoryScore = (results.gestures[0][0].score * 100).toFixed(2);
const handedness = results.handednesses[0][0].displayName;
p.innerText = `GestureRecognizer: ${categoryName}\n Confidence: ${categoryScore}%\n Handedness: ${handedness}`;
p.style.left = "0px";
p.style.top = event.target.height + "px";
p.style.width = event.target.width - 10 + "px";
const canvas = document.createElement("canvas");
canvas.setAttribute("class", "canvas");
canvas.setAttribute("width", event.target.naturalWidth + "px");
canvas.setAttribute("height", event.target.naturalHeight + "px");
canvas.style.left = "0px";
canvas.style.top = "0px";
canvas.style.width = event.target.width + "px";
canvas.style.height = event.target.height + "px";
parent.appendChild(canvas);
const canvasCtx = canvas.getContext("2d");
const drawingUtils = new DrawingUtils(canvasCtx);
if (results.landmarks) {
for (const landmarks of results.landmarks) {
drawingUtils.drawConnectors(landmarks, GestureRecognizer.HAND_CONNECTIONS, { lineWidth: 5 });
drawingUtils.drawLandmarks(landmarks, { lineWidth: 1 });
}
}
}
}
/********************************************************************
// Demo 2: Continuously grab image from webcam stream and detect it.
********************************************************************/
const video = document.getElementById("webcam");
const canvasElement = document.getElementById("output_canvas");
const canvasCtx = canvasElement.getContext("2d");
const gestureOutput = document.getElementById("gesture_output");
function hasGetUserMedia() {
return !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
}
if (hasGetUserMedia()) {
enableWebcamButton = document.getElementById("webcamButton");
enableWebcamButton.addEventListener("click", enableCam);
} else {
console.warn("getUserMedia() is not supported by your browser");
}
function enableCam() {
if (!gestureRecognizer) {
alert("Please wait for gestureRecognizer to load");
return;
}
webcamRunning = !webcamRunning;
enableWebcamButton.innerText = webcamRunning ? "DISABLE PREDICTIONS" : "ENABLE PREDICTIONS";
const constraints = { video: true };
navigator.mediaDevices.getUserMedia(constraints).then(function (stream) {
video.srcObject = stream;
video.addEventListener("loadeddata", predictWebcam);
});
}
let lastVideoTime = -1;
let results;
async function predictWebcam() {
const webcamElement = document.getElementById("webcam");
if (runningMode === "IMAGE") {
runningMode = "VIDEO";
await gestureRecognizer.setOptions({ runningMode: "VIDEO" });
}
const nowInMs = Date.now();
if (video.currentTime !== lastVideoTime) {
lastVideoTime = video.currentTime;
results = gestureRecognizer.recognizeForVideo(video, nowInMs);
}
canvasCtx.save();
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
const drawingUtils = new DrawingUtils(canvasCtx);
canvasElement.style.height = videoHeight;
webcamElement.style.height = videoHeight;
canvasElement.style.width = videoWidth;
webcamElement.style.width = videoWidth;
if (results && results.landmarks) {
for (const landmarks of results.landmarks) {
drawingUtils.drawConnectors(landmarks, GestureRecognizer.HAND_CONNECTIONS, { lineWidth: 5 });
drawingUtils.drawLandmarks(landmarks, { lineWidth: 2 });
}
}
canvasCtx.restore();
if (results && results.gestures && results.gestures.length > 0) {
gestureOutput.style.display = "block";
gestureOutput.style.width = videoWidth;
const categoryName = results.gestures[0][0].categoryName;
const categoryScore = (results.gestures[0][0].score * 100).toFixed(2);
const handedness = results.handednesses[0][0].displayName;
gestureOutput.innerText = `GestureRecognizer: ${categoryName}\n Confidence: ${categoryScore} %\n Handedness: ${handedness}`;
} else {
gestureOutput.style.display = "none";
}
if (webcamRunning === true) {
window.requestAnimationFrame(predictWebcam);
}
}
</script>
</body>
</html>

5
gesture.sh Executable file
View File

@@ -0,0 +1,5 @@
export GLOG_minloglevel=2
export TF_CPP_MIN_LOG_LEVEL=3
python recognize_gesture.py --image ily.png --model gesture_recognizer.task 2>/dev/null

BIN
gesture_recognizer.task Normal file

Binary file not shown.

BIN
hand.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 406 KiB

BIN
hand_landmarker.task Normal file

Binary file not shown.

125
hand_landmarker_cli.py Executable file
View File

@@ -0,0 +1,125 @@
#!/usr/bin/env python3
"""
Hand Landmarks on a static image using MediaPipe Tasks.
Usage:
python hand_landmarker_cli.py --image hand.png --model hand_landmarker.task --max_hands 2 --out annotated.png
What it does:
• Loads the MediaPipe Hand Landmarker model (.task file)
• Runs landmark detection on a single image
• Prints handedness and 21 landmark coords for each detected hand
• Saves an annotated image with landmarks and connections
"""
import argparse
import sys
from pathlib import Path
import cv2
import numpy as np
import mediapipe as mp
# MediaPipe Tasks API aliases
BaseOptions = mp.tasks.BaseOptions
HandLandmarker = mp.tasks.vision.HandLandmarker
HandLandmarkerOptions = mp.tasks.vision.HandLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode
# Landmark connection topology (same as mp.solutions.hands.HAND_CONNECTIONS, copied to avoid extra dependency)
HAND_CONNECTIONS = [
(0,1),(1,2),(2,3),(3,4), # Thumb
(0,5),(5,6),(6,7),(7,8), # Index
(5,9),(9,10),(10,11),(11,12), # Middle
(9,13),(13,14),(14,15),(15,16), # Ring
(13,17),(17,18),(18,19),(19,20), # Pinky
(0,17) # Palm base to pinky base
]
def draw_landmarks(image_bgr: np.ndarray, landmarks_norm: list):
"""
Draws landmarks and connections on a BGR image.
`landmarks_norm` is a list of normalized (x,y,z) MediaPipe landmarks (0..1).
"""
h, w = image_bgr.shape[:2]
# Convert normalized to pixel coords
pts = []
for lm in landmarks_norm:
x = int(lm.x * w)
y = int(lm.y * h)
pts.append((x, y))
# Draw connections
for a, b in HAND_CONNECTIONS:
if 0 <= a < len(pts) and 0 <= b < len(pts):
cv2.line(image_bgr, pts[a], pts[b], (0, 255, 0), 2, cv2.LINE_AA)
# Draw keypoints
for i, (x, y) in enumerate(pts):
cv2.circle(image_bgr, (x, y), 3, (255, 255, 255), -1, cv2.LINE_AA)
cv2.circle(image_bgr, (x, y), 2, (0, 0, 255), -1, cv2.LINE_AA)
def main():
ap = argparse.ArgumentParser(description="MediaPipe Hand Landmarker (static image)")
ap.add_argument("--image", required=True, help="Path to an input image (e.g., hand.jpg)")
ap.add_argument("--model", default="hand_landmarker.task", help="Path to MediaPipe .task model")
ap.add_argument("--max_hands", type=int, default=2, help="Maximum hands to detect")
ap.add_argument("--out", default="annotated.png", help="Output path for annotated image")
args = ap.parse_args()
img_path = Path(args.image)
if not img_path.exists():
print(f"[ERROR] Image not found: {img_path}", file=sys.stderr)
sys.exit(1)
model_path = Path(args.model)
if not model_path.exists():
print(f"[ERROR] Model not found: {model_path}", file=sys.stderr)
print("Download the model bundle (.task) and point --model to it.", file=sys.stderr)
sys.exit(2)
# Load image for MP and for drawing
mp_image = mp.Image.create_from_file(str(img_path))
image_bgr = cv2.imread(str(img_path))
if image_bgr is None:
print(f"[ERROR] Could not read image with OpenCV: {img_path}", file=sys.stderr)
sys.exit(3)
# Configure and run the landmarker
options = HandLandmarkerOptions(
base_options=BaseOptions(model_asset_path=str(model_path)),
running_mode=VisionRunningMode.IMAGE,
num_hands=args.max_hands,
min_hand_detection_confidence=0.5,
min_hand_presence_confidence=0.5,
min_tracking_confidence=0.5
)
with HandLandmarker.create_from_options(options) as landmarker:
result = landmarker.detect(mp_image)
# Print results
if not result.hand_landmarks:
print("No hands detected.")
else:
for i, (handedness, lms, world_lms) in enumerate(
zip(result.handedness, result.hand_landmarks, result.hand_world_landmarks)
):
label = handedness[0].category_name if handedness else "Unknown"
score = handedness[0].score if handedness else 0.0
print(f"\nHand #{i+1}: {label} (score {score:.3f})")
for idx, lm in enumerate(lms):
print(f" L{idx:02d}: x={lm.x:.3f} y={lm.y:.3f} z={lm.z:.3f}")
# Draw
draw_landmarks(image_bgr, lms)
# Put label
cv2.putText(image_bgr, f"{label}", (10, 30 + i*30), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,0), 2, cv2.LINE_AA)
# Save annotated image
cv2.imwrite(str(args.out), image_bgr)
print(f"\nSaved annotated image to: {args.out}")
if __name__ == "__main__":
main()

262
holistic.html Normal file
View File

@@ -0,0 +1,262 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>MediaPipe Holistic — Main Output Only</title>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="https://fonts.googleapis.com/css2?family=Titillium+Web:wght@400;600&display=swap" rel="stylesheet">
<style>
@keyframes spin { 0% {transform: rotate(0)} 100% {transform: rotate(360deg)} }
.abs { position: absolute; }
a { color: white; text-decoration: none; } a:hover { color: lightblue; }
body {
margin: 0; color: white; font-family: 'Titillium Web', sans-serif;
position: absolute; inset: 0; overflow: hidden; background: #000;
}
.container {
position: absolute; inset: 0; background-color: #596e73; height: 100%;
}
.canvas-container {
display: flex; height: 100%; width: 100%;
justify-content: center; align-items: center;
}
.output_canvas { max-width: 100%; display: block; position: relative; }
/* Hide ALL video elements so only the processed canvas is visible */
video { display: none !important; }
.control-panel { position: absolute; left: 10px; top: 10px; z-index: 6; }
.loading {
display: flex; position: absolute; inset: 0; align-items: center; justify-content: center;
backface-visibility: hidden; opacity: 1; transition: opacity 1s; z-index: 10;
}
.loading .spinner {
position: absolute; width: 120px; height: 120px; animation: spin 1s linear infinite;
border: 32px solid #bebebe; border-top: 32px solid #3498db; border-radius: 50%;
}
.loading .message { font-size: x-large; }
.loaded .loading { opacity: 0; }
.logo { bottom: 10px; right: 20px; }
.logo .title { color: white; font-size: 28px; }
.shoutout { left: 0; right: 0; bottom: 40px; text-align: center; font-size: 24px; position: absolute; z-index: 4; }
</style>
</head>
<body>
<div class="container">
<!-- Hidden capture element kept for MediaPipe pipeline -->
<video class="input_video" playsinline></video>
<div class="canvas-container">
<canvas class="output_canvas" width="1280" height="720"></canvas>
</div>
<!-- Loading spinner -->
<div class="loading">
<div class="spinner"></div>
<div class="message">Loading</div>
</div>
<!-- Logo/link -->
<a class="abs logo" href="https://mediapipe.dev" target="_blank" rel="noreferrer">
<div style="display:flex;align-items:center;bottom:0;right:10px;">
<img class="logo" alt="" style="height:50px"
src="" />
<span class="title" style="margin-left:8px">MediaPipe</span>
</div>
</a>
<!-- Info link -->
<div class="shoutout">
<div><a href="https://solutions.mediapipe.dev/holistic" target="_blank" rel="noreferrer">Click here for more info</a></div>
</div>
</div>
<!-- Control panel container -->
<div class="control-panel"></div>
<!-- MediaPipe libs (globals: mpHolistic, drawingUtils, controlsNS, etc.) -->
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/holistic/holistic.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js"></script>
<!-- Device detector is ESM; we import it and run the app -->
<script type="module">
import DeviceDetector from "https://cdn.skypack.dev/device-detector-js@2.2.10";
function testSupport(supportedDevices) {
const dd = new DeviceDetector();
const d = dd.parse(navigator.userAgent);
let ok = false;
for (const dev of supportedDevices) {
if (dev.client && !(new RegExp(`^${dev.client}$`)).test(d.client.name)) continue;
if (dev.os && !(new RegExp(`^${dev.os}$`)).test(d.os.name)) continue;
ok = true; break;
}
if (!ok) alert(`This demo, running on ${d.client.name}/${d.os.name}, is not well supported at this time, continue at your own risk.`);
}
testSupport([{ client: 'Chrome' }]);
const controlsNS = window;
const mpHolistic = window;
const drawingUtils = window;
const videoElement = document.getElementsByClassName('input_video')[0];
const canvasElement = document.getElementsByClassName('output_canvas')[0];
const controlsElement = document.getElementsByClassName('control-panel')[0];
const canvasCtx = canvasElement.getContext('2d');
const fpsControl = new controlsNS.FPS();
const spinner = document.querySelector('.loading');
spinner.ontransitionend = () => { spinner.style.display = 'none'; };
function removeElements(landmarks, elements) {
if (!landmarks) return;
for (const e of elements) delete landmarks[e];
}
function removeLandmarks(results) {
if (results.poseLandmarks) {
removeElements(results.poseLandmarks, [0,1,2,3,4,5,6,7,8,9,10,15,16,17,18,19,20,21,22]);
}
}
function connect(ctx, connectors) {
const c = ctx.canvas;
for (const [from, to] of connectors) {
if (!from || !to) continue;
if (from.visibility && to.visibility && (from.visibility < 0.1 || to.visibility < 0.1)) continue;
ctx.beginPath();
ctx.moveTo(from.x * c.width, from.y * c.height);
ctx.lineTo(to.x * c.width, to.y * c.height);
ctx.stroke();
}
}
let activeEffect = 'mask';
function onResults(results) {
document.body.classList.add('loaded');
removeLandmarks(results);
fpsControl.tick();
canvasCtx.save();
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
if (results.segmentationMask) {
canvasCtx.drawImage(results.segmentationMask, 0, 0, canvasElement.width, canvasElement.height);
if (activeEffect === 'mask' || activeEffect === 'both') {
canvasCtx.globalCompositeOperation = 'source-in';
canvasCtx.fillStyle = '#00FF007F';
canvasCtx.fillRect(0, 0, canvasElement.width, canvasElement.height);
} else {
canvasCtx.globalCompositeOperation = 'source-out';
canvasCtx.fillStyle = '#0000FF7F';
canvasCtx.fillRect(0, 0, canvasElement.width, canvasElement.height);
}
canvasCtx.globalCompositeOperation = 'destination-atop';
canvasCtx.drawImage(results.image, 0, 0, canvasElement.width, canvasElement.height);
canvasCtx.globalCompositeOperation = 'source-over';
} else {
canvasCtx.drawImage(results.image, 0, 0, canvasElement.width, canvasElement.height);
}
canvasCtx.lineWidth = 5;
if (results.poseLandmarks) {
if (results.rightHandLandmarks) {
canvasCtx.strokeStyle = 'white';
connect(canvasCtx, [[
results.poseLandmarks[mpHolistic.POSE_LANDMARKS.RIGHT_ELBOW],
results.rightHandLandmarks[0]
]]);
}
if (results.leftHandLandmarks) {
canvasCtx.strokeStyle = 'white';
connect(canvasCtx, [[
results.poseLandmarks[mpHolistic.POSE_LANDMARKS.LEFT_ELBOW],
results.leftHandLandmarks[0]
]]);
}
}
drawingUtils.drawConnectors(canvasCtx, results.poseLandmarks, mpHolistic.POSE_CONNECTIONS, { color: 'white' });
drawingUtils.drawLandmarks(
canvasCtx,
Object.values(mpHolistic.POSE_LANDMARKS_LEFT).map(i => results.poseLandmarks?.[i]),
{ visibilityMin: 0.65, color: 'white', fillColor: 'rgb(255,138,0)' }
);
drawingUtils.drawLandmarks(
canvasCtx,
Object.values(mpHolistic.POSE_LANDMARKS_RIGHT).map(i => results.poseLandmarks?.[i]),
{ visibilityMin: 0.65, color: 'white', fillColor: 'rgb(0,217,231)' }
);
drawingUtils.drawConnectors(canvasCtx, results.rightHandLandmarks, mpHolistic.HAND_CONNECTIONS, { color: 'white' });
drawingUtils.drawLandmarks(canvasCtx, results.rightHandLandmarks, {
color: 'white', fillColor: 'rgb(0,217,231)', lineWidth: 2,
radius: (data) => drawingUtils.lerp(data.from?.z ?? 0, -0.15, 0.1, 10, 1)
});
drawingUtils.drawConnectors(canvasCtx, results.leftHandLandmarks, mpHolistic.HAND_CONNECTIONS, { color: 'white' });
drawingUtils.drawLandmarks(canvasCtx, results.leftHandLandmarks, {
color: 'white', fillColor: 'rgb(255,138,0)', lineWidth: 2,
radius: (data) => drawingUtils.lerp(data.from?.z ?? 0, -0.15, 0.1, 10, 1)
});
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_TESSELATION, { color: '#C0C0C070', lineWidth: 1 });
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_RIGHT_EYE, { color: 'rgb(0,217,231)' });
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_RIGHT_EYEBROW, { color: 'rgb(0,217,231)' });
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_LEFT_EYE, { color: 'rgb(255,138,0)' });
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_LEFT_EYEBROW, { color: 'rgb(255,138,0)' });
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_FACE_OVAL, { color: '#E0E0E0', lineWidth: 5 });
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_LIPS, { color: '#E0E0E0', lineWidth: 5 });
canvasCtx.restore();
}
const holistic = new mpHolistic.Holistic({
locateFile: (file) => `https://cdn.jsdelivr.net/npm/@mediapipe/holistic@${mpHolistic.VERSION}/${file}`
});
holistic.onResults(onResults);
new controlsNS.ControlPanel(controlsElement, {
selfieMode: true,
modelComplexity: 1,
smoothLandmarks: true,
enableSegmentation: false,
smoothSegmentation: true,
minDetectionConfidence: 0.5,
minTrackingConfidence: 0.5,
effect: 'background',
})
.add([
new controlsNS.StaticText({ title: 'MediaPipe Holistic' }),
fpsControl,
new controlsNS.Toggle({ title: 'Selfie Mode', field: 'selfieMode' }),
new controlsNS.SourcePicker({
onSourceChanged: () => { holistic.reset(); },
onFrame: async (input, size) => {
const aspect = size.height / size.width;
let width, height;
if (window.innerWidth > window.innerHeight) {
height = window.innerHeight; width = height / aspect;
} else {
width = window.innerWidth; height = width * aspect;
}
canvasElement.width = width;
canvasElement.height = height;
await holistic.send({ image: input });
},
}),
new controlsNS.Slider({ title: 'Model Complexity', field: 'modelComplexity', discrete: ['Lite', 'Full', 'Heavy'] }),
new controlsNS.Toggle({ title: 'Smooth Landmarks', field: 'smoothLandmarks' }),
new controlsNS.Toggle({ title: 'Enable Segmentation', field: 'enableSegmentation' }),
new controlsNS.Toggle({ title: 'Smooth Segmentation', field: 'smoothSegmentation' }),
new controlsNS.Slider({ title: 'Min Detection Confidence', field: 'minDetectionConfidence', range: [0, 1], step: 0.01 }),
new controlsNS.Slider({ title: 'Min Tracking Confidence', field: 'minTrackingConfidence', range: [0, 1], step: 0.01 }),
new controlsNS.Slider({ title: 'Effect', field: 'effect', discrete: { background: 'Background', mask: 'Foreground' } }),
])
.on(x => {
const options = x;
videoElement.classList.toggle('selfie', !!options.selfieMode);
activeEffect = x['effect'];
holistic.setOptions(options);
});
</script>
</body>
</html>

BIN
ily.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 304 KiB

BIN
landmarks.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 977 KiB

268
marker.html Normal file
View File

@@ -0,0 +1,268 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>MediaPipe Hand Landmarker — Single File Demo</title>
<!-- Material Components (for the button styling) -->
<link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet">
<script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>
<!-- Drawing utils (provides drawConnectors, drawLandmarks) -->
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
<!-- Hands (provides HAND_CONNECTIONS constant) -->
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/hands/hands.js" crossorigin="anonymous"></script>
<style>
/* Inline CSS from the CodePen, cleaned for single-file use */
body {
font-family: Roboto, Arial, sans-serif;
margin: 2em;
color: #3d3d3d;
--mdc-theme-primary: #007f8b;
--mdc-theme-on-primary: #f1f3f4;
}
h1 { color: #007f8b; }
h2 { clear: both; }
em { font-weight: bold; }
video {
clear: both;
display: block;
transform: rotateY(180deg);
-webkit-transform: rotateY(180deg);
-moz-transform: rotateY(180deg);
}
section {
opacity: 1;
transition: opacity 500ms ease-in-out;
}
.removed { display: none; }
.invisible { opacity: 0.2; }
.note {
font-style: italic;
font-size: 130%;
}
.videoView, .detectOnClick {
position: relative;
float: left;
width: 48%;
margin: 2% 1%;
cursor: pointer;
}
.videoView p, .detectOnClick p {
position: absolute;
padding: 5px;
background-color: #007f8b;
color: #fff;
border: 1px dashed rgba(255, 255, 255, 0.7);
z-index: 2;
font-size: 12px;
margin: 0;
}
.highlighter {
background: rgba(0, 255, 0, 0.25);
border: 1px dashed #fff;
z-index: 1;
position: absolute;
}
.canvas, .output_canvas {
z-index: 1;
position: absolute;
pointer-events: none;
}
.output_canvas {
transform: rotateY(180deg);
-webkit-transform: rotateY(180deg);
-moz-transform: rotateY(180deg);
}
.detectOnClick { z-index: 0; }
.detectOnClick img { width: 100%; }
</style>
</head>
<body>
<h2>Demo: Webcam continuous hands landmarks detection</h2>
<p>Hold your hand in front of your webcam to get real-time hand landmarker detection.<br>Click <b>ENABLE WEBCAM</b> below and grant access to the webcam if prompted.</p>
<div id="liveView" class="videoView">
<button id="webcamButton" class="mdc-button mdc-button--raised">
<span class="mdc-button__ripple"></span>
<span class="mdc-button__label">ENABLE WEBCAM</span>
</button>
<div style="position: relative;">
<video id="webcam" style="position: absolute; left: 0; top: 0;" autoplay playsinline></video>
<canvas class="output_canvas" id="output_canvas" style="left: 0; top: 0;"></canvas>
</div>
</div>
</section>
<script type="module">
// Import the Tasks Vision ESM build
import { HandLandmarker, FilesetResolver } from "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.0";
const demosSection = document.getElementById("demos");
let handLandmarker;
let runningMode = "IMAGE";
let enableWebcamButton;
let webcamRunning = false;
// Load the model and enable the demos section
const createHandLandmarker = async () => {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.0/wasm"
);
handLandmarker = await HandLandmarker.createFromOptions(vision, {
baseOptions: {
modelAssetPath: "https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/1/hand_landmarker.task",
delegate: "GPU"
},
runningMode,
numHands: 2
});
demosSection.classList.remove("invisible");
};
createHandLandmarker();
/********************************************************************
// Demo 1: Click images to run landmark detection
********************************************************************/
const imageContainers = document.getElementsByClassName("detectOnClick");
for (let i = 0; i < imageContainers.length; i++) {
const img = imageContainers[i].children[0];
img.addEventListener("click", handleClick);
}
async function handleClick(event) {
if (!handLandmarker) {
console.log("Wait for handLandmarker to load before clicking!");
return;
}
if (runningMode === "VIDEO") {
runningMode = "IMAGE";
await handLandmarker.setOptions({ runningMode: "IMAGE" });
}
const container = event.target.parentNode;
// Remove old overlays
const old = container.getElementsByClassName("canvas");
for (let i = old.length - 1; i >= 0; i--) {
old[i].parentNode.removeChild(old[i]);
}
// Run detection
const result = handLandmarker.detect(event.target);
// Create overlay canvas aligned to the image element
const canvas = document.createElement("canvas");
canvas.className = "canvas";
canvas.width = event.target.naturalWidth;
canvas.height = event.target.naturalHeight;
canvas.style.left = "0px";
canvas.style.top = "0px";
canvas.style.width = event.target.width + "px";
canvas.style.height = event.target.height + "px";
container.appendChild(canvas);
const ctx = canvas.getContext("2d");
if (result && result.landmarks) {
for (const landmarks of result.landmarks) {
// drawConnectors and drawLandmarks are provided by drawing_utils.js
// HAND_CONNECTIONS is provided by hands.js
drawConnectors(ctx, landmarks, HAND_CONNECTIONS, {
color: "#00FF00",
lineWidth: 5
});
drawLandmarks(ctx, landmarks, { color: "#FF0000", lineWidth: 1 });
}
}
}
/********************************************************************
// Demo 2: Webcam stream detection
********************************************************************/
const video = document.getElementById("webcam");
const canvasElement = document.getElementById("output_canvas");
const canvasCtx = canvasElement.getContext("2d");
const hasGetUserMedia = () => !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
if (hasGetUserMedia()) {
enableWebcamButton = document.getElementById("webcamButton");
enableWebcamButton.addEventListener("click", enableCam);
} else {
console.warn("getUserMedia() is not supported by your browser");
}
function enableCam() {
if (!handLandmarker) {
console.log("Wait! HandLandmarker not loaded yet.");
return;
}
webcamRunning = !webcamRunning;
enableWebcamButton.innerText = webcamRunning ? "DISABLE PREDICTIONS" : "ENABLE PREDICTIONS";
if (!webcamRunning) return;
const constraints = { video: true };
navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
video.srcObject = stream;
video.addEventListener("loadeddata", predictWebcam);
});
}
let lastVideoTime = -1;
let results;
async function predictWebcam() {
// Match canvas to the video size
canvasElement.style.width = video.videoWidth + "px";
canvasElement.style.height = video.videoHeight + "px";
canvasElement.width = video.videoWidth;
canvasElement.height = video.videoHeight;
// Switch to VIDEO mode for streaming
if (runningMode === "IMAGE") {
runningMode = "VIDEO";
await handLandmarker.setOptions({ runningMode: "VIDEO" });
}
const startTimeMs = performance.now();
if (lastVideoTime !== video.currentTime) {
lastVideoTime = video.currentTime;
results = handLandmarker.detectForVideo(video, startTimeMs);
}
canvasCtx.save();
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
if (results && results.landmarks) {
for (const landmarks of results.landmarks) {
drawConnectors(canvasCtx, landmarks, HAND_CONNECTIONS, {
color: "#00FF00",
lineWidth: 5
});
drawLandmarks(canvasCtx, landmarks, { color: "#FF0000", lineWidth: 2 });
}
}
canvasCtx.restore();
if (webcamRunning) {
window.requestAnimationFrame(predictWebcam);
}
}
</script>
</body>
</html>

2
more_info.txt Normal file
View File

@@ -0,0 +1,2 @@
https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker
https://ai.google.dev/edge/mediapipe/solutions/customization/gesture_recognizer

298
posture.html Normal file
View File

@@ -0,0 +1,298 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="Cache-control" content="no-cache, no-store, must-revalidate">
<meta http-equiv="Pragma" content="no-cache">
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
<title>Pose Landmarker — Single File Demo</title>
<!-- Material Components (for the button styling) -->
<link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet">
<script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>
<style>
/* Copyright 2023 The MediaPipe Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
/* NOTE: The original CSS used `@use "@material";` which is a Sass directive.
That's not valid in plain CSS, so it's removed here. */
body {
font-family: Roboto, system-ui, -apple-system, Segoe UI, Arial, sans-serif;
margin: 2em;
color: #3d3d3d;
--mdc-theme-primary: #007f8b;
--mdc-theme-on-primary: #f1f3f4;
}
h1 { color: #007f8b; }
h2 { clear: both; }
em { font-weight: bold; }
video {
clear: both;
display: block;
transform: rotateY(180deg);
-webkit-transform: rotateY(180deg);
-moz-transform: rotateY(180deg);
}
section {
opacity: 1;
transition: opacity 500ms ease-in-out;
}
header, footer { clear: both; }
.removed { display: none; }
.invisible { opacity: 0.2; }
.note {
font-style: italic;
font-size: 130%;
}
.videoView, .detectOnClick {
position: relative;
float: left;
width: 48%;
margin: 2% 1%;
cursor: pointer;
}
.videoView p, .detectOnClick p {
position: absolute;
padding: 5px;
background-color: #007f8b;
color: #fff;
border: 1px dashed rgba(255, 255, 255, 0.7);
z-index: 2;
font-size: 12px;
margin: 0;
}
.highlighter {
background: rgba(0, 255, 0, 0.25);
border: 1px dashed #fff;
z-index: 1;
position: absolute;
}
.canvas {
z-index: 1;
position: absolute;
pointer-events: none;
}
.output_canvas {
transform: rotateY(180deg);
-webkit-transform: rotateY(180deg);
-moz-transform: rotateY(180deg);
}
.detectOnClick { z-index: 0; }
.detectOnClick img { width: 100%; }
/* Simple layout fix for the video/canvas wrapper */
.video-wrapper {
position: relative;
width: 1280px;
max-width: 100%;
aspect-ratio: 16 / 9;
}
.video-wrapper video,
.video-wrapper canvas {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
}
</style>
</head>
<body>
<h1>Pose detection using the MediaPipe PoseLandmarker task</h1>
<section id="demos" class="invisible">
<h2>Demo: Webcam continuous pose landmarks detection</h2>
<p>Stand in front of your webcam to get real-time pose landmarker detection.<br>Click <b>enable webcam</b> below and grant access to the webcam if prompted.</p>
<div id="liveView" class="videoView">
<button id="webcamButton" class="mdc-button mdc-button--raised">
<span class="mdc-button__ripple"></span>
<span class="mdc-button__label">ENABLE WEBCAM</span>
</button>
<div class="video-wrapper">
<video id="webcam" autoplay playsinline></video>
<canvas class="output_canvas" id="output_canvas" width="1280" height="720"></canvas>
</div>
</div>
</section>
<script type="module">
// Copyright 2023 The MediaPipe Authors.
// Licensed under the Apache License, Version 2.0 (the "License");
import {
PoseLandmarker,
FilesetResolver,
DrawingUtils
} from "https://cdn.skypack.dev/@mediapipe/tasks-vision@0.10.0";
const demosSection = document.getElementById("demos");
let poseLandmarker = undefined;
let runningMode = "IMAGE";
let enableWebcamButton;
let webcamRunning = false;
const videoHeight = "360px";
const videoWidth = "480px";
// Load the Vision WASM and the Pose Landmarker model
const createPoseLandmarker = async () => {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.0/wasm"
);
poseLandmarker = await PoseLandmarker.createFromOptions(vision, {
baseOptions: {
modelAssetPath: "https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task",
delegate: "GPU"
},
runningMode: runningMode,
numPoses: 2
});
demosSection.classList.remove("invisible");
};
createPoseLandmarker();
/********************************************************************
// Demo 1: Click an image to detect pose and draw landmarks.
********************************************************************/
const imageContainers = document.getElementsByClassName("detectOnClick");
for (let i = 0; i < imageContainers.length; i++) {
imageContainers[i].children[0].addEventListener("click", handleClick);
}
async function handleClick(event) {
if (!poseLandmarker) {
console.log("Wait for poseLandmarker to load before clicking!");
return;
}
if (runningMode === "VIDEO") {
runningMode = "IMAGE";
await poseLandmarker.setOptions({ runningMode: "IMAGE" });
}
// Remove old overlays
const allCanvas = event.target.parentNode.getElementsByClassName("canvas");
for (let i = allCanvas.length - 1; i >= 0; i--) {
const n = allCanvas[i];
n.parentNode.removeChild(n);
}
poseLandmarker.detect(event.target, (result) => {
const canvas = document.createElement("canvas");
canvas.setAttribute("class", "canvas");
canvas.setAttribute("width", event.target.naturalWidth + "px");
canvas.setAttribute("height", event.target.naturalHeight + "px");
canvas.style =
"left: 0px; top: 0px; width: " + event.target.width + "px; height: " + event.target.height + "px;";
event.target.parentNode.appendChild(canvas);
const canvasCtx = canvas.getContext("2d");
const drawingUtils = new DrawingUtils(canvasCtx);
for (const landmark of result.landmarks) {
drawingUtils.drawLandmarks(landmark, {
radius: (data) => DrawingUtils.lerp((data.from && data.from.z) ?? 0, -0.15, 0.1, 5, 1)
});
drawingUtils.drawConnectors(landmark, PoseLandmarker.POSE_CONNECTIONS);
}
});
}
/********************************************************************
// Demo 2: Live webcam pose detection.
********************************************************************/
const video = document.getElementById("webcam");
const canvasElement = document.getElementById("output_canvas");
const canvasCtx = canvasElement.getContext("2d");
const drawingUtils = new DrawingUtils(canvasCtx);
const hasGetUserMedia = () => !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
if (hasGetUserMedia()) {
enableWebcamButton = document.getElementById("webcamButton");
enableWebcamButton.addEventListener("click", enableCam);
} else {
console.warn("getUserMedia() is not supported by your browser");
}
function enableCam() {
if (!poseLandmarker) {
console.log("Wait! poseLandmarker not loaded yet.");
return;
}
if (webcamRunning === true) {
webcamRunning = false;
enableWebcamButton.innerText = "ENABLE PREDICTIONS";
} else {
webcamRunning = true;
enableWebcamButton.innerText = "DISABLE PREDICTIONS";
}
const constraints = { video: true };
navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
video.srcObject = stream;
video.addEventListener("loadeddata", predictWebcam);
});
}
let lastVideoTime = -1;
async function predictWebcam() {
canvasElement.style.height = videoHeight;
video.style.height = videoHeight;
canvasElement.style.width = videoWidth;
video.style.width = videoWidth;
if (runningMode === "IMAGE") {
runningMode = "VIDEO";
await poseLandmarker.setOptions({ runningMode: "VIDEO" });
}
const startTimeMs = performance.now();
if (lastVideoTime !== video.currentTime) {
lastVideoTime = video.currentTime;
poseLandmarker.detectForVideo(video, startTimeMs, (result) => {
canvasCtx.save();
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
for (const landmark of result.landmarks) {
drawingUtils.drawLandmarks(landmark, {
radius: (data) => DrawingUtils.lerp((data.from && data.from.z) ?? 0, -0.15, 0.1, 5, 1)
});
drawingUtils.drawConnectors(landmark, PoseLandmarker.POSE_CONNECTIONS);
}
canvasCtx.restore();
});
}
if (webcamRunning === true) {
window.requestAnimationFrame(predictWebcam);
}
}
</script>
</body>
</html>

151
process_mp4_facial.py Executable file
View File

@@ -0,0 +1,151 @@
import cv2
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import numpy as np
from mediapipe.framework.formats import landmark_pb2
import argparse
import os
import csv
# --- NEW: Helper function to create the landmark-to-feature map ---
def create_landmark_map():
"""Creates a mapping from landmark index to facial feature name."""
landmark_map = {}
# Define the connection groups from MediaPipe's face_mesh solutions
connection_groups = {
'lips': mp.solutions.face_mesh.FACEMESH_LIPS,
'left_eye': mp.solutions.face_mesh.FACEMESH_LEFT_EYE,
'right_eye': mp.solutions.face_mesh.FACEMESH_RIGHT_EYE,
'left_eyebrow': mp.solutions.face_mesh.FACEMESH_LEFT_EYEBROW,
'right_eyebrow': mp.solutions.face_mesh.FACEMESH_RIGHT_EYEBROW,
'face_oval': mp.solutions.face_mesh.FACEMESH_FACE_OVAL,
'left_iris': mp.solutions.face_mesh.FACEMESH_LEFT_IRIS,
'right_iris': mp.solutions.face_mesh.FACEMESH_RIGHT_IRIS,
}
# Populate the map by iterating through the connection groups
for part_name, connections in connection_groups.items():
for connection in connections:
landmark_map[connection[0]] = part_name
landmark_map[connection[1]] = part_name
return landmark_map
# --- Helper Function to Draw Landmarks ---
def draw_landmarks_on_image(rgb_image, detection_result):
"""Draws face landmarks on a single image frame."""
face_landmarks_list = detection_result.face_landmarks
annotated_image = np.copy(rgb_image)
# Loop through the detected faces to visualize.
for face_landmarks in face_landmarks_list:
face_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
face_landmarks_proto.landmark.extend([
landmark_pb2.NormalizedLandmark(x=landmark.x, y=landmark.y, z=landmark.z) for landmark in face_landmarks
])
mp.solutions.drawing_utils.draw_landmarks(
image=annotated_image,
landmark_list=face_landmarks_proto,
connections=mp.solutions.face_mesh.FACEMESH_TESSELATION,
landmark_drawing_spec=None,
connection_drawing_spec=mp.solutions.drawing_styles
.get_default_face_mesh_tesselation_style())
mp.solutions.drawing_utils.draw_landmarks(
image=annotated_image,
landmark_list=face_landmarks_proto,
connections=mp.solutions.face_mesh.FACEMESH_CONTOURS,
landmark_drawing_spec=None,
connection_drawing_spec=mp.solutions.drawing_styles
.get_default_face_mesh_contours_style())
mp.solutions.drawing_utils.draw_landmarks(
image=annotated_image,
landmark_list=face_landmarks_proto,
connections=mp.solutions.face_mesh.FACEMESH_IRISES,
landmark_drawing_spec=None,
connection_drawing_spec=mp.solutions.drawing_styles
.get_default_face_mesh_iris_connections_style())
return annotated_image
def main():
parser = argparse.ArgumentParser(description='Process a video to detect and draw face landmarks.')
parser.add_argument('input_video', help='The path to the input video file.')
args = parser.parse_args()
input_video_path = args.input_video
base_name, extension = os.path.splitext(input_video_path)
output_video_path = f"{base_name}_annotated{extension}"
output_csv_path = f"{base_name}_landmarks.csv"
# --- NEW: Create the landmark map ---
landmark_to_part_map = create_landmark_map()
# --- Configuration & Setup ---
model_path = 'face_landmarker.task'
base_options = python.BaseOptions(model_asset_path=model_path)
options = vision.FaceLandmarkerOptions(base_options=base_options,
output_face_blendshapes=True,
output_facial_transformation_matrixes=True,
num_faces=1)
detector = vision.FaceLandmarker.create_from_options(options)
# --- Video and CSV Setup ---
cap = cv2.VideoCapture(input_video_path)
if not cap.isOpened():
print(f"Error: Could not open video file {input_video_path}")
return
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_video_path, fourcc, fps, (frame_width, frame_height))
# Open CSV file for writing
with open(output_csv_path, 'w', newline='') as csvfile:
csv_writer = csv.writer(csvfile)
# NEW: Write the updated header row
csv_writer.writerow(['frame', 'face', 'landmark_index', 'face_part', 'x', 'y', 'z'])
print(f"Processing video: {input_video_path} 📹")
frame_number = 0
while(cap.isOpened()):
ret, frame = cap.read()
if not ret:
break
frame_number += 1
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb_frame)
detection_result = detector.detect(mp_image)
# Write landmark data to CSV
if detection_result.face_landmarks:
for face_index, face_landmarks in enumerate(detection_result.face_landmarks):
for landmark_index, landmark in enumerate(face_landmarks):
# NEW: Look up the face part name from the map
face_part = landmark_to_part_map.get(landmark_index, 'unknown')
# NEW: Write the new column to the CSV row
csv_writer.writerow([frame_number, face_index, landmark_index, face_part, landmark.x, landmark.y, landmark.z])
# Draw landmarks on the frame for the video
annotated_frame = draw_landmarks_on_image(rgb_frame, detection_result)
bgr_annotated_frame = cv2.cvtColor(annotated_frame, cv2.COLOR_RGB2BGR)
out.write(bgr_annotated_frame)
# Release everything when the job is finished
cap.release()
out.release()
cv2.destroyAllWindows()
print(f"\n✅ Processing complete.")
print(f"Annotated video saved to: {output_video_path}")
print(f"Landmarks CSV saved to: {output_csv_path}")
if __name__ == '__main__':
main()

214
process_mp4_holistic.py Executable file
View File

@@ -0,0 +1,214 @@
#!/usr/bin/env python3
"""
holistic_mp4.py
Process an MP4 with MediaPipe Holistic:
- Saves annotated video
- Exports CSV of face/pose/hand landmarks per frame
Usage:
python holistic_mp4.py /path/to/input.mp4
python holistic_mp4.py /path/to/input.mp4 --out-video out.mp4 --out-csv out.csv --show
"""
import argparse
import csv
import os
import sys
from pathlib import Path
import cv2
import mediapipe as mp
mp_holistic = mp.solutions.holistic
mp_drawing = mp.solutions.drawing_utils
mp_styles = mp.solutions.drawing_styles
def parse_args():
p = argparse.ArgumentParser(description="Run MediaPipe Holistic on an MP4 and export annotated video + CSV landmarks.")
p.add_argument("input", help="Input .mp4 file")
p.add_argument("--out-video", help="Output annotated MP4 path (default: <input>_annotated.mp4)")
p.add_argument("--out-csv", help="Output CSV path for landmarks (default: <input>_landmarks.csv)")
p.add_argument("--model-complexity", type=int, default=1, choices=[0, 1, 2], help="Holistic model complexity")
p.add_argument("--no-smooth", action="store_true", help="Disable smoothing (smoothing is ON by default)")
p.add_argument("--refine-face", action="store_true", help="Refine face landmarks (iris, lips).")
p.add_argument("--show", action="store_true", help="Show preview window while processing")
return p.parse_args()
def open_video_writer(cap, out_path):
# Properties from input
fps = cap.get(cv2.CAP_PROP_FPS)
if fps is None or fps <= 0:
fps = 30.0 # sensible fallback
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
# Writer
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
writer = cv2.VideoWriter(out_path, fourcc, float(fps), (width, height))
if not writer.isOpened():
raise RuntimeError(f"Failed to open VideoWriter at {out_path}")
return writer, fps, (width, height)
def write_landmarks_to_csv(writer, frame_idx, ts_ms, kind, landmarks, world_landmarks=None, handedness=None):
"""
landmarks: NormalizedLandmarkList (x,y,z, visibility?) -> face/hand have no visibility; pose has visibility.
world_landmarks: LandmarkList in meters (optional, pose_world_landmarks available).
handedness: "Left"|"Right"|None (we label hand sets by field name; not a confidence score here)
"""
if not landmarks:
return
# index by position; world coords may be absent or differ in length
wl = world_landmarks.landmark if world_landmarks and getattr(world_landmarks, "landmark", None) else None
for i, lm in enumerate(landmarks.landmark):
world_x = world_y = world_z = ""
if wl and i < len(wl):
world_x, world_y, world_z = wl[i].x, wl[i].y, wl[i].z
# Some landmark types (pose) include visibility; others (face/hands) don't
vis = getattr(lm, "visibility", "")
writer.writerow([
frame_idx,
int(ts_ms),
kind, # e.g., face, pose, left_hand, right_hand
i,
lm.x, lm.y, lm.z,
vis,
"", # presence not provided in Holistic landmarks
world_x, world_y, world_z,
handedness or ""
])
def main():
args = parse_args()
in_path = Path(args.input)
if not in_path.exists():
print(f"Input not found: {in_path}", file=sys.stderr)
sys.exit(1)
out_video = Path(args.out_video) if args.out_video else in_path.with_name(in_path.stem + "_annotated.mp4")
out_csv = Path(args.out_csv) if args.out_csv else in_path.with_name(in_path.stem + "_landmarks.csv")
cap = cv2.VideoCapture(str(in_path))
if not cap.isOpened():
print(f"Could not open video: {in_path}", file=sys.stderr)
sys.exit(1)
writer, fps, (w, h) = open_video_writer(cap, str(out_video))
# Prepare CSV
out_csv.parent.mkdir(parents=True, exist_ok=True)
csv_file = open(out_csv, "w", newline="", encoding="utf-8")
csv_writer = csv.writer(csv_file)
csv_writer.writerow([
"frame", "timestamp_ms", "type", "landmark_index",
"x", "y", "z", "visibility", "presence",
"world_x", "world_y", "world_z", "handedness"
])
# Holistic configuration
holistic = mp_holistic.Holistic(
static_image_mode=False,
model_complexity=args.model_complexity,
smooth_landmarks=(not args.no_smooth),
refine_face_landmarks=args.refine_face,
enable_segmentation=False
)
try:
frame_idx = 0
print(f"Processing: {in_path.name} -> {out_video.name}, {out_csv.name}")
while True:
ok, frame_bgr = cap.read()
if not ok:
break
# Timestamp (ms) based on frame index and fps
ts_ms = (frame_idx / fps) * 1000.0
# Convert to RGB for MediaPipe
image_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
image_rgb.flags.writeable = False
results = holistic.process(image_rgb)
image_rgb.flags.writeable = True
# Draw on a BGR copy for output
out_frame = frame_bgr
# Face
if results.face_landmarks:
mp_drawing.draw_landmarks(
out_frame,
results.face_landmarks,
mp_holistic.FACEMESH_TESSELATION,
landmark_drawing_spec=None,
connection_drawing_spec=mp_styles.get_default_face_mesh_tesselation_style(),
)
write_landmarks_to_csv(csv_writer, frame_idx, ts_ms, "face", results.face_landmarks)
# Pose
if results.pose_landmarks:
mp_drawing.draw_landmarks(
out_frame,
results.pose_landmarks,
mp_holistic.POSE_CONNECTIONS,
landmark_drawing_spec=mp_styles.get_default_pose_landmarks_style()
)
write_landmarks_to_csv(
csv_writer, frame_idx, ts_ms, "pose",
results.pose_landmarks,
world_landmarks=getattr(results, "pose_world_landmarks", None)
)
# Left hand
if results.left_hand_landmarks:
mp_drawing.draw_landmarks(
out_frame,
results.left_hand_landmarks,
mp_holistic.HAND_CONNECTIONS,
landmark_drawing_spec=mp_styles.get_default_hand_landmarks_style()
)
write_landmarks_to_csv(csv_writer, frame_idx, ts_ms, "left_hand", results.left_hand_landmarks, handedness="Left")
# Right hand
if results.right_hand_landmarks:
mp_drawing.draw_landmarks(
out_frame,
results.right_hand_landmarks,
mp_holistic.HAND_CONNECTIONS,
landmark_drawing_spec=mp_styles.get_default_hand_landmarks_style()
)
write_landmarks_to_csv(csv_writer, frame_idx, ts_ms, "right_hand", results.right_hand_landmarks, handedness="Right")
# Write frame
writer.write(out_frame)
# Optional preview
if args.show:
cv2.imshow("Holistic (annotated)", out_frame)
if cv2.waitKey(1) & 0xFF == 27: # ESC
break
# Lightweight progress
if frame_idx % 120 == 0:
print(f" frame {frame_idx}", end="\r", flush=True)
frame_idx += 1
print(f"\nDone.\n Video: {out_video}\n CSV: {out_csv}")
finally:
holistic.close()
writer.release()
cap.release()
csv_file.close()
if args.show:
cv2.destroyAllWindows()
if __name__ == "__main__":
main()

98
recognize_gesture.py Executable file
View File

@@ -0,0 +1,98 @@
#!/usr/bin/env python3
import argparse
import sys
import mediapipe as mp
BaseOptions = mp.tasks.BaseOptions
VisionRunningMode = mp.tasks.vision.RunningMode
GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerOptions = mp.tasks.vision.GestureRecognizerOptions
def _first_category(item):
"""
Accepts either:
- a Classifications object with .categories
- a list of Category
- None / empty
Returns the first Category or None.
"""
if item is None:
return None
# Shape 1: Classifications with .categories
cats = getattr(item, "categories", None)
if isinstance(cats, list):
return cats[0] if cats else None
# Shape 2: already a list[Category]
if isinstance(item, list):
return item[0] if item else None
return None
def _len_safe(x):
return len(x) if isinstance(x, list) else 0
def main():
parser = argparse.ArgumentParser(description="Recognize hand gestures in a still image with MediaPipe.")
parser.add_argument("-i", "--image", default="hand.jpg", help="Path to input image (default: hand.jpg)")
parser.add_argument("-m", "--model", default="gesture_recognizer.task",
help="Path to gesture_recognizer .task model (default: gesture_recognizer.task)")
parser.add_argument("--num_hands", type=int, default=2, help="Max hands to detect")
args = parser.parse_args()
options = GestureRecognizerOptions(
base_options=BaseOptions(model_asset_path=args.model),
running_mode=VisionRunningMode.IMAGE,
num_hands=args.num_hands,
)
# Load the image
try:
mp_image = mp.Image.create_from_file(args.image)
except Exception as e:
print(f"Failed to load image '{args.image}': {e}", file=sys.stderr)
sys.exit(1)
with GestureRecognizer.create_from_options(options) as recognizer:
result = recognizer.recognize(mp_image)
if result is None:
print("No result returned.")
return
n = max(
_len_safe(getattr(result, "gestures", [])),
_len_safe(getattr(result, "handedness", [])),
_len_safe(getattr(result, "hand_landmarks", [])),
)
if n == 0:
print("No hands/gestures detected.")
return
for i in range(n):
handed = None
if _len_safe(getattr(result, "handedness", [])) > i:
cat = _first_category(result.handedness[i])
if cat:
handed = cat.category_name
top_gesture = None
score = None
if _len_safe(getattr(result, "gestures", [])) > i:
cat = _first_category(result.gestures[i])
if cat:
top_gesture = cat.category_name
score = cat.score
header = f"Hand #{i+1}" + (f" ({handed})" if handed else "")
print(header + ":")
if top_gesture:
print(f" Gesture: {top_gesture} (score={score:.3f})")
else:
print(" Gesture: none")
# If you want pixel landmark coordinates later:
# if _len_safe(getattr(result, "hand_landmarks", [])) > i:
# for j, lm in enumerate(result.hand_landmarks[i]):
# print(f" lm{j}: x={lm.x:.3f} y={lm.y:.3f} z={lm.z:.3f}")
if __name__ == "__main__":
main()

2
server_holistic.sh Executable file
View File

@@ -0,0 +1,2 @@
echo "Go to: http://localhost:8001/holistic.html "
python -m http.server 8001

14
source_activate_venv.sh Executable file
View File

@@ -0,0 +1,14 @@
#!/bin/bash
# AlERT: source this script, don't run it directly.
# source source_activate_venv.sh
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
echo "This script must be sourced, not run directly."
echo "source source_activate_venv.sh"
exit 1
fi
# rest of your script here
echo "Script is being sourced. Continuing..."
source ./.venv/bin/activate