Initial commit: MediaPipe landmarks demo
HTML demos for face, hand, gesture, and posture tracking using MediaPipe. Includes Python CLI tools for processing video files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
3
.gitignore
vendored
Normal file
3
.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
.venv/
|
||||
__pycache__/
|
||||
*.pyc
|
||||
179
Training_handshape_B.md
Normal file
179
Training_handshape_B.md
Normal file
@@ -0,0 +1,179 @@
|
||||
Let’s add a custom gesture for the **ASL letter “B”** (flat hand, fingers together, thumb folded across the palm) using MediaPipe **Gesture Recognizer (Model Maker)**.
|
||||
|
||||
# Plan (what you’ll build)
|
||||
|
||||
* A custom model with a new class label, e.g. `ASL_B`, plus the required `none` class.
|
||||
* A small, labeled image dataset (Model Maker will extract hand landmarks for you).
|
||||
* A trained `.task` file you can drop into your Python/JS app and allowlist.
|
||||
|
||||
---
|
||||
|
||||
# 1) Pick labels
|
||||
|
||||
Use:
|
||||
|
||||
* `ASL_B` ← your new gesture
|
||||
* `none` ← anything that’s not one of your target gestures (mandatory)
|
||||
|
||||
Folder layout:
|
||||
|
||||
```
|
||||
dataset/
|
||||
ASL_B/
|
||||
...images...
|
||||
none/
|
||||
...images...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 2) Collect the right data (what to capture)
|
||||
|
||||
Target handshape for **B**:
|
||||
|
||||
* **Fingers**: index–pinky fully extended and **pressed together**
|
||||
* **Thumb**: folded across palm (not sticking out to the side)
|
||||
* **Palm**: facing camera (front) and also a few angles
|
||||
|
||||
Suggested minimums (per label):
|
||||
|
||||
| Bucket | Shots |
|
||||
| --------------------------------------------------- | -------------------- |
|
||||
| Distances: close (\~40–60 cm), medium (\~80–120 cm) | 80 |
|
||||
| View angles: front, \~30°, \~60° yaw | 80 |
|
||||
| Rotations: slight roll/tilt | 40 |
|
||||
| Lighting: bright, dim, backlit | 40 |
|
||||
| Backgrounds: plain wall, cluttered office/outdoor | 40 |
|
||||
| Hands: left & right (both) | included across all |
|
||||
| Skin tones / several people | as many as practical |
|
||||
|
||||
Do **at least \~300–500** `ASL_B` images to start.
|
||||
For **`none`**, include: open palm (“High-Five”), slightly spread fingers, thumbs-up, fist, pointing, random objects/background frames, other ASL letters—especially **Open\_Palm** look-alikes so the model learns “not B”.
|
||||
|
||||
Quick ways to get images:
|
||||
|
||||
* Record short clips on laptop/phone and extract frames (e.g., 2 fps).
|
||||
* Ask 3–5 colleagues to contribute a short 10–20s clip each.
|
||||
|
||||
Frame extraction example:
|
||||
|
||||
```bash
|
||||
# Extract 2 frames/sec from a video into dataset/ASL_B/
|
||||
ffmpeg -i b_sign.mov -vf fps=2 dataset/ASL_B/b_%05d.jpg
|
||||
# Do the same for negatives into dataset/none/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 3) Train with Model Maker (Python)
|
||||
|
||||
Create and activate a venv, then:
|
||||
|
||||
```bash
|
||||
pip install --upgrade pip
|
||||
pip install mediapipe-model-maker
|
||||
```
|
||||
|
||||
Training script (save as `train_asl_b.py` and run it):
|
||||
|
||||
```python
|
||||
from mediapipe_model_maker import gesture_recognizer as gr
|
||||
|
||||
DATA_DIR = "dataset"
|
||||
EXPORT_DIR = "exported_model"
|
||||
|
||||
# Load & auto-preprocess (runs hand detection, keeps images with a detected hand)
|
||||
data = gr.Dataset.from_folder(
|
||||
dirname=DATA_DIR,
|
||||
hparams=gr.HandDataPreprocessingParams( # you can tweak these if needed
|
||||
min_detection_confidence=0.5
|
||||
)
|
||||
)
|
||||
|
||||
# Split
|
||||
train_data, rest = data.split(0.8)
|
||||
validation_data, test_data = rest.split(0.5)
|
||||
|
||||
# Hyperparameters (start small; bump epochs if needed)
|
||||
hparams = gr.HParams(
|
||||
export_dir=EXPORT_DIR,
|
||||
epochs=12,
|
||||
batch_size=16,
|
||||
learning_rate=0.001,
|
||||
)
|
||||
|
||||
# Optional model head size & dropout
|
||||
options = gr.GestureRecognizerOptions(
|
||||
hparams=hparams,
|
||||
model_options=gr.ModelOptions(layer_widths=[128, 64], dropout_rate=0.1)
|
||||
)
|
||||
|
||||
model = gr.GestureRecognizer.create(
|
||||
train_data=train_data,
|
||||
validation_data=validation_data,
|
||||
options=options
|
||||
)
|
||||
|
||||
# Evaluate
|
||||
loss, acc = model.evaluate(test_data, batch_size=1)
|
||||
print(f"Test loss={loss:.4f}, acc={acc:.4f}")
|
||||
|
||||
# Export .task
|
||||
model.export_model() # writes exported_model/gesture_recognizer.task
|
||||
print("Exported:", EXPORT_DIR + "/gesture_recognizer.task")
|
||||
```
|
||||
|
||||
Tips:
|
||||
|
||||
* If many `ASL_B` images get dropped at load time (no hand detected), back up the camera a little or ensure the whole hand is visible.
|
||||
* If `none` is weak, add more “near-miss” negatives: open palm with fingers slightly apart, thumb slightly out, partial occlusions.
|
||||
|
||||
---
|
||||
|
||||
# 4) Plug it into your app
|
||||
|
||||
**Python (Tasks API example):**
|
||||
|
||||
```python
|
||||
import mediapipe as mp
|
||||
BaseOptions = mp.tasks.BaseOptions
|
||||
GestureRecognizer = mp.tasks.vision.GestureRecognizer
|
||||
GestureRecognizerOptions = mp.tasks.vision.GestureRecognizerOptions
|
||||
VisionRunningMode = mp.tasks.vision.RunningMode
|
||||
ClassifierOptions = mp.tasks.components.processors.ClassifierOptions
|
||||
|
||||
options = GestureRecognizerOptions(
|
||||
base_options=BaseOptions(model_asset_path="exported_model/gesture_recognizer.task"),
|
||||
running_mode=VisionRunningMode.LIVE_STREAM,
|
||||
custom_gesture_classifier_options=ClassifierOptions(
|
||||
score_threshold=0.6, # tighten until false positives drop
|
||||
category_allowlist=["ASL_B"] # only report your class
|
||||
),
|
||||
)
|
||||
recognizer = GestureRecognizer.create_from_options(options)
|
||||
```
|
||||
|
||||
**Web (JS):**
|
||||
|
||||
```js
|
||||
const recognizer = await GestureRecognizer.createFromOptions(fileset, {
|
||||
baseOptions: { modelAssetPath: "exported_model/gesture_recognizer.task" },
|
||||
runningMode: "LIVE_STREAM",
|
||||
customGesturesClassifierOptions: {
|
||||
scoreThreshold: 0.6,
|
||||
categoryAllowlist: ["ASL_B"]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 5) Troubleshooting & tuning
|
||||
|
||||
* **False positives with Open Palm:** Add more `none` examples where fingers are together but **thumb is visible** to the side. The model needs to see “almost B but not B.”
|
||||
* **Left vs right hand:** Include both in training. If you only trained on right hands, left hands may underperform.
|
||||
* **Distance issues:** If far-away hands fail, capture more medium/far shots. Landmarks get noisier when small.
|
||||
* **Thresholds:** Raise `score_threshold` to reduce spurious detections; lower it if you miss true B’s.
|
||||
* **Confusion matrix:** If accuracy is fine but live results wobble, collect more from the exact camera/lighting you’ll use.
|
||||
|
||||
---
|
||||
435
face.html
Normal file
435
face.html
Normal file
@@ -0,0 +1,435 @@
|
||||
<!-- face.html • Single-file MediaPipe Face Landmarker demo -->
|
||||
<!-- Copyright 2023 The MediaPipe Authors.
|
||||
Licensed under the Apache License, Version 2.0 -->
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta http-equiv="Cache-control" content="no-cache, no-store, must-revalidate" />
|
||||
<meta http-equiv="Pragma" content="no-cache" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
|
||||
<title>Face Landmarker</title>
|
||||
|
||||
<!-- Material Components (styles only for the raised button) -->
|
||||
<link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet" />
|
||||
<script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>
|
||||
|
||||
<style>
|
||||
/* Inlined CSS from your snippet (with minor cleanups) */
|
||||
|
||||
body {
|
||||
font-family: helvetica, arial, sans-serif;
|
||||
margin: 2em;
|
||||
color: #3d3d3d;
|
||||
--mdc-theme-primary: #007f8b;
|
||||
--mdc-theme-on-primary: #f1f3f4;
|
||||
}
|
||||
|
||||
h1 {
|
||||
font-style: italic;
|
||||
color: #007f8b;
|
||||
}
|
||||
|
||||
h2 {
|
||||
clear: both;
|
||||
}
|
||||
|
||||
em { font-weight: bold; }
|
||||
|
||||
video {
|
||||
clear: both;
|
||||
display: block;
|
||||
transform: rotateY(180deg);
|
||||
-webkit-transform: rotateY(180deg);
|
||||
-moz-transform: rotateY(180deg);
|
||||
}
|
||||
|
||||
section {
|
||||
opacity: 1;
|
||||
transition: opacity 500ms ease-in-out;
|
||||
}
|
||||
|
||||
.removed { display: none; }
|
||||
.invisible { opacity: 0.2; }
|
||||
|
||||
.note {
|
||||
font-style: italic;
|
||||
font-size: 130%;
|
||||
}
|
||||
|
||||
.videoView,
|
||||
.detectOnClick,
|
||||
.blend-shapes {
|
||||
position: relative;
|
||||
float: left;
|
||||
width: 48%;
|
||||
margin: 2% 1%;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.videoView p,
|
||||
.detectOnClick p {
|
||||
position: absolute;
|
||||
padding: 5px;
|
||||
background-color: #007f8b;
|
||||
color: #fff;
|
||||
border: 1px dashed rgba(255, 255, 255, 0.7);
|
||||
z-index: 2;
|
||||
font-size: 12px;
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.highlighter {
|
||||
background: rgba(0, 255, 0, 0.25);
|
||||
border: 1px dashed #fff;
|
||||
z-index: 1;
|
||||
position: absolute;
|
||||
}
|
||||
|
||||
.canvas {
|
||||
z-index: 1;
|
||||
position: absolute;
|
||||
pointer-events: none;
|
||||
}
|
||||
|
||||
.output_canvas {
|
||||
transform: rotateY(180deg);
|
||||
-webkit-transform: rotateY(180deg);
|
||||
-moz-transform: rotateY(180deg);
|
||||
}
|
||||
|
||||
.detectOnClick { z-index: 0; }
|
||||
.detectOnClick img { width: 100%; }
|
||||
|
||||
.blend-shapes-item {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
height: 20px;
|
||||
}
|
||||
|
||||
.blend-shapes-label {
|
||||
display: flex;
|
||||
width: 120px;
|
||||
justify-content: flex-end;
|
||||
align-items: center;
|
||||
margin-right: 4px;
|
||||
}
|
||||
|
||||
.blend-shapes-value {
|
||||
display: flex;
|
||||
height: 16px;
|
||||
align-items: center;
|
||||
background-color: #007f8b;
|
||||
color: #fff;
|
||||
padding: 0 6px;
|
||||
border-radius: 2px;
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
/* Ensure video/canvas overlap correctly inside the container */
|
||||
#liveView > div {
|
||||
position: relative;
|
||||
display: inline-block;
|
||||
}
|
||||
#webcam {
|
||||
position: absolute; left: 0; top: 0;
|
||||
}
|
||||
#output_canvas {
|
||||
position: absolute; left: 0; top: 0;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Face landmark detection using the MediaPipe FaceLandmarker task</h1>
|
||||
|
||||
<section id="demos" class="invisible">
|
||||
<h2>Demo: Webcam continuous face landmarks detection</h2>
|
||||
<p>
|
||||
Hold your face in front of your webcam to get real-time face landmarker detection.<br />
|
||||
Click <b>enable webcam</b> below and grant access to the webcam if prompted.
|
||||
</p>
|
||||
|
||||
<div id="liveView" class="videoView">
|
||||
<button id="webcamButton" class="mdc-button mdc-button--raised">
|
||||
<span class="mdc-button__ripple"></span>
|
||||
<span class="mdc-button__label">ENABLE WEBCAM</span>
|
||||
</button>
|
||||
<div>
|
||||
<video id="webcam" autoplay playsinline></video>
|
||||
<canvas class="output_canvas" id="output_canvas"></canvas>
|
||||
</div>
|
||||
</div>
|
||||
<div class="blend-shapes">
|
||||
<ul class="blend-shapes-list" id="video-blend-shapes"></ul>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<script type="module">
|
||||
// Inlined JS (converted to plain JS; removed TS types)
|
||||
|
||||
import vision from "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3";
|
||||
const { FaceLandmarker, FilesetResolver, DrawingUtils } = vision;
|
||||
|
||||
const demosSection = document.getElementById("demos");
|
||||
const imageBlendShapes = document.getElementById("image-blend-shapes");
|
||||
const videoBlendShapes = document.getElementById("video-blend-shapes");
|
||||
|
||||
let faceLandmarker;
|
||||
let runningMode = "IMAGE"; // "IMAGE" | "VIDEO"
|
||||
let enableWebcamButton;
|
||||
let webcamRunning = false;
|
||||
const videoWidth = 480;
|
||||
|
||||
async function createFaceLandmarker() {
|
||||
const filesetResolver = await FilesetResolver.forVisionTasks(
|
||||
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3/wasm"
|
||||
);
|
||||
faceLandmarker = await FaceLandmarker.createFromOptions(filesetResolver, {
|
||||
baseOptions: {
|
||||
modelAssetPath:
|
||||
"https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task",
|
||||
delegate: "GPU",
|
||||
},
|
||||
outputFaceBlendshapes: true,
|
||||
runningMode,
|
||||
numFaces: 1,
|
||||
});
|
||||
demosSection.classList.remove("invisible");
|
||||
}
|
||||
createFaceLandmarker();
|
||||
|
||||
/********************************************************************
|
||||
// Demo 1: Click image to detect landmarks
|
||||
********************************************************************/
|
||||
const imageContainers = document.getElementsByClassName("detectOnClick");
|
||||
for (let imageContainer of imageContainers) {
|
||||
imageContainer.children[0].addEventListener("click", handleClick);
|
||||
}
|
||||
|
||||
async function handleClick(event) {
|
||||
if (!faceLandmarker) {
|
||||
console.log("Wait for faceLandmarker to load before clicking!");
|
||||
return;
|
||||
}
|
||||
|
||||
if (runningMode === "VIDEO") {
|
||||
runningMode = "IMAGE";
|
||||
await faceLandmarker.setOptions({ runningMode });
|
||||
}
|
||||
|
||||
const parent = event.target.parentNode;
|
||||
const allCanvas = parent.getElementsByClassName("canvas");
|
||||
for (let i = allCanvas.length - 1; i >= 0; i--) {
|
||||
const n = allCanvas[i];
|
||||
n.parentNode.removeChild(n);
|
||||
}
|
||||
|
||||
const faceLandmarkerResult = faceLandmarker.detect(event.target);
|
||||
|
||||
const canvas = document.createElement("canvas");
|
||||
canvas.setAttribute("class", "canvas");
|
||||
canvas.setAttribute("width", event.target.naturalWidth + "px");
|
||||
canvas.setAttribute("height", event.target.naturalHeight + "px");
|
||||
canvas.style.left = "0px";
|
||||
canvas.style.top = "0px";
|
||||
canvas.style.width = `${event.target.width}px`;
|
||||
canvas.style.height = `${event.target.height}px`;
|
||||
|
||||
parent.appendChild(canvas);
|
||||
const ctx = canvas.getContext("2d");
|
||||
const drawingUtils = new DrawingUtils(ctx);
|
||||
|
||||
for (const landmarks of faceLandmarkerResult.faceLandmarks) {
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_TESSELATION,
|
||||
{ color: "#C0C0C070", lineWidth: 1 }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_RIGHT_EYE,
|
||||
{ color: "#FF3030" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_RIGHT_EYEBROW,
|
||||
{ color: "#FF3030" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_LEFT_EYE,
|
||||
{ color: "#30FF30" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_LEFT_EYEBROW,
|
||||
{ color: "#30FF30" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_FACE_OVAL,
|
||||
{ color: "#E0E0E0" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_LIPS,
|
||||
{ color: "#E0E0E0" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_RIGHT_IRIS,
|
||||
{ color: "#FF3030" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_LEFT_IRIS,
|
||||
{ color: "#30FF30" }
|
||||
);
|
||||
}
|
||||
drawBlendShapes(imageBlendShapes, faceLandmarkerResult.faceBlendshapes);
|
||||
}
|
||||
|
||||
/********************************************************************
|
||||
// Demo 2: Webcam stream detection
|
||||
********************************************************************/
|
||||
const video = document.getElementById("webcam");
|
||||
const canvasElement = document.getElementById("output_canvas");
|
||||
const canvasCtx = canvasElement.getContext("2d");
|
||||
|
||||
function hasGetUserMedia() {
|
||||
return !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
|
||||
}
|
||||
|
||||
if (hasGetUserMedia()) {
|
||||
enableWebcamButton = document.getElementById("webcamButton");
|
||||
enableWebcamButton.addEventListener("click", enableCam);
|
||||
} else {
|
||||
console.warn("getUserMedia() is not supported by your browser");
|
||||
}
|
||||
|
||||
function enableCam() {
|
||||
if (!faceLandmarker) {
|
||||
console.log("Wait! faceLandmarker not loaded yet.");
|
||||
return;
|
||||
}
|
||||
|
||||
webcamRunning = !webcamRunning;
|
||||
enableWebcamButton.innerText = webcamRunning
|
||||
? "DISABLE PREDICTIONS"
|
||||
: "ENABLE PREDICTIONS";
|
||||
|
||||
const constraints = { video: true };
|
||||
|
||||
navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
|
||||
video.srcObject = stream;
|
||||
video.addEventListener("loadeddata", predictWebcam);
|
||||
});
|
||||
}
|
||||
|
||||
let lastVideoTime = -1;
|
||||
let results;
|
||||
const drawingUtils = new DrawingUtils(canvasCtx);
|
||||
|
||||
async function predictWebcam() {
|
||||
const ratio = video.videoHeight / video.videoWidth;
|
||||
video.style.width = videoWidth + "px";
|
||||
video.style.height = videoWidth * ratio + "px";
|
||||
canvasElement.style.width = videoWidth + "px";
|
||||
canvasElement.style.height = videoWidth * ratio + "px";
|
||||
canvasElement.width = video.videoWidth;
|
||||
canvasElement.height = video.videoHeight;
|
||||
|
||||
if (runningMode === "IMAGE") {
|
||||
runningMode = "VIDEO";
|
||||
await faceLandmarker.setOptions({ runningMode });
|
||||
}
|
||||
|
||||
const startTimeMs = performance.now();
|
||||
if (lastVideoTime !== video.currentTime) {
|
||||
lastVideoTime = video.currentTime;
|
||||
results = faceLandmarker.detectForVideo(video, startTimeMs);
|
||||
}
|
||||
|
||||
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
|
||||
|
||||
if (results && results.faceLandmarks) {
|
||||
for (const landmarks of results.faceLandmarks) {
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_TESSELATION,
|
||||
{ color: "#C0C0C070", lineWidth: 1 }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_RIGHT_EYE,
|
||||
{ color: "#FF3030" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_RIGHT_EYEBROW,
|
||||
{ color: "#FF3030" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_LEFT_EYE,
|
||||
{ color: "#30FF30" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_LEFT_EYEBROW,
|
||||
{ color: "#30FF30" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_FACE_OVAL,
|
||||
{ color: "#E0E0E0" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_LIPS,
|
||||
{ color: "#E0E0E0" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_RIGHT_IRIS,
|
||||
{ color: "#FF3030" }
|
||||
);
|
||||
drawingUtils.drawConnectors(
|
||||
landmarks,
|
||||
FaceLandmarker.FACE_LANDMARKS_LEFT_IRIS,
|
||||
{ color: "#30FF30" }
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
drawBlendShapes(videoBlendShapes, (results && results.faceBlendshapes) || []);
|
||||
|
||||
if (webcamRunning === true) {
|
||||
window.requestAnimationFrame(predictWebcam);
|
||||
}
|
||||
}
|
||||
|
||||
function drawBlendShapes(el, blendShapes) {
|
||||
if (!blendShapes || !blendShapes.length) {
|
||||
el.innerHTML = "";
|
||||
return;
|
||||
}
|
||||
|
||||
let htmlMaker = "";
|
||||
blendShapes[0].categories.forEach((shape) => {
|
||||
const label = shape.displayName || shape.categoryName;
|
||||
const pct = Math.max(0, Math.min(1, Number(shape.score) || 0));
|
||||
htmlMaker += `
|
||||
<li class="blend-shapes-item">
|
||||
<span class="blend-shapes-label">${label}</span>
|
||||
<span class="blend-shapes-value" style="width: calc(${pct * 100}% - 120px)">${pct.toFixed(4)}</span>
|
||||
</li>
|
||||
`;
|
||||
});
|
||||
|
||||
el.innerHTML = htmlMaker;
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
BIN
face_landmarker.task
Normal file
BIN
face_landmarker.task
Normal file
Binary file not shown.
1
fingers_positions.sh
Executable file
1
fingers_positions.sh
Executable file
@@ -0,0 +1 @@
|
||||
python hand_landmarker_cli.py --image hand.png --model hand_landmarker.task --out annotated.png
|
||||
290
gesture.html
Normal file
290
gesture.html
Normal file
@@ -0,0 +1,290 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<title>MediaPipe Hand Gesture Recognizer — Single File Demo</title>
|
||||
<!-- Material Components (for button styling) -->
|
||||
<link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet" />
|
||||
<script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>
|
||||
|
||||
<style>
|
||||
/* Inlined from the CodePen CSS (Sass directives removed) */
|
||||
body {
|
||||
font-family: Roboto, system-ui, -apple-system, Segoe UI, Helvetica, Arial, sans-serif;
|
||||
margin: 2em;
|
||||
color: #3d3d3d;
|
||||
--mdc-theme-primary: #007f8b;
|
||||
--mdc-theme-on-primary: #f1f3f4;
|
||||
}
|
||||
|
||||
h1 { color: #007f8b; }
|
||||
h2 { clear: both; }
|
||||
|
||||
video {
|
||||
clear: both;
|
||||
display: block;
|
||||
transform: rotateY(180deg);
|
||||
-webkit-transform: rotateY(180deg);
|
||||
-moz-transform: rotateY(180deg);
|
||||
height: 280px;
|
||||
}
|
||||
|
||||
section { opacity: 1; transition: opacity 500ms ease-in-out; }
|
||||
.removed { display: none; }
|
||||
.invisible { opacity: 0.2; }
|
||||
|
||||
.detectOnClick {
|
||||
position: relative;
|
||||
float: left;
|
||||
width: 48%;
|
||||
margin: 2% 1%;
|
||||
cursor: pointer;
|
||||
z-index: 0;
|
||||
font-size: calc(8px + 1.2vw);
|
||||
}
|
||||
|
||||
.videoView {
|
||||
position: absolute;
|
||||
float: left;
|
||||
width: 48%;
|
||||
margin: 2% 1%;
|
||||
cursor: pointer;
|
||||
min-height: 500px;
|
||||
}
|
||||
|
||||
.videoView p,
|
||||
.detectOnClick p {
|
||||
padding-top: 5px;
|
||||
padding-bottom: 5px;
|
||||
background-color: #007f8b;
|
||||
color: #fff;
|
||||
border: 1px dashed rgba(255, 255, 255, 0.7);
|
||||
z-index: 2;
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.highlighter { background: rgba(0, 255, 0, 0.25); border: 1px dashed #fff; z-index: 1; position: absolute; }
|
||||
.canvas { z-index: 1; position: absolute; pointer-events: none; }
|
||||
|
||||
.output_canvas {
|
||||
transform: rotateY(180deg);
|
||||
-webkit-transform: rotateY(180deg);
|
||||
-moz-transform: rotateY(180deg);
|
||||
}
|
||||
|
||||
.detectOnClick img { width: 45vw; }
|
||||
|
||||
.output { display: none; width: 100%; font-size: calc(8px + 1.2vw); }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<section id="demos" class="invisible">
|
||||
<h2><br>Demo: Webcam continuous hand gesture detection</h2>
|
||||
<p>Use your hand to make gestures in front of the camera to get gesture classification. <br />Click <b>enable webcam</b> below and grant access to the webcam if prompted.</p>
|
||||
<PRE>
|
||||
Gesture Label Description
|
||||
Closed_Fist Hand fully closed into a fist
|
||||
Open_Palm Flat open hand
|
||||
Pointing_Up Index finger extended upward, others closed
|
||||
Thumb_Down Thumb extended downward
|
||||
Thumb_Up Thumb extended upward
|
||||
Victory Index and middle finger extended in a “V”
|
||||
ILoveYou Thumb, index, and pinky extended (ASL “I love you”)
|
||||
None No recognized gesture / below confidence threshold
|
||||
</PRE>
|
||||
|
||||
<div id="liveView" class="videoView">
|
||||
<button id="webcamButton" class="mdc-button mdc-button--raised">
|
||||
<span class="mdc-button__ripple"></span>
|
||||
<span class="mdc-button__label">ENABLE WEBCAM</span>
|
||||
</button>
|
||||
<div style="position: relative;">
|
||||
<video id="webcam" autoplay playsinline></video>
|
||||
<canvas class="output_canvas" id="output_canvas" width="1280" height="720" style="position: absolute; left: 0; top: 0;"></canvas>
|
||||
<p id="gesture_output" class="output"></p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<script type="module">
|
||||
import { GestureRecognizer, FilesetResolver, DrawingUtils } from "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3";
|
||||
|
||||
const demosSection = document.getElementById("demos");
|
||||
/** @type {GestureRecognizer} */
|
||||
let gestureRecognizer;
|
||||
let runningMode = "IMAGE";
|
||||
/** @type {HTMLButtonElement} */
|
||||
let enableWebcamButton;
|
||||
let webcamRunning = false;
|
||||
const videoHeight = "360px";
|
||||
const videoWidth = "480px";
|
||||
|
||||
// Load the WASM and model, then reveal the demos section
|
||||
const createGestureRecognizer = async () => {
|
||||
const vision = await FilesetResolver.forVisionTasks(
|
||||
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.3/wasm"
|
||||
);
|
||||
gestureRecognizer = await GestureRecognizer.createFromOptions(vision, {
|
||||
baseOptions: {
|
||||
modelAssetPath: "https://storage.googleapis.com/mediapipe-models/gesture_recognizer/gesture_recognizer/float16/1/gesture_recognizer.task",
|
||||
delegate: "GPU"
|
||||
},
|
||||
runningMode
|
||||
});
|
||||
demosSection.classList.remove("invisible");
|
||||
};
|
||||
createGestureRecognizer();
|
||||
|
||||
/********************************************************************
|
||||
// Demo 1: Detect hand gestures in images
|
||||
********************************************************************/
|
||||
const imageContainers = document.getElementsByClassName("detectOnClick");
|
||||
for (let i = 0; i < imageContainers.length; i++) {
|
||||
const img = imageContainers[i].children[0];
|
||||
img.addEventListener("click", handleClick);
|
||||
}
|
||||
|
||||
async function handleClick(event) {
|
||||
if (!gestureRecognizer) {
|
||||
alert("Please wait for gestureRecognizer to load");
|
||||
return;
|
||||
}
|
||||
|
||||
if (runningMode === "VIDEO") {
|
||||
runningMode = "IMAGE";
|
||||
await gestureRecognizer.setOptions({ runningMode: "IMAGE" });
|
||||
}
|
||||
|
||||
const parent = event.target.parentNode;
|
||||
|
||||
// Remove previous overlays
|
||||
const allCanvas = parent.getElementsByClassName("canvas");
|
||||
for (let i = allCanvas.length - 1; i >= 0; i--) {
|
||||
const n = allCanvas[i];
|
||||
n.parentNode.removeChild(n);
|
||||
}
|
||||
|
||||
const results = gestureRecognizer.recognize(event.target);
|
||||
console.log(results);
|
||||
|
||||
if (results.gestures && results.gestures.length > 0) {
|
||||
const p = parent.querySelector(".classification");
|
||||
p.classList.remove("removed");
|
||||
|
||||
const categoryName = results.gestures[0][0].categoryName;
|
||||
const categoryScore = (results.gestures[0][0].score * 100).toFixed(2);
|
||||
const handedness = results.handednesses[0][0].displayName;
|
||||
|
||||
p.innerText = `GestureRecognizer: ${categoryName}\n Confidence: ${categoryScore}%\n Handedness: ${handedness}`;
|
||||
p.style.left = "0px";
|
||||
p.style.top = event.target.height + "px";
|
||||
p.style.width = event.target.width - 10 + "px";
|
||||
|
||||
const canvas = document.createElement("canvas");
|
||||
canvas.setAttribute("class", "canvas");
|
||||
canvas.setAttribute("width", event.target.naturalWidth + "px");
|
||||
canvas.setAttribute("height", event.target.naturalHeight + "px");
|
||||
canvas.style.left = "0px";
|
||||
canvas.style.top = "0px";
|
||||
canvas.style.width = event.target.width + "px";
|
||||
canvas.style.height = event.target.height + "px";
|
||||
|
||||
parent.appendChild(canvas);
|
||||
const canvasCtx = canvas.getContext("2d");
|
||||
const drawingUtils = new DrawingUtils(canvasCtx);
|
||||
if (results.landmarks) {
|
||||
for (const landmarks of results.landmarks) {
|
||||
drawingUtils.drawConnectors(landmarks, GestureRecognizer.HAND_CONNECTIONS, { lineWidth: 5 });
|
||||
drawingUtils.drawLandmarks(landmarks, { lineWidth: 1 });
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/********************************************************************
|
||||
// Demo 2: Continuously grab image from webcam stream and detect it.
|
||||
********************************************************************/
|
||||
const video = document.getElementById("webcam");
|
||||
const canvasElement = document.getElementById("output_canvas");
|
||||
const canvasCtx = canvasElement.getContext("2d");
|
||||
const gestureOutput = document.getElementById("gesture_output");
|
||||
|
||||
function hasGetUserMedia() {
|
||||
return !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
|
||||
}
|
||||
|
||||
if (hasGetUserMedia()) {
|
||||
enableWebcamButton = document.getElementById("webcamButton");
|
||||
enableWebcamButton.addEventListener("click", enableCam);
|
||||
} else {
|
||||
console.warn("getUserMedia() is not supported by your browser");
|
||||
}
|
||||
|
||||
function enableCam() {
|
||||
if (!gestureRecognizer) {
|
||||
alert("Please wait for gestureRecognizer to load");
|
||||
return;
|
||||
}
|
||||
|
||||
webcamRunning = !webcamRunning;
|
||||
enableWebcamButton.innerText = webcamRunning ? "DISABLE PREDICTIONS" : "ENABLE PREDICTIONS";
|
||||
|
||||
const constraints = { video: true };
|
||||
navigator.mediaDevices.getUserMedia(constraints).then(function (stream) {
|
||||
video.srcObject = stream;
|
||||
video.addEventListener("loadeddata", predictWebcam);
|
||||
});
|
||||
}
|
||||
|
||||
let lastVideoTime = -1;
|
||||
let results;
|
||||
async function predictWebcam() {
|
||||
const webcamElement = document.getElementById("webcam");
|
||||
|
||||
if (runningMode === "IMAGE") {
|
||||
runningMode = "VIDEO";
|
||||
await gestureRecognizer.setOptions({ runningMode: "VIDEO" });
|
||||
}
|
||||
|
||||
const nowInMs = Date.now();
|
||||
if (video.currentTime !== lastVideoTime) {
|
||||
lastVideoTime = video.currentTime;
|
||||
results = gestureRecognizer.recognizeForVideo(video, nowInMs);
|
||||
}
|
||||
|
||||
canvasCtx.save();
|
||||
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
|
||||
const drawingUtils = new DrawingUtils(canvasCtx);
|
||||
|
||||
canvasElement.style.height = videoHeight;
|
||||
webcamElement.style.height = videoHeight;
|
||||
canvasElement.style.width = videoWidth;
|
||||
webcamElement.style.width = videoWidth;
|
||||
|
||||
if (results && results.landmarks) {
|
||||
for (const landmarks of results.landmarks) {
|
||||
drawingUtils.drawConnectors(landmarks, GestureRecognizer.HAND_CONNECTIONS, { lineWidth: 5 });
|
||||
drawingUtils.drawLandmarks(landmarks, { lineWidth: 2 });
|
||||
}
|
||||
}
|
||||
canvasCtx.restore();
|
||||
|
||||
if (results && results.gestures && results.gestures.length > 0) {
|
||||
gestureOutput.style.display = "block";
|
||||
gestureOutput.style.width = videoWidth;
|
||||
const categoryName = results.gestures[0][0].categoryName;
|
||||
const categoryScore = (results.gestures[0][0].score * 100).toFixed(2);
|
||||
const handedness = results.handednesses[0][0].displayName;
|
||||
gestureOutput.innerText = `GestureRecognizer: ${categoryName}\n Confidence: ${categoryScore} %\n Handedness: ${handedness}`;
|
||||
} else {
|
||||
gestureOutput.style.display = "none";
|
||||
}
|
||||
|
||||
if (webcamRunning === true) {
|
||||
window.requestAnimationFrame(predictWebcam);
|
||||
}
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
5
gesture.sh
Executable file
5
gesture.sh
Executable file
@@ -0,0 +1,5 @@
|
||||
export GLOG_minloglevel=2
|
||||
export TF_CPP_MIN_LOG_LEVEL=3
|
||||
python recognize_gesture.py --image ily.png --model gesture_recognizer.task 2>/dev/null
|
||||
|
||||
|
||||
BIN
gesture_recognizer.task
Normal file
BIN
gesture_recognizer.task
Normal file
Binary file not shown.
BIN
hand_landmarker.task
Normal file
BIN
hand_landmarker.task
Normal file
Binary file not shown.
125
hand_landmarker_cli.py
Executable file
125
hand_landmarker_cli.py
Executable file
@@ -0,0 +1,125 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Hand Landmarks on a static image using MediaPipe Tasks.
|
||||
|
||||
Usage:
|
||||
python hand_landmarker_cli.py --image hand.png --model hand_landmarker.task --max_hands 2 --out annotated.png
|
||||
|
||||
What it does:
|
||||
• Loads the MediaPipe Hand Landmarker model (.task file)
|
||||
• Runs landmark detection on a single image
|
||||
• Prints handedness and 21 landmark coords for each detected hand
|
||||
• Saves an annotated image with landmarks and connections
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
import mediapipe as mp
|
||||
|
||||
# MediaPipe Tasks API aliases
|
||||
BaseOptions = mp.tasks.BaseOptions
|
||||
HandLandmarker = mp.tasks.vision.HandLandmarker
|
||||
HandLandmarkerOptions = mp.tasks.vision.HandLandmarkerOptions
|
||||
VisionRunningMode = mp.tasks.vision.RunningMode
|
||||
|
||||
# Landmark connection topology (same as mp.solutions.hands.HAND_CONNECTIONS, copied to avoid extra dependency)
|
||||
HAND_CONNECTIONS = [
|
||||
(0,1),(1,2),(2,3),(3,4), # Thumb
|
||||
(0,5),(5,6),(6,7),(7,8), # Index
|
||||
(5,9),(9,10),(10,11),(11,12), # Middle
|
||||
(9,13),(13,14),(14,15),(15,16), # Ring
|
||||
(13,17),(17,18),(18,19),(19,20), # Pinky
|
||||
(0,17) # Palm base to pinky base
|
||||
]
|
||||
|
||||
def draw_landmarks(image_bgr: np.ndarray, landmarks_norm: list):
|
||||
"""
|
||||
Draws landmarks and connections on a BGR image.
|
||||
`landmarks_norm` is a list of normalized (x,y,z) MediaPipe landmarks (0..1).
|
||||
"""
|
||||
h, w = image_bgr.shape[:2]
|
||||
|
||||
# Convert normalized to pixel coords
|
||||
pts = []
|
||||
for lm in landmarks_norm:
|
||||
x = int(lm.x * w)
|
||||
y = int(lm.y * h)
|
||||
pts.append((x, y))
|
||||
|
||||
# Draw connections
|
||||
for a, b in HAND_CONNECTIONS:
|
||||
if 0 <= a < len(pts) and 0 <= b < len(pts):
|
||||
cv2.line(image_bgr, pts[a], pts[b], (0, 255, 0), 2, cv2.LINE_AA)
|
||||
|
||||
# Draw keypoints
|
||||
for i, (x, y) in enumerate(pts):
|
||||
cv2.circle(image_bgr, (x, y), 3, (255, 255, 255), -1, cv2.LINE_AA)
|
||||
cv2.circle(image_bgr, (x, y), 2, (0, 0, 255), -1, cv2.LINE_AA)
|
||||
|
||||
def main():
|
||||
ap = argparse.ArgumentParser(description="MediaPipe Hand Landmarker (static image)")
|
||||
ap.add_argument("--image", required=True, help="Path to an input image (e.g., hand.jpg)")
|
||||
ap.add_argument("--model", default="hand_landmarker.task", help="Path to MediaPipe .task model")
|
||||
ap.add_argument("--max_hands", type=int, default=2, help="Maximum hands to detect")
|
||||
ap.add_argument("--out", default="annotated.png", help="Output path for annotated image")
|
||||
args = ap.parse_args()
|
||||
|
||||
img_path = Path(args.image)
|
||||
if not img_path.exists():
|
||||
print(f"[ERROR] Image not found: {img_path}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
model_path = Path(args.model)
|
||||
if not model_path.exists():
|
||||
print(f"[ERROR] Model not found: {model_path}", file=sys.stderr)
|
||||
print("Download the model bundle (.task) and point --model to it.", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
# Load image for MP and for drawing
|
||||
mp_image = mp.Image.create_from_file(str(img_path))
|
||||
image_bgr = cv2.imread(str(img_path))
|
||||
if image_bgr is None:
|
||||
print(f"[ERROR] Could not read image with OpenCV: {img_path}", file=sys.stderr)
|
||||
sys.exit(3)
|
||||
|
||||
# Configure and run the landmarker
|
||||
options = HandLandmarkerOptions(
|
||||
base_options=BaseOptions(model_asset_path=str(model_path)),
|
||||
running_mode=VisionRunningMode.IMAGE,
|
||||
num_hands=args.max_hands,
|
||||
min_hand_detection_confidence=0.5,
|
||||
min_hand_presence_confidence=0.5,
|
||||
min_tracking_confidence=0.5
|
||||
)
|
||||
|
||||
with HandLandmarker.create_from_options(options) as landmarker:
|
||||
result = landmarker.detect(mp_image)
|
||||
|
||||
# Print results
|
||||
if not result.hand_landmarks:
|
||||
print("No hands detected.")
|
||||
else:
|
||||
for i, (handedness, lms, world_lms) in enumerate(
|
||||
zip(result.handedness, result.hand_landmarks, result.hand_world_landmarks)
|
||||
):
|
||||
label = handedness[0].category_name if handedness else "Unknown"
|
||||
score = handedness[0].score if handedness else 0.0
|
||||
print(f"\nHand #{i+1}: {label} (score {score:.3f})")
|
||||
for idx, lm in enumerate(lms):
|
||||
print(f" L{idx:02d}: x={lm.x:.3f} y={lm.y:.3f} z={lm.z:.3f}")
|
||||
|
||||
# Draw
|
||||
draw_landmarks(image_bgr, lms)
|
||||
# Put label
|
||||
cv2.putText(image_bgr, f"{label}", (10, 30 + i*30), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,0), 2, cv2.LINE_AA)
|
||||
|
||||
# Save annotated image
|
||||
cv2.imwrite(str(args.out), image_bgr)
|
||||
print(f"\nSaved annotated image to: {args.out}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
262
holistic.html
Normal file
262
holistic.html
Normal file
@@ -0,0 +1,262 @@
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<title>MediaPipe Holistic — Main Output Only</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="https://fonts.googleapis.com/css2?family=Titillium+Web:wght@400;600&display=swap" rel="stylesheet">
|
||||
<style>
|
||||
@keyframes spin { 0% {transform: rotate(0)} 100% {transform: rotate(360deg)} }
|
||||
.abs { position: absolute; }
|
||||
a { color: white; text-decoration: none; } a:hover { color: lightblue; }
|
||||
body {
|
||||
margin: 0; color: white; font-family: 'Titillium Web', sans-serif;
|
||||
position: absolute; inset: 0; overflow: hidden; background: #000;
|
||||
}
|
||||
.container {
|
||||
position: absolute; inset: 0; background-color: #596e73; height: 100%;
|
||||
}
|
||||
.canvas-container {
|
||||
display: flex; height: 100%; width: 100%;
|
||||
justify-content: center; align-items: center;
|
||||
}
|
||||
.output_canvas { max-width: 100%; display: block; position: relative; }
|
||||
/* Hide ALL video elements so only the processed canvas is visible */
|
||||
video { display: none !important; }
|
||||
.control-panel { position: absolute; left: 10px; top: 10px; z-index: 6; }
|
||||
.loading {
|
||||
display: flex; position: absolute; inset: 0; align-items: center; justify-content: center;
|
||||
backface-visibility: hidden; opacity: 1; transition: opacity 1s; z-index: 10;
|
||||
}
|
||||
.loading .spinner {
|
||||
position: absolute; width: 120px; height: 120px; animation: spin 1s linear infinite;
|
||||
border: 32px solid #bebebe; border-top: 32px solid #3498db; border-radius: 50%;
|
||||
}
|
||||
.loading .message { font-size: x-large; }
|
||||
.loaded .loading { opacity: 0; }
|
||||
.logo { bottom: 10px; right: 20px; }
|
||||
.logo .title { color: white; font-size: 28px; }
|
||||
.shoutout { left: 0; right: 0; bottom: 40px; text-align: center; font-size: 24px; position: absolute; z-index: 4; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<!-- Hidden capture element kept for MediaPipe pipeline -->
|
||||
<video class="input_video" playsinline></video>
|
||||
|
||||
<div class="canvas-container">
|
||||
<canvas class="output_canvas" width="1280" height="720"></canvas>
|
||||
</div>
|
||||
|
||||
<!-- Loading spinner -->
|
||||
<div class="loading">
|
||||
<div class="spinner"></div>
|
||||
<div class="message">Loading</div>
|
||||
</div>
|
||||
|
||||
<!-- Logo/link -->
|
||||
<a class="abs logo" href="https://mediapipe.dev" target="_blank" rel="noreferrer">
|
||||
<div style="display:flex;align-items:center;bottom:0;right:10px;">
|
||||
<img class="logo" alt="" style="height:50px"
|
||||
src="" />
|
||||
<span class="title" style="margin-left:8px">MediaPipe</span>
|
||||
</div>
|
||||
</a>
|
||||
|
||||
<!-- Info link -->
|
||||
<div class="shoutout">
|
||||
<div><a href="https://solutions.mediapipe.dev/holistic" target="_blank" rel="noreferrer">Click here for more info</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Control panel container -->
|
||||
<div class="control-panel"></div>
|
||||
|
||||
<!-- MediaPipe libs (globals: mpHolistic, drawingUtils, controlsNS, etc.) -->
|
||||
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/holistic/holistic.js"></script>
|
||||
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js"></script>
|
||||
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js"></script>
|
||||
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js"></script>
|
||||
|
||||
<!-- Device detector is ESM; we import it and run the app -->
|
||||
<script type="module">
|
||||
import DeviceDetector from "https://cdn.skypack.dev/device-detector-js@2.2.10";
|
||||
|
||||
function testSupport(supportedDevices) {
|
||||
const dd = new DeviceDetector();
|
||||
const d = dd.parse(navigator.userAgent);
|
||||
let ok = false;
|
||||
for (const dev of supportedDevices) {
|
||||
if (dev.client && !(new RegExp(`^${dev.client}$`)).test(d.client.name)) continue;
|
||||
if (dev.os && !(new RegExp(`^${dev.os}$`)).test(d.os.name)) continue;
|
||||
ok = true; break;
|
||||
}
|
||||
if (!ok) alert(`This demo, running on ${d.client.name}/${d.os.name}, is not well supported at this time, continue at your own risk.`);
|
||||
}
|
||||
testSupport([{ client: 'Chrome' }]);
|
||||
|
||||
const controlsNS = window;
|
||||
const mpHolistic = window;
|
||||
const drawingUtils = window;
|
||||
|
||||
const videoElement = document.getElementsByClassName('input_video')[0];
|
||||
const canvasElement = document.getElementsByClassName('output_canvas')[0];
|
||||
const controlsElement = document.getElementsByClassName('control-panel')[0];
|
||||
const canvasCtx = canvasElement.getContext('2d');
|
||||
|
||||
const fpsControl = new controlsNS.FPS();
|
||||
const spinner = document.querySelector('.loading');
|
||||
spinner.ontransitionend = () => { spinner.style.display = 'none'; };
|
||||
|
||||
function removeElements(landmarks, elements) {
|
||||
if (!landmarks) return;
|
||||
for (const e of elements) delete landmarks[e];
|
||||
}
|
||||
function removeLandmarks(results) {
|
||||
if (results.poseLandmarks) {
|
||||
removeElements(results.poseLandmarks, [0,1,2,3,4,5,6,7,8,9,10,15,16,17,18,19,20,21,22]);
|
||||
}
|
||||
}
|
||||
function connect(ctx, connectors) {
|
||||
const c = ctx.canvas;
|
||||
for (const [from, to] of connectors) {
|
||||
if (!from || !to) continue;
|
||||
if (from.visibility && to.visibility && (from.visibility < 0.1 || to.visibility < 0.1)) continue;
|
||||
ctx.beginPath();
|
||||
ctx.moveTo(from.x * c.width, from.y * c.height);
|
||||
ctx.lineTo(to.x * c.width, to.y * c.height);
|
||||
ctx.stroke();
|
||||
}
|
||||
}
|
||||
|
||||
let activeEffect = 'mask';
|
||||
|
||||
function onResults(results) {
|
||||
document.body.classList.add('loaded');
|
||||
removeLandmarks(results);
|
||||
fpsControl.tick();
|
||||
|
||||
canvasCtx.save();
|
||||
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
|
||||
|
||||
if (results.segmentationMask) {
|
||||
canvasCtx.drawImage(results.segmentationMask, 0, 0, canvasElement.width, canvasElement.height);
|
||||
if (activeEffect === 'mask' || activeEffect === 'both') {
|
||||
canvasCtx.globalCompositeOperation = 'source-in';
|
||||
canvasCtx.fillStyle = '#00FF007F';
|
||||
canvasCtx.fillRect(0, 0, canvasElement.width, canvasElement.height);
|
||||
} else {
|
||||
canvasCtx.globalCompositeOperation = 'source-out';
|
||||
canvasCtx.fillStyle = '#0000FF7F';
|
||||
canvasCtx.fillRect(0, 0, canvasElement.width, canvasElement.height);
|
||||
}
|
||||
canvasCtx.globalCompositeOperation = 'destination-atop';
|
||||
canvasCtx.drawImage(results.image, 0, 0, canvasElement.width, canvasElement.height);
|
||||
canvasCtx.globalCompositeOperation = 'source-over';
|
||||
} else {
|
||||
canvasCtx.drawImage(results.image, 0, 0, canvasElement.width, canvasElement.height);
|
||||
}
|
||||
|
||||
canvasCtx.lineWidth = 5;
|
||||
if (results.poseLandmarks) {
|
||||
if (results.rightHandLandmarks) {
|
||||
canvasCtx.strokeStyle = 'white';
|
||||
connect(canvasCtx, [[
|
||||
results.poseLandmarks[mpHolistic.POSE_LANDMARKS.RIGHT_ELBOW],
|
||||
results.rightHandLandmarks[0]
|
||||
]]);
|
||||
}
|
||||
if (results.leftHandLandmarks) {
|
||||
canvasCtx.strokeStyle = 'white';
|
||||
connect(canvasCtx, [[
|
||||
results.poseLandmarks[mpHolistic.POSE_LANDMARKS.LEFT_ELBOW],
|
||||
results.leftHandLandmarks[0]
|
||||
]]);
|
||||
}
|
||||
}
|
||||
|
||||
drawingUtils.drawConnectors(canvasCtx, results.poseLandmarks, mpHolistic.POSE_CONNECTIONS, { color: 'white' });
|
||||
drawingUtils.drawLandmarks(
|
||||
canvasCtx,
|
||||
Object.values(mpHolistic.POSE_LANDMARKS_LEFT).map(i => results.poseLandmarks?.[i]),
|
||||
{ visibilityMin: 0.65, color: 'white', fillColor: 'rgb(255,138,0)' }
|
||||
);
|
||||
drawingUtils.drawLandmarks(
|
||||
canvasCtx,
|
||||
Object.values(mpHolistic.POSE_LANDMARKS_RIGHT).map(i => results.poseLandmarks?.[i]),
|
||||
{ visibilityMin: 0.65, color: 'white', fillColor: 'rgb(0,217,231)' }
|
||||
);
|
||||
|
||||
drawingUtils.drawConnectors(canvasCtx, results.rightHandLandmarks, mpHolistic.HAND_CONNECTIONS, { color: 'white' });
|
||||
drawingUtils.drawLandmarks(canvasCtx, results.rightHandLandmarks, {
|
||||
color: 'white', fillColor: 'rgb(0,217,231)', lineWidth: 2,
|
||||
radius: (data) => drawingUtils.lerp(data.from?.z ?? 0, -0.15, 0.1, 10, 1)
|
||||
});
|
||||
drawingUtils.drawConnectors(canvasCtx, results.leftHandLandmarks, mpHolistic.HAND_CONNECTIONS, { color: 'white' });
|
||||
drawingUtils.drawLandmarks(canvasCtx, results.leftHandLandmarks, {
|
||||
color: 'white', fillColor: 'rgb(255,138,0)', lineWidth: 2,
|
||||
radius: (data) => drawingUtils.lerp(data.from?.z ?? 0, -0.15, 0.1, 10, 1)
|
||||
});
|
||||
|
||||
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_TESSELATION, { color: '#C0C0C070', lineWidth: 1 });
|
||||
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_RIGHT_EYE, { color: 'rgb(0,217,231)' });
|
||||
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_RIGHT_EYEBROW, { color: 'rgb(0,217,231)' });
|
||||
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_LEFT_EYE, { color: 'rgb(255,138,0)' });
|
||||
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_LEFT_EYEBROW, { color: 'rgb(255,138,0)' });
|
||||
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_FACE_OVAL, { color: '#E0E0E0', lineWidth: 5 });
|
||||
drawingUtils.drawConnectors(canvasCtx, results.faceLandmarks, mpHolistic.FACEMESH_LIPS, { color: '#E0E0E0', lineWidth: 5 });
|
||||
|
||||
canvasCtx.restore();
|
||||
}
|
||||
|
||||
const holistic = new mpHolistic.Holistic({
|
||||
locateFile: (file) => `https://cdn.jsdelivr.net/npm/@mediapipe/holistic@${mpHolistic.VERSION}/${file}`
|
||||
});
|
||||
holistic.onResults(onResults);
|
||||
|
||||
new controlsNS.ControlPanel(controlsElement, {
|
||||
selfieMode: true,
|
||||
modelComplexity: 1,
|
||||
smoothLandmarks: true,
|
||||
enableSegmentation: false,
|
||||
smoothSegmentation: true,
|
||||
minDetectionConfidence: 0.5,
|
||||
minTrackingConfidence: 0.5,
|
||||
effect: 'background',
|
||||
})
|
||||
.add([
|
||||
new controlsNS.StaticText({ title: 'MediaPipe Holistic' }),
|
||||
fpsControl,
|
||||
new controlsNS.Toggle({ title: 'Selfie Mode', field: 'selfieMode' }),
|
||||
new controlsNS.SourcePicker({
|
||||
onSourceChanged: () => { holistic.reset(); },
|
||||
onFrame: async (input, size) => {
|
||||
const aspect = size.height / size.width;
|
||||
let width, height;
|
||||
if (window.innerWidth > window.innerHeight) {
|
||||
height = window.innerHeight; width = height / aspect;
|
||||
} else {
|
||||
width = window.innerWidth; height = width * aspect;
|
||||
}
|
||||
canvasElement.width = width;
|
||||
canvasElement.height = height;
|
||||
await holistic.send({ image: input });
|
||||
},
|
||||
}),
|
||||
new controlsNS.Slider({ title: 'Model Complexity', field: 'modelComplexity', discrete: ['Lite', 'Full', 'Heavy'] }),
|
||||
new controlsNS.Toggle({ title: 'Smooth Landmarks', field: 'smoothLandmarks' }),
|
||||
new controlsNS.Toggle({ title: 'Enable Segmentation', field: 'enableSegmentation' }),
|
||||
new controlsNS.Toggle({ title: 'Smooth Segmentation', field: 'smoothSegmentation' }),
|
||||
new controlsNS.Slider({ title: 'Min Detection Confidence', field: 'minDetectionConfidence', range: [0, 1], step: 0.01 }),
|
||||
new controlsNS.Slider({ title: 'Min Tracking Confidence', field: 'minTrackingConfidence', range: [0, 1], step: 0.01 }),
|
||||
new controlsNS.Slider({ title: 'Effect', field: 'effect', discrete: { background: 'Background', mask: 'Foreground' } }),
|
||||
])
|
||||
.on(x => {
|
||||
const options = x;
|
||||
videoElement.classList.toggle('selfie', !!options.selfieMode);
|
||||
activeEffect = x['effect'];
|
||||
holistic.setOptions(options);
|
||||
});
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
BIN
landmarks.png
Normal file
BIN
landmarks.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 977 KiB |
268
marker.html
Normal file
268
marker.html
Normal file
@@ -0,0 +1,268 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<title>MediaPipe Hand Landmarker — Single File Demo</title>
|
||||
|
||||
<!-- Material Components (for the button styling) -->
|
||||
<link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet">
|
||||
<script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>
|
||||
|
||||
<!-- Drawing utils (provides drawConnectors, drawLandmarks) -->
|
||||
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
|
||||
<!-- Hands (provides HAND_CONNECTIONS constant) -->
|
||||
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/hands/hands.js" crossorigin="anonymous"></script>
|
||||
|
||||
<style>
|
||||
/* Inline CSS from the CodePen, cleaned for single-file use */
|
||||
body {
|
||||
font-family: Roboto, Arial, sans-serif;
|
||||
margin: 2em;
|
||||
color: #3d3d3d;
|
||||
--mdc-theme-primary: #007f8b;
|
||||
--mdc-theme-on-primary: #f1f3f4;
|
||||
}
|
||||
|
||||
h1 { color: #007f8b; }
|
||||
h2 { clear: both; }
|
||||
em { font-weight: bold; }
|
||||
|
||||
video {
|
||||
clear: both;
|
||||
display: block;
|
||||
transform: rotateY(180deg);
|
||||
-webkit-transform: rotateY(180deg);
|
||||
-moz-transform: rotateY(180deg);
|
||||
}
|
||||
|
||||
section {
|
||||
opacity: 1;
|
||||
transition: opacity 500ms ease-in-out;
|
||||
}
|
||||
|
||||
.removed { display: none; }
|
||||
.invisible { opacity: 0.2; }
|
||||
|
||||
.note {
|
||||
font-style: italic;
|
||||
font-size: 130%;
|
||||
}
|
||||
|
||||
.videoView, .detectOnClick {
|
||||
position: relative;
|
||||
float: left;
|
||||
width: 48%;
|
||||
margin: 2% 1%;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.videoView p, .detectOnClick p {
|
||||
position: absolute;
|
||||
padding: 5px;
|
||||
background-color: #007f8b;
|
||||
color: #fff;
|
||||
border: 1px dashed rgba(255, 255, 255, 0.7);
|
||||
z-index: 2;
|
||||
font-size: 12px;
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.highlighter {
|
||||
background: rgba(0, 255, 0, 0.25);
|
||||
border: 1px dashed #fff;
|
||||
z-index: 1;
|
||||
position: absolute;
|
||||
}
|
||||
|
||||
.canvas, .output_canvas {
|
||||
z-index: 1;
|
||||
position: absolute;
|
||||
pointer-events: none;
|
||||
}
|
||||
|
||||
.output_canvas {
|
||||
transform: rotateY(180deg);
|
||||
-webkit-transform: rotateY(180deg);
|
||||
-moz-transform: rotateY(180deg);
|
||||
}
|
||||
|
||||
.detectOnClick { z-index: 0; }
|
||||
.detectOnClick img { width: 100%; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h2>Demo: Webcam continuous hands landmarks detection</h2>
|
||||
<p>Hold your hand in front of your webcam to get real-time hand landmarker detection.<br>Click <b>ENABLE WEBCAM</b> below and grant access to the webcam if prompted.</p>
|
||||
|
||||
<div id="liveView" class="videoView">
|
||||
<button id="webcamButton" class="mdc-button mdc-button--raised">
|
||||
<span class="mdc-button__ripple"></span>
|
||||
<span class="mdc-button__label">ENABLE WEBCAM</span>
|
||||
</button>
|
||||
<div style="position: relative;">
|
||||
<video id="webcam" style="position: absolute; left: 0; top: 0;" autoplay playsinline></video>
|
||||
<canvas class="output_canvas" id="output_canvas" style="left: 0; top: 0;"></canvas>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<script type="module">
|
||||
// Import the Tasks Vision ESM build
|
||||
import { HandLandmarker, FilesetResolver } from "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.0";
|
||||
|
||||
const demosSection = document.getElementById("demos");
|
||||
|
||||
let handLandmarker;
|
||||
let runningMode = "IMAGE";
|
||||
let enableWebcamButton;
|
||||
let webcamRunning = false;
|
||||
|
||||
// Load the model and enable the demos section
|
||||
const createHandLandmarker = async () => {
|
||||
const vision = await FilesetResolver.forVisionTasks(
|
||||
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.0/wasm"
|
||||
);
|
||||
handLandmarker = await HandLandmarker.createFromOptions(vision, {
|
||||
baseOptions: {
|
||||
modelAssetPath: "https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/1/hand_landmarker.task",
|
||||
delegate: "GPU"
|
||||
},
|
||||
runningMode,
|
||||
numHands: 2
|
||||
});
|
||||
demosSection.classList.remove("invisible");
|
||||
};
|
||||
createHandLandmarker();
|
||||
|
||||
/********************************************************************
|
||||
// Demo 1: Click images to run landmark detection
|
||||
********************************************************************/
|
||||
const imageContainers = document.getElementsByClassName("detectOnClick");
|
||||
for (let i = 0; i < imageContainers.length; i++) {
|
||||
const img = imageContainers[i].children[0];
|
||||
img.addEventListener("click", handleClick);
|
||||
}
|
||||
|
||||
async function handleClick(event) {
|
||||
if (!handLandmarker) {
|
||||
console.log("Wait for handLandmarker to load before clicking!");
|
||||
return;
|
||||
}
|
||||
if (runningMode === "VIDEO") {
|
||||
runningMode = "IMAGE";
|
||||
await handLandmarker.setOptions({ runningMode: "IMAGE" });
|
||||
}
|
||||
|
||||
const container = event.target.parentNode;
|
||||
// Remove old overlays
|
||||
const old = container.getElementsByClassName("canvas");
|
||||
for (let i = old.length - 1; i >= 0; i--) {
|
||||
old[i].parentNode.removeChild(old[i]);
|
||||
}
|
||||
|
||||
// Run detection
|
||||
const result = handLandmarker.detect(event.target);
|
||||
|
||||
// Create overlay canvas aligned to the image element
|
||||
const canvas = document.createElement("canvas");
|
||||
canvas.className = "canvas";
|
||||
canvas.width = event.target.naturalWidth;
|
||||
canvas.height = event.target.naturalHeight;
|
||||
canvas.style.left = "0px";
|
||||
canvas.style.top = "0px";
|
||||
canvas.style.width = event.target.width + "px";
|
||||
canvas.style.height = event.target.height + "px";
|
||||
container.appendChild(canvas);
|
||||
|
||||
const ctx = canvas.getContext("2d");
|
||||
if (result && result.landmarks) {
|
||||
for (const landmarks of result.landmarks) {
|
||||
// drawConnectors and drawLandmarks are provided by drawing_utils.js
|
||||
// HAND_CONNECTIONS is provided by hands.js
|
||||
drawConnectors(ctx, landmarks, HAND_CONNECTIONS, {
|
||||
color: "#00FF00",
|
||||
lineWidth: 5
|
||||
});
|
||||
drawLandmarks(ctx, landmarks, { color: "#FF0000", lineWidth: 1 });
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/********************************************************************
|
||||
// Demo 2: Webcam stream detection
|
||||
********************************************************************/
|
||||
const video = document.getElementById("webcam");
|
||||
const canvasElement = document.getElementById("output_canvas");
|
||||
const canvasCtx = canvasElement.getContext("2d");
|
||||
|
||||
const hasGetUserMedia = () => !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
|
||||
|
||||
if (hasGetUserMedia()) {
|
||||
enableWebcamButton = document.getElementById("webcamButton");
|
||||
enableWebcamButton.addEventListener("click", enableCam);
|
||||
} else {
|
||||
console.warn("getUserMedia() is not supported by your browser");
|
||||
}
|
||||
|
||||
function enableCam() {
|
||||
if (!handLandmarker) {
|
||||
console.log("Wait! HandLandmarker not loaded yet.");
|
||||
return;
|
||||
}
|
||||
|
||||
webcamRunning = !webcamRunning;
|
||||
enableWebcamButton.innerText = webcamRunning ? "DISABLE PREDICTIONS" : "ENABLE PREDICTIONS";
|
||||
|
||||
if (!webcamRunning) return;
|
||||
|
||||
const constraints = { video: true };
|
||||
navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
|
||||
video.srcObject = stream;
|
||||
video.addEventListener("loadeddata", predictWebcam);
|
||||
});
|
||||
}
|
||||
|
||||
let lastVideoTime = -1;
|
||||
let results;
|
||||
|
||||
async function predictWebcam() {
|
||||
// Match canvas to the video size
|
||||
canvasElement.style.width = video.videoWidth + "px";
|
||||
canvasElement.style.height = video.videoHeight + "px";
|
||||
canvasElement.width = video.videoWidth;
|
||||
canvasElement.height = video.videoHeight;
|
||||
|
||||
// Switch to VIDEO mode for streaming
|
||||
if (runningMode === "IMAGE") {
|
||||
runningMode = "VIDEO";
|
||||
await handLandmarker.setOptions({ runningMode: "VIDEO" });
|
||||
}
|
||||
|
||||
const startTimeMs = performance.now();
|
||||
if (lastVideoTime !== video.currentTime) {
|
||||
lastVideoTime = video.currentTime;
|
||||
results = handLandmarker.detectForVideo(video, startTimeMs);
|
||||
}
|
||||
|
||||
canvasCtx.save();
|
||||
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
|
||||
|
||||
if (results && results.landmarks) {
|
||||
for (const landmarks of results.landmarks) {
|
||||
drawConnectors(canvasCtx, landmarks, HAND_CONNECTIONS, {
|
||||
color: "#00FF00",
|
||||
lineWidth: 5
|
||||
});
|
||||
drawLandmarks(canvasCtx, landmarks, { color: "#FF0000", lineWidth: 2 });
|
||||
}
|
||||
}
|
||||
canvasCtx.restore();
|
||||
|
||||
if (webcamRunning) {
|
||||
window.requestAnimationFrame(predictWebcam);
|
||||
}
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
2
more_info.txt
Normal file
2
more_info.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker
|
||||
https://ai.google.dev/edge/mediapipe/solutions/customization/gesture_recognizer
|
||||
298
posture.html
Normal file
298
posture.html
Normal file
@@ -0,0 +1,298 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="Cache-control" content="no-cache, no-store, must-revalidate">
|
||||
<meta http-equiv="Pragma" content="no-cache">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
|
||||
<title>Pose Landmarker — Single File Demo</title>
|
||||
|
||||
<!-- Material Components (for the button styling) -->
|
||||
<link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet">
|
||||
<script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>
|
||||
|
||||
<style>
|
||||
/* Copyright 2023 The MediaPipe Authors.
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License. */
|
||||
|
||||
/* NOTE: The original CSS used `@use "@material";` which is a Sass directive.
|
||||
That's not valid in plain CSS, so it's removed here. */
|
||||
|
||||
body {
|
||||
font-family: Roboto, system-ui, -apple-system, Segoe UI, Arial, sans-serif;
|
||||
margin: 2em;
|
||||
color: #3d3d3d;
|
||||
--mdc-theme-primary: #007f8b;
|
||||
--mdc-theme-on-primary: #f1f3f4;
|
||||
}
|
||||
|
||||
h1 { color: #007f8b; }
|
||||
h2 { clear: both; }
|
||||
|
||||
em { font-weight: bold; }
|
||||
|
||||
video {
|
||||
clear: both;
|
||||
display: block;
|
||||
transform: rotateY(180deg);
|
||||
-webkit-transform: rotateY(180deg);
|
||||
-moz-transform: rotateY(180deg);
|
||||
}
|
||||
|
||||
section {
|
||||
opacity: 1;
|
||||
transition: opacity 500ms ease-in-out;
|
||||
}
|
||||
|
||||
header, footer { clear: both; }
|
||||
|
||||
.removed { display: none; }
|
||||
.invisible { opacity: 0.2; }
|
||||
|
||||
.note {
|
||||
font-style: italic;
|
||||
font-size: 130%;
|
||||
}
|
||||
|
||||
.videoView, .detectOnClick {
|
||||
position: relative;
|
||||
float: left;
|
||||
width: 48%;
|
||||
margin: 2% 1%;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.videoView p, .detectOnClick p {
|
||||
position: absolute;
|
||||
padding: 5px;
|
||||
background-color: #007f8b;
|
||||
color: #fff;
|
||||
border: 1px dashed rgba(255, 255, 255, 0.7);
|
||||
z-index: 2;
|
||||
font-size: 12px;
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.highlighter {
|
||||
background: rgba(0, 255, 0, 0.25);
|
||||
border: 1px dashed #fff;
|
||||
z-index: 1;
|
||||
position: absolute;
|
||||
}
|
||||
|
||||
.canvas {
|
||||
z-index: 1;
|
||||
position: absolute;
|
||||
pointer-events: none;
|
||||
}
|
||||
|
||||
.output_canvas {
|
||||
transform: rotateY(180deg);
|
||||
-webkit-transform: rotateY(180deg);
|
||||
-moz-transform: rotateY(180deg);
|
||||
}
|
||||
|
||||
.detectOnClick { z-index: 0; }
|
||||
.detectOnClick img { width: 100%; }
|
||||
|
||||
/* Simple layout fix for the video/canvas wrapper */
|
||||
.video-wrapper {
|
||||
position: relative;
|
||||
width: 1280px;
|
||||
max-width: 100%;
|
||||
aspect-ratio: 16 / 9;
|
||||
}
|
||||
.video-wrapper video,
|
||||
.video-wrapper canvas {
|
||||
position: absolute;
|
||||
top: 0;
|
||||
left: 0;
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<h1>Pose detection using the MediaPipe PoseLandmarker task</h1>
|
||||
|
||||
<section id="demos" class="invisible">
|
||||
<h2>Demo: Webcam continuous pose landmarks detection</h2>
|
||||
<p>Stand in front of your webcam to get real-time pose landmarker detection.<br>Click <b>enable webcam</b> below and grant access to the webcam if prompted.</p>
|
||||
|
||||
<div id="liveView" class="videoView">
|
||||
<button id="webcamButton" class="mdc-button mdc-button--raised">
|
||||
<span class="mdc-button__ripple"></span>
|
||||
<span class="mdc-button__label">ENABLE WEBCAM</span>
|
||||
</button>
|
||||
<div class="video-wrapper">
|
||||
<video id="webcam" autoplay playsinline></video>
|
||||
<canvas class="output_canvas" id="output_canvas" width="1280" height="720"></canvas>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<script type="module">
|
||||
// Copyright 2023 The MediaPipe Authors.
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
|
||||
import {
|
||||
PoseLandmarker,
|
||||
FilesetResolver,
|
||||
DrawingUtils
|
||||
} from "https://cdn.skypack.dev/@mediapipe/tasks-vision@0.10.0";
|
||||
|
||||
const demosSection = document.getElementById("demos");
|
||||
|
||||
let poseLandmarker = undefined;
|
||||
let runningMode = "IMAGE";
|
||||
let enableWebcamButton;
|
||||
let webcamRunning = false;
|
||||
const videoHeight = "360px";
|
||||
const videoWidth = "480px";
|
||||
|
||||
// Load the Vision WASM and the Pose Landmarker model
|
||||
const createPoseLandmarker = async () => {
|
||||
const vision = await FilesetResolver.forVisionTasks(
|
||||
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.0/wasm"
|
||||
);
|
||||
poseLandmarker = await PoseLandmarker.createFromOptions(vision, {
|
||||
baseOptions: {
|
||||
modelAssetPath: "https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task",
|
||||
delegate: "GPU"
|
||||
},
|
||||
runningMode: runningMode,
|
||||
numPoses: 2
|
||||
});
|
||||
demosSection.classList.remove("invisible");
|
||||
};
|
||||
createPoseLandmarker();
|
||||
|
||||
/********************************************************************
|
||||
// Demo 1: Click an image to detect pose and draw landmarks.
|
||||
********************************************************************/
|
||||
const imageContainers = document.getElementsByClassName("detectOnClick");
|
||||
for (let i = 0; i < imageContainers.length; i++) {
|
||||
imageContainers[i].children[0].addEventListener("click", handleClick);
|
||||
}
|
||||
|
||||
async function handleClick(event) {
|
||||
if (!poseLandmarker) {
|
||||
console.log("Wait for poseLandmarker to load before clicking!");
|
||||
return;
|
||||
}
|
||||
|
||||
if (runningMode === "VIDEO") {
|
||||
runningMode = "IMAGE";
|
||||
await poseLandmarker.setOptions({ runningMode: "IMAGE" });
|
||||
}
|
||||
|
||||
// Remove old overlays
|
||||
const allCanvas = event.target.parentNode.getElementsByClassName("canvas");
|
||||
for (let i = allCanvas.length - 1; i >= 0; i--) {
|
||||
const n = allCanvas[i];
|
||||
n.parentNode.removeChild(n);
|
||||
}
|
||||
|
||||
poseLandmarker.detect(event.target, (result) => {
|
||||
const canvas = document.createElement("canvas");
|
||||
canvas.setAttribute("class", "canvas");
|
||||
canvas.setAttribute("width", event.target.naturalWidth + "px");
|
||||
canvas.setAttribute("height", event.target.naturalHeight + "px");
|
||||
canvas.style =
|
||||
"left: 0px; top: 0px; width: " + event.target.width + "px; height: " + event.target.height + "px;";
|
||||
|
||||
event.target.parentNode.appendChild(canvas);
|
||||
const canvasCtx = canvas.getContext("2d");
|
||||
const drawingUtils = new DrawingUtils(canvasCtx);
|
||||
|
||||
for (const landmark of result.landmarks) {
|
||||
drawingUtils.drawLandmarks(landmark, {
|
||||
radius: (data) => DrawingUtils.lerp((data.from && data.from.z) ?? 0, -0.15, 0.1, 5, 1)
|
||||
});
|
||||
drawingUtils.drawConnectors(landmark, PoseLandmarker.POSE_CONNECTIONS);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
/********************************************************************
|
||||
// Demo 2: Live webcam pose detection.
|
||||
********************************************************************/
|
||||
const video = document.getElementById("webcam");
|
||||
const canvasElement = document.getElementById("output_canvas");
|
||||
const canvasCtx = canvasElement.getContext("2d");
|
||||
const drawingUtils = new DrawingUtils(canvasCtx);
|
||||
|
||||
const hasGetUserMedia = () => !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
|
||||
|
||||
if (hasGetUserMedia()) {
|
||||
enableWebcamButton = document.getElementById("webcamButton");
|
||||
enableWebcamButton.addEventListener("click", enableCam);
|
||||
} else {
|
||||
console.warn("getUserMedia() is not supported by your browser");
|
||||
}
|
||||
|
||||
function enableCam() {
|
||||
if (!poseLandmarker) {
|
||||
console.log("Wait! poseLandmarker not loaded yet.");
|
||||
return;
|
||||
}
|
||||
|
||||
if (webcamRunning === true) {
|
||||
webcamRunning = false;
|
||||
enableWebcamButton.innerText = "ENABLE PREDICTIONS";
|
||||
} else {
|
||||
webcamRunning = true;
|
||||
enableWebcamButton.innerText = "DISABLE PREDICTIONS";
|
||||
}
|
||||
|
||||
const constraints = { video: true };
|
||||
|
||||
navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
|
||||
video.srcObject = stream;
|
||||
video.addEventListener("loadeddata", predictWebcam);
|
||||
});
|
||||
}
|
||||
|
||||
let lastVideoTime = -1;
|
||||
async function predictWebcam() {
|
||||
canvasElement.style.height = videoHeight;
|
||||
video.style.height = videoHeight;
|
||||
canvasElement.style.width = videoWidth;
|
||||
video.style.width = videoWidth;
|
||||
|
||||
if (runningMode === "IMAGE") {
|
||||
runningMode = "VIDEO";
|
||||
await poseLandmarker.setOptions({ runningMode: "VIDEO" });
|
||||
}
|
||||
|
||||
const startTimeMs = performance.now();
|
||||
if (lastVideoTime !== video.currentTime) {
|
||||
lastVideoTime = video.currentTime;
|
||||
poseLandmarker.detectForVideo(video, startTimeMs, (result) => {
|
||||
canvasCtx.save();
|
||||
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
|
||||
for (const landmark of result.landmarks) {
|
||||
drawingUtils.drawLandmarks(landmark, {
|
||||
radius: (data) => DrawingUtils.lerp((data.from && data.from.z) ?? 0, -0.15, 0.1, 5, 1)
|
||||
});
|
||||
drawingUtils.drawConnectors(landmark, PoseLandmarker.POSE_CONNECTIONS);
|
||||
}
|
||||
canvasCtx.restore();
|
||||
});
|
||||
}
|
||||
|
||||
if (webcamRunning === true) {
|
||||
window.requestAnimationFrame(predictWebcam);
|
||||
}
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
151
process_mp4_facial.py
Executable file
151
process_mp4_facial.py
Executable file
@@ -0,0 +1,151 @@
|
||||
import cv2
|
||||
import mediapipe as mp
|
||||
from mediapipe.tasks import python
|
||||
from mediapipe.tasks.python import vision
|
||||
import numpy as np
|
||||
from mediapipe.framework.formats import landmark_pb2
|
||||
import argparse
|
||||
import os
|
||||
import csv
|
||||
|
||||
# --- NEW: Helper function to create the landmark-to-feature map ---
|
||||
def create_landmark_map():
|
||||
"""Creates a mapping from landmark index to facial feature name."""
|
||||
landmark_map = {}
|
||||
|
||||
# Define the connection groups from MediaPipe's face_mesh solutions
|
||||
connection_groups = {
|
||||
'lips': mp.solutions.face_mesh.FACEMESH_LIPS,
|
||||
'left_eye': mp.solutions.face_mesh.FACEMESH_LEFT_EYE,
|
||||
'right_eye': mp.solutions.face_mesh.FACEMESH_RIGHT_EYE,
|
||||
'left_eyebrow': mp.solutions.face_mesh.FACEMESH_LEFT_EYEBROW,
|
||||
'right_eyebrow': mp.solutions.face_mesh.FACEMESH_RIGHT_EYEBROW,
|
||||
'face_oval': mp.solutions.face_mesh.FACEMESH_FACE_OVAL,
|
||||
'left_iris': mp.solutions.face_mesh.FACEMESH_LEFT_IRIS,
|
||||
'right_iris': mp.solutions.face_mesh.FACEMESH_RIGHT_IRIS,
|
||||
}
|
||||
|
||||
# Populate the map by iterating through the connection groups
|
||||
for part_name, connections in connection_groups.items():
|
||||
for connection in connections:
|
||||
landmark_map[connection[0]] = part_name
|
||||
landmark_map[connection[1]] = part_name
|
||||
|
||||
return landmark_map
|
||||
|
||||
# --- Helper Function to Draw Landmarks ---
|
||||
def draw_landmarks_on_image(rgb_image, detection_result):
|
||||
"""Draws face landmarks on a single image frame."""
|
||||
face_landmarks_list = detection_result.face_landmarks
|
||||
annotated_image = np.copy(rgb_image)
|
||||
|
||||
# Loop through the detected faces to visualize.
|
||||
for face_landmarks in face_landmarks_list:
|
||||
face_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
|
||||
face_landmarks_proto.landmark.extend([
|
||||
landmark_pb2.NormalizedLandmark(x=landmark.x, y=landmark.y, z=landmark.z) for landmark in face_landmarks
|
||||
])
|
||||
|
||||
mp.solutions.drawing_utils.draw_landmarks(
|
||||
image=annotated_image,
|
||||
landmark_list=face_landmarks_proto,
|
||||
connections=mp.solutions.face_mesh.FACEMESH_TESSELATION,
|
||||
landmark_drawing_spec=None,
|
||||
connection_drawing_spec=mp.solutions.drawing_styles
|
||||
.get_default_face_mesh_tesselation_style())
|
||||
mp.solutions.drawing_utils.draw_landmarks(
|
||||
image=annotated_image,
|
||||
landmark_list=face_landmarks_proto,
|
||||
connections=mp.solutions.face_mesh.FACEMESH_CONTOURS,
|
||||
landmark_drawing_spec=None,
|
||||
connection_drawing_spec=mp.solutions.drawing_styles
|
||||
.get_default_face_mesh_contours_style())
|
||||
mp.solutions.drawing_utils.draw_landmarks(
|
||||
image=annotated_image,
|
||||
landmark_list=face_landmarks_proto,
|
||||
connections=mp.solutions.face_mesh.FACEMESH_IRISES,
|
||||
landmark_drawing_spec=None,
|
||||
connection_drawing_spec=mp.solutions.drawing_styles
|
||||
.get_default_face_mesh_iris_connections_style())
|
||||
|
||||
return annotated_image
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Process a video to detect and draw face landmarks.')
|
||||
parser.add_argument('input_video', help='The path to the input video file.')
|
||||
args = parser.parse_args()
|
||||
|
||||
input_video_path = args.input_video
|
||||
base_name, extension = os.path.splitext(input_video_path)
|
||||
output_video_path = f"{base_name}_annotated{extension}"
|
||||
output_csv_path = f"{base_name}_landmarks.csv"
|
||||
|
||||
# --- NEW: Create the landmark map ---
|
||||
landmark_to_part_map = create_landmark_map()
|
||||
|
||||
# --- Configuration & Setup ---
|
||||
model_path = 'face_landmarker.task'
|
||||
base_options = python.BaseOptions(model_asset_path=model_path)
|
||||
options = vision.FaceLandmarkerOptions(base_options=base_options,
|
||||
output_face_blendshapes=True,
|
||||
output_facial_transformation_matrixes=True,
|
||||
num_faces=1)
|
||||
detector = vision.FaceLandmarker.create_from_options(options)
|
||||
|
||||
# --- Video and CSV Setup ---
|
||||
cap = cv2.VideoCapture(input_video_path)
|
||||
if not cap.isOpened():
|
||||
print(f"Error: Could not open video file {input_video_path}")
|
||||
return
|
||||
|
||||
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
fps = int(cap.get(cv2.CAP_PROP_FPS))
|
||||
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
|
||||
out = cv2.VideoWriter(output_video_path, fourcc, fps, (frame_width, frame_height))
|
||||
|
||||
# Open CSV file for writing
|
||||
with open(output_csv_path, 'w', newline='') as csvfile:
|
||||
csv_writer = csv.writer(csvfile)
|
||||
# NEW: Write the updated header row
|
||||
csv_writer.writerow(['frame', 'face', 'landmark_index', 'face_part', 'x', 'y', 'z'])
|
||||
|
||||
print(f"Processing video: {input_video_path} 📹")
|
||||
frame_number = 0
|
||||
while(cap.isOpened()):
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
frame_number += 1
|
||||
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
||||
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb_frame)
|
||||
detection_result = detector.detect(mp_image)
|
||||
|
||||
# Write landmark data to CSV
|
||||
if detection_result.face_landmarks:
|
||||
for face_index, face_landmarks in enumerate(detection_result.face_landmarks):
|
||||
for landmark_index, landmark in enumerate(face_landmarks):
|
||||
# NEW: Look up the face part name from the map
|
||||
face_part = landmark_to_part_map.get(landmark_index, 'unknown')
|
||||
# NEW: Write the new column to the CSV row
|
||||
csv_writer.writerow([frame_number, face_index, landmark_index, face_part, landmark.x, landmark.y, landmark.z])
|
||||
|
||||
# Draw landmarks on the frame for the video
|
||||
annotated_frame = draw_landmarks_on_image(rgb_frame, detection_result)
|
||||
bgr_annotated_frame = cv2.cvtColor(annotated_frame, cv2.COLOR_RGB2BGR)
|
||||
out.write(bgr_annotated_frame)
|
||||
|
||||
# Release everything when the job is finished
|
||||
cap.release()
|
||||
out.release()
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
print(f"\n✅ Processing complete.")
|
||||
print(f"Annotated video saved to: {output_video_path}")
|
||||
print(f"Landmarks CSV saved to: {output_csv_path}")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
214
process_mp4_holistic.py
Executable file
214
process_mp4_holistic.py
Executable file
@@ -0,0 +1,214 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
holistic_mp4.py
|
||||
Process an MP4 with MediaPipe Holistic:
|
||||
- Saves annotated video
|
||||
- Exports CSV of face/pose/hand landmarks per frame
|
||||
|
||||
Usage:
|
||||
python holistic_mp4.py /path/to/input.mp4
|
||||
python holistic_mp4.py /path/to/input.mp4 --out-video out.mp4 --out-csv out.csv --show
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import csv
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import cv2
|
||||
import mediapipe as mp
|
||||
|
||||
mp_holistic = mp.solutions.holistic
|
||||
mp_drawing = mp.solutions.drawing_utils
|
||||
mp_styles = mp.solutions.drawing_styles
|
||||
|
||||
|
||||
def parse_args():
|
||||
p = argparse.ArgumentParser(description="Run MediaPipe Holistic on an MP4 and export annotated video + CSV landmarks.")
|
||||
p.add_argument("input", help="Input .mp4 file")
|
||||
p.add_argument("--out-video", help="Output annotated MP4 path (default: <input>_annotated.mp4)")
|
||||
p.add_argument("--out-csv", help="Output CSV path for landmarks (default: <input>_landmarks.csv)")
|
||||
p.add_argument("--model-complexity", type=int, default=1, choices=[0, 1, 2], help="Holistic model complexity")
|
||||
p.add_argument("--no-smooth", action="store_true", help="Disable smoothing (smoothing is ON by default)")
|
||||
p.add_argument("--refine-face", action="store_true", help="Refine face landmarks (iris, lips).")
|
||||
p.add_argument("--show", action="store_true", help="Show preview window while processing")
|
||||
return p.parse_args()
|
||||
|
||||
|
||||
def open_video_writer(cap, out_path):
|
||||
# Properties from input
|
||||
fps = cap.get(cv2.CAP_PROP_FPS)
|
||||
if fps is None or fps <= 0:
|
||||
fps = 30.0 # sensible fallback
|
||||
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
|
||||
# Writer
|
||||
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
|
||||
writer = cv2.VideoWriter(out_path, fourcc, float(fps), (width, height))
|
||||
if not writer.isOpened():
|
||||
raise RuntimeError(f"Failed to open VideoWriter at {out_path}")
|
||||
return writer, fps, (width, height)
|
||||
|
||||
|
||||
def write_landmarks_to_csv(writer, frame_idx, ts_ms, kind, landmarks, world_landmarks=None, handedness=None):
|
||||
"""
|
||||
landmarks: NormalizedLandmarkList (x,y,z, visibility?) -> face/hand have no visibility; pose has visibility.
|
||||
world_landmarks: LandmarkList in meters (optional, pose_world_landmarks available).
|
||||
handedness: "Left"|"Right"|None (we label hand sets by field name; not a confidence score here)
|
||||
"""
|
||||
if not landmarks:
|
||||
return
|
||||
|
||||
# index by position; world coords may be absent or differ in length
|
||||
wl = world_landmarks.landmark if world_landmarks and getattr(world_landmarks, "landmark", None) else None
|
||||
|
||||
for i, lm in enumerate(landmarks.landmark):
|
||||
world_x = world_y = world_z = ""
|
||||
if wl and i < len(wl):
|
||||
world_x, world_y, world_z = wl[i].x, wl[i].y, wl[i].z
|
||||
|
||||
# Some landmark types (pose) include visibility; others (face/hands) don't
|
||||
vis = getattr(lm, "visibility", "")
|
||||
writer.writerow([
|
||||
frame_idx,
|
||||
int(ts_ms),
|
||||
kind, # e.g., face, pose, left_hand, right_hand
|
||||
i,
|
||||
lm.x, lm.y, lm.z,
|
||||
vis,
|
||||
"", # presence not provided in Holistic landmarks
|
||||
world_x, world_y, world_z,
|
||||
handedness or ""
|
||||
])
|
||||
|
||||
|
||||
def main():
|
||||
args = parse_args()
|
||||
in_path = Path(args.input)
|
||||
if not in_path.exists():
|
||||
print(f"Input not found: {in_path}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
out_video = Path(args.out_video) if args.out_video else in_path.with_name(in_path.stem + "_annotated.mp4")
|
||||
out_csv = Path(args.out_csv) if args.out_csv else in_path.with_name(in_path.stem + "_landmarks.csv")
|
||||
|
||||
cap = cv2.VideoCapture(str(in_path))
|
||||
if not cap.isOpened():
|
||||
print(f"Could not open video: {in_path}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
writer, fps, (w, h) = open_video_writer(cap, str(out_video))
|
||||
|
||||
# Prepare CSV
|
||||
out_csv.parent.mkdir(parents=True, exist_ok=True)
|
||||
csv_file = open(out_csv, "w", newline="", encoding="utf-8")
|
||||
csv_writer = csv.writer(csv_file)
|
||||
csv_writer.writerow([
|
||||
"frame", "timestamp_ms", "type", "landmark_index",
|
||||
"x", "y", "z", "visibility", "presence",
|
||||
"world_x", "world_y", "world_z", "handedness"
|
||||
])
|
||||
|
||||
# Holistic configuration
|
||||
holistic = mp_holistic.Holistic(
|
||||
static_image_mode=False,
|
||||
model_complexity=args.model_complexity,
|
||||
smooth_landmarks=(not args.no_smooth),
|
||||
refine_face_landmarks=args.refine_face,
|
||||
enable_segmentation=False
|
||||
)
|
||||
|
||||
try:
|
||||
frame_idx = 0
|
||||
print(f"Processing: {in_path.name} -> {out_video.name}, {out_csv.name}")
|
||||
while True:
|
||||
ok, frame_bgr = cap.read()
|
||||
if not ok:
|
||||
break
|
||||
|
||||
# Timestamp (ms) based on frame index and fps
|
||||
ts_ms = (frame_idx / fps) * 1000.0
|
||||
|
||||
# Convert to RGB for MediaPipe
|
||||
image_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
|
||||
image_rgb.flags.writeable = False
|
||||
results = holistic.process(image_rgb)
|
||||
image_rgb.flags.writeable = True
|
||||
|
||||
# Draw on a BGR copy for output
|
||||
out_frame = frame_bgr
|
||||
|
||||
# Face
|
||||
if results.face_landmarks:
|
||||
mp_drawing.draw_landmarks(
|
||||
out_frame,
|
||||
results.face_landmarks,
|
||||
mp_holistic.FACEMESH_TESSELATION,
|
||||
landmark_drawing_spec=None,
|
||||
connection_drawing_spec=mp_styles.get_default_face_mesh_tesselation_style(),
|
||||
)
|
||||
write_landmarks_to_csv(csv_writer, frame_idx, ts_ms, "face", results.face_landmarks)
|
||||
|
||||
# Pose
|
||||
if results.pose_landmarks:
|
||||
mp_drawing.draw_landmarks(
|
||||
out_frame,
|
||||
results.pose_landmarks,
|
||||
mp_holistic.POSE_CONNECTIONS,
|
||||
landmark_drawing_spec=mp_styles.get_default_pose_landmarks_style()
|
||||
)
|
||||
write_landmarks_to_csv(
|
||||
csv_writer, frame_idx, ts_ms, "pose",
|
||||
results.pose_landmarks,
|
||||
world_landmarks=getattr(results, "pose_world_landmarks", None)
|
||||
)
|
||||
|
||||
# Left hand
|
||||
if results.left_hand_landmarks:
|
||||
mp_drawing.draw_landmarks(
|
||||
out_frame,
|
||||
results.left_hand_landmarks,
|
||||
mp_holistic.HAND_CONNECTIONS,
|
||||
landmark_drawing_spec=mp_styles.get_default_hand_landmarks_style()
|
||||
)
|
||||
write_landmarks_to_csv(csv_writer, frame_idx, ts_ms, "left_hand", results.left_hand_landmarks, handedness="Left")
|
||||
|
||||
# Right hand
|
||||
if results.right_hand_landmarks:
|
||||
mp_drawing.draw_landmarks(
|
||||
out_frame,
|
||||
results.right_hand_landmarks,
|
||||
mp_holistic.HAND_CONNECTIONS,
|
||||
landmark_drawing_spec=mp_styles.get_default_hand_landmarks_style()
|
||||
)
|
||||
write_landmarks_to_csv(csv_writer, frame_idx, ts_ms, "right_hand", results.right_hand_landmarks, handedness="Right")
|
||||
|
||||
# Write frame
|
||||
writer.write(out_frame)
|
||||
|
||||
# Optional preview
|
||||
if args.show:
|
||||
cv2.imshow("Holistic (annotated)", out_frame)
|
||||
if cv2.waitKey(1) & 0xFF == 27: # ESC
|
||||
break
|
||||
|
||||
# Lightweight progress
|
||||
if frame_idx % 120 == 0:
|
||||
print(f" frame {frame_idx}", end="\r", flush=True)
|
||||
frame_idx += 1
|
||||
|
||||
print(f"\nDone.\n Video: {out_video}\n CSV: {out_csv}")
|
||||
|
||||
finally:
|
||||
holistic.close()
|
||||
writer.release()
|
||||
cap.release()
|
||||
csv_file.close()
|
||||
if args.show:
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
98
recognize_gesture.py
Executable file
98
recognize_gesture.py
Executable file
@@ -0,0 +1,98 @@
|
||||
#!/usr/bin/env python3
|
||||
import argparse
|
||||
import sys
|
||||
import mediapipe as mp
|
||||
|
||||
BaseOptions = mp.tasks.BaseOptions
|
||||
VisionRunningMode = mp.tasks.vision.RunningMode
|
||||
GestureRecognizer = mp.tasks.vision.GestureRecognizer
|
||||
GestureRecognizerOptions = mp.tasks.vision.GestureRecognizerOptions
|
||||
|
||||
def _first_category(item):
|
||||
"""
|
||||
Accepts either:
|
||||
- a Classifications object with .categories
|
||||
- a list of Category
|
||||
- None / empty
|
||||
Returns the first Category or None.
|
||||
"""
|
||||
if item is None:
|
||||
return None
|
||||
# Shape 1: Classifications with .categories
|
||||
cats = getattr(item, "categories", None)
|
||||
if isinstance(cats, list):
|
||||
return cats[0] if cats else None
|
||||
# Shape 2: already a list[Category]
|
||||
if isinstance(item, list):
|
||||
return item[0] if item else None
|
||||
return None
|
||||
|
||||
def _len_safe(x):
|
||||
return len(x) if isinstance(x, list) else 0
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Recognize hand gestures in a still image with MediaPipe.")
|
||||
parser.add_argument("-i", "--image", default="hand.jpg", help="Path to input image (default: hand.jpg)")
|
||||
parser.add_argument("-m", "--model", default="gesture_recognizer.task",
|
||||
help="Path to gesture_recognizer .task model (default: gesture_recognizer.task)")
|
||||
parser.add_argument("--num_hands", type=int, default=2, help="Max hands to detect")
|
||||
args = parser.parse_args()
|
||||
|
||||
options = GestureRecognizerOptions(
|
||||
base_options=BaseOptions(model_asset_path=args.model),
|
||||
running_mode=VisionRunningMode.IMAGE,
|
||||
num_hands=args.num_hands,
|
||||
)
|
||||
|
||||
# Load the image
|
||||
try:
|
||||
mp_image = mp.Image.create_from_file(args.image)
|
||||
except Exception as e:
|
||||
print(f"Failed to load image '{args.image}': {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
with GestureRecognizer.create_from_options(options) as recognizer:
|
||||
result = recognizer.recognize(mp_image)
|
||||
|
||||
if result is None:
|
||||
print("No result returned.")
|
||||
return
|
||||
|
||||
n = max(
|
||||
_len_safe(getattr(result, "gestures", [])),
|
||||
_len_safe(getattr(result, "handedness", [])),
|
||||
_len_safe(getattr(result, "hand_landmarks", [])),
|
||||
)
|
||||
if n == 0:
|
||||
print("No hands/gestures detected.")
|
||||
return
|
||||
|
||||
for i in range(n):
|
||||
handed = None
|
||||
if _len_safe(getattr(result, "handedness", [])) > i:
|
||||
cat = _first_category(result.handedness[i])
|
||||
if cat:
|
||||
handed = cat.category_name
|
||||
|
||||
top_gesture = None
|
||||
score = None
|
||||
if _len_safe(getattr(result, "gestures", [])) > i:
|
||||
cat = _first_category(result.gestures[i])
|
||||
if cat:
|
||||
top_gesture = cat.category_name
|
||||
score = cat.score
|
||||
|
||||
header = f"Hand #{i+1}" + (f" ({handed})" if handed else "")
|
||||
print(header + ":")
|
||||
if top_gesture:
|
||||
print(f" Gesture: {top_gesture} (score={score:.3f})")
|
||||
else:
|
||||
print(" Gesture: none")
|
||||
|
||||
# If you want pixel landmark coordinates later:
|
||||
# if _len_safe(getattr(result, "hand_landmarks", [])) > i:
|
||||
# for j, lm in enumerate(result.hand_landmarks[i]):
|
||||
# print(f" lm{j}: x={lm.x:.3f} y={lm.y:.3f} z={lm.z:.3f}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
2
server_holistic.sh
Executable file
2
server_holistic.sh
Executable file
@@ -0,0 +1,2 @@
|
||||
echo "Go to: http://localhost:8001/holistic.html "
|
||||
python -m http.server 8001
|
||||
14
source_activate_venv.sh
Executable file
14
source_activate_venv.sh
Executable file
@@ -0,0 +1,14 @@
|
||||
#!/bin/bash
|
||||
|
||||
# AlERT: source this script, don't run it directly.
|
||||
# source source_activate_venv.sh
|
||||
|
||||
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
|
||||
echo "This script must be sourced, not run directly."
|
||||
echo "source source_activate_venv.sh"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# rest of your script here
|
||||
echo "Script is being sourced. Continuing..."
|
||||
source ./.venv/bin/activate
|
||||
Reference in New Issue
Block a user