Jan 27, 2026

Build face & hand tracking effects with Bolt x MediaPipe

With Bolt and Google's MediaPipe, you can create interactive apps that incorporate human tracking using nothing but a webcam. Here are four unique tools and games you can build in minutes.

Projects

A title card that reads "face and hand tracking effects with MediaPipe."
A title card that reads "face and hand tracking effects with MediaPipe."

You’ve probably seen the viral hand and face tracking effects taking over X in recent months. Builders are creating endlessly fun interactive experiences online. 


In this tutorial, we’ll show you how to build interactive apps yourself using MediaPipe and Bolt. 


MediaPipe lets you track faces, hands, and bodies in real time using nothing but a camera. Bolt, of course, lets you turn that raw capability into real, usable products fast.


This walkthrough features a total of four interactive apps:


  • A MediaPipe interactive showcase

  • A fitness rep counter

  • An air piano controlled by your fingers

  • A posture guardian and meeting engagement tracker


Watch the walkthrough with Jakub Skrzypczak, designer and Bolt.new superuser:



First: What’s MediaPipe? 


MediaPipe is Google’s open-source framework for real-time human tracking. It runs entirely in the browser and gives you structured landmark data you can build on immediately.


MediaPipe supports four core detection modes:


Face


  • Tracks 468 facial landmarks

  • Use for tracking expressions, gaze, and attention


Hands


  • Tracks 21 landmarks per hand

  • Use for tracking gestures, instruments, and controls 


Pose


  • Tracks 33 body landmakrs

  • Use it for tracking fitness, posture, and movement


Holistic


  • Tracks face + hands + body

  • Use it for full-body interactions


Instead of raw pixels, you get coordinates, angles, and visibility scores, which serve as perfect inputs for apps.


Explore MediaPipe with an interactive showcase


Before you start building, it helps to get a sense for how MediaPipe behaves in real conditions. Fortunately, there’s a live demo for that. 


MediaPipe Showcase


MediaPipe showcase screenshot.


This showcase lets you toggle between:


  • Face detection: expressions, head rotation, blinking

  • Hand tracking: open palms, fists, finger positions

  • Pose detection: arms, legs, squats

  • Holistic mode: all of the above at once


You’ll quickly notice which angles are reliable and which aren’t (for example, flat palms facing the camera can get glitchy). It’ll help you decide how to design interactions in your app. 


App #1: Fitness rep counter (pose tracking)


You can have a lot of fun with interactive apps. But you can also build things that are genuinely useful. This Fitness Tracker is one such example. 


Fitness Tracker


Screenshot of the fitness tracker.


Prompt 


Build a clean fitness rep counter for basic exercises using MediaPipe Pose detection.

=== CORE CONCEPT ===

Simple, focused fitness app. Pick an exercise, do it, watch your reps get counted automatically with real-time form feedback. Full-screen camera view with minimal UI.

=== TECH STACK ===

React + TypeScript + Vite

MediaPipe Pose Landmarker

Tailwind CSS

=== EXERCISES (Pick 3-4 Simple Ones) ===

Push-ups (easiest to detect)

Track elbow angle

Count: Straight arms → bent elbows → straight arms

Squats (clear movement)

Track knee and hip angles

Count: Standing → deep squat → standing

Jumping Jacks (fun to track)

Track arm height and leg spread

Count: Arms down/legs together → arms up/legs apart → back

Arm Raises (simple)

Track shoulder to wrist height

Count: Arms down → arms raised above shoulder → back down

=== USER FLOW ===

Setup Screen:

Title: "FITNESS TRACKER"

Exercise selection (large buttons with icons)

Target reps: Quick buttons (10 / 15 / 20)

"Start" button

Countdown:

"3... 2... 1... GO!"

Gets user ready

Workout Screen:

FULL SCREEN camera feed (mirrored)

Thin skeleton overlay on user

Large rep counter (center or corner)

Simple phase indicator: "DOWN" / "UP"

Minimal design, focus on movement

Summary:

"Complete!" message

Final rep count

Duration

"Do Another" / "Done" buttons

=== REP COUNTING LOGIC ===

State Machine (Simple):

For each exercise, track 3 states:

READY (starting position)

PHASE_1 (movement down/out)

PHASE_2 (movement up/back)

Count rep when completing full cycle: READY → PHASE_1 → PHASE_2 → back to READY

Requirements to Count Rep:

Must pass through all states in order

Must hold each state briefly (200-300ms minimum)

Key landmarks must be visible

Movement must be smooth (not glitchy)

Example - Squats:

READY: Standing (knee angle > 160°)

DOWN: Squatting (knee angle < 110°, hip below knee)

UP: Back to standing (knee angle > 160°)

Count +1 when full cycle completes

Example - Push-ups:

READY: Plank position (elbow angle > 160°)

DOWN: Lowered (elbow angle < 100°)

UP: Back to plank (elbow angle > 160°)

Count +1 when full cycle completes

=== VISUAL DESIGN ===

Minimal & Clean:

Full screen camera (no wasted space)

Skeleton overlay: Thin white lines, semi-transparent

Rep counter: Very large number, floating card with blur background

Everything else minimal

Rep Counter:

Huge bold number (e.g., "12")

Positioned where it doesn't block user

Subtle animation when incrementing

That's it - keep it simple

Phase Indicator:

Single word: "UP" / "DOWN" / "READY"

Small text below rep counter

Changes color based on state

Optional - can be removed for even cleaner look

Skeleton Overlay:

Connect key joints with thin lines

Simple, not cluttered

Just enough to see tracking is working

Option to hide it entirely

=== FEEDBACK ===

Keep It Minimal:

Rep counts = satisfaction

Smooth animations

Optional subtle sound on rep count

No complex form feedback (save for later version)

If Form is Very Bad:

Simple message: "Slow down" or "Not detected"

Don't count bad reps

Keep it encouraging, not critical

=== SCREENS (ULTRA SIMPLE) ===

Setup:

Exercise buttons (push-ups, squats, jumping jacks)

Target number

Start button

That's it

Workout:

Camera feed + skeleton

Rep counter

Timer in corner

Stop button (small, unobtrusive)

Summary:

"15 reps completed!"

Time taken

Two buttons: Again / Done

=== TECHNICAL ===

Detection:

MediaPipe Pose at 30fps

Track relevant landmarks per exercise

Simple angle calculations

State machine logic

Smooth landmark positions

Performance:

Efficient rendering

No dropped frames

Responsive UI updates

Quick model loading

=== DESIGN PHILOSOPHY ===

KISS - Keep It Simple, Stupid.

One screen, one focus: counting reps

No complex menus or features

Works immediately

Feels fast and responsive

Satisfying to use

The app should feel like: "I just want to count my reps accurately. That's it."

Think: Apple-level simplicity. Beautiful, minimal, functional. Nothing more, nothing less.


How it works


  1. Camera feed runs full screen

  2. MediaPipe Pose tracks 33 body landmarks at ~30fps

  3. Key joint angles are calculated (knees, hips, elbows)

  4. A simple state machine counts reps:

    • READY → DOWN → UP → READY

  5. Reps increment only when the full movement cycle completes


Exercises included


  • Squats (knee + hip angles)

  • Push-ups (elbow angle)

  • Jumping jacks (arm height + leg spread)

  • Arm raises (shoulder to wrist height)


Why this works well


  • Minimal UI = fewer distractions

  • Skeleton overlay confirms tracking

  • No over-engineered form scoring

  • Counts reps accurately and immediately


MediaPipe handles the hard part (body tracking). Bolt handles everything else, including state logic, UI, animations, and performance.


App #2: An air piano with hand tracking


Now let’s do something playful.


Air Piano


Screenshot of the air piano.


Prompt


Build a virtual 8-key piano controlled by finger positions using MediaPipe Hands with two modes: free play and rhythm game.

=== CORE CONCEPT ===

Air piano you control with your fingers. Play freely or follow falling notes Guitar Hero style.

=== TECH STACK ===

React + TypeScript + Vite

MediaPipe Hands

Tailwind CSS

Web Audio API for piano sounds

=== PIANO SETUP ===

8 Keys:

C, D, E, F, G, A, B, C (one octave)

Each finger controls one key

Left hand: Thumb=C, Index=D, Middle=E, Ring=F

Right hand: Index=G, Thumb=A, Middle=B, Ring=C

Detection:

Track one or both hands

Each extended finger hovers over its key

"Press" detected when finger dips down (Y coordinate change)

Key plays sound and lights up

=== TWO MODES ===

1. FREE PLAY:

Just play the piano freely

Keys respond to finger presses

Visual feedback: keys light up

Piano sounds play

Relaxed, exploratory mode

Record and playback optional

2. RHYTHM MODE (Guitar Hero style):

Notes fall from top of screen toward keys

Hit the right key when note reaches target line

Score based on timing accuracy

Play pre-loaded simple songs

Progress through difficulty levels

=== USER FLOW ===

Main Menu:

"FREE PLAY" button

"RHYTHM MODE" button

Settings (sound volume, hand preference)

Free Play:

Full screen with piano keyboard at bottom

Hand tracking display in corner (small)

Keys light up when pressed

That's it - simple and fun

Rhythm Mode - Song Select:

List of 3-5 simple songs:

"Twinkle Twinkle" (very easy)

"Mary Had a Little Lamb" (easy)

"Ode to Joy" (medium)

Shows difficulty and duration

"Play" button

Rhythm Mode - Gameplay:

Piano keyboard at bottom (visual reference)

Notes fall from top in lanes (one per key)

Target line where notes should be hit

Score and combo counter

Progress bar (song completion)

Results:

Final score

Accuracy percentage

Notes hit vs missed

"Play Again" / "New Song" / "Free Play"

=== VISUAL DESIGN ===

Piano Keyboard:

Simplified visual (white keys only for 8 notes)

Clean, modern design

Keys: Off-white when idle

Keys: Bright color when pressed

Labels: Note names on each key

Free Play Screen:

Large piano keyboard (bottom 30% of screen)

Camera feed (optional, can be minimized)

Hand skeleton showing finger positions

Minimal UI - focus on playing

Rhythm Mode Screen:

Piano keyboard (bottom 20%)

Note highway (top 60%)

Notes: Colored circles falling in lanes

Target line: Bright horizontal line

Score display: Top corner, unobtrusive

Note Highway:

Vertical lanes (one per key)

Lane dividers: Thin lines

Notes: Simple circles or rounded rectangles

Color-coded by key/note

Falls smoothly at constant speed

=== DETECTION LOGIC ===

Finger Tracking:

Track extended fingers

Detect finger tips (landmarks 4, 8, 12, 16, 20)

Map finger position to key (X coordinate determines which key)

Detect "press" when finger tip moves down (Y increases)

Key Press Detection:

Finger must be over correct key (X position)

Finger must dip down (Y change threshold)

Trigger once per press (prevent rapid re-triggering)

Visual and audio feedback immediate

Rhythm Mode Timing:

Check if key pressed when note in target zone

Target zone: ±100ms window (configurable)

Perfect: Within ±30ms

Good: Within ±60ms

Okay: Within ±100ms

Miss: Outside window or wrong key

=== GAMEPLAY (RHYTHM MODE) ===

Scoring:

Perfect hit: +100 points

Good hit: +75 points

Okay hit: +50 points

Miss: 0 points, combo breaks

Combo multiplier: 5+ streak = 2x points

Feedback:

Visual: "PERFECT!" / "GOOD" / "MISS" appears briefly

Audio: Different sound for perfect vs good vs miss

Note explodes in particles on hit

Smooth animations throughout

Songs:

Pre-programmed note sequences

3-5 simple, recognizable melodies

Progressive difficulty (starts easy)

Each song 30-60 seconds long

=== AUDIO ===

Piano Sounds:

Clean piano samples for each of 8 notes

Low latency playback (<50ms)

Volume control in settings

Feedback Sounds:

Hit confirmation (subtle click)

Perfect hit (satisfying ding)

Miss (soft "oops" or just silence)

Background music in menus (optional)

=== TECHNICAL ===

Hand Detection:

MediaPipe Hands at 30fps

Track one hand (or both for advanced)

Smooth finger positions

Calibrate sensitivity

Audio System:

Web Audio API for low latency

Pre-load all sound samples

Efficient playback (no lag)

Note Generation:

Pre-defined note sequences for songs

Timing based on BPM

Notes spawn and fall smoothly

=== DESIGN PHILOSOPHY ===

Make it feel like playing a real instrument, even though it's just hand tracking. Responsive, low-latency, satisfying feedback. Simple enough for anyone to try, engaging enough to keep playing.

Free Play = Creative exploration, no pressure

Rhythm Mode = Challenge and achievement, but still fun

Think: Actual musical instrument meets rhythm game. Easy to understand, hard to master, always satisfying to play.


The result: Two modes


Free Play

  • Move your fingers

  • Keys light up

  • Notes play instantly

  • No rules, just exploration


Rhythm Mode

  • Notes fall Guitar Hero–style

  • Hit the right key at the right time

  • Scoring + combos

  • Simple songs like Mary Had a Little Lamb

Design lesson


From the showcase earlier, you already know:

  • Rotations work well

  • Depth works well

  • Flat, edge-on hand angles don’t

This app avoids unreliable angles and feels surprisingly real.

App #3: Posture guardian (pose tracking, background-aware)

This one is designed to run all day and help users clean up their posture. 

Posture Guardian

Screenshot of the Posture Guardian.


Prompt 

Build a real-time posture monitoring web application that uses the device camera and AI pose detection to analyze and improve users' sitting posture.

Core Concept

A desktop/laptop web app that runs in the background while users work, continuously monitoring their posture via webcam and providing gentle alerts when posture degrades. Think of it as a digital ergonomics coach.

Tech Stack

React + TypeScript + Vite

MediaPipe Pose (load from CDN: https://cdn.jsdelivr.net/npm/@mediapipe/pose)

Zustand for state management

Tailwind CSS for styling

Lucide React for icons

date-fns for date formatting

Bolt Database for persistence (sessions, daily summaries, settings)

Key Features

1. Pose Detection & Analysis

Use MediaPipe Pose to detect 33 body landmarks from webcam feed

Focus on upper body: nose, ears, eyes, shoulders, elbows, wrists, hips

Calculate posture metrics: head angle, shoulder angle, neck angle, shoulder symmetry, head forward offset

Use a calibration system where users set their "ideal posture" as baseline

Score posture 0-100 based on deviation from calibrated baseline

Use rolling average (30 frames) to smooth scores and prevent jitter

2. Calibration Wizard

3-second countdown with visual feedback

User sits in ideal posture, app captures baseline metrics

Store calibration in both Zustand and Bolt Database

Show skeleton overlay comparison (baseline vs current)

3. Alert System with Progressive Escalation

Alert levels: none → caution → warning → alert

Thresholds based on duration of poor posture (in foreground: 60s caution, 3min warning, 5min alert)

Immediate alert if score drops below 40 (critical threshold)

Browser Notification API for background alerts

Web Audio API for sound alerts (pleasant chime, not jarring)

Configurable sensitivity (threshold score) and notification frequency

4. Visual Interface

Collapsible panel design - minimal when collapsed, detailed when expanded

Real-time skeleton visualization overlaid on video (optional, toggle in settings)

Color-coded score display (green/yellow/orange/red based on score)

Score breakdown showing head, shoulder, back, symmetry components

Animated circular progress indicator

Weekly chart showing daily averages

Today's stats: average score, hours of good posture, alerts count

5. Settings

Alert sensitivity slider (score threshold)

Notification frequency (low/medium/high)

Sound on/off toggle

Stretch reminder interval

Show/hide video feed

Show/hide baseline overlay

Critical Technical Challenge: Background Tab Detection

Browsers heavily throttle JavaScript in background tabs. This is the hardest part of the app. Here's what works:

Problem: requestAnimationFrame pauses completely when tab is hidden. setInterval is throttled to once per second at best. Video processing nearly stops.

Solution - Multi-layered approach:

Silent Audio Trick: Create an AudioContext playing inaudible audio (1Hz, gain 0.001). This signals to the browser that the page is "active" and reduces some throttling.

requestVideoFrameCallback: When available (Chrome), use this instead of requestAnimationFrame for video processing - it's less throttled.

Web Worker Heartbeat: Create a worker that sends messages at regular intervals. Workers aren't throttled the same way. Use these messages to trigger frame processing.

setInterval Fallback: In background, switch to setInterval-based processing. Even if throttled to 1fps, some detection continues.

Baseline Score Comparison: Store the user's score when they switch away from the tab. When processing finally happens in background, compare current score to that baseline. If dropped 25+ points, trigger alert. This catches degradation even with delayed detection.

Periodic Notification Check: Run a 30-second interval that checks last known score and sends notifications if poor. This survives throttling since it doesn't need real-time.

Visibility Change Handler:

document.addEventListener('visibilitychange', () => {

  if (document.visibilityState === 'hidden') {

    // Store current score for later comparison

    // Switch to background processing mode

    // Start notification interval

  } else {

    // Resume normal processing

    // Clear baseline score

  }

});

Database Schema (Bolt Database)

Tables needed:

user_settings: alert_sensitivity, notification_frequency, sound_enabled, stretch_reminder_interval, etc.

posture_sessions: id, user_id, start_time, end_time, average_score, alerts_count

daily_summaries: date, average_score, total_minutes_tracked, good_posture_minutes, alerts_count, sessions_count

Enable RLS on all tables. Users can only access their own data.

UI/UX Guidelines

Dark, muted color palette (grays, subtle blues/teals for accents)

No purple/indigo

Minimal, non-intrusive design - users are working, not staring at this app

Smooth transitions and micro-animations (respect reduced motion preference)

Clear visual hierarchy: current score is most prominent

Collapse to small floating widget showing just score + status indicator

Expand for full stats, settings, and visualization

Posture Scoring Algorithm

Score = (headScore * 0.30) + (shoulderScore * 0.30) + (backScore * 0.25) + (symmetryScore * 0.15)

- headScore: Based on head angle deviation and forward head offset from baseline

- shoulderScore: Based on shoulder width ratio and angle vs baseline (detect hunching)

- backScore: Based on neck angle deviation (slouching)

- symmetryScore: Based on shoulder height difference and angle (leaning)

Each component: 100 = perfect alignment with baseline, decreases with deviation.

Important Implementation Notes

MediaPipe must be loaded dynamically from CDN - don't bundle it

Use Camera.onFrame for proper video frame timing during calibration

Always check landmark visibility (>0.5) before using landmarks

Handle case where shoulders go off-screen (sitting too close) - this should tank the score

Clear pose history when stopping/starting tracking

Persist calibration baseline to survive page refresh

Request notification permission early in the flow

Test on both Chrome and Firefox - Safari has additional restrictions

How it works

  1. User calibrates ideal posture (3-second capture)

  2. App continuously compares:

    • Head angle

    • Neck angle

    • Shoulder symmetry

  3. A rolling average smooths jitter

  4. A posture score (0–100) updates in real time

  5. Gentle alerts trigger when posture degrades

The hard part: Background tabs

Browsers throttle background tabs aggressively.

This app uses:

  • requestVideoFrameCallback

  • Web Workers

  • Silent AudioContext trick

  • Periodic score comparison

Result: posture alerts still fire even when the tab isn’t visible.

App #4: Track meeting presence with face detection


Finally, let’s use facial landmarks to build an app that helps users see how engaged and active they are on virtual meetings. 

Meeting Stats App

Screenshot of the meeting stats app.


Prompt


Create a "Meeting Stats" web application that uses MediaPipe Face Detection to track and visualize personal meeting engagement metrics in real-time.

Core Functionality:

Meeting Session Controls

Large "Start Meeting" button to begin tracking

Live timer showing current meeting duration

"End Meeting" button that stops tracking and shows dashboard

Simple, clean interface during recording

Real-Time Tracking (Background Processing)

Attention Detection: Track when face is detected vs not detected (looking at screen vs away)

Nod Counter: Detect vertical head movements (up-down motion of face landmarks) and count as nods

Gaze Heat Map: Track which areas of the screen the user's face is positioned relative to, store coordinates over time

Post-Meeting Dashboard

Display after clicking "End Meeting":

Meeting Summary Card: Total duration, date/time

Attention Score: Percentage of time face was detected, with visual progress bar

Nod Counter: Total nods with fun copy like "You nodded 47 times - active listener! 👏"

Focus Heat Map: Visual representation showing where on screen user spent most time (use color intensity - red for most time, blue for least)

Timeline Graph: Simple line chart showing attention levels throughout meeting (5-minute intervals)

Fun Insights: Generate quirky observations like:

"Peak focus at [time]"

"You looked away X times - coffee runs? ☕"

"Most active nodding period: [timeframe]"

History & Trends

List of past meetings with basic stats

Store in localStorage

"View Details" to see full dashboard for any past meeting

Simple stats: average attention rate, total meetings, total nods

Technical Requirements:

Use MediaPipe Face Detection library via CDN

Access webcam with proper permissions handling

All processing happens locally in browser (privacy-first)

Use localStorage for meeting history

Responsive design that works on desktop

Clean, modern UI with playful elements (emojis, fun copy)

Design Style:

Minimalist, premium aesthetic

Primary color: Deep blue (#1e3a8a)

Accent: Bright cyan (#06b6d4)

Soft shadows, rounded corners

Ample white space

Playful but professional tone

UI Flow:

Landing page with "Start Meeting" CTA and brief explainer

During meeting: Minimal UI, just timer and small "End" button, webcam feed in corner

After meeting: Full dashboard with all stats and visualizations

History page: List of all past meetings

Key Features:

Privacy notice: "All data stays on your device"

Optional: Toggle webcam preview on/off during meeting

Export meeting stats as image or text summary

Clear all history button

Make it fun, visually engaging, and give users surprising insights about their meeting behavior!

What it tracks (Locally only)


  • Face detected vs not detected (attention)

  • Head nods (vertical movement)

  • Gaze position on screen

  • Focus heat map

  • Attention timeline


After the meeting


You get:

  • Attention score (%)

  • Nod count

  • Focus heat map

  • Timeline of engagement

  • Fun insights like:
    “You nodded 12 times — active listener 👏”


All data stays on the device. 


Ready to build your own?


Experiment by:


  • Remixing any demo

  • Swapping detection modes

  • Changing interactions

  • …or shipping something entirely new


Get started now

More articles that will interest you