What I Learned Building a Hand-Tracking Browser Game from Scratch
I've been building web software for over 20 years. APIs, databases, dashboards, the usual. When I decided to build Mirlo Volador — a browser game you control with your hand in front of a webcam — I figured the hard part would be the game logic. I was wrong. The hand tracking was where I spent most of my time, most of my frustration, and learned the most.
This is an honest account of what I ran into. Not a tutorial, not a polished retrospective. Just the problems I actually hit, the decisions I actually made, and a few things I'd do differently if I started today.
The Starting Question
The whole project started with one question: could MediaPipe hand tracking run fast enough in a browser to feel responsive in a game? Not in a demo, not in a research paper — in something people would actually play. I'd seen the MediaPipe demos before. They looked smooth. But demos are always smooth. I wanted to know what happened under real conditions: a potato laptop, a cheap webcam, a bright window behind the player.
The answer, after about two weeks of testing, is: yes, but with caveats. On a mid-range laptop from 2022 or newer, MediaPipe's hand landmark model runs at around 25 to 35 frames per second with the default configuration. On a newer MacBook or a gaming machine it can hit 60. On older hardware or phones with limited GPU access it can drop to 15. That range is wide enough to matter.
The game loop runs at 60fps regardless. The hand tracker runs at whatever speed the hardware allows and feeds the latest position into the game state. Decoupling those two things was the first important architectural decision.
The Jitter Problem
Here's something the MediaPipe documentation doesn't warn you about clearly enough: the raw landmark coordinates are noisy. Even when your hand is perfectly still, the detected position of your wrist or palm shifts by a few pixels every frame. On screen, this translates to the bird constantly vibrating slightly up and down even when you're holding still. In a fast-moving game, that felt awful.
The fix is smoothing, but choosing the right smoothing is non-obvious. I went through three approaches before landing on one I was happy with.
The first approach was a simple moving average over the last five frames. It worked, but it introduced noticeable lag. When you moved your hand quickly upward, the bird lagged behind by almost 100 milliseconds. That's enough to cause a death in a game that punishes imprecision.
The second approach was an exponential moving average with a tunable alpha. Better, but the right alpha value depended on the game speed and the hardware frame rate. At 60fps a good alpha was different from a good alpha at 25fps. I didn't want a value that worked differently depending on which computer you were on.
The final approach was what I ended up shipping: a velocity-aware smoothing that applies stronger averaging when the hand is moving slowly (likely jitter) and weaker averaging when the hand is moving quickly (likely intentional movement). It's not complicated — just a few lines of TypeScript — but it took me a while to think of. The result is that the bird feels stable at rest and responsive in motion.
Coordinate Normalization and the Dead Zone
MediaPipe returns hand landmark positions as normalized coordinates between 0 and 1, where (0, 0) is the top-left of the camera frame. The first version of Mirlo Volador mapped the full 0-to-1 range directly to the game canvas height. This meant you had to move your hand from the very bottom of the camera frame to the very top to move the bird from the lowest pipe gap to the highest. That range of motion is exhausting, and most players don't even know they're supposed to reach those extremes.
I added a configurable active range. The game now uses only the middle portion of the vertical camera space — roughly from 25% to 75% of the frame height — and maps that range to the full game height. This means normal, relaxed hand movements cover the full range of game motion. Players don't need to stretch. The top and bottom 25% of the frame become effectively dead zones for positioning purposes.
This sounds simple. It took me a full day to get the numbers right, and it was one of the biggest improvements to the feel of the game.
Camera Permissions: The Invisible Barrier
I didn't think much about the camera permission prompt when I started building. I assumed users would read it, understand it, and click Allow. In reality, the permission prompt is one of the biggest sources of player drop-off.
Several things happen when someone sees a camera permission request on a site they just found. Some people immediately deny it out of privacy concern — completely understandable. Some people are on a device where their browser remembers a previous denial, so they never even see the prompt, just a black canvas with no explanation. Some people are on a work laptop with camera access disabled at the system level, which gives a different error entirely.
I spent time building clear error states for each of these cases. If camera access is denied, the game shows an explanation of what is happening and why, along with instructions for how to re-enable it in their browser settings. If no camera is found, it says so explicitly. These states seem obvious in retrospect, but the first version of the game just showed a black canvas and nothing else, which was confusing to everyone.
Mobile: A Separate Problem
I built Mirlo Volador primarily for desktop browsers with a webcam. But people tried it on their phones almost immediately, because of course they did. Mobile presented a different set of challenges.
First, the front-facing camera on a phone is typically in a fixed position and people hold phones close to their faces. The natural holding position puts the camera looking straight at you, but the game requires a view of your hand. Players had to hold the phone unusually — either at arm's length or positioned to the side — which felt awkward.
Second, mobile GPU access for WebAssembly is inconsistent. Some phones ran MediaPipe beautifully. Others throttled the GPU under sustained load, causing the frame rate to drop to single digits after a minute of play.
Third, the screen real estate on mobile is small, and the game canvas plus the hand tracking view compete for space in a way that works fine on a widescreen desktop but feels cramped on a phone.
The game works on mobile and I did make a number of adjustments to improve the experience, but it is primarily a desktop game. I'm honest about this in the UI. Trying to pretend otherwise would make the mobile experience worse, not better, because I'd be optimizing for a framing that doesn't match the reality of how the game works.
The TypeScript and Canvas Architecture
I used TypeScript from the start, which I don't regret at all. The game state has a lot of moving parts — the bird position and velocity, the pipe array with their positions and gap sizes, the hand tracker state, the score, the game phase (waiting, playing, dead) — and typed interfaces made all of it much easier to reason about.
The rendering is pure HTML5 Canvas 2D. No game framework, no WebGL. For a game of this visual complexity, Canvas 2D is more than sufficient and keeps the bundle small. Vite handles the build, which is a good choice for this type of project: fast hot module replacement during development and clean production bundles without much configuration.
One architecture decision I'm genuinely happy with: the hand tracker runs in its own class with a simple interface. It exposes a single method that returns the current normalized hand Y position (0 to 1) or null if no hand is detected. The game loop doesn't care how that number is computed. If I wanted to switch from MediaPipe to a different tracking library tomorrow, I'd only have to change one file. This separation paid for itself multiple times when I was debugging tracking issues.
What I Would Do Differently
A few things I'd change if I started fresh:
Start with smoothing. I added it late, after the core game was working, and retrofitting it was annoying. Smoothing the input is not an optimization — it is a fundamental part of the game feel. Build it in from day one.
Test on slow hardware earlier. My development machine is fast. I tested on it constantly. The first time I handed the game to a friend with an older laptop, the frame rate dropped to 18fps and the experience was completely different. I should have had a slower test device in the loop from the beginning.
Design the camera permission flow before everything else. It is the first thing the player encounters. Getting it wrong means you lose the player before they've seen a single frame of the game. I treated it as an afterthought and paid for it in user confusion.
Building Mirlo Volador taught me that camera-based input is fundamentally different from keyboard or touch input. Keyboard input is binary and clean. Camera input is continuous, noisy, hardware-dependent, and affected by the physical environment in ways that keyboard input never is. Designing around that requires a different mindset. Once I accepted that and stopped trying to treat the hand position like a keyboard event, the pieces fell into place.
If you want to see the result of all this, the game is right here. No download, no account. Just allow camera access and hold up your hand.