Build a Palm Detection and Hand Tracking Model with Python. (Part 1)

5 min readJun 15, 2021

How to make a model that detects and tracks your hand using OpenCV, Jupyter notebook, and MediaPipe?

Hey guys, thank you for being here, we now would learn how do you get stuff done. Let’s look at the prerequisites that are necessary for this particular project.

Pre-requisites :

Basic Python programming knowledge.
Use of Jupyter notebooks.
Brief understanding of how Machine Learning models work.

I am using a Linux machine for this project. The setup is almost the same in Mac as in the tutorial. If you are working on Windows, the setup may slightly differ, don’t worry I’ll provide all necessary links here.

Let’s start.

Step 1: Install python3 and pip

Make sure you have python3 and pip installed on your machine. Pip stands for “preferred installer programs ” that allow you to install python packages. No worries, if you don’t have it. Visit python.org or check out these links to help you get started.

For Python:

Python 3 Installation & Setup Guide - Real Python

The first step to getting started with Python is to install it on your machine. In this tutorial, you'll learn how to…

realpython.com

For pip:

Installation - pip documentation v21.1.2

pip is already installed if you are using Python 2 >=2.7.9 or Python 3 >=3.4 downloaded from python.org or if you are…

pip.pypa.io

Step 2: Install Virtual Environment and Jupyter Notebook.

What is a virtual environment and why do we need it?

A virtual environment is a Python environment where you can install packages, dependencies, libraries, and scripts and these do not affect your operating system's environment. Consider this virtual environment as a separate room of your house where you can mess around with absolutely anything that would not affect your house (unless your mom sees it!).

Therefore, having an independent and isolated environment lets us install all unique packages and helps keep our project organized, and allows running packages locally in that particular environment.

So, we are going to create a virtual environment where we can install all that we need and get things done.

Check out this documentation to install virtualenv on your machine:

Installing packages using pip and virtual environments - Python Packaging User Guide

This guide discusses how to install packages using and a virtual environment manager: either for Python 3 or virtualenv…

packaging.python.org

Everything’s freshly set up. Let’s go ahead.

Step 3: Create a virtual environment and set up the notebook.

Create a folder and name it “Hand tracking” or anything you prefer.
Open the terminal and create a virtual environment named track inside that folder you created.

3. Activate the virtual environment. Do this by: source your_env_name/bin/activate.

Might be slightly different in Windows so take care of that. When your environment is activated, the name of your environment now shows up prior to the directory as shown.

We have got the room. Let’s play around.

4. Install jupyter using the command: pip install jupyter

pip install jupyter and see your terminal going mad.

5. Install ipykernel to create a kernel for the notebook. The command for that is:

pip install ipykernel

python3 -m ipykernel install — user — name=your_env_name

6. Write jupyter notebook in the terminal once jupyter is installed. This will redirect you to a browser page with a jupyter interface on your localhost. (Jupyter notebook is launched yay!!)

jupyter notebook with your environment right inside it.

7. Create a notebook from the new tab on the top right and let’s code and don't forget to rename it.

Your notebook should look something like this. Notice the kernel name same as your virtual environment(track in this case)

Step 4: Get familiar with the libraries.

The two dependencies that we would be needing are:

a. OpenCV

b. MediaPipe

Let’s learn about them and see why are they needed.

OpenCV (Open Computer Vision) is a cross-platform, open-source library that helps us do stuff with images, graphics, and media. It lets you manipulate them and is extensively aimed at real-time computer vision. One can do image processing, object detection, 3D constructions, feature detections, and whatnot. There’s a lot of cool things that you can do and OpenCV lets you do that. Read more about OpenCV in this documentation:

OpenCV: OpenCV modules

Edit description

docs.opencv.org

Why and how are we using this library in our project?

We’ll use OpenCV:

a. to get access to our camera or the webcam through a pre-defined function and capture frames and images.

b. to render and process our images in the required form and interchanging them from RGB to BGR and vice-versa also using its pre-defined functions.

Next comes up, MediaPipe.

MediaPipe is a framework for building machine learning pipelines. The hand Tracking API that we will use in our case will come from here. It lets you run its own graphs on Python, JavaScript, C++, Android, and iOS Learn more about MediaPipe from their web here:

MediaPipe

Built-in fast ML inference and processing accelerated even on common hardware Unified solution works across Android…

mediapipe.dev

Home

MediaPipe offers cross-platform, customizable ML solutions for live and streaming media.

google.github.io