Build a Palm Detection and Hand Tracking Model with Python. (Part 1)

Anmol Bajpai
5 min readJun 15, 2021

--

How to make a model that detects and tracks your hand using OpenCV, Jupyter notebook, and MediaPipe?

Hey guys, thank you for being here, we now would learn how do you get stuff done. Let’s look at the prerequisites that are necessary for this particular project.

Pre-requisites :

  1. Basic Python programming knowledge.
  2. Use of Jupyter notebooks.
  3. Brief understanding of how Machine Learning models work.

I am using a Linux machine for this project. The setup is almost the same in Mac as in the tutorial. If you are working on Windows, the setup may slightly differ, don’t worry I’ll provide all necessary links here.

Let’s start.

Step 1: Install python3 and pip

Make sure you have python3 and pip installed on your machine. Pip stands for “preferred installer programs ” that allow you to install python packages. No worries, if you don’t have it. Visit python.org or check out these links to help you get started.

For Python:

For pip:

Step 2: Install Virtual Environment and Jupyter Notebook.

What is a virtual environment and why do we need it?

A virtual environment is a Python environment where you can install packages, dependencies, libraries, and scripts and these do not affect your operating system's environment. Consider this virtual environment as a separate room of your house where you can mess around with absolutely anything that would not affect your house (unless your mom sees it!).

Therefore, having an independent and isolated environment lets us install all unique packages and helps keep our project organized, and allows running packages locally in that particular environment.

So, we are going to create a virtual environment where we can install all that we need and get things done.

Check out this documentation to install virtualenv on your machine:

Everything’s freshly set up. Let’s go ahead.

Step 3: Create a virtual environment and set up the notebook.

  1. Create a folder and name it “Hand tracking” or anything you prefer.
  2. Open the terminal and create a virtual environment named track inside that folder you created.

3. Activate the virtual environment. Do this by: source your_env_name/bin/activate.

Might be slightly different in Windows so take care of that. When your environment is activated, the name of your environment now shows up prior to the directory as shown.

We have got the room. Let’s play around.

4. Install jupyter using the command: pip install jupyter

pip install jupyter and see your terminal going mad.

5. Install ipykernel to create a kernel for the notebook. The command for that is:

pip install ipykernel

python3 -m ipykernel install — user — name=your_env_name

Creating a kernel for the notebook

6. Write jupyter notebook in the terminal once jupyter is installed. This will redirect you to a browser page with a jupyter interface on your localhost. (Jupyter notebook is launched yay!!)

jupyter notebook with your environment right inside it.

7. Create a notebook from the new tab on the top right and let’s code and don't forget to rename it.

Your notebook should look something like this. Notice the kernel name same as your virtual environment(track in this case)

Step 4: Get familiar with the libraries.

The two dependencies that we would be needing are:

a. OpenCV

b. MediaPipe

Let’s learn about them and see why are they needed.

OpenCV (Open Computer Vision) is a cross-platform, open-source library that helps us do stuff with images, graphics, and media. It lets you manipulate them and is extensively aimed at real-time computer vision. One can do image processing, object detection, 3D constructions, feature detections, and whatnot. There’s a lot of cool things that you can do and OpenCV lets you do that. Read more about OpenCV in this documentation:

Why and how are we using this library in our project?

We’ll use OpenCV:

a. to get access to our camera or the webcam through a pre-defined function and capture frames and images.

b. to render and process our images in the required form and interchanging them from RGB to BGR and vice-versa also using its pre-defined functions.

Next comes up, MediaPipe.

MediaPipe is a framework for building machine learning pipelines. The hand Tracking API that we will use in our case will come from here. It lets you run its own graphs on Python, JavaScript, C++, Android, and iOS Learn more about MediaPipe from their web here:

Why and how are we using this library in our project?

It implements ML pipeline, a pipeline where multiple models are working together to perform a certain something. In this case, it is returning landmarks from an image that we as a user provide, and it operates on that frame or image to return a 3D keypoint.

Thank you for reading, people!

--

--