Distraction Detection using Pose Estimation with OpenCV and TensorFlow

Jake Mellichamp
CodeX
Published in
5 min readApr 27, 2021

--

Figure 1. Library Dependencies

Abstract

Recently our society has been thrust into a work-from-home-centric workforce because of the COVID-19 pandemic. Society transformed from working on job sites, learning in classrooms, and sweating-out projects in the library to computing all these tasks on a centralized workstation from home. It has thrown a wrench in work-life balance and I hypothesize that most people are suffering from copious distractions while doing work remotely. This hypothesis inspired me to create an application that could quantifiably measure an individual’s attentive vs inattentive behavior while working remotely. By using Python’s Tensor flow library for face detection and OpenCV for facial rotation an application was developed to test whether someone is engaging with their workstation.

TensorFlow Facial Detection

TensorFlow is an end-to-end open-source python library for machine learning. That is, it contains the tools necessary to create and train robust machine learning models. This software has been available since late 2015 and is responsible for developing thousands of machine learning models. Instead of reinventing the wheel and training my own face detection algorithm, I decided to utilize a pre-trained model for facial detection. This model was published by Yin Guobing on GitHub and detects 68 facial landmarks that can be utilized in defining a face object[2].

Figure 2. Facial Detection Implementation using CNN Classifier

OpenCV Pose Estimation

This is where the project deviates from supplied code to experimental code. The Pose Estimation problem boils down to calculating the relative rotation/orientation of the facial object detected. This is a notorious mathematical problem within Computer Vision known as the Perspective-n-point problem (PnP). [1] The “n” represents the number of known and identified points in the image plane. The PnP problem is special because it gives us a way of solving the extrinsic camera properties given predefined 2D coordinates, real-time 3D coordinates, and the intrinsic camera parameters from the current camera.

Figure 3. Perspective-n-Point Problem euqation

Therefore, when comparing a current image-frame to a previously defined face detection model, the mathematical function will be able to solve the Extrinsic Camera Properties defined in Matrix R and T. For this project I will only be concerned with the Rotational Matrix R which can be computed with openCV’s solvePNP(…) function

Implementation Initial Short Comings

After finding the rotational matrix I thought my project was finished, boom, easy. I simply converted the radian values returned from the solvePnP function into degrees and displayed the information. However, although the rotational matrix was solved and seemed to currently illustrate the current rotation of my face. The actual roll, pitch, and yaw values were extremely skewed about the camera.

Figure 4. Demonstrating Minimal Change to the Camera Matrix and Skewness (X in red, Y in green)

For example, simply looking straight into the camera yields an X rotation of -155 degrees when in reality it should be 0 or 180 degrees. A slight inconvenience, but something I could fix with adding weights to rotated axis values. After adding weights, I then added timers and calculated whether the user was paying attention to the camera by examining whether or not the face was rotated within a bound of (-15, 15) degrees for the Y- axis (head tilt) and (-10, 10) degrees for the X-axis (head turn).

Figure 5. Testing Distraction vs attentiveness with the rotation matrix

A More Simple Approach

Looking at the 23.9 percent error from the rotation matrix, to simply put it, was not impressive. This was a computer vision-inspired project, so I wanted to solve the problem using a computer vision concept. I could not help but notice, however, there are three facial marks on the nose marked by Yin Guobing’s facial detector. Therefore, we can utilize two of those points to make a line! Using some algebra and geometry we can solve to find the slope of the nose-line and then calculate the angle via an arc-tan function:

Figure 6. Showing theoretical Triangle on the face given nose coordinates.
Figure 7. Math used to solve for arctan

This means that when a user is facing the camera, the angle will be small (or 90 degrees if perfectly straight) and as the user looks away from the camera, the angle will increase! Let’s compare results:

Figure 8. Testing with the Tangent angle

Conclusion

All in all, it was a fun experimental project to predict distractions via a pose estimation algorithm. It took me down the rabbit holes of TensorFlow, facial markers, the PnP problem (and actually understanding it!), as well as some geometric functions. It also expanded my problem-solving skills as I was initially derailed with the rotation matrix being. . . not so accurate. The geometric example was clearly the more accurate of the two functions but failed if the user was looking downward at a phone screen. With more time and patience, a better Pose Estimation algorithm could probably be developed to detect when a user is distracted or not. As I was creating this problem, I realized that pose estimation, in general, can not determine if a user is currently being distracted or not. Will it work most of the time? Yes, however, this problem could be easily solved with an eye-tracking algorithm instead of a pose estimation algorithm.

Source Code: https://github.com/Jacob-Mellichamp/DistractionDector

A Quick Note on Performance

I was also experimenting with fps times to minimize CPU usage when the algorithm was running. By utilizing a 0.2 FPS (or one frame every five seconds) a 2.25 Ghz i5 intel processor was able to successfully keep track of distractions with as little as 5% CPU usage throughout the duration of the tracking without an external graphics card!

References

[1] Satya Mallick. 2016. Head Pose Estimation using OpenCV and Dlib. Retrieved April, 15 2021 from https://learnopencv.com/head-poseestimation-using-opencv-and-dlib/

[2] Ying Veytsman. [n.d.]. HeadPose Estimation. Retrieved April, 10 2021 from https://github.com/yinguobing/head-pose-estimation

--

--