使用 Emgu CV 库从视频中捕获运动。

Markus Koppensteiner

4.67/5 (9投票s)

2017 年 6 月 21 日

CPOL

10分钟阅读

33349

4073

本文演示了如何使用 Emgu CV 库的命令执行人脸识别、帧差分和密集光流。

引言

OpenCV（开源计算机视觉库）是一个免费可用的软件库，包含用于处理视觉输入（如图像或视频）的现成例程。该库提供的功能和工具可以通过 C/C++、Python 或 .NET 编程语言进行访问。在本文中，我将重点介绍 OpenCV 的封装库 Emgu CV，其方法可以嵌入到 C# 程序中。在此，我将演示如何加载和播放电影，如何在一张图像中检测人脸（通过使用预训练的 Haar 分类器），如何应用 Farneback 算法进行密集光流（以捕获运动），以及如何使用帧差分（即，减去连续帧的像素信息以捕获运动）。

本文的内容可以作为 Emgu CV 库提供的一些命令的入门指南，但它们也旨在表明所提出的技术有助于回答行为科学中出现的问题。因此，本文不仅包含代码描述，还附带了一个小型应用程序、一个视频以及一些关于如何使用描述的软件例程来记录行为分析数据的简要评论。

由于我通过 C# 访问 Emgu CV 封装库，因此需要扎实的该编程语言知识才能正确理解代码示例。相比之下，使用包含的应用程序不需要任何编程技能。该应用程序在 Visual Studio 2015 中编译，基于 .NET Framework 4.5 和 Emgu CV 3.0（点击此处）。为了在 Visual Studio 开发环境中可用 Emgu CV 的功能和类，需要采取几个步骤。首先，选择应用程序类型后（在本例中是标准的 Windows 窗体应用程序），转到“解决方案资源管理器”窗口，右键单击“引用”，然后选择“添加引用”。在出现的窗口中，选择“浏览”并搜索 Emgu CV 在安装过程中存储的文件夹。然后需要包含“bin”文件夹中的 DLL 文件。在主程序中输入 C# 命令“using”以及 Emgu CV 库的名称（例如，using Emgu.Util;）即可访问必要的组件（有关最后一步的更多信息，请参见可通过“下载源代码”下载的 Form1 文件）。有关所有这些步骤的详细描述可以在线找到（点击此处）。

我在这里不描述我提出的程序的数学背景。我只描述如何使用库提供的工具。如果您对所有这些的数学原理感兴趣，请在其他文章中查找。

电影连续帧像素信息的绝对差值

接下来，我将介绍一种简单的方法来提取视频的两个连续帧之间发生的运动量。视频的像素可以转换为灰度值，颜色范围从 0 到 255（8 位图像，从黑到白）。如果未发生运动（或物体位置偏移），两个帧之间的灰度值绝对差（即，两个图像中同一位置的像素之间的差值）将生成一个黑色的图像，因为像素值相互抵消（该操作对所有像素都给出 0）。然而，当发生位置偏移时，在“差值图像”的某些位置会显示不同的灰色渐变（参见上图）。可以通过计算高于某个阈值（例如，灰度值为 100）的像素值来估计运动量。当然，使用这种方法存在局限性和问题（例如，像素颜色的照明条件发生变化），但总的来说，它提供了对变化量的一个简单且相当稳健的估计。

使用代码

下面提供的代码片段只是应用程序中使用的方法的骨架版本（参见示例下载）。它侧重于确定连续帧之间绝对差值所需的最重要的 Emgu CV 命令。这段代码建立在之前需要完成的步骤之上。这包括初始化几个变量，例如 Capture 类。要捕获电影，需要像 Capture capture_movie = new Capture(movie_name) 这样的代码行。要更深入地了解这一点，查看源代码会很有帮助（参见 mnu_LoadMovie_Click）。总之，下面的代码会捕获一帧，将其转换为灰度帧，从前一帧中减去其像素值，并在窗口中显示此过程的结果（请注意，我主要使用 Image<> 类而不是 Mat，因为它具有 Mat 类不提供的某些功能）。

//
// Code skeleton for frame subtraction
//
// .... find full version of the code in Form1 file

 public void Abs_Diff_And_Areas_of_Activity()
        {
          

        // needed to clean up; all values stored in image are set to 0 (=black)
        // could also be done by reintializing the Image array below each time  
        img_abs_diff.SetZero(); // is a global Image<> variable definede elsewhere
                

        // the current frame becomes the previous frame (is stored in prev_frame) 
        // the Mat frame variable is defined in the section global variables of the source code
        prev_frame = frame;

        // ....omitted code 
                
               
        // drives movie to the frame to be decoded/captured next; 
        // in this case to the frame given in the variable frame_nr
        // capture_movie variable has to be intialized before =>
        // Capture capture_movie = new Capture(movie_name) 
        capture_movie.SetCaptureProperty(CapProp.PosFrames, frame_nr);

        // capture the frame of loaded movie at position of frame_nr (see previous line)
        // QueryFrame pushes "pointer of Capture class" forward; calling it again grabs the next frame
        frame = capture_movie.QueryFrame();
              

        // used for changing original frame size; 
        // resizing factor is given in textfield on user interface
        // making images smaller accelerates processing 
        Size n_size = new Size(frame.Width / Convert.ToInt32(txt_resize_factor.Text),
              frame.Height / Convert.ToInt32(txt_resize_factor.Text));
                
        // resize frame and previous frame, CvInvoke is an Emgu CV Class
        // the destination and the source frame for resizing are the same 
        CvInvoke.Resize(frame, frame, n_size );
        CvInvoke.Resize(prev_frame, prev_frame, n_size);

        // show resized frame in window 
        CvInvoke.Imshow("Movie resized", frame);
               
        // greyscale images to store information of the frame substraction procedure
        Image<Gray, Byte> prev_grey_img, curr_grey_img;

        // initialize images used for frame subtraction using size of resized frame above  
        rev_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);
        curr_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);
        
        // assign frame and previous frame to greyscale images (turns them into greyscales) 
        curr_grey_img = frame.ToImage<Gray, byte>(); // turns Mat frame variable into Image<> variable
        prev_grey_img = prev_frame.ToImage<Gray, Byte>();

        // subtract pixel values of successive frames (greyscale) from each other to get 
        // areas where changes in pixel color occured 
        // only provides positive values -> absolute difference
        CvInvoke.AbsDiff(prev_grey_img, curr_grey_img, img_abs_diff);

        // ALTERNATIVE: CvInvoke.Subtract(prev_grey_img, curr_grey_img, img_abs_diff); 
        // also gets differences but includes negative values as well

        //....omitted code // in source code file: code that transfers greyscale values above 
        // certain threshold (areas after substracting) into array
       

        // show results of CvInvoke.AbsDiff function 
        CvInvoke.Imshow("Frame Subtraction", img_abs_diff);


         // Release memory
         curr_grey_img.Dispose();
         prev_grey_img.Dispose();

                
       }
//

使用 Haar 分类器检测图像中的人脸

任何使用现代数码相机或智能手机摄像头的人都接触过这些设备自动人脸检测功能。OpenCv 和 Emgu CV 提供的工具也可以进行此类对象检测。这些工具基于用于对象识别的机器学习算法，或者更精确地说，基于所谓的 Haar 分类器。Haar 分类器使用大量正面示例（例如，人脸）和负面示例（例如，相同大小但不是人脸的图像）进行训练。然后可以将这种分类器应用于未分类的图像（例如，包含人脸的图像），以识别其中的对象（即，分类器训练的对象）。OpenCv 提供包含用于检测不同类型对象（例如，人脸、眼睛等）数据的现成 xml 文件。下面提供的代码使用了这种预训练的分类器。然而，也可以创建自己的 xml 文件用于对象分类。

与上面给出的代码示例一样，为了应用后续代码片段的命令，必须首先加载电影（即，Capture capture_movie = new Capture(movie_name)）。代码包含了在 Emgu CV 中使用 Haar 分类器的基本原理。使用不同的分类器（例如，眼睛）当然会产生不同的结果，但基本原理是相同的。

使用代码

// 
// Code skeleton for face detection
// 
// .... find full version of code in Form1 file

private void Face_Detect()
             {

                double rect_size = 0;

                // rectangle structure to store largest rectangle (largest face found) 
                // see below foreach loop
                Rectangle largest_rect = new Rectangle();

                //.... omitted code

                // using Haar classifier to find faces in images 
                // data of the trained classifier is stored in xml file 
                // has be in the same folder as the exe.file of the application
                CascadeClassifier haar = new CascadeClassifier("haarcascade_frontalface_default.xml");

                // drive movie to given frame number (stored in frame_nr variable)
                capture_movie.SetCaptureProperty(CapProp.PosFrames, frame_nr);

                // grab frame at the given position (given by frame_nr variable) 
                frame = capture_movie.QueryFrame();

                // convert frame stored as Mat variable to Image<bgr, byte> variable 
                // grabbed_image is global variable (see source code)
                grabbed_image = frame.ToImage<Bgr, Byte>();

                // used for changing original frame size 
                // resizing factor is given in textfield on user interface
                Size n_size = new Size(grabbed_image.Width / Convert.ToInt32(txt_resize_factor.Text),
                grabbed_image.Height / Convert.ToInt32(txt_resize_factor.Text));

                // resize grabbed frame 
                // for demonstration purposes I use the resize function here; 
                // this is different from other procedures in this article (eg., frame subtraction)
                CvInvoke.Resize(grabbed_image, grabbed_image, n_size);
                

                // define greyscale image; has the same size as grabbed_image
                Image<Gray, Byte> grey_img = new Image<Gray, byte>(grabbed_image.Width, 
                grabbed_image.Height);
                // convert grabbed image to greyscale image and store the result in greyscale image 
                grey_img = grabbed_image.Convert<Gray, byte>();
                 
                // define rectangle structure array for storing the position of all faces found 
                Rectangle[] rect;

                // use haarclassifier xlm file to detect faces in grey_scale image and 
                // store results in rect structure array
                // second parameter is factor by which the search window is scaled 
                // between subsequent scans
                // (for example, 1.1 means increasing window by 10%)
                // third parameter is minimum number (minus 1) of neighbor rectangles 
                // that make up an object
                rect = haar.DetectMultiScale(grey_img, 1.1, 3);

                // loop through rectangle array and draw each of them onto the image
                // find largest rectangle and draw it in a different color 
                foreach (var ele in rect)
                {

                    // check if found rectangle is largest rectangle and store this information
                    if ((ele.Width * ele.Height) > rect_size)
                    {
                        rect_size = ele.Width * ele.Height;
                        largest_rect = ele;
                    }

                    // draw found rectangles onto grabbed (original) frame; use red color 
                    grabbed_image.Draw(ele, new Bgr(255, 0, 0), 3);

                }

                // draw largest rectangle onto grabbed image (in green)
                grabbed_image.Draw(largest_rect, new Bgr(0, 225, 0), 3);

                // show results of these procedures   
                CvInvoke.Imshow("Original Video", grabbed_image);

                // release memory 
                grey_img.Dispose();
                haar.Dispose();

               //.... omitted code

            }

应用密集光流来捕获电影连续帧之间发生的像素位置偏移

当“光流”一词被创造出来时（Gibson, 1940），它主要用于描述观察者与场景之间相对运动引起的运动模式。更具体地说，它描述了当人或动物在其环境中移动时，眼睛必须处理的物体、表面和边缘的表观（即，原则上不存在的）运动。一个现代的——可能难以理解的——定义是，光流是图像中亮度模式运动的表观速度分布。

与上面介绍的帧差分方法类似，光流算法处理像素颜色的变化来检测运动。总的来说，主要有两种算法类别，即稀疏光流和密集光流。前者使用一组关键特征来检测运动，而后者则处理所有存在的像素信息。密集光流更准确，但需要更多的资源。在下面的示例中，我将展示基于 Gunnar Farneback 算法的密集光流代码，因为对我所做的工作而言，准确性比处理速度更重要。示例代码分为两个函数。第一个函数展示了光流过程的代码；第二个函数展示了如何访问过程的结果以及如何将这些结果绘制到屏幕上。关于 Farneback 算法参数的更多信息可以在 Emgu CV 和 OpenCV 网页上找到。我不会（也无法）提供关于算法内部结构的信息。

同样，与其他代码示例一样，为了应用后续代码片段的命令，必须首先加载电影（即，Capture capture_movie = new Capture(movie_name)）。Draw_Farneback_flow_map() 函数的代码骨架仅侧重于l用于访问像素偏移信息以及如何使这些偏移可见的行。在源代码文件中，可以找到大量额外的代码（例如，左右两侧所有向量的总和，有关方向变化的信息等）。

使用代码

// 
// Code skeleton for face detection 
// 
// .... find full version of code in Form1 file 

public void Dense_Optical_Flow()
        {

           //.... omitted code

           // frame becomes previous frame (i.e., prev_frame stores information before movie 
           // is pushed forward to next frame by QueryFrame() function)
           prev_frame = frame;

           // .... omitted code

           // set "pointer" to position where frame capturing will start
           capture_movie.SetCaptureProperty(CapProp.PosFrames, frame_nr);

           // capture frame
           frame = capture_movie.QueryFrame();

           // .... omitted code

           // used for changing original frame size
           // resizing factor is given in textfield on user interface
           Size n_size = new Size(frame.Width / Convert.ToInt32(txt_resize_factor.Text),
                  frame.Height / Convert.ToInt32(txt_resize_factor.Text));

           // resize frame and previous frame (make them smaller to reduce processing load)
           CvInvoke.Resize(frame, frame, n_size);
           CvInvoke.Resize(prev_frame, prev_frame, n_size);

           // images that are compared during the flow operation (see below) 
           Image<Gray, Byte> prev_grey_img, curr_grey_img;

           prev_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);
           curr_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);

           // image arrays to store information of flow vectors => results of Farneback algorithm
           // one image array for each direction, which is x and y
           Image<Gray, float> flow_x;
           Image<Gray, float> flow_y;

           flow_x = new Image<Gray, float>(frame.Width, frame.Height);
           flow_y = new Image<Gray, float>(frame.Width, frame.Height);

           // assign information stored in frame and previous frame to greyscale images
           curr_grey_img = frame.ToImage<Gray, byte>();
           prev_grey_img = prev_frame.ToImage<Gray, Byte>();

           // apply Farneback dense optical flow  
           // parameters are the two greyscale images (these are compared) 
           // and two image arrays storing the results of algorithm  
           // the rest of the parameters are (for more details consult google):
           // pryScale: specifies image scale to build pyramids: 
           //           0.5 means that each next layer is twice smaller than the former
           // levels: number of pyramid levels: 1 means no extra layers
           // winSize: the average window size; larger values = more robust to noise but more blur
           // iterations: number of iterations at each pyramid level
           // polyN: size of pixel neighbourhood: higher = more precision but more blur
           // polySigma
           // flags
           CvInvoke.CalcOpticalFlowFarneback(prev_grey_img, curr_grey_img, flow_x, flow_y, 
                         0.5, 3, 15, 3, 6, 1.3, 0);

           // call function that shows results of Farneback algorithm (see next section)  
           Draw_Farneback_flow_map(frame.ToImage<Bgr, Byte>(), flow_x, flow_y, overall_step);
           

           // Release memory 
           prev_grey_img.Dispose();
           curr_grey_img.Dispose();
           flow_x.Dispose();
           flow_y.Dispose();

           //.... omitted code    
          
        }

private void Draw_Farneback_flow_map(Image<Bgr, Byte> img_curr, 
        Image<Gray, float> flow_x, Image<Gray, float> flow_y, int step, int shift_that_counts = 0)
        {

         // NOTE: flow Images (flow_x and flow_y) are organized like this:
         // at index (is position of pixel before optical flow operation) of Image array
         // the shift of this specific pixel after the flow operation is stored
         // if no shift has occured value stored at index is zero
         // (i.e., pixel[index] = 0 
         
         // Point variable where line between pixel positions before and after flow starts
         Point from_dot_xy = new Point(); 
         // Point variable, which will be the endpoint of line between pixels before and after flow
         Point to_dot_xy = new Point(); 
            
         MCvScalar col; // variable to store color values of lines representing flow vectors
         col.V0 = 100;
         col.V1 = 255;
         col.V2 = 0;
         col.V3 = 0;

         //.... omitted code

        
         // loops over image matrix and gets positions of dots before and after optical flow operations 
         // and draws vectors between old and new positions
         // only a subset of pixels are process (see step)
            for (int i = 0; i < flow_x.Rows; i += step) // flow_ y has the same size and row numbers
               for (int j = 0; j < flow_x.Cols; j += step) // flow_y has the same col numbers
                {

                  // pixel shift measured by optical flow is transferred to Point variables 
                  // stores starting point of motion (from_dot..) and its end points (to_dot...)
                  // accesses single pixels of flow matrix, where x-coords and y-coords of pixel after 
                  // flow procedure are stored; only gives the shift
                  to_dot_xy.X = (int)flow_x.Data[i, j, 0]; 
                  to_dot_xy.Y = (int)flow_y.Data[i, j, 0]; 

                  from_dot_xy.X = j; // index of loop is position on image (x-coord); X is cols
                  from_dot_xy.Y = i; // index of of loop is  position on image (y-coord); Y is rows

                  // new x-coord position of pixel 
                  // is "original" position plus shift stored in this position  
                  to_dot_xy.X = from_dot_xy.X + to_dot_xy.X;  
                  to_dot_xy.Y = from_dot_xy.Y + to_dot_xy.Y; 

                  //.... omitted code

                  // draw line between coords to diplay pixel shift stored in flow field 
                  CvInvoke.Line(img_curr, from_dot_xy, to_dot_xy, col, 2); 

                  // show image with flow depicted as lines
                  CvInvoke.Imshow("Flow field vectors", img_curr); 

                } 

      

           //.... omitted code

         
        }


//

关注点

本文附带了一个小型应用程序，该应用程序可以执行上面描述的所有分析，甚至更多。由于我在非语言交流领域进行研究，我主要对从人类行为中提取非语言线索感兴趣。因此，该应用程序包含了一些在上述代码示例中未提及的附加功能。帧差分部分包含一个额外的函数，该函数存储高于某个阈值的所有值，并生成像素颜色发生变化区域的图像。光流函数包含计算窗口右侧和左侧的叠加方向向量的代码。它们还提供关于叠加向量方向变化的（在一个附加窗口中）信息。此外，还有一些代码段存储了由此处描述的例程提取的信息。所有这些都旨在用于对人类运动行为进行自动化分析。

应用程序的用户界面会显示视频的总帧数、帧率和当前帧号。它显示了用于帧差分过程的灰度值阈值（默认值 100 表示仅使用高于此阈值的值进行帧差分例程）。“步数”表示视频在例如使用“前进”或“分步应用”按钮后向前（或向后）推进的帧数。“按比例缩小尺寸”文本字段指定了原始视频将缩小到什么程度（2 表示视频的宽度和高度将减半）。减小视频尺寸可以加快图像数据的处理速度。界面上的选项按钮、“播放”按钮以及“文件”菜单中的选项，我认为是不言自明的。“分步应用”以分步方式将图像处理例程之一（选项按钮）应用于视频（将其应用于当前帧和当前帧加上“步数”中给定的数字）。可以使用软件捕获的数据可以保存到 txt 文件中（参见菜单）。

要访问 Emgu CV 例程，需要引用库的 DLL 文件（通过将 Emgu CV 文件夹添加到 Windows 的环境变量中）。也可以将所有必要的 DLL 文件复制到程序 exe 文件所在的文件夹（可能不太优雅，但相对简单）。如果您不关心代码，但想使用该软件并遇到问题，请联系我。

不能保证这里提供的示例没有错误。同样，代码当然可以组织得更直接、更简洁。

致谢

这项工作得到了荷兰人文与社会科学高等研究院（NIAS/KNAW）(www.nias.knaw.nl)、EURIAS 奖学金计划和欧盟委员会（玛丽·居里行动 - COFUND 计划 - FP7）的支持。