使用 Media Foundation 在 Windows 7 和 8 上从网络摄像头捕获视频

Evgeny Pereguda

4.96/5 (25投票s)

2013年3月12日

CPOL

5分钟阅读

298166

33387

使用 Media Foundation 捕获网络摄像头视频的简单库

下载源代码 - 25 KB

引言

开始使用 Win8-Desktop 后，我发现一些旧技术（尤其是 DirectShow）工作不佳。例如，使用 DirectShow 从网络摄像头捕获实时视频在 WinXP、Vista、Win7 上表现完美，并且可以获得特定的分辨率。例如，从 Microsoft Life Studio Web-Camera，我可以获得 1080p 的视频。然而，在 Win8-Desktop 上，我只能获得 640x480 的视频。事实是，在 Win7 上返回 HRESULT - S_OK 的代码行中的函数在 Win8-Desktop 上返回 FAILED。阅读 MSDN 上的信息后，我意识到微软有意停止支持 DirectShow 并推广另一项技术 - Media Foundation。我发现了一些关于使用 Media Foundation 支持从网络摄像头捕获视频并带有所需参数的信息，但这方面的信息在 MSDN 上非常分散。我认为，有一个 C++ 类可以包含所有初始化过程，并隐藏它们，提供一个简单的接口会很有用。我已经实现了这个类，并在本提示中展示。

背景

我正在领导一个增强现实项目，需要简单地支持从网络摄像头捕获视频。我使用了来自网站 http://muonics.net/school/spring05/videoInput/ 的简单库 videoInput，该库为此目的使用了 DirectShow。然而，它在 Win8-Desktop 上效果不佳。我发现设置视频捕获分辨率存在问题。我通过使用 Media Foundation 找到了解决此问题的方法，但我的项目使用了 videoInput，我想创建一个具有与 videoInput 相同接口的新库，但使用 Media Foundation。因此，我实现了我的目标，并认为我的新库对那些在图像识别程序开发过程中遇到相同问题的其他人会很有用。

Using the Code

videoInput 库是用 Visual Studio 2012 编写的 - videoInputVS2012.zip（静态库 videoInput-staticlib-VS2012x86.zip），包含九个类

videoInput - 类接口。要使用此库，只需在项目中包含 videoInput.h 和 videoInput .lib。此类设计为单例模式，便于资源管理。
Media_Foundation - 一个单例类，负责 Media Foundation 资源的分配和实现。
videoDevices - 一个单例类，负责视频设备的分配、实现以及对单个视频设备的访问。
videoDevice - 用于操作视频捕获设备、获取原始数据、检查新帧、获取支持的分辨率、设置所需分辨率、关闭视频设备的类。
ImageGrabberThread - 用于操作图像抓取线程的类。
ImageGrabber - 用于初始化和从视频设备抓取图像的类。它控制抓取过程并结束它。
RawImage - 一个临时类，用于写入和读取一帧。
FormatReading - 一个类，用于将有关支持分辨率的信息读取到客户的 MediaType 中。
DebugPrintOut - 一个用于将文本打印到控制台的类。

只需使用 videoInput.h 文件作为库的接口。其列表如下所示。

#pragma once

#include <guiddef.h>

struct IMFMediaSource;
 
// Structure for collecting info about types of video,
// which are supported by current video device
struct MediaType
{
    unsigned int MF_MT_FRAME_SIZE;
    unsigned int height;
    unsigned int width;
    unsigned int MF_MT_YUV_MATRIX;
    unsigned int MF_MT_VIDEO_LIGHTING;
    unsigned int MF_MT_DEFAULT_STRIDE;
    unsigned int MF_MT_VIDEO_CHROMA_SITING;
    GUID MF_MT_AM_FORMAT_TYPE;
    wchar_t *pMF_MT_AM_FORMAT_TYPEName;
    unsigned int MF_MT_FIXED_SIZE_SAMPLES;
    unsigned int MF_MT_VIDEO_NOMINAL_RANGE;
    unsigned int MF_MT_FRAME_RATE;
 
    unsigned int MF_MT_FRAME_RATE_low;
    unsigned int MF_MT_PIXEL_ASPECT_RATIO;
 
    unsigned int MF_MT_PIXEL_ASPECT_RATIO_low;
    unsigned int MF_MT_ALL_SAMPLES_INDEPENDENT;
    unsigned int MF_MT_FRAME_RATE_RANGE_MIN;
    unsigned int MF_MT_FRAME_RATE_RANGE_MIN_low;
    unsigned int MF_MT_SAMPLE_SIZE;
    unsigned int MF_MT_VIDEO_PRIMARIES;
    unsigned int MF_MT_INTERLACE_MODE;
    unsigned int MF_MT_FRAME_RATE_RANGE_MAX;
    unsigned int MF_MT_FRAME_RATE_RANGE_MAX_low;
 
    GUID MF_MT_MAJOR_TYPE;
    wchar_t *pMF_MT_MAJOR_TYPEName;
    GUID MF_MT_SUBTYPE;
    wchar_t *pMF_MT_SUBTYPEName;    
 
    MediaType();
    ~MediaType();
    void Clear();
};
 
// Structure for collecting info about one parameter of current video device
struct Parametr
{
    long CurrentValue;
    long Min;
    long Max;
    long Step;
    long Default; 
    long Flag;
    Parametr();
};
 
// Structure for collecting info about 17 parameters of current video device
struct CamParametrs
{
    Parametr Brightness;
    Parametr Contrast;
    Parametr Hue;
    Parametr Saturation;
    Parametr Sharpness;
    Parametr Gamma;
    Parametr ColorEnable;
    Parametr WhiteBalance;
    Parametr BacklightCompensation;
    Parametr Gain;
 
 
    Parametr Pan;
    Parametr Tilt;
    Parametr Roll;
    Parametr Zoom;
    Parametr Exposure;
    Parametr Iris;
    Parametr Focus;
};
 
/// The only visible class for controlling of video devices in format singleton
class videoInput
{
public:
    virtual ~videoInput(void);
 
    // Getting of static instance of videoInput class
    static videoInput& getInstance(); 
 
    // Closing video device with deviceID
    void closeDevice(unsigned int deviceID);
    // Setting callback function for emergency events
    // (for example: removing video device with deviceID) with userData
    void setEmergencyStopEvent(unsigned int deviceID, void *userData, void(*func)(int, void *));
 
    // Closing all devices
    void closeAllDevices();
 
    // Getting of parametrs of video device with deviceID
    CamParametrs getParametrs(unsigned int deviceID);
 
    // Setting of parametrs of video device with deviceID
    void setParametrs(unsigned int deviceID, CamParametrs parametrs);
 
    // Getting numbers of existence videodevices with listing in console
    unsigned int listDevices(bool silent = false);
        
    // Getting numbers of formats, which are supported by videodevice with deviceID
    unsigned int getCountFormats(unsigned int deviceID);
 
    // Getting width of image, which is getting from videodevice with deviceID
    unsigned int getWidth(unsigned int deviceID);
 
    // Getting height of image, which is getting from videodevice with deviceID
    unsigned int getHeight(unsigned int deviceID);
 
    // Getting name of videodevice with deviceID
    wchar_t *getNameVideoDevice(unsigned int deviceID);
    // Getting interface MediaSource for Media Foundation from videodevice with deviceID
    IMFMediaSource *getMediaSource(unsigned int deviceID);
    // Getting format with id, which is supported by videodevice with deviceID 
    MediaType getFormat(unsigned int deviceID, int unsigned id);
 
    // Checking of existence of the suitable video devices
    bool isDevicesAcceable();
 
    // Checking of using the videodevice with deviceID
    bool isDeviceSetup(unsigned int deviceID);
 
    // Checking of using MediaSource from videodevice with deviceID
    bool isDeviceMediaSource(unsigned int deviceID);
    // Checking of using Raw Data of pixels from videodevice with deviceID
    bool isDeviceRawDataSource(unsigned int deviceID);
 
    // Setting of the state of outprinting info in consol
    void setVerbose(bool state);
    // Initialization of video device with deviceID by media type with id
    bool setupDevice(unsigned int deviceID, unsigned int id = 0);
 
    // Initialization of video device with deviceID by width w, height h and fps idealFramerate
    bool setupDevice(unsigned int deviceID, unsigned int w, 
                     unsigned int h, unsigned int idealFramerate = 30);
 
    // Checking of recivig of new frame from video device with deviceID 
    bool isFrameNew(unsigned int deviceID);
 
    // Writing of Raw Data pixels from video device with deviceID with correction
    // of RedAndBlue flipping flipRedAndBlue and vertical flipping flipImage
    bool getPixels(unsigned int deviceID, unsigned char * pixels, 
                   bool flipRedAndBlue = false, bool flipImage = false);
    
private: 
 
    bool accessToDevices;
    videoInput(void);
 
    void processPixels(unsigned char * src, unsigned char * dst, unsigned int width, 
         unsigned int height, unsigned int bpp, bool bRGB, bool bFlip);
    void updateListOfDevices();
};

此类有两种模式：原始数据抓取模式和 MediaSource 模式。如果只使用第一种模式，则无需包含 Media Foundation 的头文件及其库。在这种情况下，方法 IMFMediaSource *getMediaSource(unsigned int deviceID) 中的接口将返回 NULL，并且已在 videoInput.h 中预定义。在第二种模式下，您可以使用上述方法，并在应用程序中将其作为来自网络摄像头的普通媒体数据源使用。以下列表显示了在获取帧的原始数据时如何使用 videoInput。此示例使用 OpenCV 框架来显示实时视频（此代码 TestVideoInputVS2012x86.zip，TestVideoInputVS2012x86-noexe.zip）。此框架有自己的网络摄像头捕获函数，但基于 DirectShow，并在 Win8-Desktop 上存在所述问题。此示例显示在下一个列表中。

// TestvideoInput.cpp: определяет точку входа для консольного приложения.
//

#include "stdafx.h"
#include "videoInput.h"
#include "highgui.h"

#pragma comment(lib, "lib\\opencv\\Release\\opencv_highgui242.lib")
#pragma comment(lib, "lib\\opencv\\Release\\opencv_core242.lib")
 
#pragma comment(lib, "videoInput.lib")
 
void StopEvent(int deviceID, void *userData)
{
    videoInput *VI = &videoInput::getInstance();
 
    VI->closeDevice(deviceID);
}
 
int _tmain(int argc, _TCHAR* argv[])
{
    videoInput *VI = &videoInput::getInstance();
 
    int i = VI->listDevices();
 
    if(i > 0)
    {
        if(VI->setupDevice(i-1, 640, 480, 60))
        {
            VI->setEmergencyStopEvent(i - 1, NULL, StopEvent);
 
            if(VI->isFrameNew(i-1))
            {
                int countLeftFrames = 0;
 
                cvNamedWindow("VideoTest", CV_WINDOW_AUTOSIZE);
                CvSize size = cvSize(VI->getWidth(i-1), VI->getHeight(i-1));
 
                IplImage* frame;
 
                frame = cvCreateImage(size, 8,3);
 
                while(1)
                {
                    if(VI->isFrameNew(i-1))
                    {
                        VI->getPixels(i - 1, (unsigned char *)frame->imageData);                        
 
                        cvShowImage("VideoTest", frame);
 
                        countLeftFrames = 0;
                    }
                    else
                        countLeftFrames++;
 
                    char c = cvWaitKey(33);
 
                    if(c == 27) 
                        break;
                    
                    if(c == 49) 
                    {
                        CamParametrs CP = VI->getParametrs(i-1);                        
                        CP.Brightness.CurrentValue = 128; 
                        CP.Brightness.Flag = 1; 
                        VI->setParametrs(i - 1, CP);
                    }
 
                    if(!VI->isDeviceSetup(i - 1))
                    {
                        break;
                    }
 
                    if(countLeftFrames > 60)
                        break;
                }
 
                VI->closeDevice(i - 1);
                
                cvDestroyWindow("VideoTest");
            }
        }
    }
 
    if(VI->setupDevice(i-1, 1920, 1080, 60))
    {
        if(VI->isFrameNew(i-1))
        {
            int countLeftFrames = 0;
 
            cvNamedWindow("VideoTest1", CV_WINDOW_AUTOSIZE);
            CvSize size = cvSize(VI->getWidth(i-1), VI->getHeight(i-1));
 
            IplImage* frame;
 
            frame = cvCreateImage(size, 8,3);
 
            while(1)
            {
                if(VI->isFrameNew(i-1))
                {
                    VI->getPixels(i - 1, (unsigned char *)frame->imageData,false); 
                    cvShowImage("VideoTest1", frame); 
                    countLeftFrames = 0;
                }
                else
                    countLeftFrames++;
                    
                char c = cvWaitKey(33);
 
                if(c == 27) 
                    break;
                    
                if(!VI->isDeviceSetup(i - 1))
                {
                    break;
                }
 
                if(countLeftFrames > 60)
                    break;
            }
 
            VI->closeDevice(i - 1);
                
            cvDestroyWindow("VideoTest1");
        }
 
    }
    return 0;
}

在此代码中，可以通过调用方法 videoInput::getInstance() 来获取 videoInput 类的指针。在使用摄像头之前，需要使用函数 VI->listDevices() 获取可用设备列表。通过调用方法 VI->setupDevice(i-1, 640, 480, 60) 来初始化设备。有两个重载的 setupDevice 方法：一个设置所需分辨率和每秒帧数，另一个设置所需输出类型编号。第一个方法查找具有所需参数的现有 MediaType，或者使用编号为 0 的默认类型。通过首先调用 VI->isFrameNew(i-1) 来开始从 MediaSource 抓取图像。调用此方法后，可以通过方法 VI->getPixels(i - 1, (unsigned char *)frame->imageData,false) 获取原始数据。可以通过调用方法 VI->getParametrs(i-1) 获取视频摄像头的参数。可以通过方法 VI->setParametrs(i - 1, CP) 设置新参数。关闭设备的函数 VI->closeDevice(i - 1) 会停止抓取线程并释放视频设备的上下文。该示例展示了对同一视频设备的快速使用、停止和重新使用。全局函数 StopEvent(int deviceID, void *userData) 用作方法 VI->setEmergencyStopEvent(i - 1, NULL, StopEvent) 的回调函数。在意外停止时（例如，从 USB 插槽中移除网络摄像头）会调用此函数。

第二个示例基于 Windows SDK 中的 SimpleCapture 示例（此代码 - SimpleCaptureVS2012.zip，应用程序 - SimpleCapture-exe.zip）。此示例对于列出来说太大了，但我可以描述与原始示例的几个不同之处。首先，我删除了所有原始的网络摄像头链接，并设置了 videoInput 库。

其次，我添加了第二个对话框，用于从支持的 Media Types 列表中选择合适的分辨率。需要注意的是，IMFMediaSource 接口不应手动停止。它通过调用函数 closeDevice(unsigned int deviceID) 来释放。

关注点

我花了大量时间在微软开发者网站上搜索相关信息，但没有从该网站的专家那里获得帮助。而且，我不是唯一一个为此问题寻找解决方案的人。我惊讶地发现，使用网络摄像头和 Media Foundation 的问题没有得到介绍，我希望我的提示能为这个网站做出有益的贡献。