从屏幕抓取文本

Paul Heil

4.96/5 (33投票s)

2010年6月17日

CPOL

8分钟阅读

114824

8644

如何以编程方式读取屏幕上任意位置的任何进程的文本

引言

在本文中，我们研究了读取另一个进程在屏幕上显示的文本的方法（也称为屏幕抓取）。这里介绍的方法对于以用户的角度以编程方式确定另一个程序的状态可能很有用。尽管附带的示例代码是为 Windows Mobile 5 及更高版本编写的，但所介绍的概念应该可以轻松地转移到大型 Windows 系统上。

定位要读取的文本

我们首先要做的是确定要读取的文本。微软在他们的 Spy++ 工具中引入了一个有用的概念，称为“查找器”，它可以方便地扩展到我们的应用程序。

用户可以将查找器的目标图标拖放到包含要读取的文本的对象上。

在我们的应用程序中，我们通过重写 WM_LBUTTONUP 和 WM_COMMAND 消息来实现查找器控件。我们使用 SetCapture()，以便我们的应用程序即使在光标移出应用程序对话框边界时也能接收 WM_LBUTTONUP 消息。请注意，尽管上面的图像显示目标光标位于项目上方，但在 Windows Mobile 中实际上不会发生这种情况。图像被修改以使其含义更加清晰。

BEGIN_MSG_MAP( CMainDlg )
    MESSAGE_HANDLER( WM_INITDIALOG, OnInitDialog )
    // ...
    COMMAND_ID_HANDLER( IDC_FINDER, OnFinder )
    MESSAGE_HANDLER( WM_LBUTTONUP, OnLButtonUp )
END_MSG_MAP()

LRESULT OnInitDialog( UINT /*uMsg*/, 
                      WPARAM /*wParam*/, 
                      LPARAM /*lParam*/, 
                      BOOL& bHandled )
{
    // ...
    
    // pre-load the images used by the finder control
    finder_image_.LoadBitmap( MAKEINTRESOURCE( IDB_FINDER ) );
    finder_empty_image_.LoadBitmap( MAKEINTRESOURCE( IDB_FINDER_EMPTY ) );

    return ( bHandled = FALSE );
}

LRESULT OnFinder( WORD /*wNotifyCode*/, 
                  WORD /*wID*/, 
                  HWND /*hWndCtl*/, 
                  BOOL& /*bHandled*/ )
{
    // capture the cursor so we can detect WM_LBUTTONUP messages even when the
    // cursor is not within our window boundary.
    SetCapture();
    finder_.SetBitmap( finder_empty_image_ );
    return 0;
}

LRESULT OnLButtonUp( UINT /*uMsg*/, 
                     WPARAM /*wParam*/, 
                     LPARAM lParam, 
                     BOOL& /*bHandled*/ )
{
    if( m_hWnd == GetCapture() )
    {
        ReleaseCapture();
        finder_.SetBitmap( finder_image_ );

        // get the screen coordinates of our cursor
        // get the text on the screen at the given point
        // display the text to the user
    }
    return 0;
}

/// finder control
CStatic finder_;

/// image of the finder in its native state
CBitmap finder_image_;

/// image of the finder in its empty state
CBitmap finder_empty_image_;

定位文本的窗口

WM_LBUTTONUP 提供了用户释放触笔或鼠标左键时的客户端坐标。我们将使用 WindowFromPoint() 函数来确定位于这些坐标处的窗口。

LRESULT OnLButtonUp( UINT /*uMsg*/, 
                     WPARAM /*wParam*/, 
                     LPARAM lParam, 
                     BOOL& /*bHandled*/ )
{
    // ... 
    
    // get the screen coordinates of our cursor
    POINT finder_point = { GET_X_LPARAM( lParam ), 
                           GET_Y_LPARAM( lParam ) };
    ClientToScreen( &finder_point );
    
    // locate the window at the given point
    HWND target = ::WindowFromPoint( screen_point );
                                
    // ...
}

不幸的是，WindowFromPoint() 有一个限制。在其 MSDN 页面上

WindowFromPoint 函数不检索隐藏或禁用窗口的句柄，即使点在窗口内。应用程序应使用 ChildWindowFromPoint 函数进行非限制性搜索。

要使用 ChildWindowFromPoint()，正如文档建议的那样，我们必须提供一个父窗口以及相对于该父窗口的客户端坐标。因此，我们的代码必须更改为

{
    // ... 
    
    // get the screen coordinates of our cursor
    POINT finder_point = { GET_X_LPARAM( lParam ), 
                           GET_Y_LPARAM( lParam ) };
    ClientToScreen( &finder_point );

    // Locate the parent of the window at the given coordinates
    HWND parent = ::GetParent( ::WindowFromPoint( screen_point ) );

    // the screen coordinates from the child-most window's perspective
    POINT client_point;

    // perform a non-restrictive search to find the child-most window
    HWND target = GetChildMost( parent, screen_point, &client_point ); 
    
    // ...
}

/// Get the child-most window a given parent control at a specific point
HWND GetChildMost( HWND parent_window, 
                   const POINT& screen_point, 
                   POINT* parent_point )
{
    // reset our coordinate system to the current window
    *parent_point = screen_point;
    ::ScreenToClient( parent_window, parent_point );

    // Find this window's child (if any)
    HWND child = ::ChildWindowFromPoint( parent_window, *parent_point );
    if( NULL == child || child == parent_window )
        return parent_window;

    // get the next child-most window in the stack
    return GetChildMost( child, screen_point, parent_point );
}

现在，无论其状态如何，我们都可以始终找到正确的窗口。

从静态控件读取文本

我们已经找到了包含要读取文本的控件，现在我们将检查提取该文本的几种方法，并讨论每种方法的局限性。我们将从最简单的控件开始读取——静态控件或标签。在本篇文章的后面，我们将介绍更复杂的控件。

朴素方法

获取给定窗口文本的最明显和最简单的方法是使用 GetWindowText() 和 GetWindowTextLength()。我们可以像下面这样实现它

// verify the static doesn't have the SS_ICON style or WM_GETTEXT will
// return an icon handle instead of text.
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
    DWORD text_length = ::GetWindowTextLength( target );
    if( text_length > 0 )
    {
        // buffer to hold the text from WM_GETTEXT
        std::vector< wchar_t > window_text_buffer( text_length + 1 );
        
        // text returned by WM_GETTEXT
        wchar_t* window_text = reinterpret_cast< wchar_t* >( &result_buffer.front() );
        
        if( ::GetWindowText( target, window_text, text_length + 1 ) )
        {
            // We've successfully received the text from the other process.
        }
    }
}

不幸的是，这种方法有许多限制。正如 Raymond Chen 在 The Old New Thing - "GetWindowText 的秘密生活" 中指出的那样，GetWindowText() 主要用于获取窗口标题。如果您从另一个进程使用它来获取具有自定义文本管理的控件的文本，它将不起作用。为此，我们需要使用 WM_GETTEXT 和 WM_GETTEXTLENGTH。

朴素方法 II

从 GetWindowText() 更改为 WM_GETTEXT 实际上并没有太多工作。只需将 GetWindowText() 替换为几个 SendMessage() 调用，就完成了

// verify the static doesn't have the SS_ICON style or WM_GETTEXT will
// return an icon handle instead of text.
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
    DWORD text_length = ::SendMessage( target, WM_GETTEXTLENGTH, 0, 0 );
    if( text_length > 0 )
    {
        // buffer to hold the text from WM_GETTEXT
        std::vector< wchar_t > window_text_buffer( text_length + 1 );
        
        // text returned by WM_GETTEXT
        wchar_t* window_text = reinterpret_cast< wchar_t* >( &result_buffer.front() );
        
        if( ::SendMessage( target, 
                           WM_GETTEXT, 
                           text_length + 1, 
                           reinterpret_cast< LPARAM >( window_text ) ) )
        {
            // We've successfully received the text from the other process.
        }
    }
}

可惜它不起作用。如果我们运行此代码，我们会发现 WM_GETTEXTLENGTH 按预期返回文本字符串的长度。但是，尽管 WM_GETTEXT 成功了，它却返回了一个空字符串。¹ 为什么？考虑我们正在做什么：我们正在将一条消息发送到另一个进程，并要求它用我们的进程中的一个缓冲区填充数据。这是绝对禁止的。为了使此工作正常进行，我们需要访问可以在进程之间共享的内存空间。内存映射文件来拯救！

内存映射文件标准分配器

内存映射文件使我们可以访问虚拟地址空间的一部分，该空间可用于在进程之间共享文件或内存。在我们的例子中，我们可能不需要共享足够的数据来使用文件，因此我们将使用“基于 RAM 的映射”，其中数据完全驻留在 RAM 中且永不分页。

内存映射文件 API 包含三个与我们的程序相关的函数

CreateFileMapping - 在共享虚拟地址空间中创建内存映射文件
MapViewOfFile - 提供文件的指针
UnmapViewOfFile - 释放并使文件指针失效

必须经历从使用 std::vector<> 的优雅转向在每次调用 SendMessage() 时都围绕所有内存映射文件代码，这将是一件非常糟糕的事情。幸运的是，标准库有一个很少使用的功能来处理这种情况。每个标准库容器至少有两个模板参数。第一个（也是迄今为止最常用的）定义了容器将存储什么。然而，第二个参数定义了容器如何为这些对象分配空间。我们将定义一个分配器，std::vector<> 可以使用它在内存映射文件中分配空间。

/// Standard library allocator implementation using memory mapped files
template< class T >
class MappedFileAllocator
{
public:
    typedef T         value_type;
    typedef size_t    size_type;
    typedef ptrdiff_t difference_type;
    typedef T*        pointer;
    typedef const T*  const_pointer;
    typedef T&        reference;
    typedef const T&  const_reference;

    pointer address( reference r ) const { return &r; };
    const_pointer address( const_reference r ) const { return &r; };

    void construct( pointer p, const_reference val ) { new( p ) T( val ); };
    void destroy( pointer p ) { p; p->~T(); };

    /// convert a MappedFileAllocator< T > to a MappedFileAllocator< U >
    template< class U >
    struct rebind { typedef MappedFileAllocator< U > other; };

    MappedFileAllocator() throw() : mapped_file_( INVALID_HANDLE_VALUE )
    {
    };

    template< class U >
    explicit MappedFileAllocator( const MappedFileAllocator< U >& other ) throw()
        : mapped_file_( INVALID_HANDLE_VALUE )
    {
        ::DuplicateHandle( GetCurrentProcess(), 
                           other.mapped_file_,
                           GetCurrentProcess(),
                           &this->mapped_file_,
                           0,
                           FALSE,
                           DUPLICATE_SAME_ACCESS );
    };

    pointer allocate( size_type n, const void* /*hint*/ = 0 )
    {
        mapped_file_ = ::CreateFileMapping( INVALID_HANDLE_VALUE, 
            NULL,
            PAGE_READWRITE,
            0,
            n,
            NULL );

        return reinterpret_cast< T* >( ::MapViewOfFile( mapped_file_, 
            FILE_MAP_READ | FILE_MAP_WRITE, 
            0, 
            0, 
            n ) );
    };

    void deallocate( pointer p, size_type n )
    {
        if( NULL != p )
        {
            ::FlushViewOfFile( p, n * sizeof( T ) );
            ::UnmapViewOfFile( p );
        }
        if( INVALID_HANDLE_VALUE != mapped_file_ )
        {
            ::CloseHandle( mapped_file_ );
            mapped_file_ = INVALID_HANDLE_VALUE;
        }
    };

    size_type max_size() const throw() 
    { 
        return std::numeric_limits< size_type >::max() / sizeof( T );
    };

private:

    /// disallow assignment
    void operator=( const MappedFileAllocator& );

    /// handle to the memory-mapped file
    HANDLE mapped_file_;
}; // class MappedFileAllocator

现在，我们能够定义一个内存缓冲区，它具有 std::vector<> 的所有优点，并且可以在进程之间共享。

/// a sequential byte-buffer backed by a memory-mapped file.
typedef std::vector< byte, MappedFileAllocator< byte > > MappedBuffer;

朴素方法 III

让我们用这个内存映射缓冲区回顾一下我们最后的方法，看看它是如何工作的。

// verify the static doesn't have the SS_ICON style or WM_GETTEXT will
// return an icon handle instead of text.
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
    DWORD text_length = ::SendMessage( target, WM_GETTEXTLENGTH, 0, 0 );
    if( text_length > 0 )
    {
        // buffer to hold the text from WM_GETTEXT
        MappedBuffer window_text_buffer( ( text_length + 1 ) * sizeof( wchar_t ) );
        
        // text returned by WM_GETTEXT
        wchar_t* window_text = reinterpret_cast< wchar_t* >( &result_buffer.front() );
        
        if( ::SendMessage( target, 
                           WM_GETTEXT, 
                           text_length + 1,
                           reinterpret_cast< LPARAM >( window_text ) ) )
        {
            // We've successfully received the text from the other process.
        }
    }
}

我们的代码几乎没有改变，但结果正是我们想要的……除了一个例外。SendMessage() 在返回之前会等待目标进程响应。如果目标进程冻结了怎么办？按照我们现在的代码，我们可能会冻结我们的应用程序，等待另一个进程恢复正常。幸运的是，微软已经考虑了这种情况，提供了 SendMessageTimeout()。

最终方法

将所有内容整合在一起，我们最终得到一个算法，该算法可以安全地从任何静态控件、按钮、复选框、组合框或编辑控件检索文本。

// define some arbitrary, but reasonable timeout value    
DWORD timeout = 1000;

// verify the static doesn't have the SS_ICON style or WM_GETTEXT will
// return an icon handle instead of text.
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
    // length of the text in the window
    DWORD text_length = 0;
    if( ( ::SendMessageTimeout( target,
                                WM_GETTEXTLENGTH,
                                0,
                                0,
                                SMTO_NORMAL,
                                timeout,
                                &text_length ) ) &&
        ( text_length > 0 ) )
    {
        // memory-mapped buffer to hold the text from WM_GETTEXT
        MappedBuffer window_text_buffer( ( text_length + 1 ) * sizeof( wchar_t ) );

        // text returned by WM_GETTEXT
        wchar_t* window_text = 
		reinterpret_cast< wchar_t* >( &window_text_buffer.front() );

        // amount of text copied by WM_GETTEXT
        DWORD copied = 0;
        if( ( ::SendMessageTimeout( target, 
                                    WM_GETTEXT, 
                                    text_length + 1, 
                                    reinterpret_cast< LPARAM >( window_text ), 
                                    SMTO_NORMAL, 
                                    timeout, 
                                    &copied ) > 0 ) &&
            ( copied > 0 ) )
        {
            // We've successfully received the text from the other process.
        }
    }
}

从列表视图控件读取文本

能够获取静态控件、按钮、复选框、组合框和编辑控件中的文本很棒，但还有很多其他控件。让我们看一个更复杂的控件，列表视图，其中 WM_GETTEXT 不起作用。列表视图用于文件资源管理器和任务管理器等应用程序。有三个步骤可以检索其文本

验证列表视图中是否有项目 - LVM_GETITEMCOUNT
定位鼠标悬停的项目 - LVM_SUBITEMHITTEST
获取该项目的文本 - LVM_GETITEM

由于我们的程序查找空列表视图中的文本没有意义，所以我们首先检查视图中是否还有任何项目。

bool CheckValiditiy( HWND target, DWORD timeout = INFINITE )
{
    DWORD item_count = 0;
    if( ::SendMessageTimeout( target, 
                              LVM_GETITEMCOUNT, 
                              0, 
                              0, 
                              SMTO_NORMAL, 
                              timeout, 
                              &item_count ) > 0 )
    {
        return item_count > 0;
    }
    return false;
};

您可能想知道为什么我们的 GetChildMost() 函数需要返回子窗口的鼠标点（以客户端坐标表示），而我们只需要它来获取静态控件文本。毕竟，我们不需要它来获取静态控件文本。但是，列表视图等更复杂的控件有多个文本元素。我们将使用客户端坐标通过“命中测试”来确定我们正在查看哪个文本元素。

typedef struct {
    int item;
    int subitem;
} item_type;

bool LocateItem( HWND target, 
                 const POINT& pt, 
                 item_type* item, 
                 DWORD timeout = INFINITE )
{
    MappedBuffer hti_buffer( sizeof( LVHITTESTINFO ) );
    LVHITTESTINFO* hti = 
        reinterpret_cast< LVHITTESTINFO* >( &hti_buffer.front() );
    hti->pt = pt;

    int res = 0;
    if( ::SendMessageTimeout( target,
                              LVM_SUBITEMHITTEST,
                              0, 
                              reinterpret_cast< LPARAM >( hti ),
                              SMTO_NORMAL,
                              timeout,
                              reinterpret_cast< DWORD* >( &res ) ) > 0 &&
        res > -1 )
    {
        item->item = hti->iItem;
        item->subitem = hti->iSubItem;
        return true;
    }    
    return false;
};

现在我们知道我们的坐标指向哪个项目和子项目，我们向列表视图发送 LVM_GETITEM 消息，以获取列表中选定项目的文本。

bool GetText( HWND target, 
              const item_type& item, 
              DWORD length, 
              std::wstring* text, 
              DWORD timeout = INFINITE )
{
    MappedBuffer lvi_buffer( 
        sizeof( LV_ITEM ) + sizeof( wchar_t ) * length );
    LV_ITEM* lvi = 
        reinterpret_cast< LV_ITEM* >( &lvi_buffer.front() );
    lvi->mask = LVIF_TEXT;
    lvi->iItem = item.item;
    lvi->iSubItem = item.subitem;
    lvi->cchTextMax = length;
    lvi->pszText = reinterpret_cast< wchar_t* >( 
        &lvi_buffer.front() + sizeof( LV_ITEM ) );

    BOOL success = FALSE;
    if( ::SendMessageTimeout( target, 
                              LVM_GETITEM, 
                              0, 
                              reinterpret_cast< LPARAM >( lvi ), 
                              SMTO_NORMAL, 
                              timeout, 
                              reinterpret_cast< DWORD* >( &success ) ) > 0 &&
        success )
    {
        *text = lvi->pszText;
        return true;
    }
    return false;
};

从选项卡控件读取文本

与列表视图控件一样，我们有三个步骤来抓取选项卡控件中的文本

验证选项卡控件中是否有项目 - TCM_GETITEMCOUNT
定位鼠标悬停的选项卡 - TCM_HITTEST
获取该选项卡的文本 - TCM_GETITEM

和以前一样，我们首先检查控件中是否还有任何选项卡。

bool CheckValiditiy( HWND target, DWORD timeout = INFINITE )
{
    DWORD item_count = 0;
    if( ::SendMessageTimeout( target, 
                              TCM_GETITEMCOUNT, 
                              0, 
                              0, 
                              SMTO_NORMAL, 
                              timeout, 
                              &item_count ) > 0 )
    {
        return item_count > 0;
    }
    return false;
};

然后，我们确定我们的指针悬停在哪个选项卡上。

BOOL LocateItem( HWND target, 
                 const POINT& pt, 
                 item_type* item, 
                 DWORD timeout = INFINITE )
{
    MappedBuffer tch_buffer( sizeof( TCHITTESTINFO ) );
    TCHITTESTINFO* tch = 
        reinterpret_cast< TCHITTESTINFO* >( &tch_buffer.front() );
    tch->pt = pt;

    item_type it;
    if( ::SendMessageTimeout( target,
                              TCM_HITTEST,
                              0, 
                              reinterpret_cast< LPARAM >( tch ),
                              SMTO_NORMAL,
                              timeout,
                              reinterpret_cast< DWORD* >( &it ) ) > 0 )
    {
        if( it > -1 )
        {
            *item = it;
            return true;
        }
    }
    return false;
};

最后，我们查询选项卡控件以获取该选项卡的文本。

bool GetText( HWND target, 
              const item_type& item, 
              DWORD length, 
              std::wstring* text, 
              DWORD timeout = INFINITE )
{
    MappedBuffer tc_buffer( sizeof( TCITEM ) + sizeof( wchar_t ) * length );
    TCITEM* tc = reinterpret_cast< TCITEM* >( &tc_buffer.front() );
    tc->cchTextMax = length;
    tc->mask = TCIF_TEXT;
    tc->pszText = reinterpret_cast< wchar_t* >( 
        &tc_buffer.front() + sizeof( TCITEM ) );

    BOOL success = FALSE;
    if( ::SendMessageTimeout( target, 
                              TCM_GETITEM, 
                              item, 
                              reinterpret_cast< LPARAM >( tc ), 
                              SMTO_NORMAL, 
                              timeout, 
                              reinterpret_cast< DWORD* >( &success ) ) > 0 )
    {
        if( success )
        {
            *text = tc->pszText;
            return true;
        }
    }
    return false;
}

整合所有

到目前为止，很明显一个模式正在出现。我们可以通过遵循一个相当通用的过程来获取任何控件类型的屏幕文本

检查控件的有效性。
定位控件内的文本项。
获取文本长度。
获取文本。

我们可以将这些过程元素中的每一个泛化为一个“特性”结构

/// traits for reading the text of a tab control
struct TabTraits
{
    /// type of item contained within this control
    typedef int item_type;

    /// name of the window class these traits are relevant to
    static wchar_t* ClassName() { return WC_TABCONTROL; };

    /// Does the target window contain text to read?
    static bool CheckValiditiy( HWND target, DWORD timeout = INFINITE );

    /// locate the text item within the control at the given point
    static BOOL LocateItem( HWND target, 
                            const POINT& pt, 
                            item_type* item, 
                            DWORD timeout = INFINITE );

    /// get the length of the text string to retrieve
    static DWORD GetTextLength( HWND target, 
                                const item_type& item, 
                                DWORD timeout = INFINITE );

    /// retrieve the text
    static bool GetText( HWND target, 
                         const item_type& item, 
                         DWORD length, 
                         std::wstring* text, 
                         DWORD timeout = INFINITE );
}; // struct TabTraits

我们将“特性”结构作为模板参数提供给一个执行每个步骤的通用算法。

/// read the text at a specific point on a target control window.
template< class T >
bool DoReadScreenText( HWND target, 
                       const POINT& client_point, 
                       std::wstring* screen_text, 
                       DWORD timeout = INFINITE )
{
    if( T::CheckValiditiy( target, timeout ) )
    {
        T::item_type item;
        if( T::LocateItem( target, client_point, &item, timeout ) )
        {
            DWORD length = T::GetTextLength( target, item, timeout );
            if( length > 0 )
            {
                return T::GetText( target, item, length, screen_text, timeout );
            }
        }
    }
    return false;
}

使用 GetClassName()，我们可以确定我们正在读取的控件的类型。这使我们能够创建一个可以从任何屏幕控件读取文本的控件结构。

bool ReadScreenText( HWND target, 
                     const POINT& client_point, 
                     std::wstring* screen_text, 
                     DWORD timeout )
{
    // get the window class for the target window
    wchar_t class_name[ 257 ] = { 0 };
    ::GetClassName( target, class_name, _countof( class_name ) );

    // different window classes require different methods of getting their 
    // screen text.
    if( wcsstr( class_name, TabTraits::ClassName() ) )
    {
        return DoReadScreenText< TabTraits >( target, 
                                              client_point, 
                                              screen_text, 
                                              timeout );
    }
    else if( wcsstr( class_name, ... ) )
    {
        // ...
    }
    else if ...
}

附带的代码包含从静态控件、选项卡控件、列表视图控件和列表框控件读取的方法。读取其他控件类型，如标题控件、菜单、树形视图、今日屏幕插件或其他自定义控件，留给感兴趣的读者作为练习。

脚注

严格来说并非如此。WM_GETTEXT 并不总是返回空字符串。有三个窗口消息被特别处理：WM_GETTEXT、WM_SETTEXT 和 WM_COPYDATA。使用进程本地内存缓冲区发送这些消息的结果似乎取决于 Windows 版本以及该控件如何处理消息。为了在一般情况下工作，我们提供了一个内存映射文件。在不需要的情况下，它也不会造成损害。