数据库虚拟游标

Hatem Mostafa

4.96/5 (58投票s)

2008年5月1日

CPOL

24分钟阅读

75565

本文演示了一种利用虚拟游标优化DBMS输出游标存储的新技术。

引言

当我开发一个关系型数据库引擎时，我在Microsoft SQL Server中测试一些查询，发现一些简单的查询需要很长时间才能执行，或者收到“磁盘空间不足”的消息。经过一番调查，我发现问题出在查询结果的保存上，或者更准确地说，是保存记录集游标值所需的时间。所以，我试图在我的设计中克服这个时间问题。本文将简单概述记录集游标、SQL语句执行，然后查看一个简单查询并解释减慢大多数DBMS性能的问题。接着，我将介绍虚拟游标的概念，并将其应用于更复杂的查询。

SQL Server中SQL语句“Select * from orders,employees,customers where orders.employeeid=4”需要55秒才能执行。我知道SQL Server在字段orders.employeeid中搜索值4所需的时间不会超过0毫秒，但问题在于获取结果并将其与来自另外两个表（employees, customers）的结果结合起来并保存输出游标。这意味着，如果输出结果包含大量或巨量的行，SQL Server需要很长时间才能保存整个结果，这促使我思考如何在我公司的引擎中克服这个问题。

我知道这不是一个正常的查询，三个表的笛卡尔积返回664,830行，但是如果你看一下我执行相同查询的时间，你会发现其中的差异，这将促使你阅读整篇文章以了解虚拟游标背后的思想。下图显示了我的引擎执行相同查询的时间。它不到1毫秒。

实际上，SQL Server执行查询本身不需要时间；总时间取决于最终的结果游标向量；如果它太大，则需要很长时间，或者返回“磁盘空间不足”的错误。如果它很小，则不需要时间来存储。作为其速度的证明，如果我们使用查询“Select count(*) from ...”，它不需要时间。所以，问题出在游标值的保存上。例如，对于以下查询：

Select * from Orders, Employees, Customers, Products 
  WHERE Orders.EmployeeID IN (4, 5, 6) 
  AND Employees.EmployeeID IN (1, 2, 3) 
  AND Customers.CustomerID LIKE 'A%'

SQL Server需要3秒才能返回244,860行（我的引擎返回相同的行需要16毫秒）。如果你尝试删除条件“Customers.CustomerID LIKE 'A%'”，你将有足够的时间去制作一个三明治，然后才能得到结果；这大约需要35秒！！！而我的引擎只需1毫秒即可获取5,448,135行。无论如何，让我们首先概述记录集游标及其内部结构，然后讨论游标扩展作为虚拟游标的引入。请花时间阅读每个部分，因为每个部分都是下一个部分的钥匙。

记录集游标

要与任何数据库管理系统交互，您应该使用Recordset对象执行SQL语句。Recordset对象执行输入查询并将结果存储在特定数据结构中，以使调用者能够枚举执行结果。枚举通常通过以下函数完成：

`int GetRecordCount()`	返回行数
`bool IsBOF()`	检查游标是否在行首
`bool IsEOF()`	检查游标是否在行尾
`void MoveNext()`	移动到下一行
`void MovePrev()`	移动到上一行
`void MoveFirst()`	移动到第一行
`void MoveLast()`	移动到最后一行
`Move(int nRows)`	从当前行移动`nRows`行
`MoveTo(int nRow)`	移动到`nRow`行

下图表示Cursor组件在DCMS组件中的位置

让我们举一个简单的例子，尝试找到一个合适的结构来表示Cursor。我假设我有四个表：T1、T2、T3和T4。F1是T1中的外键，也是T2中的主键。

对于查询“Select * from T1,T2”，我们得到一个交叉连接的结果，如下表所示：

T1.F1	T2.F1
B	A
C	A
A	A
A	A
C	A
B	B
C	B
A	B
A	B
C	B
B	C
C	C
A	C
A	C
C	C

因此，我们可以将结果想象成行的视图，而游标只是一个值向量，范围从0到行数减1。游标函数只是递增和递减：一个在最小值和最大值之间的变量。但是，如果我们有一个限制结果的条件，例如，如果我们将输入查询修改为：“Select * from T1,T2 where T2.F1=T1.F1”，会发生什么情况？在这种情况下，结果如下：

T1.F1	T2.F1
A	A
A	A
B	B
C	C
C	C

因此，结果是两个表（两个表的乘积）的交叉连接的一个子集。如下图所示：

所以，游标向量只包含项 (2, 3, 5, 11, 14)，行数就是向量的长度 (5 行)。一个简单的算法，以其最简单的形式，将游标值映射到字段记录值，如下：

void CRecordSet::GetFieldValue(CField* pField, VARIANT& var)
{
    // get field record from cursor
    int nFieldRecord = GetFieldRecord(pField);
    // retrieve field value from field
    pField->GetRecord(nFieldRecord, var);
}

int CRecordSet::GetFieldRecord(CField* pField, bool bUseCursor)
{
    // map cursor position a the global cross product of tables (look figure 1)
    int nCursor = bUseCursor ? m_vCursor[m_nCursor] : m_nCursor;
    // iterate  through recordset tables
    int nTableCount = m_vTables.size(), nTableSize, nPrefixSize = 1;
    for (int n = 0; n < nTableCount; n++)
    {
        nTableSize = m_vTables[n]->GetRecordCount();
        if(m_vTables[n] == pField->m_pTable)
            return (nCursor / nPrefixSize) % nTableSize;
        nPrefixSize *= nTableSize;
    }
    return -1;
}

GetFieldRecord 是我原始代码的简化版本。它通过游标向量 m_vCursor 映射记录集内部游标 (m_nCursor)，该向量包含图1所示的最终结果。要理解该函数，我们首先应该理解任意数量表的笛卡尔积，因为这是该函数背后的秘密。两个表的笛卡尔积只是将第二个表的每一行与第一个表的所有行重复。这在图1的左侧表格中很清楚。所以，如果执行

Select * from T3, T4

我们得到以下行：

T3.F2	T4.F3
1	X
2	X
3	X
4	X
1	Y
2	Y
3	Y
4	Y
1	Z
2	Z
3	Z
4	Z

T4的每个项目都与T3的整个行重复，这是这里的关键。在算法中，你可以找到一个变量nPrefixSize，它被初始化为“1”。因此，如果我们遍历记录集表（m_vTables），同时将nPrefixSize与每个表的大小相乘，对于第一个表，nPrefixSize等于1。每行重复一次。然后，nPrefixSize乘以第一个表的大小。因此，对于第二个表，nPrefixSize等于第一个表的大小。它的每一行重复的次数是第一个表的大小。这种迭代一直持续到我们到达字段所属的表。

if(m_vTables[n] == pField->m_pTable)
    return (nCursor / nPrefixSize) % nTableSize;

将nCursor除以nPrefixSize以消除重复，然后取结果除以nTableSize的余数。此算法可以应用于任意数量的表。因此，我们可以通过一个简单而简短的for循环将游标位置映射到字段行值。

记录集执行

在深入探讨我记录集游标构造的目标之前，让我们先了解一下通用的SELECT语句结构。

`SELECT .....`	指定要显示的字段或聚合/数学函数	必要
`DISTINCT`	指定所有返回的行必须是唯一的	可选
`FROM table0, table1, ..., tablen`	指定用于初始化记录集视图的表	必要
`WHERE ......`	指定过滤记录集视图的条件	可选
`GROUP BY field0, field1, ..., fieldn`	根据field0, field1, ..., fieldn的值对数据进行分组	可选
`HAVING ......`	指定在`GROUP BY`之后过滤记录集视图的条件	可选
`ORDER BY field0, field1, ..., fieldn`	根据field0, field1, ..., fieldn中的值对数据进行排序	可选

Select部分可以包含字段名或字段上的数学运算，或聚合函数（count, sum, avg, min, max, stdev）。

以下活动图显示了SELECT语句的执行步骤。我这里只讨论ExecuteWhere部分。

ExecuteWhere 使用 "WHERE ......." 条件来过滤 "FROM tablename(s)" 部分的笛卡尔积。该函数在子条件之间执行布尔运算；每个条件都是对指定表的一个或多个字段的过滤器。例如，在以下条件中：

(orders.orderdate>'5/10/1996' and employees.employeeID=3) and shipname like 'Toms*'

ParseCondition 操作会构建一个像这样的执行栈：

和

orders.orderdate > '5/10/1996'

employees.employeeID = 3

shipname like 'Toms*'

我将描述执行上述查询的三种方法。第一种是为了阐明直接过滤的思想，第二种是虚拟游标的关键（所以，花时间去理解它），但在某些情况下它会占用大量内存，第三种是虚拟游标解决方案，它兼具第二种解决方案的强大功能和更少的内存使用。

直接视图过滤

第一步是假设记录集通过两个表（Orders，Employees）的交叉连接完全展开。因此，记录总数等于Orders的大小乘以Employees的大小。第二步是遍历这个记录计数并为需要过滤的字段调用函数GetFieldRecord(pField, false)，并且只将符合条件的那些值（例如：orders.orderdate>'5/10/1996'）添加到游标向量中。因此，在循环之后，我们有一个可以与其他游标进行布尔运算以获得记录集的最终游标。

因此，该函数调用一个函数来搜索栈顶，该函数会调用其两个操作数并要求它们进行搜索，然后获取它们的两个游标并根据其运算符（AND、OR、XOR）执行布尔运算。我想你明白这是一个递归操作，这里不需要浪费时间和空间来讨论它。

优点

这可用于“Having”执行（聚合函数输出无索引）
它可用于过滤数学表达式的字段，例如：sin(t1.f1)/cos(t2.f2/t3.f3)^2

缺点

大量比较，因为我们比较的是扩展后的记录集视图（交叉连接）
比较是针对字段值进行的，这非常慢（应使用字段索引）
在某些情况下，每个条件游标都可能占用大量内存

游标扩展

首先，我们需要理解游标扩展的概念。因此，我需要举一个非常简单的例子，然后通过一个复杂的例子来扩大规模。如果我们在NorthWind数据库中有一个名为“Orders”的表，并且想要执行一个像这样的查询：

Select * from Orders

我们得到830行（游标向量等于表的长度）。但是，如果执行一个像这样的查询：

Select * from orders where freight<0.2

然后，我们得到5行，游标向量为 (48, 261, 396, 724, 787)。这些值直接映射到完整的表行，我们可以按照“记录集游标”一节中所述获取表行号。但是，如果查询中有两三个表，并且我们有一个像 where freight<0.2 这样的过滤器，会是什么情况？我们应该使用第一种方式：“直接视图过滤”，来过滤三个表的整个视图吗？想象一下三个表：Orders、Employees和Customers。表的交叉连接（完整视图）将有664830行。我们并没有在内存中展开这个视图，我们只是有一个包含从0到664829的序列的游标向量，然后使用“记录集游标”的算法，我们可以到达任何字段的正确记录号。但是，在过滤器 (where freight<0.2) 的情况下，我们如何在不进行“直接视图过滤”的情况下构建游标向量？这就是问题所在！！！

我们如何将查询“Select * from orders where freight<0.2”的向量 (48, 261, 396, 724, 787) 转换为查询“Select * from orders,employees,customers where freight<0.2”的向量 (48, 261, 396, 724, 787, 878, 1091, 1226, ......, 664724, 664787)，其中包含4005行？我想你现在已经知道游标扩展的含义了，所以我们可以将其定义为：“游标扩展是将一个（或多个）表的游标（通过表索引搜索得到的结果）扩展到该（这些）表与多个表的笛卡尔积上的游标的过程”。

这种扩展取决于许多因素：

表在交叉连接中的位置（过滤条件所在的表）

对于三个表，订单表有三个位置：0、1和2。这取决于它在输入查询中的顺序。所以，我们必须有一个考虑到所有情况的算法。

参与过滤的表数量

例如，如果我们有一个条件，如“Employees.employeeid=Orders.employeeid”（关系条件），我们有两个表。扩展取决于这两个表的位置以及它们之间的距离，如果它们是连续的或者被其他表隔开。

游标扩展示例

假设我们有我们的表格

并且我们有查询“SELECT T1.F1,T2.F1,T3.F2 FROM T1,T2,T3 WHERE T1.F1='C'”。

首先，使用T1.F1的索引，我们得到记录(1,4)
其次，我们需要将此结果与 T2、T3 进行交叉连接以获得结果

最终结果是T1、T2和T3三个表的笛卡尔积的子集，游标值为（1、4、6、9、11、14、16、19、21、24、26、29、31、34、36、39、41、44、46、49、51、54、56、59）。

并且，如果我们在查询中改变这三个表的顺序，像这样：“SELECT T1.F1,T2.F1,T3.F2 FROM T2, T1, T3 WHERE T1.F1='C'”，它将是(3, 4, 5, 12, 13, 14, 18, 19, 20, 27, 28, 29, 33, 34, 35, 42, 43, 44, 48, 49, 50, 57, 58, 59)，对于“SELECT T1.F1,T2.F1,T3.F2 FROM T2, T1, T3 WHERE T1.F1='C'”的顺序，它将是(12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59)。

所以，表的顺序会影响输出游标值（我们得到相同的数据，只是顺序不同；要得到特定的顺序，我们可以使用Order By）。因此，我们需要一个算法来根据共享表的顺序将条件向量扩展到最终游标。有人可能会说，“与其那样，我们可以为每个字段构建一个包含其记录的向量，并用它们通过游标位置检索值”。但是，这将占用大量空间来保存所有这些选定字段的向量，特别是当字段数量很大时（超过约20个字段）。除了浪费空间，我们还需要一个非常复杂的算法来在条件之间进行布尔运算（T1.F1='C' AND T3.F2=2）。但是，在我们只保存游标值（相对于共享表的交叉连接）的情况下，对于每个条件，我们可以使用像STL库的set_intersection和merge这样的简单算法，在条件的游标值之间简单地进行布尔运算。

在整个情况下，视图（输出）大小等于条件游标大小乘以其他表的大小。

view size = condition size * other tables' sizes

回到记录集游标部分以及图2、图3。记录集对象包含一个以树结构保存的条件栈，如图3所示。每个节点代表一个WhereItem类型的对象。查询的执行从栈顶的WhereItem开始，它接收记录集游标的引用。记录集游标是一个简单的值向量和一组函数，像这样：

// simplified code for recordset cursor
class CCursor : public vector<__int64>
{
public:
    ... constructors

public:
    vector<CTable*> m_vTables;
    // tables that share in the cursor (ordered with recordset tables)

public:
    // Boolean operators
    CCursor& operator=(const CCursor& input);
    CCursor& operator|=(const CCursor& input);    
    CCursor& operator&=(const CCursor& input);
    CCursor& operator^=(const CCursor& input);
    CCursor operator|(const CCursor& input);    
    CCursor operator&(const CCursor& input);
    CCursor operator^(const CCursor& input);

    // calculate cursor records count
    __int64 GetRecordCount(bool bCursor = true);
    // retrieve field record corresponding to cursor nRow
    int GetRecord(__int64 nRow, CIRField* pField, bool bOrdered = true);
    // Filter current cursor with a condition cursor
    bool Filter(CCursor& vCursor);
    // Expand vCursor (with its tables) to *this (with its tables)
    void Expand(CCursor& vCursor);
    
    ... order by, group by, and distinct functions
};

每个WhereItem可能是一个操作符（带有两个操作数的节点），或者是一个操作数（叶子 - 直接条件）。所以，如果栈顶是一个叶子，它将直接将条件应用于WhereItem表（在解析过程中分配 - 不在本文讨论范围）。如果它是一个操作符，它将调用其两个操作数的Search函数，并将其操作符应用于输出游标。所以，我们可以想象这样的代码：

// this a simplified code (original include more 
// conditions for time optimization and logging)
void WhereItem::Search(CCursor& cursor)
{
    if(m_cOperator) // node
    {
        // copy my cursor (take same tables)
        CCursor copy = m_vCursor;
        // search in the two operands
        m_pOperands[0]->Search(m_vCursor);
        m_pOperands[1]->Search(copy);
        switch(m_cOperator)
        {
        case 'a':    m_vCursor &= copy;  break; // and 
        case 'o':    m_vCursor |= copy;  break; // or
        case 'x':    m_vCursor ^= copy;  break; // xor
        }
    }
    else // leaf
    {
        if(m_vFields.size() == 1)
            // search using Field indexes
            m_vFields[0]->Search();
        else    if(!m_bRelation)  // relation cursor retrieved at parsing process
            m_vCursor.Filter(m_vFields[0], m_vFields[1], m_strQuery, m_op);
    }
    // filter input cursor (may contain more tables) with my filtered values
    cursor.Filter(m_vCursor);
}

栈顶接受包含所有FROM表的记录集游标，但叶子节点可能包含一个或多个表。并且，每个节点包含其操作数的表（按照记录集FROM表的顺序排列）。所以，想象一下输入查询及其节点工作表。

SELECT * FROM
Categories,Products,Employees,Customers,Orders
WHERE
Categories.CategoryID = Products.CategoryID AND
Customers.CustomerID = Orders.CustomerID AND
Employees.EmployeeID = Orders.EmployeeID

正如你从WhereItem::Search函数中可以看到的，它调用cursor.Filter(m_vCursor)来扩展父游标（从父节点传入函数的）。例如，图4中较低的AND调用其操作数时，传入的游标包含Employees、Customers和Orders表。操作数0有一个Customers、Orders表的游标。所以，它获取Customers、Orders表上的游标（关系游标），并尝试将其扩展到Employees、Customers、Orders表。操作数1有一个Employees、Orders表的游标。所以，它获取Employees、Orders表上的游标（关系游标），并尝试将其扩展到Employees、Customers、Orders表。之后，两个操作数的游标都用相同的表（以相同的顺序）扩展。所以，我们可以安全地应用布尔操作。

switch(m_cOperator)
{
    case 'a':   m_vCursor &= copy;  break; // and 
    case 'o':   m_vCursor |= copy;  break; // or
    case 'x':   m_vCursor ^= copy;  break; // xor
}

这个过程一直进行，直到我们将栈顶WhereItem的游标与recordset游标进行扩展，这可能包含更多的表，因为每个节点都从其子节点收集表。这里的关键是每个叶节点/节点都只扩展到其父节点的表，而不是记录集表。因此，它加速了整个执行。正如我之前提到的，任何条件的游标扩展都取决于条件表相对于其父表的顺序。如果没有条件，则表进行笛卡尔积，但有条件时，我们有条件游标与其他表的笛卡尔积。在一些示例之后，我发现我需要计算一些在扩展过程中共享的变量：

// size of the cursor after expansion (calculated before expansion)
__int64 m_nViewSize;
// product of tables' sizes before the first table condition
__int64 m_nPrefixSize;
// product of condition tables' sizes located
// before first middle table (see m_nMiddleSize)
__int64 m_nGroupSize;
// product of tables' sizes between condition tables
__int64 m_nMiddleSize;
// tables cross product records count
// (calculated once) without the condition
__int64 m_nRecordCount;

如下图所示：

m_nViewSize = T1.size * T3.size * T5.size * condition.size;
m_nPrefixSize = T1.size;
m_nGroupSize = T2.size;
m_nMiddleSize = T3.size;
m_nRecordCount = T1.size * T2.size * T3.size * T4.size * T5.size;

游标的函数Filter(CCursor& vCursor)接收条件的游标（过滤器）并将其扩展到其表（或者从另一个方面来说，通过条件游标过滤其表的笛卡尔积）。

bool CCursor::Filter(CCursor& vCursor)
{
    if(vCursor.m_vTables.size() == m_vTables.size())
    {   // if same tables then copy cursor (no need for expansion)
        *this = vCursor;
        return true;
    }
    // initialize values
    m_nViewSize = vCursor.size(), m_nMiddleSize = 1, m_nPrefixSize = 1;
    if(m_nViewSize == 0)
        return true;
    // seek through cursor tables to find search fields' tables 
    vector<int&gt vTableOrder;
    for (vector<CTable*>::iterator table = vCursor.m_vTables.begin(); 
               table != vCursor.m_vTables.end(); table++)
        vTableOrder.push_back(m_vTables.lfindIndex(*table));

    // loop through cursor tables to get relation size
    // before expansion and cursor size after expansion
    int nTableCount = m_vTables.size(), nTableSize;
    for (int nTable = 0; nTable < nTableCount; nTable++)
        if(vTableOrder.lfindIndex(nTable) == -1)
        {
            m_nViewSize *= 
              (nTableSize = m_vTables[nTable]->GetRecordCount());
            for(vector<int&gt::iterator i = 
                   vTableOrder.begin(); i < vTableOrder.end()-1; i++)
                if(nTable > *i && nTable < *(i+1))
                    m_nMiddleSize *= nTableSize;
        }
    // seek for search table to expand cursor
    m_nGroupSize = 1;
    bool bFound = false;
    for (int nTable = 0; nTable < nTableCount; nTable++)
    {
        nTableSize = m_vTables[nTable]->GetRecordCount();
        if(vTableOrder.lfind(nTable))
        {    // expand vCursor into (*this)
            m_nGroupSize *= nTableSize;
            bFound = true;
        }
        else
        {
            if(bFound)
                break;
            m_nPrefixSize *= nTableSize;
        }
    }
    m_nSegmentSize = vCursor.size()*m_nMiddleSize*m_nPrefixSize;
    Expand(vCursor);
    return true;
}

前面的函数初始化了游标扩展所需的变量。现在是介绍Expand函数的时候了。请记住，Expand函数的作用如定义所述：“游标扩展是将一个（或多个）表的游标（通过表索引搜索得到的结果）扩展到该（这些）表与许多表的笛卡尔积上的游标的过程”。

为了找到将条件过滤器扩展到笛卡尔积值的方法，我手动做了一个例子，并研究了结果以找到计算值的通用表达式。下表显示了对查询的尝试：

Select * from T1,T2,T3,T4,T5 where T3.F2=T5.F2

首先想象查询“Select * from T3,T5 where T3.F2=T5.F2”的结果；然后我们可以将其游标扩展到表T1、T2、T3、T4、T5。

对于扩展后的查询“Select * from T1,T2,T3,T4,T5 where T3.F2=T5.F2”

实际上，这并不是唯一的尝试。我尝试了许多样本，但每次尝试都需要一个非常大的Excel表格来演示扩展，这很难包含在文章中。最后，我找到了以下通用表达式：NC=m*G*P+(C+m)*P+p：其中NC是新的游标值，C是旧的游标值。我实现了以下Expand函数来将旧游标映射到新的共享表：

// not optimized version of the original code (for company privacy)
void CCursor::Expand(CCursor& vCursor)
{
    // resize destination cursor with the calculated view size
    resizeEx((int)m_nViewSize);
    // loop in source cursor items to expand items to recordset view
    __int64 nForeignMatch, nViewIncrement = 0, 
      nSegmentSize = vCursor.size()*m_nMiddleSize*m_nPrefixSize, nCursor;
    __int64 nMiddle, nMatch;
    vector<__int64>::iterator _first = vCursor.begin(), 
      _end = vCursor.end(), _dest = begin();
    // loop to fill all calculated view size
    while(m_nViewSize)
    {
        __int64 nMaxForeignMatch = m_nGroupSize;
        for(vector<__int64>::iterator i = _first; 
            i != _end; i += nForeignMatch, 
            nViewIncrement += m_nMiddleSize, 
            nMaxForeignMatch += m_nGroupSize)
        {
            // count cursor values that are under the size
            // of the foreign table
            // (it should be 1 for FroeignPrimary sequence)
            for(nForeignMatch = 0; i+nForeignMatch != _end && 
                 (*(i+nForeignMatch)) < nMaxForeignMatch; 
                 nForeignMatch++);
            // repeat cursor items with the size
            // of table(s) located between relation tables
            for(nMiddle = 0; nMiddle < m_nMiddleSize; nMiddle++)
            {
                nCursor = (nMiddle + nViewIncrement) * 
                           m_nGroupSize * m_nPrefixSize;
                for(nMatch = 0; nMatch < nForeignMatch; nMatch++)
                    for (__int64 nPrefix = 0, c = nCursor + 
                               (*(i+nMatch)%m_nGroupSize)*m_nPrefixSize; 
                               nPrefix < m_nPrefixSize; nPrefix++)
                        *(_dest++) = c+nPrefix;
            }
        }
        m_nViewSize -= nSegmentSize;
    }
}

该算法首先根据预期视图大小调整游标大小。要理解Segment Size的含义，我们应该首先理解如何估算输出视图大小。实际上，视图大小应该是条件大小与剩余表大小的乘积（参见图5）。并且，它可以通过另一种方法计算，可以表示为：

m_nViewSize = vCursor.size()*m_nMiddleSize*
   m_nPrefixSize*(product of tables after last condition table)
// in our case: no tables after last condition table T5, So
m_nViewSize = 5 * 3 * 15 = 225

nSegmentSize 简单地等于 vCursor.size()*m_nMiddleSize*m_nPrefixSize。无论如何，我希望您理解游标扩展的思想，现在不需要完全理解算法，因为我认为任何了解其思想的人都可以提取算法并以自己的方式实现它。

优点

简单快速的算法。
使用STL合并和交集算法，在扩展游标之间进行简单的布尔运算。

缺点

游标扩展的主要问题是内存占用过大，因为扩展后的游标可能会占用大量内存；即使我们使用硬盘来保存它，保存和检索以执行布尔运算也会花费很长时间。下一节将讨论这个缺点。

让我们检查一下查询，并从Windows任务管理器中获取一些测量数据。

Select * 
from orders,employees,customers,orderdetails,shippers,suppliers 
where
customers.customerid=orders.customerid and 
employees.employeeid=orders.employeeid and 
orders.orderid=orderdetails.orderid and 
orders.shipvia=shippers.shipperid

下表包含每个条件及其扩展前后的游标大小

不支持。	条件	扩展表	扩展前	扩展后	大小
1	customers.customerid = orders.customerid	orders, employees, customers, orderdetails, shippers	825 行	48,002,625 行	366 MB
2	employees.employeeid = orders.employeeid	orders, employees, orderdetails, shippers	830 行	5,365,950 行	41 MB
3	orders.orderid = orderdetails.orderid	orders, orderdetails, shippers	2155 行	6,465 行	50.50 KB
4	orders.shipvia = shippers.shipperid	orders, orderdetails, shippers	830 行	1,788,650 行	13.6 MB
5	condition3 and condition4	orders, orderdetails, shippers	2155 行	19,395 行	151.5 KB
6	condition2 and condition5	orders, employees, orderdetails, shippers	2155 行	191,795 行	1.46 MB
7	condition1 and condition6	orders, employees, customers, orderdetails, shippers	2155 行	62,176 行	485.7 KB

如果您注意到，条件customers.customerid=orders.customerid在游标扩展后占用了366 MB（！！！），而扩展前仅为6600字节（825行 * sizeof(__int64)）。下图显示了查询执行期间的Windows任务管理器。

为了克服这个问题，我考虑了虚拟游标，我将在下一节中演示。

SQL Server 再次声明：我不认为 SQL Server 的工作方式相同。实际上，它在查询执行期间或最终结果都不占用内存。但是，如果最终游标太大，则会花费很长时间，因为它会将其保存到硬盘，如果硬盘空间不足，或者需要更大的区域来保存，则可能会失败。无论如何，SQL Server 在保存最终游标（如果它很大）的方式上存在问题。因此，请阅读下一节，看看它是否能帮助解决这个问题。

虚拟游标

在上一节中，我们介绍了游标扩展的概念以及这种扩展如何有助于在查询条件的扩展游标之间进行布尔运算。但是，我们面临内存使用量大和内存不足的风险。因此，我将介绍一种新的游标扩展思路。

如果我们根本不扩展游标，让它们保持未扩展状态会怎样？你可能会问“但我们需要为共享表的笛卡尔积扩展它们”。你说得对，但是，如你所知，我们有Expand()函数中的游标扩展算法，所以我们可以保留相同的算法并按需扩展游标值。是的，这就是主意。我们可以创建一个新的函数GetCursor()，它接收所需的行号并使用相同的算法计算游标值（相对于笛卡尔积）。所以，我们只需要保留共享算法的变量以在该函数中使用。你可能会说“它会很慢”。你也说对了。但是，如果我们添加一些缓存并将缓存窗口扩展到最佳数量，我们可以加快这个过程。此外，我还添加了一些技巧来优化速度并更快地跳转到所需的游标值。

所以，现在的解决方案可以概括为：不扩展游标，只扩展一小部分（例如：10240）游标值，并根据游标类型向前或向后滑动这个窗口。因此，我们将扩展游标的大小替换为其原始大小加上缓存窗口的大小。

GetCursor 算法

GetCursor 算法与 Expand GetCursor 相同，只是它只扩展一小部分游标值。因此，我们可以采用相同的算法并进行一些修改。为了简化代码，我们逐步进行这些修改。初始代码如下：

// not optimized version of the original code (for company privacy)
__int64 CCursor::GetCursor(__int64 nRow)
{
    if(empty() || m_bSerial)
        return nRow;
    if(m_bVirtual == false)
        return (*this)[(int)nRow];
    if(m_nCacheRow != -1 && nRow >= m_nCacheRow && 
            nRow < m_nCacheRow+m_nCacheIndex)
        return m_vCache[nRow-m_nCacheRow];
    m_vCache.resizeEx(m_nCachePageSize);
    m_nCacheRow = nRow;
    m_nCacheIndex = 0;

    __int64 nSegment = nRow/m_nSegmentSize, nMiddle, 
                       nMatch, nPrefix, nForeignMatch;
    __int64 nViewIncrement = nSegment*m_nPageViewIncrement,
        nAdded = nSegment*m_nSegmentSize, 
        nViewSize = m_nViewSize-nAdded;
    while(nViewSize > 0)
    {
        __int64 nForeignMatchIndex = 0;
        // loop in source cursor items to expand items to this
        for(iterator i = begin(), _end = end(); 
                      i != _end; i += nForeignMatch,
            nViewIncrement += m_nMiddleSize)
            // repeat cursor items with the size
            // of table(s) located between relation
            // tables
            if(nForeignMatch = m_vForeignMatch[nForeignMatchIndex++])
                for(nMiddle = 0; nMiddle < m_nMiddleSize; nMiddle++)
                    for(nMatch = 0; nMatch < nForeignMatch; nMatch++)
                        for(nPrefix = 0; nPrefix < m_nPrefixSize; nPrefix++)
                            if(nAdded++ >= nRow)
                            {
                                m_vCache[m_nCacheIndex++] = 
                                  (nMiddle + nViewIncrement) *
                                   m_nGroupSize * m_nPrefixSize + 
                                  (*(i+nMatch)%m_nGroupSize)*m_nPrefixSize + 
                                  nPrefix;
                                if(m_nCacheIndex == m_nCachePageSize)
                                    return m_vCache[0];
                            }
        nViewSize -= m_nSegmentSize;
    }
    return m_vCache[0];
}

m_nCachePageSize 是缓存窗口大小，默认值为10240。
m_nCacheIndex 是实际的缓存窗口大小。
m_nCacheRow 是缓存窗口的起始行。
m_vCache 是缓存窗口向量。

为了加速函数的执行，我们可以在for循环中添加一些检查。查看嵌套循环：

for(nMiddle = 0; nMiddle < m_nMiddleSize; nMiddle++)
    for(nMatch = 0; nMatch < nForeignMatch; nMatch++)
        for(nPrefix = 0; nPrefix < m_nPrefixSize; nPrefix++)
            if(nAdded++ >= nRow)

它们可以是：

if(m_nCacheIndex == 0 && nRow-nAdded > m_nMiddleSize*nForeignMatch*m_nPrefixSize)
    nAdded += m_nMiddleSize*nForeignMatch*m_nPrefixSize;
else
    for(nMiddle = 0; nMiddle < m_nMiddleSize; nMiddle++)
        if(m_nCacheIndex == 0 && nRow-nAdded > nForeignMatch*m_nPrefixSize)
            nAdded += nForeignMatch*m_nPrefixSize;
        else
            for(nMatch = 0; nMatch < nForeignMatch; nMatch++)
                if(m_nCacheIndex == 0 && nRow-nAdded > m_nPrefixSize)
                    nAdded += m_nPrefixSize;
                else
                    for(nPrefix = 0; nPrefix < m_nPrefixSize; nPrefix++)
                        if(nAdded++ >= nRow)

每个if语句都是为了避免在不需要循环时进行循环。无论如何，这里尝试解释该函数是不正确的，因为它需要一块白板和一支记号笔来分解每个细节并为每个部分举例。我只关心其思想和它所面临的问题，而不是其细节。

虚拟游标的布尔运算：布尔运算在条件游标之间进行。如果游标未扩展会怎样？我们如何求交集或合并它们？我之前提到过，我使用的是STL库的set_intersection和merge简单算法。这些算法接收两个游标（向量），执行其工作并将结果输出到目标向量。但是，在虚拟游标的情况下，我们没有向量。我们只有一个向量的缓存窗口。让我们首先看看STL set_intersection算法，看看如何使用它来完成我们的任务。

STL set_intersection

template<class _InIt1,
class _InIt2,
class _OutIt> inline _OutIt set_intersection(_InIt1 _First1, _InIt1 _Last1,
    _InIt2 _First2, _InIt2 _Last2, _OutIt _Dest)
{   // AND sets (_First1, _Last1) and (_First2, _Last2)
    for (; _First1 != _Last1 && _First2 != _Last2; )
        if (*_First1 < *_First2)
            ++_First1;
        else if (*_First2 < *_First1)
            ++_First2;
        else
            *_Dest++ = *_First1++, ++_First2;
        return _Dest;
}

该算法线性遍历两个向量，如果两个值相等，则将其添加到目标。我将使用相同的想法，但不是从向量中获取值，而是调用GetCursor()。并且，由于缓存和前向游标（默认实现），它将非常快。这是CCursor类的&=运算符的一部分：

CCursor& CCursor::operator&=(const CCursor& input)
{
    if(input.empty())
        clear();
    else    if(m_bVirtual || input.m_bVirtual)
    {
        vector<__int64> dest;
        dest.set_grow_increment(10240);
        __int64 nCursor[2] = { GetCursor(0),
           ((CCursor*)&input)->GetCursor(0) };
        __int64 nSize0 = GetCursorCount(), 
          nSize1 = ((CCursor*)&input)->GetCursorCount();
        // STL set_intersection algorithm
        for (__int64 n0 = 0, n1 = 0; n0 < nSize0 && n1 < nSize1; )
            if (nCursor[0] < nCursor[1])
                nCursor[0] = GetCursor(++n0);
            else    if (nCursor[0] > nCursor[1])
                nCursor[1] = ((CCursor*)&input)->GetCursor(++n1);
            else
                dest.push_backEx(nCursor[0]), 
                    nCursor[0] = GetCursor(++n0),
                    nCursor[1] = ((CCursor*)&input)->GetCursor(++n1);
        assignEx(dest);
        m_bVirtual = false;
    }
    else
        // if the two input vectors are sorted use normal intersection
        resizeEx(set_intersection(begin(), end(), (iterator)input.begin(),
            (iterator)input.end(), begin())-begin());
    ...
    return *this;
}

同样，我们可以实现其他运算符（|=、^=、-=）。输出游标不是虚拟的，所以变量m_bVirtual设置为false。输出游标不是虚拟的，但可以由其组合器WhereItem获取，并相对于WhereItem的父级使其虚拟化。为了澄清前述声明，我们应该知道，如果游标的条件包含的表少于其父表的表，则该游标被设置为虚拟。所以，我们必须扩展它或使其虚拟化。有人可能会问，那Order By、Group By和Distinct呢？我们如何对虚拟游标进行排序？这可能吗？实际上，我曾多次尝试并思考这个问题，但我未能找到关于对虚拟游标进行排序的明确思路。但是，这里的优点是Order By是在记录集的最终游标上完成的，该游标通常很小，因为它是某些过滤条件的结果。因此，为了解决这个问题，如果查询包含Order By、Group By或Distinct，我将按照游标扩展一节中所述扩展最终游标。

bool CRecordset::ExecuteWhere()
{
    // construct record set initial view
    m_vCursor.m_vTables = m_vFromTables;
    // check if there is a where statement to be parsed
    if(m_strWhere.IsEmpty() == false)
    {
        // start search from stack top item
        m_vWhereStack[0]->Search(m_vCursor, false);
        if(m_vGroupByFields.empty() == false || m_bDistinct ||
            m_vOrderFields.empty() == false)
            m_vCursor.Expand();
    }
    else
        m_vCursor.m_bSerial = true;
    return true;
}

注意条件：

if(m_vGroupByFields.empty() == false || m_bDistinct ||
    m_vOrderFields.empty() == false)
    m_vCursor.Expand();

请看图2，了解此函数在执行Order By或Group By之前被调用。

将虚拟游标技术应用于查询后

Select * 
from orders,employees,customers,orderdetails,shippers,suppliers 
where
customers.customerid=orders.customerid and 
employees.employeeid=orders.employeeid and 
orders.orderid=orderdetails.orderid and 
orders.shipvia=shippers.shipperid

我们可以在下表中检查每个条件及其游标大小：

不支持。	条件	扩展表	光标	大小
1	customers.customerid = orders.customerid	orders, employees, customers, orderdetails, shippers	825 行	6600 字节
2	employees.employeeid = orders.employeeid	orders, employees, orderdetails, shippers	830 行	6640 字节
3	orders.orderid = orderdetails.orderid	orders, orderdetails, shippers	2155 行	17 KB
4	orders.shipvia = shippers.shipperid	orders, orderdetails, shippers	830 行	6640 字节
5	condition3 and condition4	orders, orderdetails, shippers	2155 行	17 KB
6	condition2 and condition5	orders, employees, orderdetails, shippers	2155 行	17 KB
7	condition1 and condition6	orders, employees, customers, orderdetails, shippers	2155 行	17 KB

请注意，最大游标大小仅为17 KB，而在扩展情况下为366 MB。

感谢...

感谢上帝，也感谢Harf公司给了我开发这个有趣项目的机会。

更新

2008年5月1日：首次发布。