MongoDB教程第四天：聚合

VijayRana

4.76/5 (20投票s)

2016年5月21日

CPOL

9分钟阅读

647759

MongoDB 中的聚合

引言

欢迎来到 MongoDB 教程的第四天（相信我，这会花费不止一天）。在本文中，我们将看到 MongoDB 的一个重要特性，即聚合（Aggregation）。到目前为止，我们已经学习了如何在 MongoDB 中进行插入（Insert）/更新（Update）/删除（Delete）和索引（Index），但如果我们谈论一个真实的应用程序，那么聚合是不可或缺的。

MongoDB 在 2.2 版本中引入了一个名为聚合框架的重大改变。在聚合框架之前，MongoDB 使用 Map/Reduce 进行此类操作。因此，聚合是 Map/Reduce 的替代方案。

那么什么是聚合呢？简单来说，我们可以说聚合对文档执行操作并提供计算结果。

背景

在此之前，最好对 MongoDB 有一定的了解。因此，最好先阅读我下面的文章。

聚合函数

为了实现聚合，我们在 MongoDB 中使用聚合函数。聚合函数的语法是

db.collectionName.aggregate(pipeline options)

CollectionName：CollectionName 是我们想要应用聚合函数的集合的名称。

Pipeline：聚合管道是一个为我们执行聚合的框架。当我们使用聚合框架时，MongoDB 会将集合中的文档通过一个管道。在这个管道中，文档会经过不同的阶段。每个阶段都会改变或转换文档，最终我们得到计算结果。

聚合管道阶段

在聚合管道中，我们有以下阶段（这不是一个完整的列表。有关完整列表，请访问 MongoDB 官方网站），并与 SQL Server 进行映射，以便我们在对 SQL Server 有一点了解的情况下能有一个清晰的认识。

在 SQL 中	在 MongoDB 中	描述
`Select`	`$project`	将字段与现有字段或新字段一起传递到下一阶段。我们可以动态添加新字段。
`其中`	`$match`	这将过滤文档，并将只匹配的文档传递到下一个管道阶段。
`限制`	`$limit`	限制前 x 个未修改的文档，并将它们传递到管道的下一阶段。x 是将通过管道下一阶段的文档数量。
`GroupBy`	`$group`	这将对文档进行分组，并将它们传递到聚合管道的下一阶段。
`OrderBy`	`$sort`	它将以升序或降序更改文档的顺序。
`Sum`	`$sum`	计算所有数值的总和。
`Join`	`$lookup`	它将与同一个数据库中的另一个集合执行左外连接。

说了这么多！是时候通过一些示例来理解一些管道阶段和操作符了。

现在假设我们有一个 School 数据库和一个 Student 集合，如下所示

db.Student.insert({StudentName : "Vijay",Section : "A",
                   Marks:70,Subject:["Hindi","English","Math"]})
db.Student.insert({StudentName : "Gaurav",Section : "A",Marks:90,Subject:["English"]})
db.Student.insert({StudentName : "Ajay",Section : "A",Marks:70,Subject:["Math"]})
db.Student.insert({StudentName : "Ankur",Section : "B",Marks:10,Subject:["Hindi"]})
db.Student.insert({StudentName : "Sunil",Section : "B",Marks:70,Subject:["Math"]})
db.Student.insert({StudentName : "Preeti",Section : "C",
                   Marks:80,Subject:["Hindi","English"]})
db.Student.insert({StudentName : "Anuj",Section : "C",Marks:50,Subject:["English"]})
db.Student.insert({StudentName : "Palka",Section : "D",Marks:40,Subject:["Math"]})
db.Student.insert({StudentName : "Soniya",Section : "D",
Marks:20,Subject:["English","Math"]})

我们将看到不同的阶段以及它们如何作用于这个 Student 集合。所以，让我们准备好迎接一些好东西。

$match

$match 类似于 SQL 中的 Where。在 SQL 中，我们使用 Where 来过滤数据，这里也是一样。如果我们需要在聚合管道的下一阶段只传递数据的子集，那么我们使用 $match。$match 过滤数据并将匹配的数据传递到管道的下一阶段。

示例 1

假设我们想根据 Student 集合中的A 部分过滤数据，那么我们将使用如下 $match

db.Student.aggregate
(
 [
  {
     "$match":
     {
        "Section":"A"
     }
  }
 ]
)

这将根据我们的 $match 过滤数据，并且只会将 Section 为 A 的 3 行传递到管道的下一阶段。

结果

示例 2

假设我们想找出所有 Section 为 A 且 Marks 大于 80 的记录。

db.Student.aggregate (
[
   {
       $match:
       {
           $and:[{Section:'A'},{Marks: {"$gt" :80}}]
       }
    }
 ]
)

这将给我们一条记录。

结果

注意：聚合函数中可以有多个 $match。

$project

我们可以将此子句与 SQL 中的 SELECT 进行比较。我们可以通过 $project 选择某些字段，重命名文档中的字段。简而言之，$project 通过添加/删除或重命名文档来重塑文档，以便进行管道的下一阶段。在 $project 中，如果我们想包含 Field，我们使用 1 或 true，如果想排除某个特定字段，我们使用 0 或 false。

示例 1

在下面的查询中，我们只想从 student 集合中获取 StudentName、Section 和 Marks，那么我们将使用下面的查询

db.Student.aggregate
(
 [
  {
       "$project":{StudentName : 1,Section:1,Marks:1}
  }
 ]
)

示例 2

现在，如果我们想从 Student 集合中找出 StudentName、Section 和 Marks，其中 Section 是 'A'，那么我们将同时使用 $project 和 $match。

db.Student.aggregate
(
 [
  {
     "$match":
     {
        "Section":"A"
     }
  },
  {
       "$project":
     {
         StudentName : 1,Section:1,Marks:1
     }
  }
 ]
)

结果

注意：_id 默认可见，如果我们不希望结果中包含 _id 字段，那么我们需要像下面这样明确地删除它

 "$project":{StudentName : 1,Section:1,Marks:1,_id:0}

$unwind

$unwind 作用于文档中的数组字段。$unwind 为数组中的每个数组元素创建一个新文档。$unwind 的输出是文档中数组的每个条目的新文档。我们使用 $unwind 来扁平化数据。

示例 1

假设我们想对一个名为 Vijay 的文档应用 $unwind。在这个文档中，我们有一个名为 Subject 的数组字段，其中包含三个科目：Hindi、English 和 Math。让我们看看 $unwind 将如何处理这个文档

db.Student.aggregate
(
 [
  {
     "$match":
     {
        "StudentName":"Vijay"
     }
  },
  {
       "$unwind":"$Subject"
  }
 ]
)

结果

示例 2

如果只想选择 StudentName、Section、Marks、Subject，那么可以结合 $match 和 $unwind 使用 $project，如下所示：

db.Student.aggregate
(
 [
  {
     "$match":
       {
        "StudentName":"Vijay"
       }
  },
  {
       "$unwind":"$Subject"
  },
  {
       "$project":
        {
         StudentName : 1,Section:1,Marks:1,Subject:1
        }
  }
 ]
)

MongoDB 是无模式的，因此某些文档可能不包含数组，或者某些文档包含空数组，那么 $unwind 会对这些文档起作用吗？答案是肯定的！在 MongoDB 3.2 版本之后，如果文档包含空数组或不包含数组，那么管道将忽略输入文档，并且不会为这些文档生成输出文档。

在 MongoDB 3.2 版本之前，如果我们没有数组或者我们有一个空数组并且我们正在使用 $unwind，那么 MongoDB 会生成一个错误。

示例 3

让我向我们的 Student 集合中添加两个文档，如下所示。在第一个文档中，我们有一个空数组，在第二个文档中，我们没有任何数组字段。

db.Student.insert({StudentName : "Tarun",Section : "A",Marks:95,Subject:[]})
db.Student.insert({StudentName : "Saurabh",Section : "A",Marks:95})

现在，让我再次对 StudentName 为 Tarun 和 Saurabh 的文档运行 $unwind。

db.Student.aggregate
(
 [
  {
     "$match":
     {
        "StudentName":{$in:["Saurabh","Tarun"]}
     }
  },
  {
       "$unwind":"$Subject"
  }
 ]
)

因此，上述查询不会生成任何输出文档，因为数组缺失或为空。

includeArrayIndex 参数

示例 4：在 $unwind 中，我们可以传递名为 includeArrayIndex 的第二个参数，如果我们想在结果中包含 ArrayIndex，我们可以将其传递给 $unwind。

db.Student.aggregate
(
 [
  {
     "$match":
     {
        "StudentName":"Vijay"
     }
  },
  {
       "$unwind":{ path: "$Subject", includeArrayIndex: "arrayIndex" }
  }
 ]
)

结果

$group

MongoDB 使用 $group 根据某个指定的表达式对文档进行分组。$group 类似于 SQL 中的 Group 子句。SQL 中的 Group 在没有聚合函数的情况下是不可能的，这里也是一样。在 MongoDB 中，没有聚合函数我们无法分组。让我们通过一个示例来理解。

示例 1

假设我们想按 Section 分组查找总分数，那么我们将使用如下 $group

db.Student.aggregate ([
   {
      "$group":
      {
         "_id":
         {
            "Section" : "$Section"
         },
         "TotalMarks":
         {
            "$sum": "$Marks"
         }
      }
   }
])

在此查询中，_id 字段是强制性的。在 _id 中，我们传递要对其文档进行分组的字段。这将给我们以下结果

结果

示例 2

如果只想获取“A”部分的“总分数”，我们也可以传递一个“$match”。

db.Student.aggregate ([
   {
       "$match":{Section :'A'}
   },
   {
      "$group":
      {
         "_id":
         {
            "Section" : "$Section"
         },
         "TotalMarks":
         {
            "$sum": "$Marks"
         }
      }
   }
])

这只会汇总“A”部分的“总分数”。

结果

示例 3

假设我们想获取每个部分的学生的数量以及总分和平均分

db.Student.aggregate ([

   {
      "$group":
      {
         "_id":
         {
            "Section" : "$Section"
         },
         "TotalMarks":
         {
            "$sum": "$Marks"
         },
         "Count":{ "$sum" : 1},
         "Average" : {"$avg" : "$Marks"}
      }
   }
])

结果

示例 4

如果想在上述查询中重命名列名（将 Section 重命名为 SectionName，将 TotalMarks 重命名为 Total），那么我们可以结合 $group 使用 $project，如下所示：

db.Student.aggregate ([

   {
      "$group":
      {
         "_id":
         {
            "Section" : "$Section"
         },
         "TotalMarks":
         {
            "$sum": "$Marks"
         },
         "Count":{ "$sum" : 1},
         "Average" : {"$avg" : "$Marks"}
      }
   },
   {
       "$project" :
       {
           "SectionName" : "$_id.Section",
           "Total" : "$TotalMarks"
       }
   }
])

$sort

$sort 类似于 SQL Server 中的 orderby 子句。在 MongoDB 中，我们有 $sort 来实现这一点。$sort 将按升序或降序对文档进行排序，如下所示。MongoDB 使用 1 表示升序，-1 表示降序。

示例 1

如果想按 SectionName 降序对结果进行排序，那么可以使用 $sort。

db.Student.aggregate ([
   {
      "$group":
      {
         "_id":
         {
            "Section" : "$Section"
         },
         "TotalMarks":
         {
            "$sum": "$Marks"
         },
         "Count":{ "$sum" : 1},
         "Average" : {"$avg" : "$Marks"}
      }
   },
   {
       "$project" :
       {
           "SectionName" : "$_id.Section",
           "Total" : "$TotalMarks"
       }
   },
   {
       "$sort":{"SectionName":-1}
   }
])

$limit

$limit 操作符用于将 n 个文档传递到下一个管道阶段，其中 n 是 limit。n 是文档的数量。

示例 1

如果我们要按照上面的查询对文档进行排序，并且只需要将两个文档传递到管道的下一阶段，那么我们使用 $limit。

db.Student.aggregate ([

   {
      "$group":
      {
         "_id":
         {
            "Section" : "$Section"
         },
         "TotalMarks":
         {
            "$sum": "$Marks"
         },
         "Count":{ "$sum" : 1},
         "Average" : {"$avg" : "$Marks"}
      }
   },
   {
       "$project" :
       {
           "SectionName" : "$_id.Section",
           "Total" : "$TotalMarks"
       }
   },
   {
       "$sort":{"SectionName":-1}
   },
   {
       "$limit" : 2
   }
])

结果

$skip

$skip 用于跳过前 n 个文档，其余的将被传递到下一个 pipeline。n 是我们想要跳过的文档数量。

示例 1

在上面的例子中，如果我们想跳过第一个文档，然后将接下来的两个文档传递到管道的下一阶段，那么我们将使用下面的查询

db.Student.aggregate ([

   {
      "$group":
      {
         "_id":
         {
            "Section" : "$Section"
         },
         "TotalMarks":
         {
            "$sum": "$Marks"
         },
         "Count":{ "$sum" : 1},
         "Average" : {"$avg" : "$Marks"}
      }
   },
   {
       "$project" :
       {
           "SectionName" : "$_id.Section",
           "Total" : "$TotalMarks"
       }
   },
   {
       "$sort":{"SectionName":-1}
   },
   {
       "$skip" : 1
   },
   {
       "$limit" : 2
   }
])

结果

$lookup

这是 MongoDB 中最受期待的功能。$lookup 等同于 SQL 中的 join。$lookup 随 MongoDB 3.2 版本发布。在 MongoDB 3.2 版本之前，没有 join 的概念（我在我的第一篇文章中提到 MongoDB 不支持 Join）。让我们通过一个示例来理解这一点。

示例 1

假设我们有两个集合，名为 Country 和 City，如下所示

db.Country.insert({"_id":1,"Name":"India"})
db.Country.insert({"_id":2,"Name":"US"})
db.Country.insert({"_id":3,"Name":"UK"})
db.Country.insert({"_id":4,"Name":"Australia"})

db.City.insert({"_id":1,"Name":"Delhi","CountryID":1})
db.City.insert({"_id":2,"Name":"Noida","CountryID":1})
db.City.insert({"_id":3,"Name":"Chicago","CountryID":2})
db.City.insert({"_id":4,"Name":"London","CountryID":3})
db.City.insert({"_id":5,"Name":"Bristol","CountryID":3})
db.City.insert({"_id":6,"Name":"Sydney","CountryID":4})

如果想获取所有与国家关联的城市，那么我们将使用 $lookup，如下所示：

db.City.aggregate([
    {
      $lookup:
        {
          from: "Country",
          localField: "CountryID",
          foreignField: "_id",
          as: "Country"
        }
   }
])

在上面的查询中，我们将 City 与 Country 集合连接起来，其中 CountryID 是 City 的本地字段，_id 是 Country 的外键字段。

$redact

MongoDB 使用 $redact 根据文档中存储的信息来限制文档的内容。为了更好地理解这一点，我将先介绍 $cond、$setIntersection、$size，然后再介绍 $redact。

$cond

$cond 检查一个布尔表达式，并根据结果返回表达式。这并非管道中的一个阶段，但了解 $cond 的工作原理很有益处，因为我们很快就会用到它。

$cond 遵循以下语法

{ $cond: { if: (boolean-expression), then: (true-case), else: (false-case) } }

示例 1

在我们的 Student 集合中，如果分数大于 70，我们希望结果显示 Good，如果分数小于 70，我们希望结果显示 Poor，那么我们可以使用如下 $cond

db.Student.aggregate(
   [
      {
         $project:
           {
             StudentName: 1,
             Result:
               {
                 $cond: { if: { $gte: [ "$Marks", 70 ] }, then: "Good", else: "Poor" }
               }
           }
      }
   ]
)

结果

$setIntersection：-

$setIntersection 接收两个数组作为输入，并返回一个包含两个数组中公共元素的数组。

假设我的 Test 集合中的一个文档中有两个数组，如下所示

db.Test.insert({"Array1":["1","2","3"],"Array2":["1","2","3","4","5"]})

示例

如果想找出两个数组之间的公共元素，那么我们将使用 $setIntersection，如下所示：

db.Test.aggregate(
   [
     { $project: { "Array1": 1, "Array2": 1, commonToBoth: 
     { $setIntersection: [ "$Array1", "$Array2" ] }, _id: 0 } }
   ]
)

结果

$size

$size 计算并返回数组中元素的总数。在下面的查询中，我们正在计算 Array1 和 Array2 的元素。

db.Test.aggregate(
   [
      {
         $project: {
            Array1count: { $size: "$Array1" },
            Array2count: { $size: "$Array2" }
         }
      }
   ]
)

结果

所以我们对 $cond、$setIntersection 和 $size 已经了解得很好了，现在让我们通过一个例子来理解 $redant，但是要删除数组为 null 或缺失的记录，否则 $redant 会抛出异常。所以我从 Student 集合中删除了两个数组为 null 或为空的文档（这两个数组为空或缺少分数的文档的分数都是 95）

db.Student.remove({Marks:95})

var SubjectAccess=["Math","Hindi"];
db.Student.aggregate(
[{
    "$match": {"Section":"A"}
},
{
    $redact:{
         $cond: {
           if: { $gt: [ { $size: { $setIntersection: 
                      [ "$Subject", SubjectAccess ] } }, 0 ] },
           then: "$$DESCEND",
           else: "$$PRUNE"
         }
        }
}])

上述查询将检查 Subject 是否包含数据，无论是 Hindi 还是 Math，如果包含，则允许文档通过管道的下一阶段，它将限制所有不包含 Math 或 Hindi 的文档，当然，它也会匹配 section 为“A”的条件。

结果

第四天就到这里了。说实话，这对于一天，甚至一周来说都绰绰有余了。

下一步：MongoDB 与 C# 的连接

最后，如果这篇文章对您有帮助并且您喜欢它，请在上方投票。

历史

2016年5月21日：初始版本