相关子查询 vs. Case 语句
一个简单的案例研究,比较使用子查询与 Case 语句执行数据聚合的效率。
引言
这是一个简单的案例研究,比较使用相关子查询或 case 语句执行数据聚合的效率。
背景
这是一个常见的场景:我有一个新闻网站,它按通常的类别显示新闻故事
- 国际新闻
- 本地新闻
- 体育
- 生活方式
- 等等。
当用户离开网站时,每个新闻项目的页面浏览量都存储在数据库中。
 INSERT PageHits( News_ID, HitDate, NoHits )
   VALUES(@News_ID, @Date, @Hits)
现在,负责发工资的人要求对这些信息进行一些报告。也就是说,他们想知道某个范围内每个日期的新闻类别的页面点击量。
| 日期 | 本地新闻 | 国际新闻 | 本地体育 | 国际体育 | 商用版 | 生活方式 | 犯罪 | 天气 | 总点击量 | 
| 2010-02-25 | 1 | 2 | 2 | 4 | 0 | 0 | 0 | 0 | 9 | 
| 2010-02-26 | 1 | 36 | 2 | 0 | 0 | 5 | 0 | 0 | 44 | 
| 2010-02-27 | 35 | 10 | 0 | 0 | 0 | 2 | 0 | 0 | 47 | 
为了演示所需的步骤,让我们构建一个名为 NewsSite 的简单示例数据库
  USE Master
  GO
  CREATE DATABASE NewsSite
  Go
  -- Now add some tables...
  USE NewsSite
  GO
  CREATE TABLE Category(
    ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    Name VARCHAR(30) UNIQUE NOT NULL
  )
  GO
  CREATE TABLE NewsItem(
    ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    Cat_ID INT NOT NULL FOREIGN KEY REFERENCES Category(ID),
    ItemDate DATETIME NOT NULL DEFAULT GETDATE(),
    Name VarChar(100) NOT NULL
  )
  GO
  CREATE TABLE NewsHits(
    ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    Date DateTime Not Null Default GETDATE(),
    News_ID INT NOT NULL FOREIGN KEY REFERENCES NewsItem(ID),
    Hits INT NOT NULL
  )
  GO
现在我们添加一些虚拟数据(我将跳过一些无聊的东西,但你会明白的......
USE NewsSite
GO
-- Category Table
INSERT Category(Name)
  VALUES('Local News')
INSERT Category(Name)
  VALUES('World News')
INSERT Category(Name)
  VALUES('Local Sport')
INSERT Category(Name)
  VALUES('World Sport')
INSERT Category(Name)
  VALUES('Business')
INSERT Category(Name)
  VALUES('Lifestyle')
INSERT Category(Name)
  VALUES('Crime')
INSERT Category(Name)
  VALUES('Weather') 
-- NewsItem Table
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(1, '2/27/2010','Man Bites Dog')
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(1, '2/27/2010','Rat Bites Cat')
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(3, '2/27/2010','Rugby League Player Not in Trouble')
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(4, '2/27/2010','NFL Player Goes to Jail')
-- ...
-- ...
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(5, '2/27/2010','USD Plummets')
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(7, '2/27/2010','Murder in East LA')
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(8, '2/27/2010','Chile Tsunami Warning') 
-- NewsHits Table
INSERT NewsHits(Date, News_ID, Hits)
  VALUES('2/25/2010', 1, 2)
INSERT NewsHits(Date, News_ID, Hits)
  VALUES('2/25/2010', 3, 1)
INSERT NewsHits(Date, News_ID, Hits)
  VALUES('2/25/2010', 7, 4)
-- ...
-- ...
INSERT NewsHits(Date, News_ID, Hits)
  VALUES('2/27/2010', 2, 2)
INSERT NewsHits(Date, News_ID, Hits)
  VALUES('2/26/2010', 4, 7)
Go
相关子查询
它就在这里,非常辉煌 - 如果您的眼睛容易受伤,请避开:
USE NewsSite
GO
-- Outer Query
SELECT nh.Date,  CASE ISNULL(ln.LocalNews, 0)
                  WHEN 0 THEN 0
                   ELSE ln.LocalNews
                   END AS LocalNews,
                CASE ISNULL(wn.WorldNews, 0)
                   WHEN 0 THEN 0
                   ELSE wn.WorldNews
                   END AS WorldNews,
                CASE ISNULL(ls.LocalSport, 0)
                   WHEN 0 THEN 0
                   ELSE ls.LocalSport
                   END AS LocalSport,
                CASE ISNULL(ws.WorldSport, 0)
                   WHEN 0 THEN 0
                   ELSE ws.WorldSport
                   END AS WorldSport,
                CASE ISNULL(b.Business, 0)
                   WHEN 0 THEN 0
                   ELSE b.Business
                   END AS Business,
                CASE ISNULL(l.Lifestyle, 0)
                   WHEN 0  THEN 0
                   ELSE l.Lifestyle
                   END AS LifeStyle,
                CASE ISNULL(c.Crime, 0)
                   WHEN 0  THEN 0
                   ELSE c.Crime
                   END AS Crime,
                CASE ISNULL(w.Weather, 0)
                   WHEN 0  THEN 0
                   ELSE w.Weather
                   END AS Weather,
                SUM(nh.Hits) As TotalHits
                
 FROM NewsHits nh
  -- Inner Queries...
  LEFT JOIN (SELECT nhh.Date, SUM(nhh.Hits) As LocalNews
              FROM NewsHits nhh
                JOIN NewsItem nii
                  ON nhh.News_ID = nii.ID
                JOIN Category cc
                  ON nii.Cat_ID = cc.ID
              WHERE cc.Name = 'Local News'
              GROUP BY nhh.Date) ln
  ON nh.Date = ln.Date
 
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits)  As WorldNews
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'World News'
                GROUP BY nhh.Date) wn
  ON nh.Date = wn.Date
 
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As LocalSport
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'Local Sport'
                GROUP BY nhh.Date) ls
  ON nh.Date = ls.Date
 
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As WorldSport
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'World Sport'
                GROUP BY nhh.Date) ws
  ON nh.Date = ws.Date
 
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As Business
               FROM NewsHits nhh
                 JOIN NewsItem nii
                   ON nhh.News_ID = nii.ID
                 JOIN Category cc
                   ON nii.Cat_ID = cc.ID
               WHERE cc.Name = 'Business'
               GROUP BY nhh.Date) b
  ON nh.Date = b.Date
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As Crime
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'Crime'
                GROUP BY nhh.Date) c
  ON nh.Date = c.Date
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As Weather
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'Weather'
                GROUP BY nhh.Date) w
  ON nh.Date = w.Date
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As LifeStyle
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'Lifestyle'
                GROUP BY nhh.Date) l
  ON nh.Date = l.Date
WHERE nh.Date  BETWEEN '2/24/2010' AND '2/28/2010'
GROUP BY nh.Date, ln.LocalNews, wn.WorldNews, ls.LocalSport,
         ws.WorldSport, b.Business, c.Crime, w.Weather,
         l.LifeStyle
GO
丑陋,不是吗? 根据我更有学问的朋友的说法,这还不是这个物种特别残酷的版本!
它很容易理解,外部查询返回范围内每个日期的子查询结果。 运行示例脚本,您将看到类似于上表的输出。 当有人好心地指出至少在这种情况下,我最好使用更简单的 CASE 语句结构时,我很高兴在生产环境中使用这个(或类似但更难看的东西)。
CASE 语句
USE NewsSite
Go
SELECT nh.[Date],
       SUM(CASE WHEN c.Name = 'Local News'
                THEN nh.Hits ELSE 0 END) AS 'Local News',
       SUM(CASE WHEN c.Name = 'World News'
                THEN nh.Hits ELSE 0 END) AS 'World News',
       SUM(CASE WHEN c.Name = 'Local Sport'
                THEN nh.Hits ELSE 0 END) AS 'Local Sport',
       SUM(CASE WHEN c.Name = 'World Sport'
                THEN nh.Hits ELSE 0 END) AS 'World Sport',
       SUM(CASE WHEN c.Name = 'Business'
                THEN nh.Hits ELSE 0 END) AS 'Business',
       SUM(CASE WHEN c.Name = 'Lifestyle'
                THEN nh.Hits ELSE 0 END) AS 'Lifestyle',
       SUM(CASE WHEN c.Name = 'Crime'
                THEN nh.Hits ELSE 0 END) AS 'Crime',
       SUM(CASE WHEN c.Name = 'Weather'
                THEN nh.Hits ELSE 0 END) AS 'Weather',
       SUM(nh.Hits) As [Total Hits]
FROM NewsHits nh
  JOIN NewsItem ni
    ON ni.ID = nh.News_ID
  JOIN Category c
    ON ni.Cat_ID = c.ID
WHERE nh.[Date] BETWEEN '2/24/2010' AND '2/28/2010'
GROUP BY nh.[Date]
GO
这好多了,更赏心悦目,而且更容易一目了然地理解。
决定性因素是检查查询的执行时间
| SP 结果 | CSQ | CASE | 
| INSERT、DELETE和UPDATE语句的数量 | 0 | 0 | 
| 受 INSERT、DELETE和UPDATE语句影响的行数 | 0 | 0 | 
| SELECT语句的数量 | 1 | 1 | 
| SELECT语句返回的行数 | 3 | 3 | 
| 事务数量 | 0 | 0 | 
| 网络统计 | ||
| 服务器往返次数 | 1 | 1 | 
| 从客户端发送的 TDS 数据包 | 3 | 1 | 
| 从服务器收到的 TDS 数据包 | 1 | 1 | 
| 从客户端发送的字节数 | 8748 | 1990 | 
| 从服务器收到的字节数 | 435 | 445 | 
| 时间统计 | ||
| 客户端处理时间 (ms) | 1.2 | 4.3 | 
| 总执行时间 (ms) | 9.6 | 5.2 | 
| 服务器回复等待时间 (ms) | 8.4 | 0.9 | 
CASE 语句的运行速度几乎是相关子查询示例的两倍(9.6 毫秒 vs 5.5 毫秒)。
CSQ 在客户端花费 1.2 毫秒进行处理,然后等待服务器 8.4 秒,而 CASE 示例在客户端花费 4.2 毫秒进行处理,然后仅等待服务器 0.9 毫秒。
在这个小例子中这可能并不重要,但在可能存在数千或更多记录的情况下,CSQ 开始在这场竞赛中逐渐消失。
在一个大型数据集上使用类似的结构,CASE 示例比 CSQ 快 100 倍以上。 当然,在添加一个或两个索引后,我能够将其减少到大约 16:1。
不要跳得太快...
虽然我可以在这种情况下替换一种不同的方法,但由于连接的表结构等原因,可能并非总是如此,您可能不得不使用相关子查询。 如果你确实要走这条路,请记住设置适当的索引以确保你的查询尽可能高效地运行。
关注点
这篇文章源于 General Database Forum 中的一次简短讨论,并表明当你想到解决方案时,第一个“好”主意可能并不总是最好的。
哦,CodeProject 是一个很棒的技术资源,经常有才华横溢且慷慨的人光顾 - 感谢 Mycroft Holmes 和 i.jrussell 指引我走上这条路。
历史
- 版本 1




