T-SQL 中的超额累进求和





5.00/5 (3投票s)
计算 T-SQL 中的累进求和
引言
业务开发人员经常遇到的一个问题是,根据日期创建一系列现金流入/流出的累积求和。 这里的累积求和指的是现金流入/流出的总和,例如:
日期 | 流入/流出 | 金额 | 累积求和 |
1/1/2016 | 流入 | 10000 | 10000 |
3/1/2016 | 流出 | 5000 | 5000 |
3/1/2016 | 流入 | 3000 | 8000 |
4/1/2016 | 流出 | 12000 | -4000 |
安装
例如,假设我们在 SQL Server 中创建以下 Moves
表
/****** Object: Table [dbo].[Moves] Script Date: 12/Απρ/2016 6:37:36 μμ ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Moves](
[RecID] [int] IDENTITY(1,1) NOT NULL,
[MoveDate] [datetime] NULL,
[MoveDescr] [nvarchar](2000) NULL,
[MoveSignID] [int] NULL,
[MoveProjValue] [decimal](18, 4) NULL,
[MoveActualValue] [decimal](18, 4) NULL,
[MoveComments] [nvarchar](2000) NULL,
CONSTRAINT [PK_Moves] PRIMARY KEY CLUSTERED
(
[RecID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, _
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Index [MovesIndex] Script Date: 12/Απρ/2016 6:37:42 μμ ******/
CREATE NONCLUSTERED INDEX [MovesIndex] ON [dbo].[Moves]
(
[MoveDate] ASC,
[MoveSignID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, _
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, _
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
MoveSignID
的值为 1
表示流入(例如,客户支付给我们的现金),2
表示流出(例如,支付给供应商的款项)。
然后,我们可以使用 create sample Moves.zip 查询创建一些测试数据,这将创建一个包含 40,000 条记录的测试负载,其中包含随机值,以便使用大约 6 年的数据进行真实负载测试,例如:
RecID MoveDate MoveSignID MoveProjValue
1 2016-04-17 1 6776.8625
2 2016-04-17 1 7399.8839
...
10 2016-04-17 1 7423.4752
11 2016-04-18 1 6769.2806
...
20001 2016-04-17 2 237.0888
20002 2016-04-17 2 7865.3622
...
20011 2016-04-18 2 1348.9914
20012 2016-04-18 2 5248.4128
处理
现在,假设我们想要计算这些数据的运行总计。 最明显的查询将是
SELECT a.moveDate, a.moveprojValue, CASE WHEN a.moveSignID=1 _
THEN SUM(b.moveprojValue) ELSE SUM(-b.moveprojValue) END AS RunningTotal
FROM moves a CROSS JOIN moves b
WHERE (b.moveDate <= a.moveDate)
GROUP BY a.moveDate,a.moveprojValue,a.moveSignID
ORDER BY a.moveDate
该查询在我的 W2012R2 虚拟机上使用 SQL Server 2014 Express 完成大约需要 29 秒,统计信息如下:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(40000 row(s) affected)
Table 'Worktable'. Scan count 1, logical reads 804070,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table 'Moves'. Scan count 2, logical reads 540,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 28422 ms, elapsed time = 29155 ms.
当然,还有更优化的解决方案,例如以下:
SELECT T1.*,
(SELECT SUM(CASE WHEN MoveSignID=1 THEN MoveProjValue ELSE -MoveProjValue END )
FROM Moves T2
WHERE T2.MoveDate <= T1.MoveDate) AS RunningTotal
FROM Moves T1
ORDER BY MoveDate
GO
该查询在同一虚拟机上完成大约需要 12-13 秒,统计信息如下:
(40000 row(s) affected)
Table 'Worktable'. Scan count 44000, logical reads 449343,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table 'Moves'. Scan count 2, logical reads 82890,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 8890 ms, elapsed time = 13407 ms.
不太令人满意,仍然需要相当长的时间才能生成看似微不足道的结果。
也许我们可以尝试游标解决方案?
例如
CREATE TABLE #Moves (MoveDate datetime, Amount decimal(18,4), RunningTotal decimal(18,4))
DECLARE @MoveDate datetime, @SignID int, @Amount decimal(18,4), @RunningTotal decimal(18,4)
SET @RunningTotal = 0
DECLARE rt_cursor CURSOR FOR SELECT MoveDate, MoveSignID, _
MoveProjValue FROM Moves ORDER BY MoveDate,MoveSignID
OPEN rt_cursor
FETCH NEXT FROM rt_cursor INTO @MoveDate, @SignID, @Amount
WHILE @@FETCH_STATUS = 0
BEGIN
IF @SignID=1
SET @RunningTotal = @RunningTotal + @Amount
ELSE SET @RunningTotal = @RunningTotal - @Amount
INSERT #Moves VALUES (@MoveDate, @Amount,@RunningTotal)
FETCH NEXT FROM rt_cursor INTO @MoveDate, @SignID, @Amount
END
CLOSE rt_cursor
DEALLOCATE rt_cursor
SELECT * FROM #Moves ORDER BY MoveDate
DROP TABLE #Moves
这会在 9 秒内生成结果,统计信息不明确,因为我们每天都在执行一个 insert
。 然而,它仍然不是最优的,并且有一个很大的缺点,即它不能合并到视图中,因为游标仅允许在存储过程/函数中使用。
如果我们可以访问 SQL 2012 或更高版本,我们可以尝试使用 OVER
子句。
SELECT T1.*,
SUM(CASE WHEN T1.MoveSignID=1 THEN T1.MoveProjValue ELSE -T1.MoveProjValue END )
OVER(ORDER BY moveDate) AS RunningTotal
FROM Moves T1
order by T1.MoveDate
OVER
运算符是在 SQL 中定义的,用于定义一个窗口或用户指定的行集,以便在其上应用聚合函数,例如 sum
、avg
等。 上面的查询仅需 0.6 秒即可完成,比下一个最佳解决方案(游标查询)快近 15 倍,比上述明显的子查询解决方案快 20 倍。 另请参阅相关统计信息。
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(40000 row(s) affected)
Table 'Worktable'. Scan count 2001, logical reads 174260,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table 'Moves'. Scan count 1, logical reads 82611,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 375 ms, elapsed time = 588 ms.
此外,它使用了更优雅的语法,非常清晰简洁。 主要缺点是,所使用的语法仅在 SQL Server 2012 或更高版本中可用,例如,即使 SQL 2008 R2 也会产生语法错误。 因此,您要么需要升级您的 SQL Server,要么使用效率较低的解决方案。
最终结果
总结在表格中
Solution SQL compatibility CPU time (ms) Elapsed time (ms)
CROSS JOIN All 28422 29155
SUBQUERY All 8890 13407
CURSOR All N/A ~9000
OVER SQL 2012 + 375 588
欢迎提出任何意见,希望对大家有所帮助!
Using the Code
只需创建一个空白数据库,运行 Moves.SQL,然后创建示例 moves.SQL。
之后,您可以使用提供的任何示例查询,尤其是 progressive sum partition order by - 仅适用于 SQL2012+.SQL。
关注点
SQL Server 和 TSQL 不断改进。 然而,它仍然缺乏一个特别重要的功能,即横向扩展…
Copyright [2016] [by E. Gimissis]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://apache.ac.cn/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
历史
- 2016 年 4 月 17 日:初始版本