Oracle 中的自定义聚合
如何使用 PL/SQL 在 Oracle 中创建自定义聚合函数
引言
尽管 Oracle 提供了各种各样的聚合函数,但并非所有功能都已实现。例如,列表中缺少基本的乘积函数。Oracle Data Cartridge 有助于扩展数据库的功能,并允许使用 PL/SQL 创建自定义聚合函数。本文展示了几个简单的自定义聚合函数示例。
第一个聚合函数:乘积
聚合函数模式 (可选)
在创建聚合函数之前,我选择创建一个新的模式。这不是强制性的,但我认为将类型和函数放在一个单独的模式中可以简化管理。
--------------------------------------------
-- Schema for aggregates
--------------------------------------------
CREATE USER CustomAggregates
IDENTIFIED BY Calculations
DEFAULT TABLESPACE Users
TEMPORARY TABLESPACE TEMP;
GRANT CONNECT TO CustomAggregates;
GRANT RESOURCE TO CustomAggregates;
运行用户创建时,请根据您的环境更改默认和临时表空间。
PL/SQL 实现
创建聚合函数需要创建一个包含头和体的类型。头包含在上下文中存储中间值所需的变量以及 Oracle 在运行时会调用的函数。强制性函数是
ODCIAggregateInitialize
:调用此函数为一组值创建一个新上下文。此 `static` 方法的目的是返回类型的全新、已初始化的实例。ODCIAggregateIterate
:每个值都会传递给此方法,该方法负责进行必要的计算等。ODCIAggregateMerge
:当两个上下文合并为一个时,会使用此方法。如果原始值集是通过多个不同上下文迭代的,则会发生合并。ODCIAggregateTerminate
:在处理完所有值并合并上下文后,此函数将返回结果。
合并最好通过一个小型(简化)图来说明
查询处理器可能会将一组值划分为更小的子集。完成聚合后,每个组都有自己的类型实例来处理子集。每个实例首先被初始化,然后对集合中的每个值进行迭代。迭代完成后,上下文将合并到另一个上下文。最后,当所有上下文都合并后,聚合将被终止,结果将返回给消费者。因此,聚合函数必须设计为支持在迭代时对部分集进行操作。
用于计算乘积的类型定义如下所示
--------------------------------------------
-- Product
--------------------------------------------
-- Type definition
CREATE OR REPLACE TYPE CustomAggregates.ProductType AS OBJECT (
mProduct NUMBER, -- product of elements
mElements NUMBER, -- number of elements
STATIC FUNCTION ODCIAggregateInitialize
(sctx IN OUT CustomAggregates.ProductType)
RETURN NUMBER,
MEMBER FUNCTION ODCIAggregateIterate
(self IN OUT CustomAggregates.ProductType,
value IN NUMBER)
RETURN NUMBER,
MEMBER FUNCTION ODCIAggregateMerge
(self IN OUT CustomAggregates.ProductType,
ctx2 IN CustomAggregates.ProductType)
RETURN NUMBER,
MEMBER FUNCTION ODCIAggregateTerminate
(self IN CustomAggregates.ProductType,
returnValue OUT NUMBER,
flags IN NUMBER)
RETURN NUMBER
);
头定义了两个用于乘积计算的变量以及所有强制性函数。体实现了逻辑
-- Type implementation
CREATE OR REPLACE TYPE BODY CustomAggregates.ProductType IS
STATIC FUNCTION ODCIAggregateInitialize
( sctx IN OUT CustomAggregates.ProductType)
RETURN NUMBER IS
BEGIN
sctx := ProductType( TO_NUMBER(NULL), 0);
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateIterate
( self IN OUT CustomAggregates.ProductType,
value IN NUMBER)
RETURN NUMBER IS
BEGIN
IF (self.mElements = 0) THEN
self.mProduct := value;
ELSE
self.mProduct := self.mProduct * value;
END IF;
self.mElements := self.mElements + 1;
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateMerge
( self IN OUT CustomAggregates.ProductType,
ctx2 IN CustomAggregates.ProductType)
RETURN NUMBER IS
BEGIN
IF (self.mElements = 0) THEN
-- no elements in this set, product is the product in the set to be merged
self.mProduct := ctx2.mProduct;
ELSIF (ctx2.mElements = 0) THEN
-- no elements in ctx2 so let the product be as it is
NULL;
ELSE
-- multiply products
self.mProduct := self.mProduct * ctx2.mProduct;
END IF;
self.mElements := self.mElements + ctx2.mElements;
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateTerminate
( self IN CustomAggregates.ProductType,
returnValue OUT NUMBER,
flags IN NUMBER) RETURN NUMBER IS
BEGIN
returnValue := self.mProduct;
return ODCIConst.Success;
END;
END;
在 `ODCIAggregateInitialize` 中,会创建一个 `ProductType` 的新实例。`mProduct` 变量初始化为 `null`,`mElements` 初始化为 `0`。
所有方法都返回 `ODCIConst.Success` 或 `ODCIConst.Error`,具体取决于操作是否成功。`ODCIAggregateIterate` 方法将每个新值与现有乘积相乘,并将元素数量加 `1`。`ODCIAggregateMerge` 会将两个上下文的乘积相乘(如果它们有值),否则它会选择有值的上下文的乘积。而 `ODCIAggregateTerminate` 函数只是设置返回值。
最后一步是使用 `CREATE FUNCTION` 语句创建聚合函数本身
-- Aggregate function
CREATE OR REPLACE FUNCTION CustomAggregates.Product (value NUMBER) RETURN NUMBER
PARALLEL_ENABLE AGGREGATE USING CustomAggregates.ProductType;
该函数定义为 `PARALLEL_ENABLE`,以便优化器知道计算可以并行进行。`AGGREGATE USING` 子句定义了实现该聚合的类型。
让我们来测试一下
-- Test run 1
SELECT CustomAggregates.Product(a.Value) AS Result
FROM ( SELECT 4 AS Value FROM DUAL UNION ALL
SELECT 2 AS Value FROM DUAL UNION ALL
SELECT 5 AS Value FROM DUAL) a;
结果是:
RESULT
----------
40
嗯,这有点在意料之中。如果值中存在 `null` 会怎样?
-- Test run 2, null included
SELECT CustomAggregates.Product(a.Value) AS Result
FROM ( SELECT 4 AS Value FROM DUAL UNION ALL
SELECT 2 AS Value FROM DUAL UNION ALL
SELECT NULL AS Value FROM DUAL UNION ALL
SELECT 5 AS Value FROM DUAL) a;
结果相同
RESULT
----------
40
正如 ANSI 标准所定义的,`SUM`、`MIN` 等聚合函数会忽略 `null`,它们根本不会被传递到 `ODCIAggregateIterate`。这就是为什么函数中没有对 `null` 进行不同的处理。请注意,Oracle 11g 中的行为不同,在 Oracle 11g 中,`null` 实际上会被传递给自定义聚合函数。
让我们对一个空集进行最终测试
-- Test run 3, empty set
SELECT CustomAggregates.Product(a.Value) AS Result
FROM ( SELECT 1 AS Value FROM DUAL WHERE 1=0) a;
结果是 `null`
RESULT
----------
更多聚合函数:几何平均值和调和平均值
这些聚合函数与乘积非常相似。几何平均值定义为
现在,由于这种计算无法在迭代(在这种形式下)中完成,我们在迭代和合并步骤中计算乘积,最终结果在终止函数中计算。因此,区别在于 `ODCIAggregateTerminate` 函数
MEMBER FUNCTION ODCIAggregateTerminate
( self IN CustomAggregates.GeometricMeanType,
returnValue OUT NUMBER,
flags IN NUMBER)
RETURN NUMBER IS
BEGIN
IF ( self.mElements = 0) THEN
returnValue := NULL;
ELSE
returnValue := POWER(self.mProduct, 1 / self.mElements);
END IF;
return ODCIConst.Success;
END;
调和平均值有点不同

因此,现在我们不是计算乘积,而是对 (1/value) 求和
self.mSum := self.mSum + (1 / value);
显然,合并只是对两个上下文求和,而终止函数返回
returnValue := self.mElements / self.mSum;
从脚本中运行两个平均值的类型和函数,然后让我们测试这两个
-- Test run 1
SELECT CustomAggregates.GeometricMean(a.Value) AS Result
FROM ( SELECT 34 AS Value FROM DUAL UNION ALL
SELECT 27 AS Value FROM DUAL UNION ALL
SELECT 45 AS Value FROM DUAL UNION ALL
SELECT 55 AS Value FROM DUAL UNION ALL
SELECT 22 AS Value FROM DUAL UNION ALL
SELECT 34 AS Value FROM DUAL) a;
结果
RESULT
----------
34,54511
然后是调和平均值
-- Test run 1
SELECT CustomAggregates.HarmonicMean(a.Value) AS Result
FROM ( SELECT 34 AS Value FROM DUAL UNION ALL
SELECT 27 AS Value FROM DUAL UNION ALL
SELECT 45 AS Value FROM DUAL UNION ALL
SELECT 55 AS Value FROM DUAL UNION ALL
SELECT 22 AS Value FROM DUAL UNION ALL
SELECT 34 AS Value FROM DUAL) a;
得到
RESULT
----------
33,0179837
分析函数
自定义聚合函数可以与分析子句一起使用。例如,如果我们按两个不同的类别划分先前的数据,并且我们想要每个类别的几何平均值,查询将如下所示
-- Test run 2, analytic functions
SELECT DISTINCT
CustomAggregates.GeometricMean(a.Value)
OVER (PARTITION BY a.Cat) AS Result
FROM ( SELECT 1 AS Cat, 34 AS Value FROM DUAL UNION ALL
SELECT 1 AS Cat, 27 AS Value FROM DUAL UNION ALL
SELECT 1 AS Cat, 45 AS Value FROM DUAL UNION ALL
SELECT 2 AS Cat, 55 AS Value FROM DUAL UNION ALL
SELECT 2 AS Cat, 22 AS Value FROM DUAL UNION ALL
SELECT 2 AS Cat, 34 AS Value FROM DUAL) a;
运行此查询后,结果是
RESULT
----------
34,5213758
34,5688606
使用其他数据类型作为参数:单词聚合函数
数字并非创建聚合函数的唯一数据类型。最后一个示例是一个聚合函数,它列出了 `varchar2` 字段中的所有不同单词。
实现如下
-- Type implementation for Words
CREATE OR REPLACE TYPE BODY CustomAggregates.WordsType IS
STATIC FUNCTION ODCIAggregateInitialize
( sctx IN OUT CustomAggregates.WordsType)
RETURN NUMBER IS
BEGIN
sctx := WordsType( ';', ';');
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateIterate
( self IN OUT CustomAggregates.WordsType,
value IN VARCHAR2) RETURN NUMBER IS
nStart NUMBER := 1;
nPosition NUMBER := 1;
sPortion VARCHAR2(32767);
BEGIN
-- loop the string and search for delimiters
WHILE nPosition <= LENGTH(value) LOOP
IF SUBSTR(value, nPosition, 1)
IN (' ', '.', ';', '/', ':', ',', '!', '?', '(', ')') THEN
sPortion := SUBSTR(value, nStart, nPosition - nStart + 1);
sPortion := LTRIM(
RTRIM( sPortion, ' ,.-;:_?=)(/&%¤#"!'), ' ,.-;:_?=)(/&%¤#"!');
IF LENGTH(sPortion) > 0 THEN
IF INSTR(mWords, self.mListDelimiter
|| sPortion || self.mListDelimiter)= 0 THEN
mWords := mWords || sPortion || self.mListDelimiter;
END IF;
END IF;
nStart := nPosition + 1;
END IF;
nPosition := nPosition + 1;
END LOOP;
sPortion := SUBSTR(value, nStart, nPosition - nStart + 1);
sPortion := LTRIM( RTRIM( sPortion, ' ,.-;:_?=)(/&%¤#"!'), ' ,.-;:_?=)(/&%¤#"!');
IF LENGTH(sPortion) > 0 THEN
IF INSTR(mWords, self.mListDelimiter || sPortion || _
self.mListDelimiter) = 0 THEN
mWords := mWords || sPortion || self.mListDelimiter;
END IF;
END IF;
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateMerge
( self IN OUT CustomAggregates.WordsType,
ctx2 IN CustomAggregates.WordsType)
RETURN NUMBER IS
BEGIN
RETURN self.ODCIAggregateIterate(ctx2.mWords);
END;
MEMBER FUNCTION ODCIAggregateTerminate
( self IN CustomAggregates.WordsType,
returnValue OUT VARCHAR2,
flags IN NUMBER)
RETURN NUMBER IS
BEGIN
returnValue := RTRIM(LTRIM_
(self.mWords, self.mListDelimiter), self.mListDelimiter);
return ODCIConst.Success;
END;
END;
基本上,`iterate` 函数逐个 `char` 遍历并搜索分隔符。如果找到一个单词,它会与现有的单词列表进行检查,如果不存在,则会添加。此聚合函数的**结果**是一个以分号 (;) 分隔的单词列表。
为了测试这一点,让我们看几个简单的字符 `string`
-- Test run 1, case sensitive
SELECT CustomAggregates.Words(a.Value) AS Result
FROM ( SELECT 'This is the first string' AS Value FROM DUAL UNION ALL
SELECT 'And this is the second string' AS Value FROM DUAL) a;
结果是:
RESULT
--------------------------------------------------------------------------------
This;is;the;first;string;And;this;second
因此,结果是区分大小写的,因为单词 `This` 列出了两次。如果我们想不区分大小写地获取列表,请执行
-- Test run 2, case insensitive
SELECT CustomAggregates.Words(LOWER(a.Value)) AS Result
FROM ( SELECT 'This is the first string' AS Value FROM DUAL UNION ALL
SELECT 'And this is the second string' AS Value FROM DUAL) a;
结果是:
RESULT
--------------------------------------------------------------------------------
this;is;the;first;string;and;second
最后,输入稍微复杂一些
-- Test run 3, Sentences
SELECT CustomAggregates.Words(LOWER(a.Value)) AS Result
FROM ( SELECT 'This is the first sentence. And the second: Is this' AS Value
FROM DUAL UNION ALL
SELECT '"quote" from somewhere; And the second sentence again!' AS Value
FROM DUAL UNION ALL
SELECT 'Cursing #!#%# not allowed :)' AS Value
FROM DUAL) a;
现在结果是
RESULT
--------------------------------------------------------------------------------
this;is;the;first;sentence;and;second;quote;from;somewhere;again;cursing;not;allowed
结束语
就这些了,希望您有所收获。如果您有额外的时间,请发表评论和投票,我将不胜感激。谢谢。
历史
- 2011年3月19日:初始发布