使用表达式树实现泛型 Vector<T> 类





5.00/5 (4投票s)
本文展示了实现泛型运算符的另一种方法。本文重点关注 Vector。
Content
引言
在类似 Vector 的类实现中使用泛型是一个永无止境的故事。本文展示了另一种实现方法。
泛型 Vector 实现的主要问题在于,无法直接实现类似这样的功能:
class Vector<T>
{
public static Vector<T> operator +(Vector<T> a, Vector<T> b)
{
int cnt = a.Lenght;
Vector<T> result = new Vector(new T[cnt]);
for(int i = 0; i < cnt; i++)
result[i] = a[i] + b[i];
return result;
}
}
编译器将在以下行抛出错误:
result[i] = a[i] + b[i];
因此,如果您仍然想实现它,就必须解决这个问题,并以某种方式避免上述代码行。
一种可以帮助解决这个问题的方法是使用 lambda 表达式来实现,前提是运算符与 lambda 表达式非常相似。让我们考虑以下代码行:
Func<double, double, double> add = (a, b) => (a + b);
它的行为与我们对 double 类型上的 + 运算符的期望完全一致。但是,如果输入泛型版本,编译器将抛出错误,即使表达式 Func<T, T, T> add;
对编译器来说是可以的。
Func<T, T, T> add = (a, b) => (a + b); //give compiler error
那么,问题是如何用正确的泛型形式填充变量 Func<T, T, T> add;
?
表达式树
表达式树代表了一组类,这些类可用于构建可以编译为委托的结构。可以在此处找到描述过程、结构、类等的文章。
- Abul Kayes, https://codeproject.org.cn/Articles/235860/Expression-Tree-Basics
- Sacha Barber, https://codeproject.org.cn/Articles/30604/A-Journey-into-Expressions
- Alexandra Rusina, http://blogs.msdn.com/b/csharpfaq/archive/2009/11/19/debugging-expression-trees-in-visual-studio-2010.aspx
- Pranay Rana, https://codeproject.org.cn/Tips/438804/Expression-Tree
直接切入本文目标,让我们展示一个非常简单的表达式树用法。
ParameterExpression ap = Expression.Parameter(typeof(double), "a");
ParameterExpression bp = Expression.Parameter(typeof(double), "b");
Expression operationResult = Expression.Add(ap, bp);
Expression<Func<double, double, double>> lambda = Expression.Lambda<Func<double, double, double>>(operationResult, ap, bp);
Func<double, double, double> add = lambda.Compile();
这段代码将创建一个 Func<double, double, double> add
委托,其功能与以下代码相同:
Func<double, double, double> add = (a, b) => (a + b);
深入实现
这样,非泛型版本就实现了。泛型版本呢?如果在实现中替换 double
关键字,它仍然可以工作吗?(为了定义完整的方法,添加了头部和尾部)
Func<T, T, T> CreateAdd()
{
ParameterExpression ap = Expression.Parameter(typeof(T), "a");
ParameterExpression bp = Expression.Parameter(typeof(T), "b");
Expression operationResult = Expression.Add(ap, bp);
Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);
Func<T, T, T> add = lambda.Compile();
return add;
}
这个实现与以下方法几乎相同:
Func<T, T, T> add = (a, b) => (a + b);
区别在于 CreateAdd
方法对编译器来说是 OK 的,并且 lambda.Compile()
表达式将在运行时被调用,如果类型 T 没有定义 + 运算符,则可能抛出错误。第二个实现是在编译时编译的,并且会测试(并失败)+ 运算符的泛型可用性。
CreateAdd
方法可以推广到所有运算符,因此只需要一个实现即可处理任何类型的二元运算符。
public static class FuncGenerator<T>
{
/// <summary>
/// Convert BinaryExpression into Func which acts as operator on types T and T. T = T op T, where op is provided by f param.
/// </summary>
/// <param name="f">Func which provides BinaryExpression.</param>
/// <returns>Func<T, T, T> </returns>
public static Func<T, T, T> ExpressionToFunc(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
{
ParameterExpression ap = Expression.Parameter(typeof(T), "a");
ParameterExpression bp = Expression.Parameter(typeof(T), "b");
Expression operationResult = f(ap, bp);
Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);
return lambda.Compile();
}
}
现在,可以使用 FuncGenerator<T>.ExpressionToFunc()
方法创建适当的 Func 委托。但用户仍然必须熟悉表达式树。
Func<double, double, double> add = FuncGenerator<double>.ExpressionToFunc((a, b) => Expression.Add(a, b));
Func<double, double, double> sub = FuncGenerator<double>.ExpressionToFunc((a, b) => Expression.Subtract(a, b));
现在我们有了二元运算符的泛型实现,剩下的就比较简单了。还是不是?
泛型 T[] 二元运算符
下面实现了数组二元运算的泛型运算符实现(参见方法 ExpressionToFuncArray
)。此实现使用 for 循环并评估 FuncGenerator<T>.ExpressionToFunc()
返回的 Func 方法。
public static class FuncGenerator<T>
{
/// <summary>
/// Convert BinaryExpression into Func which acts as operator on types T and T. T = T op T, where op is provided by f param.
/// </summary>
/// <param name="f">Func which provides BinaryExpression.</param>
/// <returns>Func<T, T, T> </returns>
public static Func<T, T, T> ExpressionToFunc(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
{
ParameterExpression ap = Expression.Parameter(typeof(T), "a");
ParameterExpression bp = Expression.Parameter(typeof(T), "b");
Expression operationResult = f(ap, bp);
Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);
return lambda.Compile();
}
public static Func<T[], T[], T[]> ExpressionToFuncArray(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
{
Func<T, T, T> op = ExpressionToFunc(f);
return (a, b) =>
{
int len = a.Length;
T[] result = new T[len];
for (int i = 0; i < len; i++)
result[i] = op(ap[i], bp[i]);
return result;
};
}
}
现在关注行:
result[i] = op(ap[i], bp[i]);
op
委托每次数组操作调用 n 次,这非常耗时。因此,我们有了泛型实现,但正如可以测试的那样,这是一个非常慢的实现。回到表达式树,希望它能给我们一些解决方案。
public static Func<double[], double[], double[]> ArrayExpressionToFunc()
{
return (a, b) =>
{
int len = a.Length;
double[] result = new double[len];
for (int i = 0; i < len; i++)
result[i] = ap[i] + bp[i];
return result;
};
}
上面展示了 double
类型的非泛型实现。此实现比之前介绍的泛型实现更快。现在,让我们尝试构建一个行为与此非泛型实现完全相同的表达式树。
/// <summary>
/// Convert BinaryExpression into Func which acts as operator on types double[] and double[]. double[] = double[] op double[], where op is provided by f param.
/// </summary>
/// <param name="f"></param>
/// <returns></returns>
private static Func<double[], double[], double[]> ArrayExpressionToFunc(Func<IndexExpression, IndexExpression, BinaryExpression> f)
{
//a, b are input parametres for returned Func<double[], double[], double[]> delegate
//c is output (result)
//i is loop variable
//
// //implementation looks like:
// for(int i = a.Length; i > -1; i--)
// c[i] = a[i] op b[i];
// return c;
//
ParameterExpression apA = Expression.Parameter(typeof(double[]), "a");
ParameterExpression bpA = Expression.Parameter(typeof(double[]), "b");
ParameterExpression operationResult = Expression.Parameter(typeof(double[]), "c");
ParameterExpression iA = Expression.Parameter(typeof(int), "i");
LabelTarget labelReturn = Expression.Label(typeof(double[]));
//this block represent block inside loop
Expression innerBlock = Expression.Block(
Expression.SubtractAssign(iA, Expression.Constant(1)),
Expression.IfThen(Expression.Equal(iA, Expression.Constant(-1)),
Expression.Return(labelReturn, operationResult)),
Expression.Assign(Expression.ArrayAccess(operationResult, iA), f(Expression.ArrayAccess(apA, iA), Expression.ArrayAccess(bpA, iA)))
);
//expression for easy implementation of new double[i] constructor
Expression<Func<int, double[]>> newTA = (i) => new double[i];
//main body of Func. Variable initialization and loop execution
Expression addeA = Expression.Block(
new[] { iA, operationResult },
Expression.Assign(iA, Expression.ArrayLength(apA)),
Expression.Assign(operationResult, Expression.Invoke(newTA, iA)),
Expression.Loop(innerBlock, labelReturn)
);
//Compilation to get result.
Expression<Func<double[], double[], double[]>> lambdaA = Expression.Lambda<Func<double[], double[], double[]>>(addeA, apA, bpA);
return lambdaA.Compile();
}
此实现返回一个委托,该委托可用于将二元运算符应用于 double 数组。可能的用法是:
Func<double[], double[], double[]> addArray = FuncGenerator<double>.ArrayExpressionToFunc((a, b) => Expression.Add(a, b));
Func<double[], double[], double[]> subArray = FuncGenerator<double>.ArrayExpressionToFunc((a, b) => Expression.Subtract(a, b));
现在,泛型实现还需要最后一步。只需将适当位置的 double
关键字替换为 T
关键字。
/// <summary>
/// Convert BinaryExpression into Func which acts as operator on types T[] and T[]. T[] = T[] op T[], where op is provided by f param.
/// </summary>
/// <param name="f"></param>
/// <returns></returns>
private static Func<T[], T[], T[]> ArrayExpressionToFunc(Func<IndexExpression, IndexExpression, BinaryExpression> f)
{
//a, b are input parametres for returned Func<T[], T[], T[]> delegate
//c is output (result)
//i is loop variable
//
// //implementation looks like:
// for(int i = a.Length; i > -1; i--)
// c[i] = a[i] op b[i];
// return c;
//
ParameterExpression apA = Expression.Parameter(typeof(T[]), "a");
ParameterExpression bpA = Expression.Parameter(typeof(T[]), "b");
ParameterExpression operationResult = Expression.Parameter(typeof(T[]), "c");
ParameterExpression iA = Expression.Parameter(typeof(int), "i");
LabelTarget labelReturn = Expression.Label(typeof(T[]));
//this block represent block inside loop
Expression innerBlock = Expression.Block(
Expression.SubtractAssign(iA, Expression.Constant(1)),
Expression.IfThen(Expression.Equal(iA, Expression.Constant(-1)),
Expression.Return(labelReturn, operationResult)),
Expression.Assign(Expression.ArrayAccess(operationResult, iA), f(Expression.ArrayAccess(apA, iA), Expression.ArrayAccess(bpA, iA)))
);
//expression for easy implementation of new T[i] constructor
Expression<Func<int, T[]>> newTA = (i) => new T[i];
//main body of Func. Variable initialization and loop execution
Expression addeA = Expression.Block(
new[] { iA, operationResult },
Expression.Assign(iA, Expression.ArrayLength(apA)),
Expression.Assign(operationResult, Expression.Invoke(newTA, iA)),
Expression.Loop(innerBlock, labelReturn)
);
//Compilation to get result.
Expression<Func<T[], T[], T[]>> lambdaA = Expression.Lambda<Func<T[], T[], T[]>>(addeA, apA, bpA);
return lambdaA.Compile();
}
代码可用于定义数组运算符:
Func<int[], int[], int[]> addArray = FuncGenerator<int>.ArrayExpressionToFunc((a, b) => Expression.Add(a, b));
Func<int[], int[], int[]> subArray = FuncGenerator<int>.ArrayExpressionToFunc((a, b) => Expression.Subtract(a, b));
时间测试
为了测试和比较上述实现,实现了一系列测试。
- 简单实现(完全编译时规范,预计速度更快)
- for 循环中的 Func<double, double, double>(使用类型 T 的 + 运算符的委托,预计速度较慢)
- 在编译时创建的 Func<double[], double[], double[]>(使用 T[] 类型的 + 运算符的委托,预计速度很快)
- 在运行时创建的 Func<double[], double[], double[]>(使用表达式树获取 T[] 类型的 + 运算符,希望足够快)
所有测试的代码如下:
int lastTick = 0;
private void Msg(string msg)
{
int thisTick = Environment.TickCount;
int delta = thisTick - lastTick;
textBox1.AppendText(delta.ToString() + "ms\t" + msg + "\r\n");
lastTick = thisTick;
}
private void TimeTests()
{
double[] a = new double[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
double[] b = new double[] { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };
double[] r = new double[] { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
Func<double, double, double> Add = (ap, bp) => (ap + bp);
Func<double[], double[], double[]> AddA = (ap, bp) =>
{
int len = ap.Length;
double[] result = new double[len];
for (int i = 0; i < len; i++)
result[i] = ap[i] + bp[i];
return result;
};
Func<double[], double[], double[]> AddA2 = FuncGenerator<double>.CreateArrayOperatorFuncAdd();
Msg("beginLoops");
int cnt = 1000000; //1M
//Simple implementation
for (int i = 0; i < cnt; i++)
{
r = new double[a.Length];
for (int j = 0; j < a.Length; j++)
r[j] = a[j] + b[j];
}
Msg("Simple implementation");
//Func<double, double, double> in for loop
for (int i = 0; i < cnt; i++)
for (int j = 0; j < a.Length; j++)
r[j] = Add(a[j], b[j]);
Msg("Func<double, double, double> in for loop");
//Func<double[], double[], double[]> created at compile time
for (int i = 0; i < cnt; i++)
r = AddA(a, b);
Msg("Func<double[], double[], double[]> created at compile time");
//Func<double[], double[], double[]> created at runtime
for (int i = 0; i < cnt; i++)
r = AddA2(a, b);
Msg("Func<double[], double[], double[]> created at runtime");
}
此测试的结果:
0ms beginLoops
109ms Simple implementation
282ms Func<double, double, double> in for loop
109ms Func<double[], double[], double[]> without loop
109ms Func<double[], double[], double[]> created runtime
“简单实现”是最快的,并且与“运行时创建”相当。“for 循环中的 Func”最慢。这证明了目标已经实现,并且该实现非常好。最慢和最快的实现之间的差异很有趣,因为这种差异用于对 Func<double, double, double>
委托的重复调用。
请注意,在测试过程中,垃圾回收有时会收集内存,因此测试结果可能会有所不同,但引入的实现仍然与“简单实现”相当。
Vector<T> 类实现
所有部分工作都已完成,现在介绍完整实现的模板。完整实现不是目标,因此只展示了主要部分。
/// <summary>
/// Generic class for arithmetic operator overloading demonstration
/// </summary>
/// <typeparam name="T"></typeparam>
class Vector<T>
{
//static operator delegates
private static Func<T, T, T> Add = FuncGenerator<T>.CreateOperatorFuncAdd();
private static Func<T[], T[], T[]> AddT = FuncGenerator<T>.CreateArrayOperatorFuncAdd();
//http://stackoverflow.com/questions/1717444/combining-two-lamba-expressions-in-c-sharp
//the ideas for implementation was based on article
//http://blogs.msdn.com/b/csharpfaq/archive/2009/11/19/debugging-expression-trees-in-visual-studio-2010.aspx
//
private T[] values;
public Vector(int dim)
{
this.values = new T[dim];
}
public Vector(IEnumerable<T> initialValues)
{
values = initialValues.ToArray();
}
public static Vector<T> operator +(Vector<T> a, Vector<T> b)
{
return AddT(a.values, b.values);
}
/// <summary>
/// Allows easily convert an array to vector of same item type. Can be used with operator overloading.
/// </summary>
/// <param name="a">array to be converted</param>
/// <returns></returns>
public static implicit operator Vector<T>(T[] a)
{
return new Vector<T>(a);
}
/// <summary>
/// Some kind of conversion of Vector into string
/// </summary>
/// <returns></returns>
public override string ToString()
{
string result = "";
result = values.Select(t => t.ToString()).Aggregate((a, b) => a + ";" + b);
return "[" + result + "]";
}
/// <summary>
/// Some kind of parsing string into Vector. Items must be delimited by ';' char. On left side '[' char is expected, on right side ']' char is expected.
/// This is just demonstration class so I am not proud of this implementation as some strange string forms will be successfuly parsed into T[].
/// </summary>
/// <param name="value">String value which will be parsed.</param>
/// <param name="parse">Parse delegate for conversion from string into T value.</param>
/// <returns></returns>
public static T[] Parse(string value, Func<string, T> parse)
{
if (value[0] != '[' | value[value.Length - 1] != ']')
throw new FormatException(string.Format("{0} is not valid format for type {1}", value, typeof(T[]).ToString()));
string tmpStr = value.Substring(1, value.Length - 2).Trim();
string[] items = tmpStr.Split(new char[] { ';' });
var values = items.Select(s => parse(s.Trim()));
return values.ToArray();
}
}
代码使用和局限性
已经介绍了泛型 Vector<T> 的基本形式,现在让我们展示可能的用法。首先,有一系列数值类型可以很好地与此实现配合使用。
/// <summary>
/// Initialization values for numeric types
/// </summary>
string aValue = "[0; 1; 2; 3; 4; 5; 6; 7; 8; 9]";
string bValue = "[1; 1; 1; 1; 1; 1; 1; 1; 1; 1]";
/// <summary>
/// Generic type test
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="parser"></param>
private void TypeTest<T>(Func<string, T> parser)
{
Msg("Test for " + typeof(T).ToString());
Vector<T> a = Vector<T>.Parse(aValue, parser);
Vector<T> b = Vector<T>.Parse(bValue, parser);
Vector<T> result = a + b;
Msg(string.Format("c = a + b; => {0} = {1} + {2}", result.ToString(), a.ToString(), b.ToString()));
}
/// <summary>
/// Test of native numeric types
/// </summary>
private void ImplementationTest()
{
TypeTest<int>(s => int.Parse(s));
TypeTest<uint>(s => uint.Parse(s));
TypeTest<short>(s => short.Parse(s));
TypeTest<ushort>(s => ushort.Parse(s));
TypeTest<long>(s => long.Parse(s));
TypeTest<ulong>(s => ulong.Parse(s));
TypeTest<double>(s => double.Parse(s));
TypeTest<float>(s => float.Parse(s));
TypeTest<decimal>(s => decimal.Parse(s));
}
此测试的结果:
0ms Test for System.Int32 0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1] 15ms Test for System.UInt32 0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1] 0ms Test for System.Int16 0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1] 16ms Test for System.UInt16 0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1] 0ms Test for System.Int64 16ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1] 0ms Test for System.UInt64 0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1] 15ms Test for System.Double 0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1] 16ms Test for System.Single 0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1] 16ms Test for System.Decimal 0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
不幸的是,有些类型无法通过这种方式使用,即使有人认为它们应该可以工作。所 presented 的实现不适用于 byte
和 sbyte
类型。
用户定义类型的测试是最后需要对该实现进行的测试。为此,准备了 UserNumericType
(并重载了 + 运算符),并设定了 Vector<Vector<double>>
结构的假设。
/// <summary>
/// Test of types (with overloaded operator) defined by user
/// </summary>
private void UserTypeTest()
{
Vector<UserNumericType> a = new UserNumericType[] { new UserNumericType(0, 1), new UserNumericType(10, 10) };
Vector<UserNumericType> b = new UserNumericType[] { new UserNumericType(0, 1), new UserNumericType(20, 20) };
Vector<UserNumericType> result = a + b;
Msg("Test for UserNumericType");
Msg(string.Format("c = a + b; => {0} = {1} + {2}", result.ToString(), a.ToString(), b.ToString()));
Vector<Vector<double>> avvd = new Vector<double>[] { new double[] { 0, 1, 2 }, new double[] { 10, 10, 10 } };
Vector<Vector<double>> bvvd = new Vector<double>[] { new double[] { 1, 1, 1 }, new double[] { 21, 21, 21 } };
Vector<Vector<double>> resultvvd = avvd + bvvd;
Msg("Test for Vector<Vector<double>> type");
Msg(string.Format("c = a + b; => {0} = {1} + {2}", resultvvd.ToString(), avvd.ToString(), bvvd.ToString()));
}
此测试的结果:
0ms Test for UserNumericType 16ms c = a + b; => [(0; 2);(30; 30)] = [(0; 1);(10; 10)] + [(0; 1);(20; 20)] 15ms Test for Vector<Vector<double>> type 0ms c = a + b; => [[1;2;3];[31;31;31]] = [[0;1;2];[10;10;10]] + [[1;1;1];[21;21;21]]
未来发展
文章中多次提到,目标不是实现一个完美的 Vector<T> 类,所以没有做到。只是建立了一些为未来发展奠定坚实基础的原则。如果有人愿意,可以在互联网上找到许多某种形式的 vector 的实现,并采用友好的许可证,然后根据本文 presented 的原则对其进行改造。
历史
2014-07-21 经过几天的调查后完成首次实现。
参考文献
- Abul Kayes, https://codeproject.org.cn/Articles/235860/Expression-Tree-Basics
- Sacha Barber, https://codeproject.org.cn/Articles/30604/A-Journey-into-Expressions
- Alexandra Rusina, http://blogs.msdn.com/b/csharpfaq/archive/2009/11/19/debugging-expression-trees-in-visual-studio-2010.aspx
- Pranay Rana, https://codeproject.org.cn/Tips/438804/Expression-Tree