65.9K
CodeProject 正在变化。 阅读更多。
Home

使用表达式树实现泛型 Vector<T> 类

starIconstarIconstarIconstarIconstarIcon

5.00/5 (4投票s)

2014年7月21日

CPOL

5分钟阅读

viewsIcon

19791

downloadIcon

369

本文展示了实现泛型运算符的另一种方法。本文重点关注 Vector。

Content

  1. 引言
  2. 表达式树
  3. 深入实现
  4. 泛型 T[] 二元运算符
  5. 时间测试
  6. Vector<T> 类实现
  7. 代码使用和局限性
  8. 未来发展
  9. 历史
  10. 参考文献

引言

在类似 Vector 的类实现中使用泛型是一个永无止境的故事。本文展示了另一种实现方法。

泛型 Vector 实现的主要问题在于,无法直接实现类似这样的功能:

class Vector<T>
{

    public static Vector<T> operator +(Vector<T> a, Vector<T> b)
    {
        int cnt = a.Lenght;
        Vector<T> result = new Vector(new T[cnt]);
        for(int i = 0; i < cnt; i++)
            result[i] = a[i] + b[i];   
        return result;
    }
}

编译器将在以下行抛出错误:

            result[i] = a[i] + b[i];   

因此,如果您仍然想实现它,就必须解决这个问题,并以某种方式避免上述代码行。

一种可以帮助解决这个问题的方法是使用 lambda 表达式来实现,前提是运算符与 lambda 表达式非常相似。让我们考虑以下代码行:

Func<double, double, double> add = (a, b) => (a + b);

它的行为与我们对 double 类型上的 + 运算符的期望完全一致。但是,如果输入泛型版本,编译器将抛出错误,即使表达式 Func<T, T, T> add; 对编译器来说是可以的。

Func<T, T, T> add = (a, b) => (a + b); //give compiler error

那么,问题是如何用正确的泛型形式填充变量 Func<T, T, T> add;

表达式树

表达式树代表了一组类,这些类可用于构建可以编译为委托的结构。可以在此处找到描述过程、结构、类等的文章。

直接切入本文目标,让我们展示一个非常简单的表达式树用法。

ParameterExpression ap = Expression.Parameter(typeof(double), "a");
ParameterExpression bp = Expression.Parameter(typeof(double), "b");

Expression operationResult = Expression.Add(ap, bp);

Expression<Func<double, double, double>> lambda = Expression.Lambda<Func<double, double, double>>(operationResult, ap, bp);
Func<double, double, double> add = lambda.Compile();

这段代码将创建一个 Func<double, double, double> add 委托,其功能与以下代码相同:

Func<double, double, double> add = (a, b) => (a + b);

深入实现

这样,非泛型版本就实现了。泛型版本呢?如果在实现中替换 double 关键字,它仍然可以工作吗?(为了定义完整的方法,添加了头部和尾部)

Func<T, T, T> CreateAdd()
{
    ParameterExpression ap = Expression.Parameter(typeof(T), "a");
    ParameterExpression bp = Expression.Parameter(typeof(T), "b");

    Expression operationResult = Expression.Add(ap, bp);

    Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);

    Func<T, T, T> add = lambda.Compile();
    return add;
}

这个实现与以下方法几乎相同:

Func<T, T, T> add = (a, b) => (a + b);

区别在于 CreateAdd 方法对编译器来说是 OK 的,并且 lambda.Compile() 表达式将在运行时被调用,如果类型 T 没有定义 + 运算符,则可能抛出错误。第二个实现是在编译时编译的,并且会测试(并失败)+ 运算符的泛型可用性。

CreateAdd 方法可以推广到所有运算符,因此只需要一个实现即可处理任何类型的二元运算符。

public static class FuncGenerator<T>
{
    /// <summary>
    /// Convert BinaryExpression into Func which acts as operator on types T and T. T = T op T, where op is provided by f param.
    /// </summary>
    /// <param name="f">Func which provides BinaryExpression.</param>
    /// <returns>Func&LT;T, T, T&GT; </returns>
    public static Func<T, T, T> ExpressionToFunc(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
    {
        ParameterExpression ap = Expression.Parameter(typeof(T), "a");
        ParameterExpression bp = Expression.Parameter(typeof(T), "b");
        Expression operationResult = f(ap, bp);

        Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);
        return lambda.Compile();
    }

}

现在,可以使用 FuncGenerator<T>.ExpressionToFunc() 方法创建适当的 Func 委托。但用户仍然必须熟悉表达式树。

Func<double, double, double> add = FuncGenerator<double>.ExpressionToFunc((a, b) => Expression.Add(a, b));
Func<double, double, double> sub = FuncGenerator<double>.ExpressionToFunc((a, b) => Expression.Subtract(a, b));

现在我们有了二元运算符的泛型实现,剩下的就比较简单了。还是不是?

泛型 T[] 二元运算符

下面实现了数组二元运算的泛型运算符实现(参见方法 ExpressionToFuncArray)。此实现使用 for 循环并评估 FuncGenerator<T>.ExpressionToFunc() 返回的 Func 方法。

public static class FuncGenerator<T>
{
    /// <summary>
    /// Convert BinaryExpression into Func which acts as operator on types T and T. T = T op T, where op is provided by f param.
    /// </summary>
    /// <param name="f">Func which provides BinaryExpression.</param>
    /// <returns>Func&LT;T, T, T&GT; </returns>
    public static Func<T, T, T> ExpressionToFunc(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
    {
        ParameterExpression ap = Expression.Parameter(typeof(T), "a");
        ParameterExpression bp = Expression.Parameter(typeof(T), "b");
        Expression operationResult = f(ap, bp);

        Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);
        return lambda.Compile();
    }

    public static Func<T[], T[], T[]> ExpressionToFuncArray(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
    {
        Func<T, T, T> op = ExpressionToFunc(f);
        return (a, b) =>
        {
            int len = a.Length;
            T[] result = new T[len];
            for (int i = 0; i < len; i++)
                result[i] = op(ap[i], bp[i]);
            return result;
        };
    }
}

现在关注行:

                result[i] = op(ap[i], bp[i]);

op 委托每次数组操作调用 n 次,这非常耗时。因此,我们有了泛型实现,但正如可以测试的那样,这是一个非常慢的实现。回到表达式树,希望它能给我们一些解决方案。

    public static Func<double[], double[], double[]> ArrayExpressionToFunc()
    {
        return (a, b) =>
        {
            int len = a.Length;
            double[] result = new double[len];
            for (int i = 0; i < len; i++)
                result[i] = ap[i] + bp[i];
            return result;
        };
    }

上面展示了 double 类型的非泛型实现。此实现比之前介绍的泛型实现更快。现在,让我们尝试构建一个行为与此非泛型实现完全相同的表达式树。

    /// <summary>
    /// Convert BinaryExpression into Func which acts as operator on types double[] and double[]. double[] = double[] op double[], where op is provided by f param.
    /// </summary>
    /// <param name="f"></param>
    /// <returns></returns>
    private static Func<double[], double[], double[]> ArrayExpressionToFunc(Func<IndexExpression, IndexExpression, BinaryExpression> f)
    {
        //a, b are input parametres for returned Func<double[], double[], double[]> delegate
        //c is output (result)
        //i is loop variable
        //
        // //implementation looks like:
        // for(int i = a.Length; i > -1; i--)
        //     c[i] = a[i] op b[i];
        // return c;
        //
        ParameterExpression apA = Expression.Parameter(typeof(double[]), "a");
        ParameterExpression bpA = Expression.Parameter(typeof(double[]), "b");
        ParameterExpression operationResult = Expression.Parameter(typeof(double[]), "c");
        ParameterExpression iA = Expression.Parameter(typeof(int), "i");

        LabelTarget labelReturn = Expression.Label(typeof(double[]));

        //this block represent block inside loop
        Expression innerBlock = Expression.Block(
            Expression.SubtractAssign(iA, Expression.Constant(1)),
            Expression.IfThen(Expression.Equal(iA, Expression.Constant(-1)),
            Expression.Return(labelReturn, operationResult)),
            Expression.Assign(Expression.ArrayAccess(operationResult, iA), f(Expression.ArrayAccess(apA, iA), Expression.ArrayAccess(bpA, iA)))
            );

        //expression for easy implementation of new double[i] constructor
        Expression<Func<int, double[]>> newTA = (i) => new double[i];

        //main body of Func. Variable initialization and loop execution
        Expression addeA = Expression.Block(
            new[] { iA, operationResult },
            Expression.Assign(iA, Expression.ArrayLength(apA)),
            Expression.Assign(operationResult, Expression.Invoke(newTA, iA)),
            Expression.Loop(innerBlock, labelReturn)
            );

        //Compilation to get result.
        Expression<Func<double[], double[], double[]>> lambdaA = Expression.Lambda<Func<double[], double[], double[]>>(addeA, apA, bpA);
        return lambdaA.Compile();
    }

此实现返回一个委托,该委托可用于将二元运算符应用于 double 数组。可能的用法是:

Func<double[], double[], double[]> addArray = FuncGenerator<double>.ArrayExpressionToFunc((a, b) => Expression.Add(a, b));
Func<double[], double[], double[]> subArray = FuncGenerator<double>.ArrayExpressionToFunc((a, b) => Expression.Subtract(a, b));

现在,泛型实现还需要最后一步。只需将适当位置的 double 关键字替换为 T 关键字。

    /// <summary>
    /// Convert BinaryExpression into Func which acts as operator on types T[] and T[]. T[] = T[] op T[], where op is provided by f param.
    /// </summary>
    /// <param name="f"></param>
    /// <returns></returns>
    private static Func<T[], T[], T[]> ArrayExpressionToFunc(Func<IndexExpression, IndexExpression, BinaryExpression> f)
    {
        //a, b are input parametres for returned Func<T[], T[], T[]> delegate
        //c is output (result)
        //i is loop variable
        //
        // //implementation looks like:
        // for(int i = a.Length; i > -1; i--)
        //     c[i] = a[i] op b[i];
        // return c;
        //
        ParameterExpression apA = Expression.Parameter(typeof(T[]), "a");
        ParameterExpression bpA = Expression.Parameter(typeof(T[]), "b");
        ParameterExpression operationResult = Expression.Parameter(typeof(T[]), "c");
        ParameterExpression iA = Expression.Parameter(typeof(int), "i");

        LabelTarget labelReturn = Expression.Label(typeof(T[]));

        //this block represent block inside loop
        Expression innerBlock = Expression.Block(
            Expression.SubtractAssign(iA, Expression.Constant(1)),
            Expression.IfThen(Expression.Equal(iA, Expression.Constant(-1)),
            Expression.Return(labelReturn, operationResult)),
            Expression.Assign(Expression.ArrayAccess(operationResult, iA), f(Expression.ArrayAccess(apA, iA), Expression.ArrayAccess(bpA, iA)))
            );

        //expression for easy implementation of new T[i] constructor
        Expression<Func<int, T[]>> newTA = (i) => new T[i];

        //main body of Func. Variable initialization and loop execution
        Expression addeA = Expression.Block(
            new[] { iA, operationResult },
            Expression.Assign(iA, Expression.ArrayLength(apA)),
            Expression.Assign(operationResult, Expression.Invoke(newTA, iA)),
            Expression.Loop(innerBlock, labelReturn)
            );

        //Compilation to get result.
        Expression<Func<T[], T[], T[]>> lambdaA = Expression.Lambda<Func<T[], T[], T[]>>(addeA, apA, bpA);
        return lambdaA.Compile();
    }

代码可用于定义数组运算符:

Func<int[], int[], int[]> addArray = FuncGenerator<int>.ArrayExpressionToFunc((a, b) => Expression.Add(a, b));
Func<int[], int[], int[]> subArray = FuncGenerator<int>.ArrayExpressionToFunc((a, b) => Expression.Subtract(a, b));

时间测试

为了测试和比较上述实现,实现了一系列测试。

  • 简单实现(完全编译时规范,预计速度更快)
  • for 循环中的 Func<double, double, double>(使用类型 T 的 + 运算符的委托,预计速度较慢)
  • 在编译时创建的 Func<double[], double[], double[]>(使用 T[] 类型的 + 运算符的委托,预计速度很快)
  • 在运行时创建的 Func<double[], double[], double[]>(使用表达式树获取 T[] 类型的 + 运算符,希望足够快)

所有测试的代码如下:

int lastTick = 0;
private void Msg(string msg)
{
    int thisTick = Environment.TickCount;
    int delta = thisTick - lastTick;
    textBox1.AppendText(delta.ToString() + "ms\t" + msg + "\r\n");
    lastTick = thisTick;
}

private void TimeTests()
{
    double[] a = new double[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    double[] b = new double[] { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };
    double[] r = new double[] { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };

    Func<double, double, double> Add = (ap, bp) => (ap + bp);
    Func<double[], double[], double[]> AddA = (ap, bp) =>
        {
            int len = ap.Length;
            double[] result = new double[len];
            for (int i = 0; i < len; i++)
                result[i] = ap[i] + bp[i];
            return result;
        };
    Func<double[], double[], double[]> AddA2 = FuncGenerator<double>.CreateArrayOperatorFuncAdd();

    Msg("beginLoops");
    int cnt = 1000000; //1M

    //Simple implementation
    for (int i = 0; i < cnt; i++)
    {
        r = new double[a.Length];
        for (int j = 0; j < a.Length; j++)
            r[j] = a[j] + b[j];
    }
    Msg("Simple implementation");

    //Func<double, double, double> in for loop
    for (int i = 0; i < cnt; i++)
        for (int j = 0; j < a.Length; j++)
            r[j] = Add(a[j], b[j]);
    Msg("Func<double, double, double> in for loop");

    //Func<double[], double[], double[]> created at compile time
    for (int i = 0; i < cnt; i++)
        r = AddA(a, b);
    Msg("Func<double[], double[], double[]> created at compile time");

    //Func<double[], double[], double[]> created at runtime
    for (int i = 0; i < cnt; i++)
        r = AddA2(a, b);
    Msg("Func<double[], double[], double[]> created at runtime");
}

此测试的结果:

0ms      beginLoops
109ms    Simple implementation
282ms    Func<double, double, double> in for loop
109ms    Func<double[], double[], double[]> without loop
109ms    Func<double[], double[], double[]> created runtime

“简单实现”是最快的,并且与“运行时创建”相当。“for 循环中的 Func”最慢。这证明了目标已经实现,并且该实现非常好。最慢和最快的实现之间的差异很有趣,因为这种差异用于对 Func<double, double, double> 委托的重复调用。

请注意,在测试过程中,垃圾回收有时会收集内存,因此测试结果可能会有所不同,但引入的实现仍然与“简单实现”相当。

Vector<T> 类实现

所有部分工作都已完成,现在介绍完整实现的模板。完整实现不是目标,因此只展示了主要部分。

/// <summary>
/// Generic class for arithmetic operator overloading demonstration
/// </summary>
/// <typeparam name="T"></typeparam>
class Vector<T>
{
    //static operator delegates
    private static Func<T, T, T> Add = FuncGenerator<T>.CreateOperatorFuncAdd();
    private static Func<T[], T[], T[]> AddT = FuncGenerator<T>.CreateArrayOperatorFuncAdd();

    //http://stackoverflow.com/questions/1717444/combining-two-lamba-expressions-in-c-sharp

    //the ideas for implementation was based on article
    //http://blogs.msdn.com/b/csharpfaq/archive/2009/11/19/debugging-expression-trees-in-visual-studio-2010.aspx
    //

    private T[] values;
    public Vector(int dim)
    {
        this.values = new T[dim];
    }

    public Vector(IEnumerable<T> initialValues)
    {
        values = initialValues.ToArray();
    }

    public static Vector<T> operator +(Vector<T> a, Vector<T> b)
    {
        return AddT(a.values, b.values);
    }

    /// <summary>
    /// Allows easily convert an array to vector of same item type. Can be used with operator overloading.
    /// </summary>
    /// <param name="a">array to be converted</param>
    /// <returns></returns>
    public static implicit operator Vector<T>(T[] a)
    {
        return new Vector<T>(a);
    }

    /// <summary>
    /// Some kind of conversion of Vector into string
    /// </summary>
    /// <returns></returns>
    public override string ToString()
    {
        string result = "";

        result = values.Select(t => t.ToString()).Aggregate((a, b) => a + ";" + b);
        return "[" + result + "]";

    }

    /// <summary>
    /// Some kind of parsing string into Vector. Items must be delimited by ';' char. On left side '[' char is expected, on right side ']' char is expected.
    /// This is just demonstration class so I am not proud of this implementation as some strange string forms will be successfuly parsed into T[].
    /// </summary>
    /// <param name="value">String value which will be parsed.</param>
    /// <param name="parse">Parse delegate for conversion from string into T value.</param>
    /// <returns></returns>
    public static T[] Parse(string value, Func<string, T> parse)
    {
        if (value[0] != '[' | value[value.Length - 1] != ']')
            throw new FormatException(string.Format("{0}  is not valid format for type {1}", value, typeof(T[]).ToString()));

        string tmpStr = value.Substring(1, value.Length - 2).Trim();

        string[] items = tmpStr.Split(new char[] { ';' });
        var values = items.Select(s => parse(s.Trim()));
        return values.ToArray();
    }
}

代码使用和局限性

已经介绍了泛型 Vector<T> 的基本形式,现在让我们展示可能的用法。首先,有一系列数值类型可以很好地与此实现配合使用。

/// <summary>
/// Initialization values for numeric types
/// </summary>
string aValue = "[0; 1; 2; 3; 4; 5; 6; 7; 8; 9]";
string bValue = "[1; 1; 1; 1; 1; 1; 1; 1; 1; 1]";

/// <summary>
/// Generic type test
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="parser"></param>
private void TypeTest<T>(Func<string, T> parser)
{
    Msg("Test for " + typeof(T).ToString());
    Vector<T> a = Vector<T>.Parse(aValue, parser);
    Vector<T> b = Vector<T>.Parse(bValue, parser);
    Vector<T> result = a + b;

    Msg(string.Format("c = a + b; => {0} = {1} + {2}", result.ToString(), a.ToString(), b.ToString()));
}

/// <summary>
/// Test of native numeric types
/// </summary>
private void ImplementationTest()
{

    TypeTest<int>(s => int.Parse(s));
    TypeTest<uint>(s => uint.Parse(s));
    TypeTest<short>(s => short.Parse(s));
    TypeTest<ushort>(s => ushort.Parse(s));
    TypeTest<long>(s => long.Parse(s));
    TypeTest<ulong>(s => ulong.Parse(s));

    TypeTest<double>(s => double.Parse(s));
    TypeTest<float>(s => float.Parse(s));
    TypeTest<decimal>(s => decimal.Parse(s));

}

此测试的结果:

0ms    Test for System.Int32
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
15ms   Test for System.UInt32
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
0ms    Test for System.Int16
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
16ms   Test for System.UInt16
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
0ms    Test for System.Int64
16ms   c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
0ms    Test for System.UInt64
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
15ms   Test for System.Double
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
16ms   Test for System.Single
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
16ms   Test for System.Decimal
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]

不幸的是,有些类型无法通过这种方式使用,即使有人认为它们应该可以工作。所 presented 的实现不适用于 bytesbyte 类型。

用户定义类型的测试是最后需要对该实现进行的测试。为此,准备了 UserNumericType(并重载了 + 运算符),并设定了 Vector<Vector<double>> 结构的假设。

/// <summary>
/// Test of types (with overloaded operator) defined by user
/// </summary>
private void UserTypeTest()
{
    Vector<UserNumericType> a = new UserNumericType[] { new UserNumericType(0, 1), new UserNumericType(10, 10) };
    Vector<UserNumericType> b = new UserNumericType[] { new UserNumericType(0, 1), new UserNumericType(20, 20) };

    Vector<UserNumericType> result = a + b;
    Msg("Test for UserNumericType");
    Msg(string.Format("c = a + b; => {0} = {1} + {2}", result.ToString(), a.ToString(), b.ToString()));

    Vector<Vector<double>> avvd = new Vector<double>[] { new double[] { 0, 1, 2 }, new double[] { 10, 10, 10 } };
    Vector<Vector<double>> bvvd = new Vector<double>[] { new double[] { 1, 1, 1 }, new double[] { 21, 21, 21 } };
    Vector<Vector<double>> resultvvd = avvd + bvvd;

    Msg("Test for Vector<Vector<double>> type");
    Msg(string.Format("c = a + b; => {0} = {1} + {2}", resultvvd.ToString(), avvd.ToString(), bvvd.ToString()));

}

此测试的结果:

0ms    Test for UserNumericType
16ms   c = a + b; => [(0; 2);(30; 30)] = [(0; 1);(10; 10)] + [(0; 1);(20; 20)]
15ms   Test for Vector<Vector<double>> type
0ms    c = a + b; => [[1;2;3];[31;31;31]] = [[0;1;2];[10;10;10]] + [[1;1;1];[21;21;21]]

未来发展

文章中多次提到,目标不是实现一个完美的 Vector<T> 类,所以没有做到。只是建立了一些为未来发展奠定坚实基础的原则。如果有人愿意,可以在互联网上找到许多某种形式的 vector 的实现,并采用友好的许可证,然后根据本文 presented 的原则对其进行改造。

历史

2014-07-21 经过几天的调查后完成首次实现。

参考文献

© . All rights reserved.