气泡树平衡二叉树（面向所有人）

jurhas

3.30/5 (8投票s)

2016年5月17日

CPOL

5分钟阅读

13775

快速二叉树算法

引言

在计算机科学中，二叉树是最重要的用于插入、选择和更新的数据结构之一。对于单次插入和查找，哈希表可能（肯定）更快，那么为什么BST更重要呢？

因为平衡二叉树允许你使用比较器执行查询，我们可以选择大于某个值的值，小于某个值的值，或介于其他一些值之间的值。所以它是一个强大的数据处理工具，可以处理大量数据。

但是为了高效地完成它的工作，二叉树必须是平衡的，否则我们只是模拟了数组中的准线性搜索。我不想解释什么是平衡二叉树。你可以在这里找到更好的解释这里。维基百科总是学习的最佳起点。

背景

让我们做一些考虑

一个完美平衡的树，其前n-1层叶子节点都已填满，只是一个学术练习。实际上，如果“未针对读取进行优化”，则很难保持二叉树完美平衡。一种极端算法是将整个树放入一个现在已排序的数组中，然后从中间开始插入。他们用“它针对读取进行了优化”来证明这种算法。我会告诉他们，把它留在数组中，执行一个完美的普通二分查找，如果你插入新的东西，就移动数组；如果你插入很多东西，就调用qsort()，我认为在这种情况下这是更好的解决方案。

二叉树必须快速插入，必须快速删除，必须快速选择。如果你想到一个用二叉树做的练习，看看我的文章这里。用二叉树代替哈希表。这是平衡二叉树的目的。在这种情况下，它必须快速插入、删除和查找。不允许出现任何弱点。你和内存之间存在一个数据结构。如果你不记得有datastructure，这意味着这是一个好算法。

让我们考虑其他一些事情

搜索的成本是多少？

time=(comparison * depth)

所以，如果你用C编程，你知道对于string的比较，你必须调用一个名为strcmp()的函数，如果你不用C编程，你应该知道你的编译器会在你的位置调用类似的函数。因此，树的比较成本与深度级别完全相等。

该算法也可能达到准完美平衡，但它们必须对每个叶子节点执行两次（或更多次）比较，它们的行为就像深度是两倍（或更多倍）一样。

所以目标是：一个完美的二叉树，只有左叶或右叶，可以通过一次比较在其括号内运行，但几乎完美平衡。

这是一个优化问题。我们必须计算系统的每个变量。

诀窍是执行旋转：但是何时、为什么以及在哪里旋转是一个谜。

经过一些实验，我找到了它。

Using the Code

让我们构建我们的树

struct bt_node
{
 	unsigned index;
	void *value;
	long left;
	long right;
	unsigned depth;
}
struct bt_tree
{
 	unsigned n;
	unsigned buffer;
	long current_root;
	struct bt_node* nodes;
	i_stack freeslots;
}
//

这些结构很容易理解。我需要解释depth。经过多次尝试，我发现平衡树的最佳方法是根据深度差异。深度是反转的。这意味着叶子节点的depth=1，根节点包含整个深度。这是因为如果我执行旋转，更容易将新深度仅传播到我的父节点(s)，而不是我的200,000个叶子节点。如果这看起来像一个好理由，事实上，与真正的原因相比，它什么都不是。如果我从根节点开始，所有兄弟节点都有一些深度，而我失去了触发旋转的参数。这意味着我们不知道树何时需要平衡。实际上，我们在高速公路上是盲目的。为此，我们为叶子节点设置depth=1，并将深度向上传播到祖先节点。

添加函数在这里

// the function add() is the same for each case, and data type, it works just 
//with slots and returns the index of the new value if inserted
// otherwise the index of the existing value. The handling of the "error" or whatelse
// get done from the add_if_not_exists or other interface function (in my version I add also a pointer to function, one
//for compare, for simplicity I don't put it here)
long add(struct bt_tree*bt, unsigned root, unsigned newitemix);
// the function below save first the data in a new slot. this get done
//because I would generalize the function of the tree. So, when found, it 
//finds all the information in the proper slot. Without to modify the 
//parameter list of the core functions. So with different data type I just have to modify my 
//interface function. 

unsigned add_if_not_exists(struct bt_tree* bt, char *name, void*value)
{
	//the function get_new_slot() pop a value from freeslots, if it has
	// otherwise increase the counter of the tree, check if the buffer
	//can satisfy the new request, otherwise realloc() the vector
	 res=get_new_slot(bt->freeslots);
	unsigned tmp=add(bt, cur,res);	
	bt->node[res].name=name; //I just paste the address of the name 
				    //i don't duplicate yet 
	bt->node[res].value=value;
	if (tmp==cur)	//successfully inserted
	{
		bt->node[res].name=strdup(name); 
		bt->node[res].value=value;	

	}
	else 
	{  //already existing
		stack_push(bt->freeslots,res);
		bt->node[res].name=NULL; 
		bt->node[res].value=NULL;
	}
	return tmp;
}
//this is a core function, it returns "newitemix" 
//if successfully inserted otherwise the index of the existing item
//with the same key. What do with it is a matter of interface function (if raise an exception or
//sum, or just add_if_not_exists) 
long add(struct bt_tree*bt,  unsigned newitemix)
{
	int cmp;
	long root=bt->cuurent_root;
	while (root>0)
	{
		if ((cmp=strcmp(bt->nodes[root].name,bt->nodes[newitemix].name)==0)
		{
			return root;
		}else if(cmp<0)
		{
			if(bt->nodes[root].right<0)
			{
				bt->nodes[root].right=newitemix;
				bt->nodes[newitemix].parent=root;
				break;
			}
			else 
				root=bt->nodes[root].right;
		}
		else
		{
			if(bt->nodes[root].left<0)
			{
				bt->nodes[root].left=newitemix;
				bt->nodes[newitemix].parent=root;
				break;			
			}
			else 
				root=bt->nodes[root].left;
		}

	}

	propagate_depth(bt,newitemix);
	bubble(bt, newitemix);
	return newitemix; 
}

//

插入函数是一个完美的二分查找。我认为不可能实现得更快。主循环之后有一些东西。首先是propagate_depth()将深度传播到祖先节点。新插入的叶子节点可能会增加树的深度。然后是bubble()。这个函数赋予了算法的名称。它们在这里

void propagate_depth(bt,newitemix)
{
   assert(newitemix>=0);
   unsigned i, depth;
   for(i=newitemix;i>=0;i=bt->nodes[i].parent)
   {
          depth=bt->nodes[i].right<0?0:bt->nodes[bt->nodes[i].right].depth;
          depth=bt->nodes[i].left<0?depth:max(depth,bt->nodes[bt->nodes[i].left].depth);
          depth++;
          bt->nodes[i].depth=depth;
   } 
}

void bubble(struct bt_tree*bt, unsigned index)
{
	unsigned parent=bt->nodes[index].parent;
	struct bt_node*right,*left;
	while (parent>=0)
	{	right=bt->nodes[index].right<0?NULL:&bt->nodes[bt->nodes[index].right];
		left=bt->nodes[index].left<0?NULL:&bt->nodes[bt->nodes[index].left];
		if ((!left && right && right->depth>1)||
		   (!right &&  left && left->depth>1)||
		(left && right && abs(right->depth-left->depth)>1))
			rotate(bt,index);
		index=parent;
		parent=bt->nodes[index].parent;
	}
}

//

这个函数做什么？第一个函数取其叶子节点中的最大值，将其增加并向上传播深度。第二个函数向上遍历树，如果发现深度之间存在差异，则调用旋转。旋转例程如下，我只写了左侧的。右侧完全镜像。

unsigned rotate(struct bt_tree*bt,unsigned ix)
{  //we save first all the neighborhood in some comfortable variables
	asser (ix>=0 && ix<bt->n);
   struct bt_node *me,*parent,*left,*right;
  char side; //and defines comfortable which side my parent keeps me
  me=&bt->nodes[ix];
  parent=me->parent<0?NULL:&bt->nodes[me->parent];
  left=me->left<0?NULL:&bt->nodes[me->left];
  right=me->right<0?NULL:&bt->nodes[me->right];
   assert(left || right);
  if (parent)
   side=parent->left==me->index?'l':'r';
  else 
   side =0;
   if (!right || (left && left->depth>right->depth)
   {
     me->parent=left->index;
     me->left=left->right;
     left->right=me->index;
     if (me->left>=0)
        bt->nodes[me->left].parent=me->index;
     if (side='l')
        parent->left=left->index;
     else if(side='r')
        parent->right=left->index;
     else
        bt->current_root=left->index;
     propagate_depth(bt, me->index);
    }
    else
   {
      //mirror routine 

   }
}

//

关注点

关注点有很多：它易于实现且速度很快。速度来自它的轻量级。查找例程给出了其速度的概念

void* find(struct bt_tree*bt,char *key)
{
	long current_root=bt->current_root;
	int cmp;	
	while (current_root>=0)
	{	if ( (cmp=strcmp(bt->nodes[current_root].name,key)==0)
			return bt->nodes[current_root].value;
		else if (cmp<0)
			current_root=bt->nodes[current_root].right;
		else
			current_root=bt->nodes[current_root].left;
	
	}	
	printf ("item %s not present in the tree\n",key);
	return NULL; 
}

//

但是当然，如果不能正确平衡，它可能快如闪电，但是如果它有100万个叶子节点需要遍历，它就帮不上忙了。所以我做了一些测试：从我的程序中一个包含系统六个自由度所有变量的表中，这意味着：a，ta（a对时间的导数：速度），tta，Rx（力矩a）b，tb，ttb，c... x...fx（x方向力）y...z... 24个标签。在“t”区域足够随机且足够不平衡，以至于可以预料到一些世界末日的情景。我使用双重循环并在树中插入了240万个项目。结果：树达到了25的深度（尽管理论上是22），99%的叶子节点位于第23层深度内，其余24000个叶子节点位于其余2层深度内。我将其用作已分配内存管理，其中新的插入几乎已排序，并不断重复这种分布。这里有一些统计工具

void  fill_ratio(struct bt_tree*bt)
{
	unsigned depth=0:	
	long i,n,buf, fix;
	buf=max(bt->n/10,100);
	n=0;
	 long long *reg= (long long*)calloc(buf,sizeof(long long));
	reg[n++]=bt->current_root; 	
	while (n>0)
	{
		depth ++;
		printf ("depth %d count %d fillratio %g\n",depth, n, (double)n/
      for (i=0;i<fix;i++)
     {
        if (bt<<depth-1)); fix="n;" for="">nodes[reg[i]].left<0 && bt->nodes[reg[i]].right<0)
			{
               if (fix<n)
               {
                 reg[i--]=reg[--fix];
                 reg[fix]=reg[--n];
               }
               else
               {
                 reg[i--]=reg[--fix];
                 n--;
                }
				n--;
		else if (i<n) reg="">nodes[reg[i]].left<0 && bt->nodes[reg[i]].right>0)
				reg [i]=bt->nodes[reg[i]].right;
			else if (bt->nodes[reg[i]].left>=0 && bt->nodes[reg[i]].right<0)
				reg [i]=bt->nodes[reg[i]].left;
			else
			{
				n++;
				if (n>=buf)
				{
					buf+=buf;
					reg=(long long*)realloc(reg, buf*sizeof(long long));
				}
				reg [i]=bt->nodes[reg[i]].right;
				reg [n]=bt->nodes[reg[i]].left;
			}				
		}	
	}
}

//