下界堆排序？

本文介绍了下界堆排序？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

它是众所周知，在最坏情况下的运行时堆排序是：欧米茄（N LG N），但我无法看到这是为什么。特别是，堆排序的第一步骤（使一个最大堆）需要时间＆的Theta;（n）的。这是随后由n个堆缺失。我明白为什么每个堆的删除需要时间O（LG N）;重新平衡堆涉及气泡下降操作，需要时间为O（h）上的堆的高度，而h = O（LG n）的。不过，我不明白就是为什么这个第二个步骤应该采取与欧米茄（N LG N）。这似乎是任何单个堆出队不一定会导致移动到顶部的气泡一路下跌的树的节点。

It's well-known that the worst-case runtime for heapsort is Ω(n lg n), but I'm having trouble seeing why this is. In particular, the first step of heapsort (making a max-heap) takes time Θ(n). This is then followed by n heap deletions. I understand why each heap deletion takes time O(lg n); rebalancing the heap involves a bubble-down operation that takes time O(h) in the height of the heap, and h = O(lg n). However, what I don't see is why this second step should take Ω(n lg n). It seems like any individual heap dequeue wouldn't necessarily cause the node moved to the top to bubble all the way down the tree.

我的问题是 - 没有人知道一个良好的下限证明堆排序的最好情况的行为。

My question is - does anyone know of a good lower-bound proof for the best-case behavior of heapsort?

推荐答案

所以，我做了一点挖掘自己的，它看起来像这样的结果其实是相当近期的！第一下限证明我能找到的是1992年，虽然堆排序本身发明于1964年。

So I did a bit of digging myself and it looks like this result actually is fairly recent! The first lower-bound proof I can find is from 1992, though heapsort itself was invented in 1964.

正式下限证明是由于堆排序分析谢弗和塞奇威克的论文。下面是省略了一些技术细节证明的轻微转述的版本。

The formal lower-bound proof is due to Schaffer and Sedgewick's "The Analysis of Heapsort" paper. Here's a slightly paraphrased version of the proof that omits some of the technical details.

首先，让我们假设N = 2 K - 1对某个k，从而保证我们有一个完整的二进制堆。我将展示如何分别处理这种情况以后。因为我们有2 K - 1个元素，堆排序的第一阶段将在与西塔（N），建立了高k的堆。现在，考虑先出队的一半来自这个堆，它去除了2 K-1 从堆节点。第一个关键发现是，如果你把起始堆，然后标记这里所有的节点最终真正离队的，它们构成了一堆的子树（即获得离队每个节点都有同样被离队父）。你可以看到这一点，因为如果不是这样的话，那么就不会有他们的（大）父母没有得到离队，虽然节点本身出列，这意味着该值是无序的某个节点。

To begin, let's suppose that n = 2k - 1 for some k, which guarantees that we have a complete binary heap. I'll show how to handle this case separately later on. Because we have 2k - 1 elements, the first pass of heapsort will, in Θ(n), build up a heap of height k. Now, consider the first half of the dequeues from this heap, which removes 2k-1 nodes from the heap. The first key observation is that if you take the starting heap and then mark all of the nodes here that actually end up getting dequeued, they form a subtree of the heap (i.e. every node that get dequeued has a parent that also gets dequeued). You can see this because if this weren't the case, then there would be some node whose (larger) parent didn't get dequeued though the node itself was dequeued, meaning that the values are out of order.

现在，考虑如何这棵树的结点分布在堆。如果标记堆0，1，2，...，k的水平 - 1，那么就会出现在级别0，1，2这些节点的某些数目，...，的k - 2（即，一切，除了树的底部水平）。为了使这些节点获得离堆出队，那么他们必须被换了根，他们只被换了一次在一个级别。这意味着，一个办法下限运行时的堆排序将是必要的计算交换的数量，使所有这些值到根。事实上，这正是我们要做的。

Now, consider how the nodes of this tree are distributed across the heap. If you label the levels of the heap 0, 1, 2, ..., k - 1, then there will be some number of these nodes in levels 0, 1, 2, ..., k - 2 (that is, everything except the bottom level of the tree). In order for these nodes to get dequeued from the heap, then they have to get swapped up to the root, and they only get swapped up one level at a time. This means that one way to lower-bound the runtime of heapsort would be to count the number of swaps necessary to bring all of these values up to the root. In fact, that's exactly what we're going to do.

我们需要回答的第一个问题是 - 有多少的最大2 K-1 节点不在堆的底部水平？我们可以证明，这是不大于2 K-2 用反证法。假设有至少2 K-2功能 + 1，在堆的底部水平最大的节点。然后每个这些节点的父母也必须在等级k大结点 - 2.即使在最好的情况下，这意味着，必须有在水平至少2 K-3 + 1个大结点的k - 2，然后意味着将有至少2 K-4 +的等级k 1个大的节点 - 总结在所有这些节点3，等等，我们得到了有2个 K-2 + 2 K-3 + ... + 2 0 + K大的节点。但这个值是严格大于2 K-1 ，矛盾，我们正在使用的实际上只有2 K-1 节点位置。

The first question we need to answer is - how many of the largest 2k-1 nodes are not in the bottom level of the heap? We can show that this is no greater than 2k-2 by contradiction. Suppose that there are at least 2k-2 + 1 of the largest nodes in the bottom level of the heap. Then each of the parents of those nodes must also be large nodes in level k - 2. Even in the best case, this means that there must be at least 2k-3 + 1 large nodes in level k - 2, which then means that there would be at least 2k-4 + 1 large nodes in level k - 3, etc. Summing up over all of these nodes, we get that there are 2k-2 + 2k-3 + ... + 20 + k large nodes. But this value is strictly greater than 2k-1, contradicting the fact that we're working with only 2k-1 nodes here.

好了...我们现在知道有至多2 K-2 在底层的大节点。这意味着它们必须有在第k个2层的大结点的至少2 K-2功能。我们现在要问 - 是什么的总和，在所有这些节点，距离从该节点到根？好吧，如果我们有2 定位介于一个完整堆，K-2 节点，然后在2 K-3 他们可以在第一的K - 3个级别，等至少有2 K-3 - 2 K-3 = 2 K-3 在等级k重节点 - 2.因此，互换的总数需要被执行的是至少（K - 2）2 K-3 。由于n = 2 K -1中，k =＆的Theta;（LG n），并且因此这个值是西塔;（正LG n）的所需

Okay... we now know that there are at most 2k-2 large nodes in the bottom layer. This means that there must be at least 2k-2 of the large nodes in the first k-2 layers. We now ask - what is the sum, over all of these nodes, of the distance from that node to the root? Well, if we have 2k-2 nodes positioned somewhere in a complete heap, then at most 2k-3 of them can be in the first k - 3 levels, and so there are at least 2k-3 - 2k-3 = 2k-3 heavy nodes in level k - 2. Consequently, the total number of swaps that need to be performed are at least (k - 2) 2k-3. Since n = 2k-1, k = Θ(lg n), and so this value is Θ(n lg n) as required.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

下界堆排序？

与本文相关的文章

评论列表(0)