1 简述
HeapSelect 是一种用于选择数组中第 K 个最大元素的算法。它是选择问题的变体,涉及在无序或偏序集合中查找特定元素。
算法概要:数组被转换为最大堆,然后反复删除根节点并替换为下一个最大的元素,直到找到第 K 个最大的元素。
2 Heapselect
We saw a randomized algorithm with n + O(log n) comparison expected. Can we get the same performance out of an unrandomized algorithm?
Think about basketball tournaments, involving n teams. We form a complete binary tree with n leaves; each internal node represents an elimination game. So at the bottom level, there are n/2 games, and the n/2 winners go on to a game at the next level of the tree. Assuming the better team always wins its game, the best team always wins all its games, and can be found as the winner of the last game.
(This could all easily be expressed in pseudo code. So far, it's just a complicated algorithm for finding a minimum or maximum, which has some practical advantages, namely that it's parallel (many games can be played at once) and fair (in contrast, if we used algorithm min above, the teams placed earlier in L would have to play many more games and be at a big disadvantage).
Now, where in the tree could the second best team be? This team would always beat everyone except the eventual winner. But it must have lost once (since only the overall winner never loses). So it must have lost to the eventual winner. Therefore it's one of the log n teams that played the eventual winner and we can run another tournament algorithm among these values.
If we express this as an algorithm for finding the second best, it uses only n + ceil(log n) comparisons, even better than the average case algorithm above.
If you think about it, the elimination tournament described above is similar in some ways to a binary heap. And the process of finding the second best (by running through the teams that played the winner) is similar to the process of removing the minimum from a heap. We can therefore use heaps to extend idea to other small values of k:
heapselect(L,k)
{
heap H = heapify(L)
for (i = 1; i < k; i++) remove min(H)
return min(H)
}
The time is obviously O(n + k log n), so if k = O(n/log n), the result is O(n). Which is interesting, but still doesn't help for median finding.
3 C#源程序
using System;
namespace Legalsoft.Truffer
{
public class Heapselect
{
private int m { get; set; }
private int n { get; set; }
private int srtd { get; set; }
private double[] heap { get; set; }
public Heapselect(int mm)
{
this.m = mm;
this.n = 0;
this.srtd = 0;
this.heap = new double[mm];
for (int i = 0; i < mm; i++)
{
heap[i] = 1.0E99;
}
}
public void add(double val)
{
if (n < m)
{
heap[n++] = val;
if (n == m)
{
Array.Sort(heap);
}
}
else
{
if (val > heap[0])
{
heap[0] = val;
for (int j = 0; ;)
{
int k = (j << 1) + 1;
if (k > m - 1)
{
break;
}
if (k != (m - 1) && heap[k] > heap[k + 1])
{
k++;
}
if (heap[j] <= heap[k])
{
break;
}
Globals.SWAP(ref heap[k], ref heap[j]);
j = k;
}
}
n++;
}
srtd = 0;
}
public double report(int k)
{
int mm = Math.Min(n, m);
if (k > mm - 1)
{
throw new Exception("Heapselect k too big");
}
if (k == m - 1)
{
return heap[0];
}
if (srtd == 0)
{
Array.Sort(heap);
srtd = 1;
}
return heap[mm - 1 - k];
}
}
}
4 可参考的C 代码
/***********************************************************************
* Author: Isai Damier
* Title: Find the Greatest k values
* Project: geekviewpoint
* Package: algorithms
*
* Statement:
* Given a list of values, find the top k values.
*
* Time Complexity: O(n log n)
*
* Sample Input: {21,3,34,5,13,8,2,55,1,19}; 4
* Sample Output: {19,21,34,55}
*
* Technical Details: This selection problem is a classic and so has
* many very good solutions. In fact, any sorting algorithm can be
* modified to solve this problem. In the worst case, the problem
* can indeed be reduced to a sorting problem: where the collection
* is first sorted and then the element at indices 0 to k-1 are
* retrieved.
*
* Presently the problem is solved using a modified version of
* heapsort called heapselect.
**********************************************************************/
public int[] heapselectTopK(int[] G, int k) {
int last = G.length - 1;
//convert array to heap in O(n)
int youngestParent = last / 2;//l = 2*p+1: p=(l-1)/2
for (int i = youngestParent; i >= 0; i--) {
moveDown(G, i, last);
}
//sort up to k (i.e. find the kth)
int limit = last - k;
for (int i = last; i > limit; i--) {
if (G[0] > G[i]) {
swap(G, 0, i);
moveDown(G, 0, i - 1);
}
}
return Arrays.copyOfRange(G, G.length - k, G.length);
}
private void moveDown(int[] A, int first, int last) {
int largest = 2 * first + 1;
while (largest <= last) {
if (largest < last && A[largest] < A[largest + 1]) {
largest++;
}
if (A[first] < A[largest]) {
swap(A, first, largest);
first = largest;
largest = 2 * first + 1;
} else {
return;
}
}
}