线性时间排序算法：计数排序、桶排序与基数排序详解

news2024/12/13 0:56:41

引言

排序算法的时间复杂度通常被限制在O(n log n)，如快速排序、归并排序和堆排序。然而，在某些特定场景下，线性时间排序算法（时间复杂度为O(n)）则可以更高效地完成任务。

今天，我们将深入探讨计数排序、桶排序和基数排序的核心思想和实现方法。它们利用了数据的特点和分布规律，通过避免比较操作，达到了更优的时间复杂度。尽管适用范围有限，但在合适的场景下，它们可以成为绝佳的排序工具。

一、计数排序（Counting Sort）

1.1 算法思想

计数排序是一种基于计数统计的排序算法，适用于元素范围有限且为非负整数的数据。其核心思想是：

根据数据的值，统计每个元素出现的次数。
累加计数，确定每个元素在结果数组中的位置。
根据计数结果，将元素放置到正确位置。

1.2 算法过程

假设待排序的数组为 arr[]，范围为 [0, k)：

计数统计：创建一个大小为k的计数数组count[]，统计每个值的出现次数。
累加计数：对count[]进行累加，用于确定每个元素的最终位置。
输出结果：遍历原数组，根据count[]中的位置信息，将元素存入结果数组。

1.3 C语言实现


#include <stdio.h>
#include <stdlib.h>

void countingSort(int arr[], int n, int maxValue) {
    int* count = (int*)calloc(maxValue + 1, sizeof(int)); // 初始化计数数组
    int* output = (int*)malloc(n * sizeof(int));          // 输出数组

    // 统计每个元素的出现次数
    for (int i = 0; i < n; i++) {
        count[arr[i]]++;
    }

    // 累加计数
    for (int i = 1; i <= maxValue; i++) {
        count[i] += count[i - 1];
    }

    // 根据计数将元素放置到输出数组中
    for (int i = n - 1; i >= 0; i--) {  // 从后往前遍历保证稳定性
        output[count[arr[i]] - 1] = arr[i];
        count[arr[i]]--;
    }

    // 将结果拷贝回原数组
    for (int i = 0; i < n; i++) {
        arr[i] = output[i];
    }

    free(count);
    free(output);
}

int main() {
    int arr[] = {4, 2, 2, 8, 3, 3, 1};
    int n = sizeof(arr) / sizeof(arr[0]);
    int maxValue = 8;

    countingSort(arr, n, maxValue);

    printf("排序后的数组: ");
    for (int i = 0; i < n; i++) {
        printf("%d ", arr[i]);
    }
    return 0;
}

1.4 时间和空间复杂度

时间复杂度：O(n + k)，其中n是数组大小，k是最大值范围。
空间复杂度：O(n + k)，需要额外的计数数组和输出数组。

1.5 特点与适用场景

特点：
- 稳定排序。
- 适用于元素范围小的整数数据。
适用场景：
- 数据范围有限，数据分布均匀（如考试成绩统计）。

二、桶排序（Bucket Sort）

2.1 算法思想

桶排序通过将数据分配到若干个“桶”中，每个桶内的数据进行单独排序后再合并。其核心思想是分而治之：

根据数据分布，将元素分配到不同的桶中。
对每个桶内的元素单独排序。
按顺序合并所有桶内的元素，得到最终结果。

2.2 算法过程

分桶：创建若干个桶，每个桶对应一个数据范围。
入桶：将元素根据其值分配到对应的桶中。
桶内排序：对每个桶单独排序。
合并结果：依次将所有桶的元素合并。

2.3 C语言实现


#include <stdio.h>
#include <stdlib.h>

// 链表节点
typedef struct Node {
    int value;
    struct Node* next;
} Node;

// 将元素插入链表（从小到大排序）
Node* insertSorted(Node* head, int value) {
    Node* newNode = (Node*)malloc(sizeof(Node));
    newNode->value = value;
    newNode->next = NULL;

    if (!head || value < head->value) {
        newNode->next = head;
        return newNode;
    }

    Node* current = head;
    while (current->next && current->next->value < value) {
        current = current->next;
    }
    newNode->next = current->next;
    current->next = newNode;
    return head;
}

// 桶排序
void bucketSort(int arr[], int n) {
    int bucketCount = 10;  // 假设数据范围为[0, 100)
    Node** buckets = (Node**)calloc(bucketCount, sizeof(Node*));

    // 入桶
    for (int i = 0; i < n; i++) {
        int bucketIndex = arr[i] / 10;
        buckets[bucketIndex] = insertSorted(buckets[bucketIndex], arr[i]);
    }

    // 合并所有桶
    int index = 0;
    for (int i = 0; i < bucketCount; i++) {
        Node* current = buckets[i];
        while (current) {
            arr[index++] = current->value;
            Node* temp = current;
            current = current->next;
            free(temp);
        }
    }

    free(buckets);
}

int main() {
    int arr[] = {78, 17, 39, 26, 72, 94, 21, 12, 68, 36};
    int n = sizeof(arr) / sizeof(arr[0]);

    bucketSort(arr, n);

    printf("排序后的数组: ");
    for (int i = 0; i < n; i++) {
        printf("%d ", arr[i]);
    }
    return 0;
}

2.4 时间和空间复杂度

时间复杂度：
- 平均：O(n + k)（假设桶内排序为线性时间）。
- 最坏：O(n²)（所有数据集中在一个桶内）。
空间复杂度：O(n + k)。

2.5 特点与适用场景

特点：
- 非比较排序，效率高。
- 稳定排序。
适用场景：
- 数据分布均匀，范围已知（如浮点数排序）。

三、基数排序（Radix Sort）

3.1 算法思想

基数排序通过对数据的位（如个位、十位、百位）依次排序来完成整体排序。它使用稳定的排序算法（如计数排序）作为子过程。

3.2 算法过程

从最低位开始，对数组按每一位的值进行排序。
每次排序完成后，按当前位的顺序重新组织数组。
重复上述过程，直到最高位。

3.3 C语言实现

void countingSortForRadix(int arr[], int n, int exp) {
    int* output = (int*)malloc(n * sizeof(int));
    int count[10] = {0};

    // 统计每个数字出现次数
    for (int i = 0; i < n; i++) {
        count[(arr[i] / exp) % 10]++;
    }

    // 累加计数
    for (int i = 1; i < 10; i++) {
        count[i] += count[i - 1];
    }

    // 按当前位排序
    for (int i = n - 1; i >= 0; i--) {
        int digit = (arr[i] / exp) % 10;
        output[count[digit] - 1] = arr[i];
        count[digit]--;
    }

    for (int i = 0; i < n; i++) {
        arr[i] = output[i];
    }

    free(output);
}

void radixSort(int arr[], int n) {
    int maxValue = arr[0];
    for (int i = 1; i < n; i++) {
        if (arr[i] > maxValue) maxValue = arr[i];
    }

    for (int exp = 1; maxValue / exp > 0; exp *= 10) {
        countingSortForRadix(arr, n, exp);
    }
}

int main() {
    int arr[] = {170, 45, 75, 90, 802, 24, 2, 66};
    int n = sizeof(arr) / sizeof(arr[0]);

    radixSort(arr, n);

    printf("排序后的数组: ");
    for (int i = 0; i < n; i++) {
        printf("%d ", arr[i]);
    }
    return 0;
}

四、总结与展望

特性	计数排序	桶排序	基数排序
时间复杂度	O(n + k)	O(n + k)	O(d × (n + k))
空间复杂度	O(n + k)	O(n + k)	O(n + k)
稳定性	稳定	稳定	稳定
适用场景	整数数据，小范围	数据分布均匀	整数数据，位数有限