Bucket Sort

Bucket Sort is a distribution sorting algorithm that works by dividing elements into a finite number of buckets, sorting these buckets (typically with another algorithm), and then concatenating them to produce the final sorted array. It is particularly efficient when input is uniformly distributed over a range, achieving an average case time complexity of O(n+k)O(n + k) where kk is the number of buckets.

Algorithm

BUCKETSORT(A)1.n=A.length2.buckets B[0n1]3.for i=1 to n do4.insert A[i] into bucket B[  n×A[i]  ]5.sort each bucket6.concatenate sorted buckets \begin{aligned} &\textbf{BUCKETSORT}(A) \\ &\quad 1. \quad n = A.length \\ &\quad 2. \quad \text{buckets } B[0 \dots n - 1] \\ &\quad 3. \quad \textbf{for } i = 1 \text{ to } n \textbf{ do} \\ &\quad 4. \quad\quad \text{insert } A[i] \text{ into bucket } B[ \ \lfloor \ n \times A[i] \ \rfloor \ ] \\ &\quad 5. \quad \text{sort each bucket} \\ &\quad 6. \quad \text{concatenate sorted buckets} \end{aligned}

Code

import numpy as np

def bucket_sort(arr, num_buckets=5):
    buckets = [[] for _ in range(num_buckets)]

    for num in arr:
        index = int(num_buckets * num)
        buckets[index].append(num)

    for bucket in buckets:
        bucket.sort()

    sorted_arr = [num for bucket in buckets for num in bucket]
    return sorted_arr

Simple step-by-step example

Given array: [0.78,0.17,0.39,0.26,0.72][0.78, 0.17, 0.39, 0.26, 0.72], using 5 buckets.

  1. Create 5 empty buckets
  2. Place each number in appropriate bucket based on value range:
  3. Bucket 0 [0.0-0.2]: 0.17
  4. Bucket 1 [0.2-0.4]: 0.26, 0.39
  5. Bucket 2 [0.4-0.6]: empty
  6. Bucket 3 [0.6-0.8]: 0.72, 0.78
  7. Bucket 4 [0.8-1.0]: empty
  8. Sort each bucket individually
  9. Concatenate buckets: [0.17,0.26,0.39,0.72,0.78][0.17, 0.26, 0.39, 0.72, 0.78]

Visualisation

Runtime Analysis

The time complexity of bucket sort varies depending on several factors:

  1. Average Case: O(n+k)O(n + k)
  2. Distribution of input into buckets: O(n)O(n)
  3. Sorting each bucket: When input is uniformly distributed, each bucket contains approximately n/kn/k elements
  4. With insertion sort for each bucket: O(k(n/k)2)=O(n2/k)O(k * (n/k)^2) = O(n^2/k)
  5. When knk \approx n, this reduces to O(n)O(n)

  6. Worst Case: O(n2)O(n^2)

  7. Occurs when all elements are placed in a single bucket
  8. The single bucket must then be sorted using insertion sort: O(n2)O(n^2)
  9. Additional O(n)O(n) overhead for bucket creation and concatenation

  10. Best Case: O(n+k)O(n + k)

  11. When elements are uniformly distributed across buckets
  12. Each bucket contains approximately n/kn/k elements
  13. Bucket creation and distribution: O(n)O(n)
  14. Sorting small buckets: O(1)O(1) per bucket
  15. Final concatenation: O(n)O(n)

  16. Space Complexity: O(n+k)O(n + k)

  17. Storage for nn elements across kk buckets
  18. Additional space for bucket array itself

The efficiency of bucket sort heavily depends on: * The uniformity of input distribution * The number of buckets chosen * The algorithm used for sorting individual buckets

When the input is known to be uniformly distributed and the number of buckets is chosen appropriately (typically knk \approx n), bucket sort can achieve linear time complexity, making it more efficient than comparison-based sorting algorithms which have a lower bound of O(nlogn)O(n \log n).