Bucket Sort
Bucket Sort is a distribution sorting algorithm that works by dividing elements into a finite number of buckets, sorting these buckets (typically with another algorithm), and then concatenating them to produce the final sorted array. It is particularly efficient when input is uniformly distributed over a range, achieving an average case time complexity of where is the number of buckets.
Algorithm
Code
import numpy as np
def bucket_sort(arr, num_buckets=5):
buckets = [[] for _ in range(num_buckets)]
for num in arr:
index = int(num_buckets * num)
buckets[index].append(num)
for bucket in buckets:
bucket.sort()
sorted_arr = [num for bucket in buckets for num in bucket]
return sorted_arr
Simple step-by-step example
Given array: , using 5 buckets.
- Create 5 empty buckets
- Place each number in appropriate bucket based on value range:
- Bucket 0 [0.0-0.2]: 0.17
- Bucket 1 [0.2-0.4]: 0.26, 0.39
- Bucket 2 [0.4-0.6]: empty
- Bucket 3 [0.6-0.8]: 0.72, 0.78
- Bucket 4 [0.8-1.0]: empty
- Sort each bucket individually
- Concatenate buckets:
Visualisation
Runtime Analysis
The time complexity of bucket sort varies depending on several factors:
- Average Case:
- Distribution of input into buckets:
- Sorting each bucket: When input is uniformly distributed, each bucket contains approximately elements
- With insertion sort for each bucket:
-
When , this reduces to
-
Worst Case:
- Occurs when all elements are placed in a single bucket
- The single bucket must then be sorted using insertion sort:
-
Additional overhead for bucket creation and concatenation
-
Best Case:
- When elements are uniformly distributed across buckets
- Each bucket contains approximately elements
- Bucket creation and distribution:
- Sorting small buckets: per bucket
-
Final concatenation:
-
Space Complexity:
- Storage for elements across buckets
- Additional space for bucket array itself
The efficiency of bucket sort heavily depends on: * The uniformity of input distribution * The number of buckets chosen * The algorithm used for sorting individual buckets
When the input is known to be uniformly distributed and the number of buckets is chosen appropriately (typically ), bucket sort can achieve linear time complexity, making it more efficient than comparison-based sorting algorithms which have a lower bound of .