プロックスマップソートで配列を並び替える

プロックスマップソートを使用する

プロックスマップソートは、キーを近接写像 (proximity map) で決めた部分配列へ仕分け、各部分配列内を挿入しながら整列する分布ソートである。

ヒット数 H: 各キーに mapKey 関数を適用し、同じ部分配列へ入る要素数を数える。
近接写像 P: ヒット数の累積和から、各部分配列が出力配列 A2 のどこから始まるかを求める。部分配列のサイズはちょうどそのヒット数分確保される。
位置 L: 元配列 A の各要素について、L[i] = P[mapKey(A[i])] として配置開始位置を記録する。
配置: A を左から走査し、各要素を A2 の対応部分配列へ置く。衝突したら部分配列内で挿入ソートし、より大きいキーを右へ 1 セルずつずらして空きを作る。部分配列はヒット数分だけ確保されているため、隣の部分配列へはみ出さない。

procedure proxmap_sort(A)
  n = length(A)
  for each bucket b: H[b] = 0
  for each x in A:
    b = mapKey(x)
    H[b] = H[b] + 1
  running = 0
  for each bucket b:
    if H[b] > 0 then
      P[b] = running
      running = running + H[b]
  for i from 0 to n - 1:
    L[i] = P[mapKey(A[i])]
  A2 = array of n empty slots
  for i from 0 to n - 1:
    start = L[i]
    insert A[i] into A2 at start, shifting larger keys right within the subarray
  A = A2

整数キー 1..n を n 個の部分配列へ写す典型例では mapKey(x) = floor((x - min) / (max - min) * (n - 1)) のように値域を等分する。分布が一様なら O(n) に近づくが、同一部分配列に偏ると挿入のシフトが重なり O(n²) になる。

なお整列後に P と mapKey を保持しておけば、ProxmapSearch により平均 O(1) でキー検索できる。静的な大規模データセットで検索頻度が高い場合に有利だが、更新のたびに近接写像を組み直す必要がある。

類似アルゴリズムとの相違点

バケットソートは仕分け完了後にバケットごとに整列するが、プロックスマップは配置と挿入を同時に行い、整列後は ProxmapSearch で検索しやすい構造を残せる。

計算時間量および空間計算量を計測する

Size	Average time	Maximum time	Average memory	Maximum memory
256	0.000002	0.000048	1694	1700
512	0.000005	0.000031	1709	1716
1024	0.000009	0.000059	1746	1752
2048	0.000023	0.000070	1818	1824
4096	0.000042	0.000096	1878	1892
8192	0.000098	0.000290	2116	2148
16384	0.000199	0.000301	2754	2788
32768	0.000380	0.000545	3908	3940
65536	0.000785	0.007048	6212	6244
131072	0.001574	0.009554	10819	10852
262144	0.002844	0.022800	19982	20076

計測に使用したコードを表示する

set -euo pipefail

WORKDIR="$(mktemp -d)"
trap 'rm -rf "$WORKDIR"' EXIT

cat > "$WORKDIR/Dockerfile" <<'EOF'
FROM rust:1.95.0

WORKDIR /app

RUN mkdir -p src

RUN cat > Cargo.toml <<'CARGO'
[package]
name = "rust-benchmark"
version = "0.1.0"
edition = "2021"

[profile.release]
lto = true
codegen-units = 1
panic = "abort"
CARGO

RUN cat > src/main.rs <<'RUST'
use std::{
    env,
    process::Command,
    time::{Duration, Instant},
};
const MIN_POWER: u32 = 8;
const MAX_POWER: u32 = 18;
const RUNS: usize = 8192;


fn proxmap_sort(a: &mut [usize]) {
    if a.is_empty() {
        return;
    }

    let n = a.len();
    let min = *a.iter().min().unwrap();
    let max = *a.iter().max().unwrap();
    let bucket_count = n;

    let map_key = |x: usize| -> usize {
        if max == min {
            0
        } else {
            ((x - min) * (bucket_count - 1)) / (max - min)
        }
    };

    let mut hit_count = vec![0usize; bucket_count];
    let mut map_keys = Vec::with_capacity(n);

    for &x in a.iter() {
        let mk = map_key(x);
        map_keys.push(mk);
        hit_count[mk] += 1;
    }

    let mut prox_map = vec![None; bucket_count];
    let mut running_total = 0usize;

    for (i, &hits) in hit_count.iter().enumerate() {
        if hits > 0 {
            prox_map[i] = Some(running_total);
            running_total += hits;
        }
    }

    let location: Vec<usize> = map_keys
        .iter()
        .map(|&mk| prox_map[mk].expect("missing prox map entry"))
        .collect();

    let mut a2: Vec<Option<usize>> = vec![None; n];

    for i in 0..n {
        let key = a[i];
        let start = location[i];
        let mut insert_idx = start;

        loop {
            if a2[insert_idx].is_none() {
                a2[insert_idx] = Some(key);
                break;
            }

            let current = a2[insert_idx].unwrap();

            if key < current {
                let mut end = insert_idx + 1;
                while end < n && a2[end].is_some() {
                    end += 1;
                }

                for k in (insert_idx..end - 1).rev() {
                    a2[k + 1] = a2[k];
                }

                a2[insert_idx] = Some(key);
                break;
            }

            insert_idx += 1;
        }
    }

    for i in 0..n {
        a[i] = a2[i].unwrap();
    }
}


fn benchmark_sort(array: &mut [usize]) {

    proxmap_sort(array);

}

fn is_non_decreasing(a: &[usize]) -> bool {
    a.windows(2).all(|w| w[0] <= w[1])
}

fn same_multiset(a: &[usize], b: &[usize]) -> bool {
    if a.len() != b.len() {
        return false;
    }

    let mut left = a.to_vec();
    let mut right = b.to_vec();
    left.sort_unstable();
    right.sort_unstable();
    left == right
}

fn check_correctness_case(label: &str, mut input: Vec<usize>) {
    let original = input.clone();

    benchmark_sort(&mut input);

    if !is_non_decreasing(&input) {
        panic!("correctness case {}: output is not sorted", label);
    }

    if !same_multiset(&input, &original) {
        panic!("correctness case {}: elements were lost or added", label);
    }
}

fn few_unique_values(size: usize, unique: usize, seed: u64) -> Vec<usize> {
    let mut state = seed;

    (0..size)
        .map(|_| {
            state ^= state << 13;
            state ^= state >> 7;
            state ^= state << 17;
            (state as usize % unique) + 1
        })
        .collect()
}

fn run_correctness_checks() {
    check_correctness_case("empty", vec![]);
    check_correctness_case("single", vec![42]);
    check_correctness_case("duplicates", vec![3, 1, 3, 2, 1, 2]);
    check_correctness_case("sorted", vec![1, 2, 3, 4, 5]);
    check_correctness_case("reverse", vec![5, 4, 3, 2, 1]);
    check_correctness_case("all_equal", vec![7, 7, 7, 7]);
    check_correctness_case("skewed_range", vec![1_000_000, 2, 1_000_001, 1, 999_999]);
    // Static-buffer Grail skips the in-buffer build when key collection is sparse
    // (ideal_buffer = false). Exercising that path catches regressions in buffer gating.
    check_correctness_case(
        "few_keys_len16",
        vec![2, 2, 2, 2, 2, 2, 2, 2, 4, 3, 1, 2, 3, 4, 1, 4],
    );
    for seed in 0..32 {
        check_correctness_case(
            &format!("few_keys_len32_seed_{seed}"),
            few_unique_values(32, 4, seed),
        );
    }
}


fn shuffled(size: usize, seed: u64) -> Vec<usize> {
    let mut v: Vec<usize> = (1..=size).collect();

    let mut state = seed;

    for i in (1..size).rev() {
        state ^= state << 13;
        state ^= state >> 7;
        state ^= state << 17;

        let j = (state as usize) % (i + 1);

        v.swap(i, j);
    }

    v
}

fn memory_usage_kb() -> usize {
    // VmHWM (peak RSS, KiB). Reported memory subtracts a per-size baseline that only
    // holds the input array, so the table reflects auxiliary space during sorting.
    let contents = std::fs::read_to_string("/proc/self/status")
        .unwrap_or_default();

    for line in contents.lines() {
        if let Some(rest) = line.strip_prefix("VmHWM:") {
            let kb = rest
                .split_whitespace()
                .next()
                .unwrap_or("0")
                .parse::<usize>()
                .unwrap_or(0);

            return kb;
        }
    }

    0
}

fn micros(d: Duration) -> u128 {
    d.as_micros()
}

fn input_array(size: usize, seed: u64) -> Vec<usize> {
    shuffled(size, seed)
}

fn run_baseline(size: usize) -> usize {
    let _hold = input_array(size, 1);
    memory_usage_kb()
}

fn run_once(size: usize, seed: usize) -> (u128, usize) {
    let mut array = input_array(size, seed as u64);

    let start = Instant::now();

    benchmark_sort(&mut array);

    let elapsed = start.elapsed();
    let mem = memory_usage_kb();

    let expected: Vec<usize> = (1..=size).collect();
    if array != expected {
        panic!(
            "sort failed with seed {} for size {}",
            seed,
            size
        );
    }

    (micros(elapsed), mem)
}

fn run_baseline_child(args: &[String]) {
    let size = args[2].parse::<usize>().expect("invalid size");
    let mem = run_baseline(size);
    println!("{}", mem);
}

fn run_child(args: &[String]) {
    let size = args[2].parse::<usize>().expect("invalid size");
    let seed = args[3].parse::<usize>().expect("invalid seed");
    let (elapsed_us, mem) = run_once(size, seed);
    println!("{} {}", elapsed_us, mem);
}

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.get(1).is_some_and(|arg| arg == "--baseline-once") {
        run_baseline_child(&args);
        return;
    }
    if args.get(1).is_some_and(|arg| arg == "--run-once") {
        run_child(&args);
        return;
    }

    run_correctness_checks();

    println!(
        "| {:>10} | {:>15} | {:>15} | {:>15} | {:>15} |",
        "Size",
        "Average time",
        "Maximum time",
        "Average memory",
        "Maximum memory"
    );

    println!(
        "|{:-<11}:|{:-<16}:|{:-<16}:|{:-<16}:|{:-<16}:|",
        "",
        "",
        "",
        "",
        ""
    );

    for power in MIN_POWER..=MAX_POWER {
        let size = 1usize << power;

        let baseline_output = Command::new(env::current_exe().expect("failed to find current executable"))
            .arg("--baseline-once")
            .arg(size.to_string())
            .output()
            .expect("failed to run benchmark baseline process");

        if !baseline_output.status.success() {
            panic!(
                "benchmark baseline process failed: {}",
                String::from_utf8_lossy(&baseline_output.stderr)
            );
        }

        let baseline_stdout = String::from_utf8(baseline_output.stdout)
            .expect("baseline process returned non-UTF-8 output");
        let baseline_mem = baseline_stdout
            .split_whitespace()
            .next()
            .expect("missing baseline memory usage")
            .parse::<usize>()
            .expect("invalid baseline memory usage");

        let mut total_time: u128 = 0;
        let mut max_time: u128 = 0;

        let mut total_mem: usize = 0;
        let mut max_mem: usize = 0;

        for seed in 1..=RUNS {
            let output = Command::new(env::current_exe().expect("failed to find current executable"))
                .arg("--run-once")
                .arg(size.to_string())
                .arg(seed.to_string())
                .output()
                .expect("failed to run benchmark child process");

            if !output.status.success() {
                panic!(
                    "benchmark child process failed: {}",
                    String::from_utf8_lossy(&output.stderr)
                );
            }

            let stdout = String::from_utf8(output.stdout)
                .expect("child process returned non-UTF-8 output");
            let mut fields = stdout.split_whitespace();
            let elapsed_us = fields
                .next()
                .expect("missing elapsed time")
                .parse::<u128>()
                .expect("invalid elapsed time");
            let mem = fields
                .next()
                .expect("missing memory usage")
                .parse::<usize>()
                .expect("invalid memory usage");

            total_time += elapsed_us;

            if elapsed_us > max_time {
                max_time = elapsed_us;
            }

            let aux_mem = mem.saturating_sub(baseline_mem);

            total_mem += aux_mem;

            if aux_mem > max_mem {
                max_mem = aux_mem;
            }
        }

        let avg_time = total_time / RUNS as u128;
        let avg_mem = total_mem / RUNS;

        println!(
            "| {:>10} | {:>15} | {:>15} | {:>15} | {:>15} |",
            size,
            format!("{}.{:06}", avg_time / 1_000_000, avg_time % 1_000_000),
            format!("{}.{:06}", max_time / 1_000_000, max_time % 1_000_000),
            avg_mem,
            max_mem
        );
    }
}
RUST

RUN cargo build --release

CMD ["./target/release/rust-benchmark"]
EOF

docker build -t rust-benchmark "$WORKDIR"
docker run --rm --init rust-benchmark