Add performance numbers by zhiltsov-max · Pull Request #2 · zhiltsov-max/string-distance

zhiltsov-max · 2026-02-07T18:29:25Z

Added performance test
Added performance numbers in Readme

Copilot

Pull request overview

Adds a Criterion benchmark and publishes initial benchmark results in the README to document string_distance::levenshtein() performance.

Changes:

Added a Criterion benchmark (cargo bench) to measure levenshtein() over multiple string lengths/charsets.
Documented benchmark results and benchmarking instructions in README.md.
Added benchmark-related dev dependencies and bench configuration in Cargo.toml (and updated Cargo.lock).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

File	Description
benches/my_benchmark.rs	New Criterion benchmark for `levenshtein()` over ASCII and UTF-8 inputs.
README.md	Adds a Performance section with reported timings and a Benchmarking section.
Cargo.toml	Adds `criterion` + `rand` dev-dependencies and declares the benchmark target.
Cargo.lock	Locks new dev-dependency transitive graph for benchmarking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-07T18:33:34Z

benches/my_benchmark.rs

+}
+
+impl rand::distr::Distribution<char> for AlphanumericChar {
+    fn sample<R: rand::Rng + ?std::marker::Sized>(&self, rng: &mut R) -> char {


The generic bound ?std::marker::Sized is invalid Rust syntax and will not compile. Use ?Sized (or just omit it, since R: Rng is Sized by default) in the sample signature.

Suggested change

fn sample<R: rand::Rng + ?std::marker::Sized>(&self, rng: &mut R) -> char {

fn sample<R: rand::Rng>(&self, rng: &mut R) -> char {

benches/my_benchmark.rs

README.md

Copilot

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-07T19:07:29Z

benches/my_benchmark.rs

+use rand;
+use std::hint::black_box;
+use string_distance;


use rand; and use string_distance; are unused here (crate names are already in the extern prelude, and you reference them via rand::... / string_distance::...). This will trigger unused-import warnings when running cargo bench. Remove these imports or import the specific items you need (e.g., rand::Rng).

Suggested change

use rand;

use std::hint::black_box;

use string_distance;

use std::hint::black_box;

Copilot · 2026-02-07T19:07:29Z

benches/my_benchmark.rs

+    let mut rng = rand::rng();
+
+    for length in lengths {
+        c.bench_function(
+            &format!(
+                "levenshtein for {charset} strings of length {length}",
+                charset = "ASCII",
+                length = length
+            ),
+            |b| {
+                b.iter_batched(
+                    || {
+                        return (
+                            generate_random_string(length, &mut rng, &ascii_chars),
+                            generate_random_string(length, &mut rng, &ascii_chars),
+                        );
+                    },
+                    |v| string_distance::levenshtein(black_box(&v.0), black_box(&v.1)),
+                    BatchSize::SmallInput,
+                )
+            },
+        );
+    }
+
+    for length in lengths {
+        c.bench_function(
+            &format!(
+                "levenshtein for {charset} strings of length {length}",
+                charset = "UTF8",
+                length = length
+            ),
+            |b| {
+                b.iter_batched(
+                    || {
+                        return (
+                            generate_random_string(length, &mut rng, &utf8_chars),
+                            generate_random_string(length, &mut rng, &utf8_chars),
+                        );
+                    },
+                    |v| string_distance::levenshtein(black_box(&v.0), black_box(&v.1)),
+                    BatchSize::SmallInput,
+                )


This uses iter_batched with random string generation in the setup closure; Criterion measures the whole loop, so the reported time includes string generation/allocation as well as levenshtein(). If the goal is to benchmark the distance algorithm, pre-generate inputs per length (or use a pool and clone/select) and benchmark only the levenshtein call.

Suggested change

let mut rng = rand::rng();

for length in lengths {

c.bench_function(

&format!(

"levenshtein for {charset} strings of length {length}",

charset = "ASCII",

length = length

),

|b| {

b.iter_batched(

|| {

return (

generate_random_string(length, &mut rng, &ascii_chars),

generate_random_string(length, &mut rng, &ascii_chars),

);

},

|v| string_distance::levenshtein(black_box(&v.0), black_box(&v.1)),

BatchSize::SmallInput,

)

},

);

}

for length in lengths {

c.bench_function(

&format!(

"levenshtein for {charset} strings of length {length}",

charset = "UTF8",

length = length

),

|b| {

b.iter_batched(

|| {

return (

generate_random_string(length, &mut rng, &utf8_chars),

generate_random_string(length, &mut rng, &utf8_chars),

);

},

|v| string_distance::levenshtein(black_box(&v.0), black_box(&v.1)),

BatchSize::SmallInput,

)

// Benchmark ASCII inputs

for length in lengths {

// Pre-generate a pool of input pairs so the benchmark measures only levenshtein()

let mut rng = rand::rng();

let sample_size = 1024usize;

let mut samples: Vec<(String, String)> = Vec::with_capacity(sample_size);

for _ in 0..sample_size {

samples.push((

generate_random_string(length, &mut rng, &ascii_chars),

generate_random_string(length, &mut rng, &ascii_chars),

));

}

c.bench_function(

&format!(

"levenshtein for {charset} strings of length {length}",

charset = "ASCII",

length = length

),

move |b| {

let mut idx = 0usize;

b.iter(|| {

let pair = &samples[idx];

idx += 1;

if idx == samples.len() {

idx = 0;

}

string_distance::levenshtein(black_box(&pair.0), black_box(&pair.1))

})

},

);

}

// Benchmark UTF-8 inputs

for length in lengths {

// Pre-generate a pool of input pairs so the benchmark measures only levenshtein()

let mut rng = rand::rng();

let sample_size = 1024usize;

let mut samples: Vec<(String, String)> = Vec::with_capacity(sample_size);

for _ in 0..sample_size {

samples.push((

generate_random_string(length, &mut rng, &utf8_chars),

generate_random_string(length, &mut rng, &utf8_chars),

));

}

c.bench_function(

&format!(

"levenshtein for {charset} strings of length {length}",

charset = "UTF8",

length = length

),

move |b| {

let mut idx = 0usize;

b.iter(|| {

let pair = &samples[idx];

idx += 1;

if idx == samples.len() {

idx = 0;

}

string_distance::levenshtein(black_box(&pair.0), black_box(&pair.1))

})

Copilot · 2026-02-07T19:07:29Z

benches/my_benchmark.rs

+    for length in lengths {
+        c.bench_function(
+            &format!(
+                "levenshtein for {charset} strings of length {length}",


For the "UTF8" benchmarks, length is the number of char scalar values generated, but the library’s UTF path operates on Unicode grapheme clusters (via unicode-segmentation). That means the label "strings of length {length}" may not reflect the actual units processed. Either constrain the generated characters to single-codepoint graphemes or adjust the label/documentation to clarify what "length" means.

Suggested change

"levenshtein for {charset} strings of length {length}",

"levenshtein for {charset} strings with {length} Unicode scalar values",

README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

zhiltsov-max added 2 commits February 7, 2026 20:22

Add benchmark

b54965b

Add performance metrics in readme

5bf8ecc

zhiltsov-max requested a review from Copilot February 7, 2026 18:29

Copilot started reviewing on behalf of zhiltsov-max February 7, 2026 18:30 View session

Fix ci

6f5b0e9

Copilot AI reviewed Feb 7, 2026

View reviewed changes

zhiltsov-max added 3 commits February 7, 2026 20:42

Fix number

f06ae1c

Refactor benchmark

0d83c4c

Update numbers

f8cd440

zhiltsov-max requested a review from Copilot February 7, 2026 19:03

Copilot started reviewing on behalf of zhiltsov-max February 7, 2026 19:03 View session

Copilot AI reviewed Feb 7, 2026

View reviewed changes

Update README.md

2373bc3

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

zhiltsov-max merged commit 2573801 into develop Feb 7, 2026
3 checks passed

zhiltsov-max deleted the zm/test-perf branch February 7, 2026 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance numbers#2

Add performance numbers#2
zhiltsov-max merged 7 commits intodevelopfrom
zm/test-perf

zhiltsov-max commented Feb 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	fn sample<R: rand::Rng + ?std::marker::Sized>(&self, rng: &mut R) -> char {
	fn sample<R: rand::Rng>(&self, rng: &mut R) -> char {

	"levenshtein for {charset} strings of length {length}",
	"levenshtein for {charset} strings with {length} Unicode scalar values",

Conversation

zhiltsov-max commented Feb 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants