Microbenchmarks improvements and bug fixes by Mahalaxmibejugam · Pull Request #799 · fsspec/gcsfs

Mahalaxmibejugam · 2026-04-01T08:45:19Z

This PR includes the following changes to the microbenchmarks suite:

Fix chunking in test_info_multi_threaded: Corrected the handling of paths in multi-threaded info benchmarks to ensure proper distribution across threads instead of passing a single tuple.
Add more files in info benchmarks: Increased the number of files in info benchmarks to provide a more rigorous performance test.
Remove sleep from rename benchmarks: Removed a sleep call that was added to work around a Long Running Operation (LRO) issue that has since been fixed.
Add more folders in rm benchmarks: Increased the number of folders in rm benchmarks to better measure performance under scale.

codecov · 2026-04-01T10:01:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.44%. Comparing base (9d25f7c) to head (e2cdd1e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #799      +/-   ##
==========================================
+ Coverage   76.14%   76.44%   +0.29%     
==========================================
  Files          15       15              
  Lines        2679     2679              
==========================================
+ Hits         2040     2048       +8     
+ Misses        639      631       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jasha26 · 2026-04-01T14:00:10Z

+    - 131072
  folders:
+    - 256
+  sample_size:


why are we sampling?

Previously, we created only 100 files and 100 folders in the bucket and called info on all 200 paths (files and folders). Now that I've modified benchmark to include 65k and 130k files, calling info on all 65k paths will not yield significantly more data points compared to what we would get by calling only 100 paths and would unnecessarily increase the benchmark's runtime.

Seggregated scenarios for files and folders, so there is no need of sampling now. For file scenarios, 10k files are created and info is called on them. For folder scenarios, we are creating 65k files, 256 folders and calling info on all 256 folders.

jasha26 · 2026-04-01T14:41:44Z

 scenarios:
  - name: "delete_flat"
-    folders: [256]
+    folders: [1024, 2048, 4096]


Instead of updating the folders i'd suggest create a new scenario with these options. This will impact the daily runs as it will take long time to create these as part of setup. So if you really want to run a daily trigger that compares large number of folders, better create different scenarios and trigger pointing to these scenarios.

Just increasing the number of folders won't actually increase the setup time as we are not making explicit calls to create folders using mkdir but they are implicitly getting created during file creation.

But as we are increasing the scenarios from one (256) to three (1024, 2048, 4096), more scenarios will run now and hence delete benchmarks would take more time. However, I suggest keeping them because delete benchmark's latency has significant contribution from the number of folders, and we only observe latency differences in HNS and standard buckets at 2k and 4k folders.

I would suggest keeping the 256 for a while so that we can monitor its trends and compare it with other newly added configurations, removing 256 later on.

zhixiangli · 2026-04-08T05:27:12Z

 scenarios:
  - name: "delete_flat"
-    folders: [256]
+    folders: [1024, 2048, 4096]


I would suggest keeping the 256 for a while so that we can monitor its trends and compare it with other newly added configurations, removing 256 later on.

zhixiangli · 2026-04-08T05:28:12Z

    processes: [4, 8]

-  - name: "info_multi_process_deep"
+  - name: "info_multi_process_deep_folder"


Will changing the name break our existing metrics? Is it fine to break?

We don't have triggers created yet on these scenarios yet, so it won't break any existing metrics.

Mahalaxmibejugam · 2026-04-08T05:55:14Z

I would suggest keeping the 256 for a while so that we can monitor its trends and compare it with other newly added configurations, removing 256 later on.

We are not receiving significant data from the 256 folders use case, and we get similar data points from other folder scenarios.

We want to determine a few details from these benchmarks:

HNS bucket latency increases with an increased number of folders, but Standard buckets remain unaffected.
A comparison between HNS and Standard bucket latency with different folder counts.

We are able to achieve this with the current folder numbers. IMO, adding 256 folders as well just increases the benchmark run time while not giving us much data points.

ankitaluthra1 · 2026-04-09T04:21:40Z

/gcbrun

Mahalaxmibejugam added 4 commits April 1, 2026 04:49

add more folders in rm benchmarks

b3d4c73

remove the sleep from rename benchmarks as the LRO issue is fixed

78109e8

add more number of files in info benchmarks

8cde383

Fix chunking in test_info_multi_threaded

c291d9d

Mahalaxmibejugam changed the title ~~Update micro benchmarks~~ Microbenchmarks improvements and bug fixes Apr 1, 2026

fix test and address comment from gemini review

12ba9b6

jasha26 reviewed Apr 1, 2026

View reviewed changes

Mahalaxmibejugam added 2 commits April 2, 2026 15:18

update info configs

9034422

Fix test_info_configurator to match new config style

072c124

Mahalaxmibejugam requested a review from jasha26 April 3, 2026 10:52

zhixiangli reviewed Apr 8, 2026

View reviewed changes

jasha26 reviewed Apr 8, 2026

View reviewed changes

Comment thread gcsfs/tests/perf/microbenchmarks/info/configs.yaml Outdated

jasha26 approved these changes Apr 8, 2026

View reviewed changes

Mahalaxmibejugam added 2 commits April 8, 2026 14:59

Remove default configurations from info/configs.yaml

412b621

Merge remote-tracking branch 'upstream/main' into benchmark-changes

e2cdd1e

ankitaluthra1 merged commit 64936ae into fsspec:main Apr 9, 2026
9 checks passed

Mahalaxmibejugam deleted the benchmark-changes branch April 9, 2026 10:12

Conversation

Mahalaxmibejugam commented Apr 1, 2026

Uh oh!

codecov bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mahalaxmibejugam commented Apr 8, 2026

Uh oh!

Uh oh!

ankitaluthra1 commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Apr 1, 2026 •

edited

Loading