Skip to content

Microbenchmarks improvements and bug fixes#799

Merged
ankitaluthra1 merged 9 commits intofsspec:mainfrom
ankitaluthra1:benchmark-changes
Apr 9, 2026
Merged

Microbenchmarks improvements and bug fixes#799
ankitaluthra1 merged 9 commits intofsspec:mainfrom
ankitaluthra1:benchmark-changes

Conversation

@Mahalaxmibejugam
Copy link
Copy Markdown
Collaborator

This PR includes the following changes to the microbenchmarks suite:

  • Fix chunking in test_info_multi_threaded: Corrected the handling of paths in multi-threaded info benchmarks to ensure proper distribution across threads instead of passing a single tuple.

  • Add more files in info benchmarks: Increased the number of files in info benchmarks to provide a more rigorous performance test.

  • Remove sleep from rename benchmarks: Removed a sleep call that was added to work around a Long Running Operation (LRO) issue that has since been fixed.

  • Add more folders in rm benchmarks: Increased the number of folders in rm benchmarks to better measure performance under scale.

@Mahalaxmibejugam Mahalaxmibejugam changed the title Update micro benchmarks Microbenchmarks improvements and bug fixes Apr 1, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.44%. Comparing base (9d25f7c) to head (e2cdd1e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #799      +/-   ##
==========================================
+ Coverage   76.14%   76.44%   +0.29%     
==========================================
  Files          15       15              
  Lines        2679     2679              
==========================================
+ Hits         2040     2048       +8     
+ Misses        639      631       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- 131072
folders:
- 256
sample_size:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we sampling?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we created only 100 files and 100 folders in the bucket and called info on all 200 paths (files and folders). Now that I've modified benchmark to include 65k and 130k files, calling info on all 65k paths will not yield significantly more data points compared to what we would get by calling only 100 paths and would unnecessarily increase the benchmark's runtime.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seggregated scenarios for files and folders, so there is no need of sampling now. For file scenarios, 10k files are created and info is called on them. For folder scenarios, we are creating 65k files, 256 folders and calling info on all 256 folders.

scenarios:
- name: "delete_flat"
folders: [256]
folders: [1024, 2048, 4096]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of updating the folders i'd suggest create a new scenario with these options. This will impact the daily runs as it will take long time to create these as part of setup. So if you really want to run a daily trigger that compares large number of folders, better create different scenarios and trigger pointing to these scenarios.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just increasing the number of folders won't actually increase the setup time as we are not making explicit calls to create folders using mkdir but they are implicitly getting created during file creation.

But as we are increasing the scenarios from one (256) to three (1024, 2048, 4096), more scenarios will run now and hence delete benchmarks would take more time. However, I suggest keeping them because delete benchmark's latency has significant contribution from the number of folders, and we only observe latency differences in HNS and standard buckets at 2k and 4k folders.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest keeping the 256 for a while so that we can monitor its trends and compare it with other newly added configurations, removing 256 later on.

@Mahalaxmibejugam Mahalaxmibejugam requested a review from jasha26 April 3, 2026 10:52
scenarios:
- name: "delete_flat"
folders: [256]
folders: [1024, 2048, 4096]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest keeping the 256 for a while so that we can monitor its trends and compare it with other newly added configurations, removing 256 later on.

processes: [4, 8]

- name: "info_multi_process_deep"
- name: "info_multi_process_deep_folder"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will changing the name break our existing metrics? Is it fine to break?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have triggers created yet on these scenarios yet, so it won't break any existing metrics.

@Mahalaxmibejugam
Copy link
Copy Markdown
Collaborator Author

I would suggest keeping the 256 for a while so that we can monitor its trends and compare it with other newly added configurations, removing 256 later on.

We are not receiving significant data from the 256 folders use case, and we get similar data points from other folder scenarios.

We want to determine a few details from these benchmarks:

  1. HNS bucket latency increases with an increased number of folders, but Standard buckets remain unaffected.
  2. A comparison between HNS and Standard bucket latency with different folder counts.

We are able to achieve this with the current folder numbers. IMO, adding 256 folders as well just increases the benchmark run time while not giving us much data points.

Comment thread gcsfs/tests/perf/microbenchmarks/info/configs.yaml Outdated
@ankitaluthra1
Copy link
Copy Markdown
Collaborator

/gcbrun

@ankitaluthra1 ankitaluthra1 merged commit 64936ae into fsspec:main Apr 9, 2026
9 checks passed
@Mahalaxmibejugam Mahalaxmibejugam deleted the benchmark-changes branch April 9, 2026 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants