Skip to content

feat(clp-s::log_converter): Re-use the same log_surgeon::BufferParser for converting each file.#2284

Merged
gibber9809 merged 3 commits into
y-scope:mainfrom
gibber9809:reuse-log-parser
May 21, 2026
Merged

feat(clp-s::log_converter): Re-use the same log_surgeon::BufferParser for converting each file.#2284
gibber9809 merged 3 commits into
y-scope:mainfrom
gibber9809:reuse-log-parser

Conversation

@gibber9809
Copy link
Copy Markdown
Contributor

@gibber9809 gibber9809 commented May 15, 2026

Description

This PR modifies the LogConverter class to have it own an instance of log_surgeon::BufferParser that is created during initialization, and re-used to convert each input file. The previous implementation created a new log_surgeon::BufferParser on each invocation of convert_file, which adds quite a bit of overhead when converting many small files.

Conversion of the hive-24hr dataset goes from about 10 minutes down to about 2 minutes on an old xeon server I tested this change on.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Manually validated that conversion is significantly faster when ingesting many small files
  • Validated that integration tests still pass

Summary by CodeRabbit

Release Notes

  • Refactor
    • Improved internal log converter initialization by centralizing object construction and streamlining the parsing workflow.

Review Change Stack

@gibber9809 gibber9809 requested a review from a team as a code owner May 15, 2026 19:02
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Walkthrough

LogConverter is refactored to centralize log_surgeon::Schema and BufferParser construction through a new factory method. The constructor now accepts a BufferParser parameter and stores it as a member, replacing local variable construction patterns in convert_file. All parser references are updated to use the persistent m_parser member.

Changes

LogConverter Parser Factory Refactoring

Layer / File(s) Summary
Header contract update for parser member
components/core/src/clp_s/log_converter/LogConverter.hpp
LogConverter header includes BufferParser.hpp, constructor signature changes to accept a BufferParser parameter, factory method documentation is expanded, and a new private m_parser member is introduced to hold the parser instance.
Factory implementation and convert_file refactoring
components/core/src/clp_s/log_converter/LogConverter.cpp
LogConverter::create factory method constructs a Schema with delimiters and timestamp variable, builds a BufferParser from the schema, and delegates to the updated constructor; convert_file reset logic uses m_parser member for parse_next_event and log event view access instead of local variables.
Call site update to use factory method
components/core/src/clp_s/log_converter/log_converter.cpp
convert_files function instantiates LogConverter through the new create factory method instead of direct construction.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: re-using a single BufferParser across file conversions rather than creating a new one for each file.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao requested a review from LinZhihao-723 May 15, 2026 19:23
Copy link
Copy Markdown
Member

@davidlion davidlion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm other than the optional nit.

Comment on lines +78 to 81
log_surgeon::BufferParser m_parser;
ystdlib::containers::Array<char> m_buffer;
size_t m_num_bytes_buffered{};
size_t m_parser_offset{};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think clp follows the alphabetical rule from https://docs.yscope.com/dev-guide/contrib-guides-general.html very strictly, but technically this does violate it.
I'll leave it up to you if you want to fix it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll leave this one as is since it leaves the members grouped together in a more logical way, but will try to keep this rule in mind.

@gibber9809 gibber9809 merged commit fee2b8d into y-scope:main May 21, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants