feat: increase Drain tree depth from 4 to 30#24
Conversation
Matches Loki's default. Higher depth means the prefix tree routes more precisely, reducing the number of candidate clusters that need similarity comparison at leaf nodes. This improves both accuracy and performance for pattern detection.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the precision of log clustering by adjusting the Drain tree depth to match Loki's default configuration. By increasing the tree depth, the prefix tree can route log messages more accurately, which in turn reduces the number of candidate clusters that require similarity comparisons at the leaf nodes, leading to more efficient and precise log pattern identification. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request increases the Drain tree depth from 4 to 30, aligning it with Loki's default value. This is a sensible change to improve log pattern clustering precision. My review includes a suggestion to refactor the hardcoded parameters in NewDrainParser into named constants. This will improve code readability and maintainability, and also resolve an existing issue of duplicated code for delimiters within the package.
| func NewDrainParser() (*DrainParser, error) { | ||
| d, err := drain3.NewDrain( | ||
| drain3.WithDepth(4), | ||
| drain3.WithDepth(30), |
There was a problem hiding this comment.
While changing the depth, this is a good opportunity to address the magic numbers in this block of parameters. The values 30, 0.4, and the delimiter slice are hardcoded. Defining them as named, package-level constants/variables would improve readability and maintainability.
Additionally, the delimiter slice []string{"|", "=", ","} is duplicated in pkg/pattern/parser.go. This should be defined in one place to avoid inconsistencies.
I suggest defining these at the package level:
const (
// defaultDrainDepth matches Loki's default LogClusterDepth.
// Ref: https://github.com/grafana/loki/tree/main/pkg/pattern/drain
defaultDrainDepth = 30
defaultDrainSimTh = 0.4
)
// drainExtraDelimiters is also used in parser.go and should be the single source of truth.
var drainExtraDelimiters = []string{"|", "=", ","}Then, the NewDrain call would be cleaner and safer:
d, err := drain3.NewDrain(
drain3.WithDepth(defaultDrainDepth),
drain3.WithSimTh(defaultDrainSimTh),
drain3.WithExtraDelimiter(drainExtraDelimiters),
)This would also require updating pkg/pattern/parser.go to use the new shared drainExtraDelimiters variable.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48b478c13d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func NewDrainParser() (*DrainParser, error) { | ||
| d, err := drain3.NewDrain( | ||
| drain3.WithDepth(4), | ||
| drain3.WithDepth(30), |
There was a problem hiding this comment.
Revert Drain depth increase that fragments clusters
Increasing WithDepth from 4 to 30 causes Drain to branch on many more leading tokens before similarity matching, so messages that differ in an early non-numeric field (for example token 3–10) are routed to different leaves and never compared as one template. In this repo that is a functional regression because runDrain keeps only clusters with Count > 1 (cmd/lapp/workspace.go:311-314), so the extra singleton clusters are dropped from semantic labeling and pushed into unmatched output, reducing pattern recall for real logs.
Useful? React with 👍 / 👎.
Matches Loki's default (LogClusterDepth: 30). Higher depth means the prefix tree routes more precisely, reducing the number of candidate clusters that need similarity comparison at leaf nodes.
Ref: https://github.com/grafana/loki/tree/main/pkg/pattern/drain