Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pkg/pattern/drain.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ type DrainParser struct {
// NewDrainParser creates a DrainParser with default Drain parameters.
func NewDrainParser() (*DrainParser, error) {
d, err := drain3.NewDrain(
drain3.WithDepth(4),
drain3.WithDepth(30),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While changing the depth, this is a good opportunity to address the magic numbers in this block of parameters. The values 30, 0.4, and the delimiter slice are hardcoded. Defining them as named, package-level constants/variables would improve readability and maintainability.

Additionally, the delimiter slice []string{"|", "=", ","} is duplicated in pkg/pattern/parser.go. This should be defined in one place to avoid inconsistencies.

I suggest defining these at the package level:

const (
	// defaultDrainDepth matches Loki's default LogClusterDepth.
	// Ref: https://github.com/grafana/loki/tree/main/pkg/pattern/drain
	defaultDrainDepth = 30
	defaultDrainSimTh = 0.4
)

// drainExtraDelimiters is also used in parser.go and should be the single source of truth.
var drainExtraDelimiters = []string{"|", "=", ","}

Then, the NewDrain call would be cleaner and safer:

d, err := drain3.NewDrain(
	drain3.WithDepth(defaultDrainDepth),
	drain3.WithSimTh(defaultDrainSimTh),
	drain3.WithExtraDelimiter(drainExtraDelimiters),
)

This would also require updating pkg/pattern/parser.go to use the new shared drainExtraDelimiters variable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Revert Drain depth increase that fragments clusters

Increasing WithDepth from 4 to 30 causes Drain to branch on many more leading tokens before similarity matching, so messages that differ in an early non-numeric field (for example token 3–10) are routed to different leaves and never compared as one template. In this repo that is a functional regression because runDrain keeps only clusters with Count > 1 (cmd/lapp/workspace.go:311-314), so the extra singleton clusters are dropped from semantic labeling and pushed into unmatched output, reducing pattern recall for real logs.

Useful? React with 👍 / 👎.

drain3.WithSimTh(0.4),
drain3.WithExtraDelimiter([]string{"|", "=", ","}),
)
Expand Down