Skip to content

Commit 789f467

Browse files
committed
Ignore sub-fields with warninigs and keep only parent.
1 parent 65eb675 commit 789f467

File tree

3 files changed

+68
-9
lines changed

3 files changed

+68
-9
lines changed

docs/index.asciidoc

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -304,8 +304,12 @@ NOTE: If your index has a mapping with sub-objects where `status.code` and `stat
304304
===== Conflict on multi-fields
305305

306306
ES|QL query fetches all parent and sub-fields fields if your {es} index has https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/multi-fields[multi-fields] or https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/subobjects[subobjects].
307-
Since {ls} events cannot contain parent field's concrete value and sub-field values together, the plugin cannot map the result to {ls} event and produces `_elasticsearch_input_failure` tagged failed event.
308-
We recommend using the `RENAME` (or `DROP`) keyword in your ES|QL query explicitly rename the fields to overcome this issue.
307+
Since {ls} events cannot contain parent field's concrete value and sub-field values together, the plugin ignores sub-fields with warning and includes parent.
308+
We recommend using the `RENAME` (or `DROP` to avoid warnings) keyword in your ES|QL query explicitly rename the fields to include sub-fields into the event.
309+
310+
This a common occurrence if your template or mapping follows the pattern of always indexing strings as "text" (`field`) + " keyword" (`field.keyword`) multi-field.
311+
In this case it's recommended to do `KEEP field` if the string is identical and there is only one subfield as the engine will optimize and retrieve the keyword, otherwise you can do `KEEP field.keyword | RENAME field.keyword as field`.
312+
309313
To illustrate the situation with example, assuming your mapping has a time `time` field with `time.min` and `time.max` sub-fields as following:
310314
[source, ruby]
311315
"properties": {
@@ -315,8 +319,6 @@ To illustrate the situation with example, assuming your mapping has a time `time
315319
}
316320

317321
The ES|QL result will contain all three fields but the plugin cannot map them into {ls} event.
318-
319-
This a common occurence if your template or mapping follows the pattern of always indexing strings as "text" (`field`) + " keyword" (`field.keyword`) multi-field. In this case it's recommended to do `KEEP field` if the string is identical and there is only one subfield as the engine will optimize and retrieve the keyword, otherwise you can do `KEEP field.keyword | RENAME field.keyword as field` .
320322
To avoid this, you can use the `RENAME` keyword to rename the `time` parent field to get all three fields with unique fields.
321323
[source, ruby]
322324
...

lib/logstash/inputs/elasticsearch/esql.rb

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ def initialize(client, plugin)
2222

2323
target_field = plugin.params["target"]
2424
if target_field
25-
def self.apply_target(path) = "[#{target_field}][#{path}]"
25+
def self.apply_target(path); "[#{target_field}][#{path}]"; end
2626
else
27-
def self.apply_target(path) = path
27+
def self.apply_target(path); path; end
2828
end
2929

3030
@query = plugin.params["query"]
@@ -77,11 +77,16 @@ def retryable(job_name, &block)
7777
# @param output_queue [Queue] The queue to push processed events to
7878
def process_response(columns, values, output_queue)
7979
column_specs = columns.map { |column| ColumnSpec.new(column) }
80+
sub_element_mark_map = mark_sub_elements(column_specs)
81+
multi_fields = sub_element_mark_map.filter_map { |key, val| key.name if val == true }
82+
logger.warn("Multi-fields found in ES|QL result and they will not be available in the event. Please use `RENAME` command if you want to include them.", { :detected_multi_fields => multi_fields }) if multi_fields.any?
83+
8084
values.each do |row|
8185
event = column_specs.zip(row).each_with_object(LogStash::Event.new) do |(column, value), event|
8286
# `unless value.nil?` is a part of `drop_null_columns` that if some of columns' values are not `nil`, `nil` values appear
8387
# we should continuously filter out them to achieve full `drop_null_columns` on each individual row (ideal `LIMIT 1` result)
84-
unless value.nil?
88+
# we also exclude sub-elements of main field
89+
if value && sub_element_mark_map[column] == false
8590
field_reference = apply_target(column.field_reference)
8691
event.set(field_reference, ESQL_PARSERS_BY_TYPE[column.type].call(value))
8792
end
@@ -95,6 +100,30 @@ def process_response(columns, values, output_queue)
95100
output_queue << failed_event
96101
end
97102
end
103+
104+
# Determines whether each column in a collection is a nested sub-element (example "user.age")
105+
# of another column in the same collection (example "user").
106+
#
107+
# @param columns [Array<ColumnSpec>] An array of objects with a `name` attribute representing field paths.
108+
# @return [Hash<ColumnSpec, Boolean>] A hash mapping each column to `true` if it is a sub-element of another field, `false` otherwise.
109+
# Time complexity: (O(NlogN+N*K)) where K is the number of conflict depth
110+
# without (`prefix_set`) memoization, it would be O(N^2)
111+
def mark_sub_elements(columns)
112+
# Sort columns by name length (ascending)
113+
sorted_columns = columns.sort_by { |c| c.name.length }
114+
prefix_set = Set.new # memoization set
115+
116+
sorted_columns.each_with_object({}) do |column, memo|
117+
# Split the column name into parts (e.g., "user.profile.age" → ["user", "profile", "age"])
118+
parts = column.name.split('.')
119+
120+
# Generate all possible parent prefixes (e.g., "user", "user.profile")
121+
# and check if any parent prefix exists in the set
122+
parent_prefixes = (0...parts.size - 1).map { |i| parts[0..i].join('.') }
123+
memo[column] = parent_prefixes.any? { |prefix| prefix_set.include?(prefix) }
124+
prefix_set.add(column.name)
125+
end
126+
end
98127
end
99128

100129
# Class representing a column specification in the ESQL response['columns']

spec/inputs/elasticsearch_esql_spec.rb

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@
5454
before do
5555
allow(esql_executor).to receive(:retryable).and_yield
5656
allow(client).to receive_message_chain(:esql, :query).and_return(response)
57-
allow(plugin).to receive(:decorate_event)
5857
end
5958

6059
it "executes the ESQL query and processes the results" do
@@ -87,7 +86,6 @@
8786
before do
8887
allow(esql_executor).to receive(:retryable).and_yield
8988
allow(client).to receive_message_chain(:esql, :query).and_return(response)
90-
allow(plugin).to receive(:decorate_event)
9189
allow(response).to receive(:headers).and_return({})
9290
end
9391

@@ -125,6 +123,36 @@
125123
end
126124
end
127125
end
126+
127+
context "when sub-elements occur in the result" do
128+
let(:response) { {
129+
'values' => [[50, 1, 100], [50, 0, 1000], [50, 9, 99999]],
130+
'columns' =>
131+
[
132+
{ 'name' => 'time', 'type' => 'long' },
133+
{ 'name' => 'time.min', 'type' => 'long' },
134+
{ 'name' => 'time.max', 'type' => 'long' },
135+
]
136+
} }
137+
138+
before do
139+
allow(esql_executor).to receive(:retryable).and_yield
140+
allow(client).to receive_message_chain(:esql, :query).and_return(response)
141+
allow(response).to receive(:headers).and_return({})
142+
end
143+
144+
it "includes 1st depth elements into event" do
145+
esql_executor.do_run(output_queue, plugin_config["query"])
146+
147+
expect(output_queue.size).to eq(3)
148+
3.times do
149+
event = output_queue.pop
150+
expect(event.get('time')).to eq(50)
151+
expect(event.get('[time][min]')).to eq(nil)
152+
expect(event.get('[time][max]')).to eq(nil)
153+
end
154+
end
155+
end
128156
end
129157

130158
describe "#column spec" do

0 commit comments

Comments
 (0)