Skip to content

Commit a63b67c

Browse files
committed
Merge branch 'pr-3232' into 3.8-dev
2 parents b688a06 + 366f92d commit a63b67c

File tree

1 file changed

+163
-0
lines changed

1 file changed

+163
-0
lines changed
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
////
2+
Licensed to the Apache Software Foundation (ASF) under one or more
3+
contributor license agreements. See the NOTICE file distributed with
4+
this work for additional information regarding copyright ownership.
5+
The ASF licenses this file to You under the Apache License, Version 2.0
6+
(the "License"); you may not use this file except in compliance with
7+
the License. You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
////
17+
18+
== *Specification for the declarative `match()` step*
19+
20+
This document outlines the specification for a new, declarative
21+
`match()` step in Gremlin. This step is designed to replace the existing
22+
imperative `match()` step, which has proven difficult for providers to
23+
optimize and for users to utilize effectively. The new step leverages
24+
familiar declarative graph pattern matching syntax to provide a more
25+
powerful, intuitive, and optimizable query experience.
26+
27+
=== 1. Motivation
28+
29+
The current `match()` step in Gremlin is imperative, which requires
30+
graph providers to translate complex traversal logic into an optimizable
31+
form, a notoriously difficult task. Consequently, its adoption is low,
32+
and its performance is often suboptimal.
33+
34+
The proposed solution introduces a new `match()` step that accepts a
35+
query string from a standard declarative graph query language. This
36+
allows the underlying database to use its native query planner and
37+
optimizer to execute the pattern match efficiently, while Gremlin
38+
retains its role in composing the declarative query with broader
39+
imperative traversals.
40+
41+
'''''
42+
43+
=== 2. Core Concept
44+
45+
The `match()` step introduces a declarative pattern matching clause into
46+
a Gremlin traversal. The variables bound within the pattern are not
47+
returned directly but are added to the traversal’s path history, making
48+
them accessible to subsequent steps like `select()`.
49+
50+
The `match()` step can be used as both a start step on a
51+
`GraphTraversalSource` and as a mid-traversal step.
52+
53+
=== 3. Declarative Language
54+
55+
The `match()` step is language-agnostic by design, but it will
56+
standardize on a default language to ensure portability.
57+
58+
* *Default Language:* A restricted, read-only subset of *GQL* will be
59+
the default language. This subset will primarily support `MATCH` and
60+
`WHERE` clauses. The `RETURN` clause will *not* be supported in the
61+
default implementation (see Section 6). A provider implementing the
62+
default GQL does not need to be specified via a modulator.
63+
** Example:
64+
`g.match("MATCH (p:Person WHERE p.name = 'Stephen')-[:knows]->(friend)")`
65+
* *Provider-Specific Languages:* Providers may support other declarative
66+
languages (e.g., Cypher, GSQL, SQL++) via the `queryLanguage` modulator.
67+
** Example: `g.match("...").with("queryLanguage", "GSQL")`
68+
69+
To aid vendors, the TinkerPop project should consider providing a
70+
reference ANTLR4 grammar for the default GQL dialect.
71+
72+
'''''
73+
74+
=== 4. Parameterization
75+
76+
To prevent query injection and improve performance by enabling query
77+
plan caching, parameterized queries are supported. Parameters are
78+
supplied using the existing `with()` modulator with a special key
79+
convention.
80+
81+
* *Convention:* A key in a `with()` modulator that is prefixed with a
82+
dollar sign (`$`) will be treated as a query parameter for the `match()`
83+
step. The prefix is removed to derive the parameter name.
84+
* *Example:*
85+
+
86+
[source,groovy]
87+
----
88+
g.match("MATCH (p:Person WHERE p.name = $personName)")
89+
.with("$personName", "Stephen")
90+
----
91+
92+
This approach unifies parameter handling, allowing parameters to be
93+
defined locally for a specific `match()` step or globally on the
94+
`GraphTraversalSource`. If parameters are not explicitly provided via
95+
`with()`, an implicit lookup on remote server bindings may be performed.
96+
97+
'''''
98+
99+
=== 5. Execution Semantics
100+
101+
The `match()` step behaves similarly to the `V()` and `E()` steps.
102+
103+
* *Start Step:* When used as a start step (`g.match(...)`), it executes
104+
the pattern match against the entire graph.
105+
* *Mid-Traversal Step:* When used mid-traversal
106+
(`g.V(1).out().match(...)`), it *does not* operate on the incoming
107+
traversers. Instead, like `V()`, it ``resets'' the traversal and
108+
executes its pattern match against the *entire graph for each input
109+
parameter*. The incoming traversers are disregarded.
110+
111+
The step is restricted to executing *idempotent (read-only)* queries.
112+
113+
'''''
114+
115+
=== 6. Return Value and Data Access
116+
117+
A key aspect of this new design is its seamless integration with
118+
Gremlin’s existing projection and path-management mechanisms.
119+
120+
* *`match()` Step Return Value:* The `match()` step itself does not emit
121+
any elements. It produces an `Optional.empty()`.
122+
* *Accessing Matched Data:* All variables bound in the `MATCH` clause
123+
(e.g., `(p)`, `(friend)` in the examples) are automatically bound to the
124+
traversal’s path history. This data is then accessed using the
125+
`select()` step. This design eliminates the need for a `RETURN` clause
126+
and allows the user to continue chaining Gremlin steps naturally.
127+
128+
==== *Example Flow*
129+
130+
Consider a query to find a person and then, using Gremlin, calculate the
131+
sum of the weights of their outgoing `:knows` edges.
132+
133+
*Query:*
134+
135+
[source,groovy]
136+
----
137+
g.match("MATCH (n:Person {name:'Cole'})-[e:knows]->(friend)")
138+
.select("e")
139+
.values("weight")
140+
.sum()
141+
----
142+
143+
*Execution Breakdown:*
144+
145+
[arabic]
146+
. `match("...")`: Finds all patterns matching a person named `Cole' with
147+
an outgoing `knows` edge. For each match, it binds the vertex `n`, the
148+
edge `e`, and the vertex `friend` to the path history. It returns
149+
`Optional.empty()`.
150+
. `select("e")`: It projects the element associated with the label `e`.
151+
The traversal stream now contains the `knows` edges.
152+
. `values("weight").sum()`: Standard Gremlin steps that operate on the
153+
stream of edges to calculate the sum of their `weight` properties.
154+
155+
This approach is elegant, highly composable, and allows providers to
156+
optimize the declarative query by analyzing subsequent `select()` steps
157+
to determine which variables are actually needed.
158+
159+
=== 6. Deprecation of Existing `match()` Step
160+
161+
The existing, imperative `match()` step will be marked as *deprecated*
162+
upon the introduction of this new specification and will be removed in a
163+
future major release of TinkerPop.

0 commit comments

Comments
 (0)