|
| 1 | +//// |
| 2 | +Licensed to the Apache Software Foundation (ASF) under one or more |
| 3 | +contributor license agreements. See the NOTICE file distributed with |
| 4 | +this work for additional information regarding copyright ownership. |
| 5 | +The ASF licenses this file to You under the Apache License, Version 2.0 |
| 6 | +(the "License"); you may not use this file except in compliance with |
| 7 | +the License. You may obtain a copy of the License at |
| 8 | + |
| 9 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 10 | + |
| 11 | +Unless required by applicable law or agreed to in writing, software |
| 12 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 13 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 14 | +See the License for the specific language governing permissions and |
| 15 | +limitations under the License. |
| 16 | +//// |
| 17 | +
|
| 18 | +== *Specification for the declarative `match()` step* |
| 19 | +
|
| 20 | +This document outlines the specification for a new, declarative |
| 21 | +`match()` step in Gremlin. This step is designed to replace the existing |
| 22 | +imperative `match()` step, which has proven difficult for providers to |
| 23 | +optimize and for users to utilize effectively. The new step leverages |
| 24 | +familiar declarative graph pattern matching syntax to provide a more |
| 25 | +powerful, intuitive, and optimizable query experience. |
| 26 | +
|
| 27 | +=== 1. Motivation |
| 28 | +
|
| 29 | +The current `match()` step in Gremlin is imperative, which requires |
| 30 | +graph providers to translate complex traversal logic into an optimizable |
| 31 | +form, a notoriously difficult task. Consequently, its adoption is low, |
| 32 | +and its performance is often suboptimal. |
| 33 | +
|
| 34 | +The proposed solution introduces a new `match()` step that accepts a |
| 35 | +query string from a standard declarative graph query language. This |
| 36 | +allows the underlying database to use its native query planner and |
| 37 | +optimizer to execute the pattern match efficiently, while Gremlin |
| 38 | +retains its role in composing the declarative query with broader |
| 39 | +imperative traversals. |
| 40 | +
|
| 41 | +''''' |
| 42 | +
|
| 43 | +=== 2. Core Concept |
| 44 | +
|
| 45 | +The `match()` step introduces a declarative pattern matching clause into |
| 46 | +a Gremlin traversal. The variables bound within the pattern are not |
| 47 | +returned directly but are added to the traversal’s path history, making |
| 48 | +them accessible to subsequent steps like `select()`. |
| 49 | +
|
| 50 | +The `match()` step can be used as both a start step on a |
| 51 | +`GraphTraversalSource` and as a mid-traversal step. |
| 52 | +
|
| 53 | +=== 3. Declarative Language |
| 54 | +
|
| 55 | +The `match()` step is language-agnostic by design, but it will |
| 56 | +standardize on a default language to ensure portability. |
| 57 | +
|
| 58 | +* *Default Language:* A restricted, read-only subset of *GQL* will be |
| 59 | +the default language. This subset will primarily support `MATCH` and |
| 60 | +`WHERE` clauses. The `RETURN` clause will *not* be supported in the |
| 61 | +default implementation (see Section 6). A provider implementing the |
| 62 | +default GQL does not need to be specified via a modulator. |
| 63 | +** Example: |
| 64 | +`g.match("MATCH (p:Person WHERE p.name = 'Stephen')-[:knows]->(friend)")` |
| 65 | +* *Provider-Specific Languages:* Providers may support other declarative |
| 66 | +languages (e.g., Cypher, GSQL, SQL++) via the `queryLanguage` modulator. |
| 67 | +** Example: `g.match("...").with("queryLanguage", "GSQL")` |
| 68 | +
|
| 69 | +To aid vendors, the TinkerPop project should consider providing a |
| 70 | +reference ANTLR4 grammar for the default GQL dialect. |
| 71 | +
|
| 72 | +''''' |
| 73 | +
|
| 74 | +=== 4. Parameterization |
| 75 | +
|
| 76 | +To prevent query injection and improve performance by enabling query |
| 77 | +plan caching, parameterized queries are supported. Parameters are |
| 78 | +supplied using the existing `with()` modulator with a special key |
| 79 | +convention. |
| 80 | +
|
| 81 | +* *Convention:* A key in a `with()` modulator that is prefixed with a |
| 82 | +dollar sign (`$`) will be treated as a query parameter for the `match()` |
| 83 | +step. The prefix is removed to derive the parameter name. |
| 84 | +* *Example:* |
| 85 | ++ |
| 86 | +[source,groovy] |
| 87 | +---- |
| 88 | +g.match("MATCH (p:Person WHERE p.name = $personName)") |
| 89 | + .with("$personName", "Stephen") |
| 90 | +---- |
| 91 | +
|
| 92 | +This approach unifies parameter handling, allowing parameters to be |
| 93 | +defined locally for a specific `match()` step or globally on the |
| 94 | +`GraphTraversalSource`. If parameters are not explicitly provided via |
| 95 | +`with()`, an implicit lookup on remote server bindings may be performed. |
| 96 | +
|
| 97 | +''''' |
| 98 | +
|
| 99 | +=== 5. Execution Semantics |
| 100 | +
|
| 101 | +The `match()` step behaves similarly to the `V()` and `E()` steps. |
| 102 | +
|
| 103 | +* *Start Step:* When used as a start step (`g.match(...)`), it executes |
| 104 | +the pattern match against the entire graph. |
| 105 | +* *Mid-Traversal Step:* When used mid-traversal |
| 106 | +(`g.V(1).out().match(...)`), it *does not* operate on the incoming |
| 107 | +traversers. Instead, like `V()`, it ``resets'' the traversal and |
| 108 | +executes its pattern match against the *entire graph for each input |
| 109 | +parameter*. The incoming traversers are disregarded. |
| 110 | +
|
| 111 | +The step is restricted to executing *idempotent (read-only)* queries. |
| 112 | +
|
| 113 | +''''' |
| 114 | +
|
| 115 | +=== 6. Return Value and Data Access |
| 116 | +
|
| 117 | +A key aspect of this new design is its seamless integration with |
| 118 | +Gremlin’s existing projection and path-management mechanisms. |
| 119 | +
|
| 120 | +* *`match()` Step Return Value:* The `match()` step itself does not emit |
| 121 | +any elements. It produces an `Optional.empty()`. |
| 122 | +* *Accessing Matched Data:* All variables bound in the `MATCH` clause |
| 123 | +(e.g., `(p)`, `(friend)` in the examples) are automatically bound to the |
| 124 | +traversal’s path history. This data is then accessed using the |
| 125 | +`select()` step. This design eliminates the need for a `RETURN` clause |
| 126 | +and allows the user to continue chaining Gremlin steps naturally. |
| 127 | +
|
| 128 | +==== *Example Flow* |
| 129 | +
|
| 130 | +Consider a query to find a person and then, using Gremlin, calculate the |
| 131 | +sum of the weights of their outgoing `:knows` edges. |
| 132 | +
|
| 133 | +*Query:* |
| 134 | +
|
| 135 | +[source,groovy] |
| 136 | +---- |
| 137 | +g.match("MATCH (n:Person {name:'Cole'})-[e:knows]->(friend)") |
| 138 | + .select("e") |
| 139 | + .values("weight") |
| 140 | + .sum() |
| 141 | +---- |
| 142 | +
|
| 143 | +*Execution Breakdown:* |
| 144 | +
|
| 145 | +[arabic] |
| 146 | +. `match("...")`: Finds all patterns matching a person named `Cole' with |
| 147 | +an outgoing `knows` edge. For each match, it binds the vertex `n`, the |
| 148 | +edge `e`, and the vertex `friend` to the path history. It returns |
| 149 | +`Optional.empty()`. |
| 150 | +. `select("e")`: It projects the element associated with the label `e`. |
| 151 | +The traversal stream now contains the `knows` edges. |
| 152 | +. `values("weight").sum()`: Standard Gremlin steps that operate on the |
| 153 | +stream of edges to calculate the sum of their `weight` properties. |
| 154 | +
|
| 155 | +This approach is elegant, highly composable, and allows providers to |
| 156 | +optimize the declarative query by analyzing subsequent `select()` steps |
| 157 | +to determine which variables are actually needed. |
| 158 | +
|
| 159 | +=== 6. Deprecation of Existing `match()` Step |
| 160 | +
|
| 161 | +The existing, imperative `match()` step will be marked as *deprecated* |
| 162 | +upon the introduction of this new specification and will be removed in a |
| 163 | +future major release of TinkerPop. |
0 commit comments