A common query pattern in Cypher benchmarks is to count the number of distinct nodes and return them during projection.
Issue
The following query works in other graph systems that support Cypher (Neo4j, Kuzu, Ladybug), but are fail at the parsing stage.
MATCH (p:Person)-[:workAt]->(o:Organisation)
RETURN COUNT(DISTINCT p.id) AS num_e, o.id
ORDER BY num_e DESC
LIMIT 1
This fails:
Error: ValueError: Cypher parse error at position 74: Unexpected input after query: (DISTINCT p.id) AS num_e, o.id
ORDER BY num_e DESC
LIMIT 1
The workaround to this is to attach it to a WITH clause in Cypher, but that also doesn't work in lance-graph (until we have a new release) because of #102, and fails as shown below.
MATCH (p:Person)-[:workAt]->(o:Organisation)
WITH DISTINCT p.id AS pid, o.id AS oid
RETURN COUNT(pid) AS num_e, oid
ORDER BY num_e DESC
LIMIT 1
Returns:
Error: ValueError: Cypher parse error at position 0: Failed to parse Cypher query: Parsing Error: Error { input: "WITH DISTINCT p.id AS pid, o.id AS oid\n RETURN COUNT(pid) AS num_e, oid\n ORDER BY num_e DESC\n LIMIT 1\n ", code: Tag }
Script to repro
Here's a minimal script to repro:
from __future__ import annotations
import pyarrow as pa
from lance_graph import CypherQuery, GraphConfig
def main() -> None:
# Minimal in-memory graph: Persons workAt Organisations.
persons = pa.table({"id": [1, 2]})
orgs = pa.table({"id": [10, 11], "type": ["company", "company"]})
work_at = pa.table({"src": [1, 2, 1], "dst": [10, 10, 11]})
cfg = (
GraphConfig.builder()
.with_node_label("Person", "id")
.with_node_label("Organisation", "id")
.with_relationship("workAt", "src", "dst")
.build()
)
datasets = {
"Person": persons,
"Organisation": orgs,
"workAt": work_at,
}
query = """
MATCH (p:Person)-[:workAt]->(o:Organisation)
RETURN COUNT(DISTINCT p.id) AS num_e, o.id
ORDER BY num_e DESC
LIMIT 1
"""
print(query)
try:
result = CypherQuery(query).with_config(cfg).execute(datasets)
print(result)
except Exception as exc:
print(f"Error: {type(exc).__name__}: {exc}")
if __name__ == "__main__":
main()
Expectation
Counting the number of distinct nodes via the above pattern is essential for some upcoming LDBC benchmarks I plan to run in lance-graph, I think this would be a great addition to the query parser's repertoire, and would really appreciate if this particular issue could be prioritized so that we can expand on the benchmarks we test with lance-graph to draw more community members in. Thank you!
cc @ChunxuTang @beinan
A common query pattern in Cypher benchmarks is to count the number of distinct nodes and return them during projection.
Issue
The following query works in other graph systems that support Cypher (Neo4j, Kuzu, Ladybug), but are fail at the parsing stage.
This fails:
The workaround to this is to attach it to a
WITHclause in Cypher, but that also doesn't work in lance-graph (until we have a new release) because of #102, and fails as shown below.Returns:
Script to repro
Here's a minimal script to repro:
Expectation
Counting the number of distinct nodes via the above pattern is essential for some upcoming LDBC benchmarks I plan to run in lance-graph, I think this would be a great addition to the query parser's repertoire, and would really appreciate if this particular issue could be prioritized so that we can expand on the benchmarks we test with lance-graph to draw more community members in. Thank you!
cc @ChunxuTang @beinan