From 45d81cff3c95481a722d9a66bf75e491019d96a1 Mon Sep 17 00:00:00 2001 From: Aitor Miralles i Ruano Date: Thu, 14 May 2026 08:29:41 +0200 Subject: [PATCH] Add BigQuery Graph basics skill --- skills/cloud/bigquery-graph-basics/SKILL.md | 60 +++++++++++++ .../references/gql_syntax.md | 89 +++++++++++++++++++ .../references/schema_design.md | 45 ++++++++++ .../references/visualization_guide.md | 68 ++++++++++++++ 4 files changed, 262 insertions(+) create mode 100644 skills/cloud/bigquery-graph-basics/SKILL.md create mode 100644 skills/cloud/bigquery-graph-basics/references/gql_syntax.md create mode 100644 skills/cloud/bigquery-graph-basics/references/schema_design.md create mode 100644 skills/cloud/bigquery-graph-basics/references/visualization_guide.md diff --git a/skills/cloud/bigquery-graph-basics/SKILL.md b/skills/cloud/bigquery-graph-basics/SKILL.md new file mode 100644 index 0000000000..e78babf8f0 --- /dev/null +++ b/skills/cloud/bigquery-graph-basics/SKILL.md @@ -0,0 +1,60 @@ +--- +name: bigquery-graph-basics +description: Use for creating and managing BigQuery graphs, writing Graph Query Language (GQL) queries, optimizing graph schemas, and visualizing graph results in BigQuery Studio or external tools. +--- + +# BigQuery Graph Basics + +BigQuery Graph lets you use the analytical power of BigQuery to perform graph analysis on a large scale. When you model your data as a graph with nodes and edges, you can use Graph Query Language (GQL) to find complex, hidden relationships between data points that would be challenging to find using SQL. + +You can create node and edge tables directly from tables or views that store entities and relationships between entities. You don't need to modify your existing workflows or replicate your data to use it in graph queries. + +BigQuery Graph supports a graph query interface compatible with the ISO GQL standard and the ISO Property Graph Queries (SQL/PGQ) standard. This provides you with interoperability between relational and graph models by combining well-established SQL capabilities with the expressiveness of graph pattern matching. + +## Core Workflows + +### 1. Creating a Property Graph +When asked to set up a graph, follow these steps: +- Identify node and edge tables. +- Define keys and relationships. +- Use `CREATE PROPERTY GRAPH`. +- **Reference**: See [schema_design.md](references/schema_design.md) for DDL patterns and best practices. + +### 2. Querying with GQL +When asked to perform graph queries: +- **Direct GQL Syntax (Preferred)**: Use the top-level `GRAPH` statement. +- Formulate patterns using ASCII-art syntax `(n)-[e]->(m)`. +- Use `MATCH`, `WHERE`, and `RETURN` clauses. +- **Reference**: See [gql_syntax.md](references/gql_syntax.md) for detailed syntax and example queries. + +### 3. Optimization and Best Practices +- Advise on clustering underlying tables by keys. +- Recommend bounding variable-length paths (e.g., `*1..5`) to avoid performance issues. +- **Reference**: See [schema_design.md](references/schema_design.md) for performance tips. + +### 4. Visualizing Results +- **BigQuery Studio**: Results MUST be returned using `TO_JSON` for the Graph tab to function correctly. +- Provide Python snippets for `pyvis` or `networkx` for custom visualizations. +- **Reference**: See [visualization_guide.md](references/visualization_guide.md) for tools and interactive examples. + +## Quick Start Examples + +### Define a Social Graph +```sql +CREATE PROPERTY GRAPH `my_dataset.social_graph` +NODE TABLES ( `my_dataset.users` KEY (uid) LABEL User ) +EDGE TABLES ( `my_dataset.follows` SOURCE KEY (follower) REFERENCES users (uid) DESTINATION KEY (followed) REFERENCES users (uid) LABEL Follows ); +``` + +### Direct GQL Query for Visualization +```sql +GRAPH `my_dataset.social_graph` +MATCH (n)-[e]->(m) +RETURN TO_JSON(n) as source, TO_JSON(e) as edge, TO_JSON(m) as target +LIMIT 100 +``` + +## Important Notes +- Always remind the user that graphs and tables must be in the same region. +- Property graphs are logical views; updates to tables are immediately visible. +- Avoid `GRAPH_TABLE` unless specifically needing to JOIN graph results with standard SQL tables. diff --git a/skills/cloud/bigquery-graph-basics/references/gql_syntax.md b/skills/cloud/bigquery-graph-basics/references/gql_syntax.md new file mode 100644 index 0000000000..2c9f31d46e --- /dev/null +++ b/skills/cloud/bigquery-graph-basics/references/gql_syntax.md @@ -0,0 +1,89 @@ +# BigQuery Graph GQL Syntax Reference + +This reference covers the Graph Query Language (GQL) supported by BigQuery. + +## Direct GQL Execution (Recommended) +GQL queries should be executed directly as top-level statements in BigQuery. This is the modern and preferred way to interact with graphs. + +### Visualization Syntax +To leverage the **Graph tab** in BigQuery Studio, you must return graph entities using `TO_JSON`. + +```sql +GRAPH project.dataset.graph_name +MATCH (n)-[e]->(m) +RETURN TO_JSON(n) AS node_a, TO_JSON(e) AS edge, TO_JSON(m) AS node_b +LIMIT 100 +``` + +## GQL Core Clauses + +### MATCH +Used to specify the graph pattern to search for. +- **Node pattern**: `(variable:Label {property: value})` +- **Edge pattern**: `-[variable:Label]->`, `<-[variable:Label]-`, `-[variable:Label]-` +- **Relationship**: `(n1)-[e]->(n2)` + +### WHERE +Filters nodes, edges, or paths. +- `WHERE n.age > 21` +- `WHERE e.weight >= 0.5` + +### RETURN +Specifies the elements to return in the result set. +- `RETURN n.name AS name, e.type AS type` (Standard tabular result) +- `RETURN TO_JSON(n)` (Required for Graph visualization tab) + +### NEXT +Used to chain multiple `MATCH` patterns. + +## Advanced Patterns + +### Variable-Length Paths (Quantified Path Patterns) +BigQuery GQL uses **Standard GQL Quantified Path Patterns**. Variable-length paths are defined by wrapping a pattern in parentheses followed by a quantifier like `{min, max}`. + +**Note**: In quantified path patterns, variables within the parentheses (like `e` and `m` below) become **ARRAYS** of nodes/edges. + +- **Length 1 to 3**: `MATCH (n1) ( -[e]-> (n2) ){1,3}` +- **Fixed length 3**: `MATCH (n1) ( -[e]-> (n2) ){3}` +- **Unbounded (at least 1)**: `MATCH (n1) ( -[e]-> (n2) )+` + +### Filtering and Returning Path Arrays +When using quantified paths, use array functions to filter or access specific hops: + +```sql +GRAPH project.dataset.graph +MATCH (src:Entity) ( -[e]-> (dest:Entity) ){1,5} +WHERE src.name = 'START_NODE' + AND dest[OFFSET(ARRAY_LENGTH(dest)-1)].type = 'TABLE' -- Filter last node +RETURN TO_JSON(src), TO_JSON(e), TO_JSON(dest) +``` + +### Multiple Matches +```gql +GRAPH project.dataset.graph +MATCH (a)-[:Knows]->(b) +MATCH (b)-[:Knows]->(c) +RETURN a.name, c.name +``` + +## Example Queries + +### Finding Mutual Friends (Visualization Ready) +```sql +GRAPH `my_project.my_dataset.social_graph` +MATCH (u1:User)-[e1:Friend]->(common:User)<-[e2:Friend]-(u2:User) +WHERE u1.user_id = 1 AND u2.user_id = 2 +RETURN TO_JSON(u1), TO_JSON(e1), TO_JSON(common), TO_JSON(e2), TO_JSON(u2) +``` + +## Integration with SQL (GRAPH_TABLE) +**Note**: Only use `GRAPH_TABLE` if you need to JOIN graph results with standard BigQuery tables. For exploration and visualization, use the `GRAPH` statement directly. + +```sql +SELECT * FROM GRAPH_TABLE( + `project.dataset.graph_name` + MATCH (n)-[e]->(m) + RETURN n.name AS source, m.name AS target + COLUMNS(source, target) +) +``` diff --git a/skills/cloud/bigquery-graph-basics/references/schema_design.md b/skills/cloud/bigquery-graph-basics/references/schema_design.md new file mode 100644 index 0000000000..f8fb8c76d1 --- /dev/null +++ b/skills/cloud/bigquery-graph-basics/references/schema_design.md @@ -0,0 +1,45 @@ +# BigQuery Graph Schema Design Guide + +This guide covers the creation and optimization of property graphs in BigQuery. + +## Creating a Property Graph (DDL) + +The `CREATE PROPERTY GRAPH` statement defines the logical graph over existing tables. + +```sql +CREATE PROPERTY GRAPH `project.dataset.graph_name` +NODE TABLES ( + `project.dataset.nodes_table` + KEY (id_column) + LABEL MyLabel + PROPERTIES (col1, col2) -- Optional: list specific columns or use PROPERTIES ALL +) +EDGE TABLES ( + `project.dataset.edges_table` + KEY (edge_id) + SOURCE KEY (from_id) REFERENCES nodes_table (id_column) + DESTINATION KEY (to_id) REFERENCES nodes_table (id_column) + LABEL MyRelationship + PROPERTIES ALL +); +``` + +## Schema Best Practices + +### 1. Data Modeling +- **Entities as Nodes**: Any object with a unique identity should be a node. +- **Relationships as Edges**: Any interaction or connection between entities should be an edge. +- **Properties vs. Edges**: Use properties for metadata (e.g., `user.signup_date`). Use edges for structural connections (e.g., `user -[Purchased]-> product`). + +### 2. Performance Optimization +- **Clustering**: Cluster the underlying node and edge tables by their keys (IDs, Source IDs, Destination IDs). This significantly improves the performance of `GRAPH_TABLE` traversals. +- **Partitioning**: If your data has a temporal component (e.g., transaction logs), partition underlying tables by date. +- **Key Uniqueness**: Ensure keys are unique and non-null in the underlying tables. BigQuery Graph assumes integrity; violations can lead to incorrect results or query failures. + +### 3. Logical Structure +- **Labels**: Use descriptive labels (e.g., `Customer`, `Order`, `LineItem`). A table can be mapped to multiple labels if needed. +- **Reusability**: You can define multiple graphs over the same set of underlying tables for different use cases. + +## Limitations +- Graphs and underlying tables must reside in the same region. +- Property graphs are logical views; they do not store a separate copy of the data. diff --git a/skills/cloud/bigquery-graph-basics/references/visualization_guide.md b/skills/cloud/bigquery-graph-basics/references/visualization_guide.md new file mode 100644 index 0000000000..a00dda54f1 --- /dev/null +++ b/skills/cloud/bigquery-graph-basics/references/visualization_guide.md @@ -0,0 +1,68 @@ +# BigQuery Graph Visualization Guide + +Visualizing graphs helps in identifying clusters, influencers, and hidden patterns. + +## Integrated Visualization: BigQuery Studio + +BigQuery Studio provides a built-in graph explorer. To leverage the interactive **Graph** tab, results must be returned as JSON objects using the `TO_JSON` function within a direct `GRAPH` statement. + +### Optimized Visualization Query +To see nodes and edges correctly in the Graph tab, use the following syntax: + +```sql +GRAPH `project.dataset.graph_name` +MATCH (n)-[e]->(m) +RETURN TO_JSON(n) AS node_a, TO_JSON(e) AS edge, TO_JSON(m) AS node_b +LIMIT 100 +``` + +1. **Run the query**: Execute the SQL above in BigQuery Studio. +2. **Switch to Graph View**: In the results pane, click the **Graph** tab. +3. **Explore**: + - **Nodes**: Hover to see properties (from the JSON metadata). + - **Edges**: Visualized as directed links. + - **Layout**: Use the UI controls to change the layout (Force-directed, Circular, etc.). + +## Why use TO_JSON? +BigQuery's Graph tab expects the full graph element structure to enable features like: +- **Property Inspection**: Seeing all metadata associated with a node/edge. +- **Label Recognition**: Automatic coloring based on labels. +- **Connectivity**: Using internal identifiers to maintain the graph structure in the canvas. + +## External Visualization Tools +... +``` + +### 1. Looker / Looker Studio +- Use `GRAPH_TABLE` queries as data sources. +- While Looker is primarily tabular, you can use custom visualizations (D3.js, Network charts) to render the graph data. + +### 2. Python Notebooks (Colab Enterprise / Vertex AI) +Use Python libraries for interactive visualization: +- **Pyvis**: Great for interactive, draggable graphs. +- **NetworkX**: For graph analysis and static plotting (with Matplotlib). +- **Graphistry**: For high-performance visualization of large graphs. + +**Example Python Snippet:** +```python +from google.cloud import bigquery +import networkx as nx +from pyvis.network import Network + +client = bigquery.Client() +query = """ +SELECT source_node, target_node, weight +FROM GRAPH_TABLE(...) +""" +df = client.query(query).to_dataframe() + +G = nx.from_pandas_edgelist(df, 'source_node', 'target_node', ['weight']) +net = Network(notebook=True) +net.from_nx(G) +net.show("graph.html") +``` + +## Best Practices for Visualization +- **Sub-sampling**: Avoid visualizing millions of nodes at once. Use GQL filters to isolate a specific neighborhood or community. +- **Sizing/Coloring**: Use properties (e.g., `amount`, `weight`) to scale node size or color edges to make patterns obvious. +- **Layouts**: Use force-directed layouts for general exploration and hierarchical layouts for tree-like structures (e.g., org charts).