Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions skills/cloud/bigquery-graph-basics/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
name: bigquery-graph-basics
description: Use for creating and managing BigQuery graphs, writing Graph Query Language (GQL) queries, optimizing graph schemas, and visualizing graph results in BigQuery Studio or external tools.
---

# BigQuery Graph Basics

BigQuery Graph lets you use the analytical power of BigQuery to perform graph analysis on a large scale. When you model your data as a graph with nodes and edges, you can use Graph Query Language (GQL) to find complex, hidden relationships between data points that would be challenging to find using SQL.

You can create node and edge tables directly from tables or views that store entities and relationships between entities. You don't need to modify your existing workflows or replicate your data to use it in graph queries.

BigQuery Graph supports a graph query interface compatible with the ISO GQL standard and the ISO Property Graph Queries (SQL/PGQ) standard. This provides you with interoperability between relational and graph models by combining well-established SQL capabilities with the expressiveness of graph pattern matching.

## Core Workflows

### 1. Creating a Property Graph
When asked to set up a graph, follow these steps:
- Identify node and edge tables.
- Define keys and relationships.
- Use `CREATE PROPERTY GRAPH`.
- **Reference**: See [schema_design.md](references/schema_design.md) for DDL patterns and best practices.

### 2. Querying with GQL
When asked to perform graph queries:
- **Direct GQL Syntax (Preferred)**: Use the top-level `GRAPH` statement.
- Formulate patterns using ASCII-art syntax `(n)-[e]->(m)`.
- Use `MATCH`, `WHERE`, and `RETURN` clauses.
- **Reference**: See [gql_syntax.md](references/gql_syntax.md) for detailed syntax and example queries.

### 3. Optimization and Best Practices
- Advise on clustering underlying tables by keys.
- Recommend bounding variable-length paths (e.g., `*1..5`) to avoid performance issues.
- **Reference**: See [schema_design.md](references/schema_design.md) for performance tips.

### 4. Visualizing Results
- **BigQuery Studio**: Results MUST be returned using `TO_JSON` for the Graph tab to function correctly.
- Provide Python snippets for `pyvis` or `networkx` for custom visualizations.
- **Reference**: See [visualization_guide.md](references/visualization_guide.md) for tools and interactive examples.

## Quick Start Examples

### Define a Social Graph
```sql
CREATE PROPERTY GRAPH `my_dataset.social_graph`
NODE TABLES ( `my_dataset.users` KEY (uid) LABEL User )
EDGE TABLES ( `my_dataset.follows` SOURCE KEY (follower) REFERENCES users (uid) DESTINATION KEY (followed) REFERENCES users (uid) LABEL Follows );
```

### Direct GQL Query for Visualization
```sql
GRAPH `my_dataset.social_graph`
MATCH (n)-[e]->(m)
RETURN TO_JSON(n) as source, TO_JSON(e) as edge, TO_JSON(m) as target
LIMIT 100
```

## Important Notes
- Always remind the user that graphs and tables must be in the same region.
- Property graphs are logical views; updates to tables are immediately visible.
- Avoid `GRAPH_TABLE` unless specifically needing to JOIN graph results with standard SQL tables.
89 changes: 89 additions & 0 deletions skills/cloud/bigquery-graph-basics/references/gql_syntax.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# BigQuery Graph GQL Syntax Reference

This reference covers the Graph Query Language (GQL) supported by BigQuery.

## Direct GQL Execution (Recommended)
GQL queries should be executed directly as top-level statements in BigQuery. This is the modern and preferred way to interact with graphs.

### Visualization Syntax
To leverage the **Graph tab** in BigQuery Studio, you must return graph entities using `TO_JSON`.

```sql
GRAPH project.dataset.graph_name
MATCH (n)-[e]->(m)
RETURN TO_JSON(n) AS node_a, TO_JSON(e) AS edge, TO_JSON(m) AS node_b
LIMIT 100
```

## GQL Core Clauses

### MATCH
Used to specify the graph pattern to search for.
- **Node pattern**: `(variable:Label {property: value})`
- **Edge pattern**: `-[variable:Label]->`, `<-[variable:Label]-`, `-[variable:Label]-`
- **Relationship**: `(n1)-[e]->(n2)`

### WHERE
Filters nodes, edges, or paths.
- `WHERE n.age > 21`
- `WHERE e.weight >= 0.5`

### RETURN
Specifies the elements to return in the result set.
- `RETURN n.name AS name, e.type AS type` (Standard tabular result)
- `RETURN TO_JSON(n)` (Required for Graph visualization tab)

### NEXT
Used to chain multiple `MATCH` patterns.

## Advanced Patterns

### Variable-Length Paths (Quantified Path Patterns)
BigQuery GQL uses **Standard GQL Quantified Path Patterns**. Variable-length paths are defined by wrapping a pattern in parentheses followed by a quantifier like `{min, max}`.

**Note**: In quantified path patterns, variables within the parentheses (like `e` and `m` below) become **ARRAYS** of nodes/edges.

- **Length 1 to 3**: `MATCH (n1) ( -[e]-> (n2) ){1,3}`
- **Fixed length 3**: `MATCH (n1) ( -[e]-> (n2) ){3}`
- **Unbounded (at least 1)**: `MATCH (n1) ( -[e]-> (n2) )+`

### Filtering and Returning Path Arrays
When using quantified paths, use array functions to filter or access specific hops:

```sql
GRAPH project.dataset.graph
MATCH (src:Entity) ( -[e]-> (dest:Entity) ){1,5}
WHERE src.name = 'START_NODE'
AND dest[OFFSET(ARRAY_LENGTH(dest)-1)].type = 'TABLE' -- Filter last node
RETURN TO_JSON(src), TO_JSON(e), TO_JSON(dest)
```

### Multiple Matches
```gql
GRAPH project.dataset.graph
MATCH (a)-[:Knows]->(b)
MATCH (b)-[:Knows]->(c)
RETURN a.name, c.name
```

## Example Queries

### Finding Mutual Friends (Visualization Ready)
```sql
GRAPH `my_project.my_dataset.social_graph`
MATCH (u1:User)-[e1:Friend]->(common:User)<-[e2:Friend]-(u2:User)
WHERE u1.user_id = 1 AND u2.user_id = 2
RETURN TO_JSON(u1), TO_JSON(e1), TO_JSON(common), TO_JSON(e2), TO_JSON(u2)
```

## Integration with SQL (GRAPH_TABLE)
**Note**: Only use `GRAPH_TABLE` if you need to JOIN graph results with standard BigQuery tables. For exploration and visualization, use the `GRAPH` statement directly.

```sql
SELECT * FROM GRAPH_TABLE(
`project.dataset.graph_name`
MATCH (n)-[e]->(m)
RETURN n.name AS source, m.name AS target
COLUMNS(source, target)
)
```
45 changes: 45 additions & 0 deletions skills/cloud/bigquery-graph-basics/references/schema_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# BigQuery Graph Schema Design Guide

This guide covers the creation and optimization of property graphs in BigQuery.

## Creating a Property Graph (DDL)

The `CREATE PROPERTY GRAPH` statement defines the logical graph over existing tables.

```sql
CREATE PROPERTY GRAPH `project.dataset.graph_name`
NODE TABLES (
`project.dataset.nodes_table`
KEY (id_column)
LABEL MyLabel
PROPERTIES (col1, col2) -- Optional: list specific columns or use PROPERTIES ALL
)
EDGE TABLES (
`project.dataset.edges_table`
KEY (edge_id)
SOURCE KEY (from_id) REFERENCES nodes_table (id_column)
DESTINATION KEY (to_id) REFERENCES nodes_table (id_column)
LABEL MyRelationship
PROPERTIES ALL
);
```

## Schema Best Practices

### 1. Data Modeling
- **Entities as Nodes**: Any object with a unique identity should be a node.
- **Relationships as Edges**: Any interaction or connection between entities should be an edge.
- **Properties vs. Edges**: Use properties for metadata (e.g., `user.signup_date`). Use edges for structural connections (e.g., `user -[Purchased]-> product`).

### 2. Performance Optimization
- **Clustering**: Cluster the underlying node and edge tables by their keys (IDs, Source IDs, Destination IDs). This significantly improves the performance of `GRAPH_TABLE` traversals.
- **Partitioning**: If your data has a temporal component (e.g., transaction logs), partition underlying tables by date.
- **Key Uniqueness**: Ensure keys are unique and non-null in the underlying tables. BigQuery Graph assumes integrity; violations can lead to incorrect results or query failures.

### 3. Logical Structure
- **Labels**: Use descriptive labels (e.g., `Customer`, `Order`, `LineItem`). A table can be mapped to multiple labels if needed.
- **Reusability**: You can define multiple graphs over the same set of underlying tables for different use cases.

## Limitations
- Graphs and underlying tables must reside in the same region.
- Property graphs are logical views; they do not store a separate copy of the data.
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# BigQuery Graph Visualization Guide

Visualizing graphs helps in identifying clusters, influencers, and hidden patterns.

## Integrated Visualization: BigQuery Studio

BigQuery Studio provides a built-in graph explorer. To leverage the interactive **Graph** tab, results must be returned as JSON objects using the `TO_JSON` function within a direct `GRAPH` statement.

### Optimized Visualization Query
To see nodes and edges correctly in the Graph tab, use the following syntax:

```sql
GRAPH `project.dataset.graph_name`
MATCH (n)-[e]->(m)
RETURN TO_JSON(n) AS node_a, TO_JSON(e) AS edge, TO_JSON(m) AS node_b
LIMIT 100
```

1. **Run the query**: Execute the SQL above in BigQuery Studio.
2. **Switch to Graph View**: In the results pane, click the **Graph** tab.
3. **Explore**:
- **Nodes**: Hover to see properties (from the JSON metadata).
- **Edges**: Visualized as directed links.
- **Layout**: Use the UI controls to change the layout (Force-directed, Circular, etc.).

## Why use TO_JSON?
BigQuery's Graph tab expects the full graph element structure to enable features like:
- **Property Inspection**: Seeing all metadata associated with a node/edge.
- **Label Recognition**: Automatic coloring based on labels.
- **Connectivity**: Using internal identifiers to maintain the graph structure in the canvas.

## External Visualization Tools
...
```

### 1. Looker / Looker Studio
- Use `GRAPH_TABLE` queries as data sources.
- While Looker is primarily tabular, you can use custom visualizations (D3.js, Network charts) to render the graph data.

### 2. Python Notebooks (Colab Enterprise / Vertex AI)
Use Python libraries for interactive visualization:
- **Pyvis**: Great for interactive, draggable graphs.
- **NetworkX**: For graph analysis and static plotting (with Matplotlib).
- **Graphistry**: For high-performance visualization of large graphs.

**Example Python Snippet:**
```python
from google.cloud import bigquery
import networkx as nx
from pyvis.network import Network

client = bigquery.Client()
query = """
SELECT source_node, target_node, weight
FROM GRAPH_TABLE(...)
"""
df = client.query(query).to_dataframe()

G = nx.from_pandas_edgelist(df, 'source_node', 'target_node', ['weight'])
net = Network(notebook=True)
net.from_nx(G)
net.show("graph.html")
```

## Best Practices for Visualization
- **Sub-sampling**: Avoid visualizing millions of nodes at once. Use GQL filters to isolate a specific neighborhood or community.
- **Sizing/Coloring**: Use properties (e.g., `amount`, `weight`) to scale node size or color edges to make patterns obvious.
- **Layouts**: Use force-directed layouts for general exploration and hierarchical layouts for tree-like structures (e.g., org charts).