A Spark SQL extension that intercepts and logs SQL queries, identifying tables being accessed in Databricks/Spark environments.
This library provides a custom Spark SQL extension that:
- Intercepts every SQL query during the resolution phase
- Extracts and logs all tables being accessed
- Can be used for query auditing, monitoring, or custom query validation
- ✅ Intercepts all Spark SQL queries
- ✅ Extracts table names from various table relation types (UnresolvedRelation, LogicalRelation, HiveTableRelation, DataSourceV2Relation)
- ✅ Logs to both stdout and Log4j for visibility in Databricks driver logs
- ✅ Compatible with Spark 3.5.2
- ✅ Easy to integrate as a Spark extension
- Spark Version: 3.5.2
- Scala Version: 2.12.18
- Java Version: 8 or 11 (recommended for Spark 3.5.x)
- Build Tool: sbt 1.11.7
sbt packageThe JAR file will be generated at:
target/scala-2.12/sparkinterceptor_2.12-0.1.0-SNAPSHOT.jar
sbt clean packageIn your Databricks notebook, run this before executing any queries:
spark.conf.set("spark.sql.extensions", "com.example.CustomExtension")- Go to your Databricks cluster configuration
- Navigate to Advanced Options → Spark tab
- Add the following Spark configuration:
spark.sql.extensions com.example.CustomExtension - Restart the cluster
- Upload the JAR file to Databricks:
- Workspace → Create → Library
- Or use DBFS:
/dbfs/FileStore/jars/sparkinterceptor_2.12-0.1.0-SNAPSHOT.jar
- Attach the library to your cluster
Add to your spark-submit command or SparkConf:
spark-submit \
--conf spark.sql.extensions=com.example.CustomExtension \
--jars sparkinterceptor_2.12-0.1.0-SNAPSHOT.jar \
your-application.jarOr in code:
val spark = SparkSession.builder()
.appName("MyApp")
.config("spark.sql.extensions", "com.example.CustomExtension")
.getOrCreate()The interceptor logs appear in the Driver Logs, not in notebook cell output:
- Go to your cluster page
- Click Driver Logs tab
- Search for
=== INTERCEPTION RULE TRIGGERED ===
Look in:
stdout- for println statementsLog4j output- for logger.warn statements
=== CustomExtension is being loaded! ===
=== Registering InterceptionRule ===
=== INTERCEPTION RULE TRIGGERED ===
--- Hello World from InterceptionRule! (Spark 3.5.2) ---
Intercepted Logical Plan: Project
Tables touched: my_database.my_table, another_database.another_table
sparkInterceptor/
├── build.sbt # SBT build configuration
├── project/
│ └── build.properties # SBT version
├── src/
│ └── main/
│ └── scala/
│ └── com/
│ └── example/
│ ├── CustomExtension.scala # Extension entry point
│ └── InterceptionRule.scala # Query interception logic
└── README.md
- CustomExtension: Registers the
InterceptionRulewith Spark's SQL extension mechanism - InterceptionRule: A Spark
Rule[LogicalPlan]that:- Intercepts every logical plan during the resolution phase
- Traverses the plan tree to extract table references
- Logs the information using both println and Log4j
sbt compilesbt testsbt clean package[Your License Here]
Contributions are welcome! Please feel free to submit a Pull Request.
- Verify
spark.sql.extensionsis set correctly - Check that the JAR is attached to your cluster
- Restart the Databricks cluster after configuration changes
- Check Driver Logs (not notebook output)
- Ensure queries are actually running (not cached)
- Look for
=== CustomExtension is being loaded! ===to confirm extension loaded
- Ensure Java 8 or 11 is installed
- Verify Scala version matches:
2.12.18 - Check Spark version compatibility