Skip to content

abhinigam/sparkInterceptor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Interceptor

A Spark SQL extension that intercepts and logs SQL queries, identifying tables being accessed in Databricks/Spark environments.

Overview

This library provides a custom Spark SQL extension that:

  • Intercepts every SQL query during the resolution phase
  • Extracts and logs all tables being accessed
  • Can be used for query auditing, monitoring, or custom query validation

Features

  • ✅ Intercepts all Spark SQL queries
  • ✅ Extracts table names from various table relation types (UnresolvedRelation, LogicalRelation, HiveTableRelation, DataSourceV2Relation)
  • ✅ Logs to both stdout and Log4j for visibility in Databricks driver logs
  • ✅ Compatible with Spark 3.5.2
  • ✅ Easy to integrate as a Spark extension

Requirements

  • Spark Version: 3.5.2
  • Scala Version: 2.12.18
  • Java Version: 8 or 11 (recommended for Spark 3.5.x)
  • Build Tool: sbt 1.11.7

Building the Project

Package the JAR

sbt package

The JAR file will be generated at:

target/scala-2.12/sparkinterceptor_2.12-0.1.0-SNAPSHOT.jar

Clean Build

sbt clean package

Usage

In Databricks

Option 1: Notebook Configuration (Per Session)

In your Databricks notebook, run this before executing any queries:

spark.conf.set("spark.sql.extensions", "com.example.CustomExtension")

Option 2: Cluster Configuration (Persistent)

  1. Go to your Databricks cluster configuration
  2. Navigate to Advanced OptionsSpark tab
  3. Add the following Spark configuration:
    spark.sql.extensions com.example.CustomExtension
    
  4. Restart the cluster

Upload the JAR

  1. Upload the JAR file to Databricks:
    • Workspace → Create → Library
    • Or use DBFS: /dbfs/FileStore/jars/sparkinterceptor_2.12-0.1.0-SNAPSHOT.jar
  2. Attach the library to your cluster

In Spark Applications

Add to your spark-submit command or SparkConf:

spark-submit \
  --conf spark.sql.extensions=com.example.CustomExtension \
  --jars sparkinterceptor_2.12-0.1.0-SNAPSHOT.jar \
  your-application.jar

Or in code:

val spark = SparkSession.builder()
  .appName("MyApp")
  .config("spark.sql.extensions", "com.example.CustomExtension")
  .getOrCreate()

Viewing Output

In Databricks

The interceptor logs appear in the Driver Logs, not in notebook cell output:

  1. Go to your cluster page
  2. Click Driver Logs tab
  3. Search for === INTERCEPTION RULE TRIGGERED ===

Look in:

  • stdout - for println statements
  • Log4j output - for logger.warn statements

Example Output

=== CustomExtension is being loaded! ===
=== Registering InterceptionRule ===
=== INTERCEPTION RULE TRIGGERED ===
--- Hello World from InterceptionRule! (Spark 3.5.2) ---
Intercepted Logical Plan: Project
Tables touched: my_database.my_table, another_database.another_table

Project Structure

sparkInterceptor/
├── build.sbt                          # SBT build configuration
├── project/
│   └── build.properties               # SBT version
├── src/
│   └── main/
│       └── scala/
│           └── com/
│               └── example/
│                   ├── CustomExtension.scala       # Extension entry point
│                   └── InterceptionRule.scala      # Query interception logic
└── README.md

How It Works

  1. CustomExtension: Registers the InterceptionRule with Spark's SQL extension mechanism
  2. InterceptionRule: A Spark Rule[LogicalPlan] that:
    • Intercepts every logical plan during the resolution phase
    • Traverses the plan tree to extract table references
    • Logs the information using both println and Log4j

Development

Compile Only

sbt compile

Run Tests (if added)

sbt test

Package for Distribution

sbt clean package

License

[Your License Here]

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Troubleshooting

Extension Not Loading

  • Verify spark.sql.extensions is set correctly
  • Check that the JAR is attached to your cluster
  • Restart the Databricks cluster after configuration changes

Not Seeing Logs

  • Check Driver Logs (not notebook output)
  • Ensure queries are actually running (not cached)
  • Look for === CustomExtension is being loaded! === to confirm extension loaded

Compilation Errors

  • Ensure Java 8 or 11 is installed
  • Verify Scala version matches: 2.12.18
  • Check Spark version compatibility

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages