The general flow is as such:
- The Driver and Executor plug-ins are launched
- Each Executor plug-in spawns its own VE process
- SQL queries pass through
VeRewriteStrategyto capture any plans that coule be processed by the plug-in; expression evaluation happens here. CombinedCompilationColumnarRulecompiles all the generated C++ sources usingNCCinto a.so, and rewrites all the plans to reference the new.sothat is located on the driver.- Spark eventually calls a
VectorEngineToSparkPlanto produce anRDDof data, which callsSupportsVeColBatch#executeVeColumnarof its child plan, and its child plan does the same. During this calling,VeProcessof the executor is summoned. Data is freed at the earlist possible opportunity or at Task completion.
The Spark Cyclone plugin will translate your Spark SQL queries into a C++ kernel to execute them on the Vector Engine. Compilation can take anywhere from a few seconds to a couple minutes. While insignificant if your queries take hours you can optimize the compilation time by specifying a directory to cache kernels using the following config.