SparkSQL+: Next-generation Query Planning over Spark.

Published in ACM SIGMOD International Conference on Management of Data (SIGMOD), 2023

Abstract

We will demonstrate SparkSQL+, a SQL processing engine built on top of Spark. Unlike the vanilla SparkSQL that uses classical query plans, SparkSQL+ adopts some of the recently developed new query plans including generalized hypertree decompositions (GHD), worst-case optimal join (WCOJ) algorithms, and conjunctive queries with comparisons (CQC). SparkSQL+ also provides a platform for users to explore different query plans for a given query through a web-based interface, and compare their performances with classical query plans on the same Spark core.

Citation

Binyang Dai, Qichen Wang, and Ke Yi. “SparkSQL+: Next-generation Query Planning over Spark.” ACM SIGMOD International Conference on Management of Data (SIGMOD), June 2023. System demonstration.

Supplemental Material

PDF

Github Repo