-
Query Optimization in the Wild: From Cross-platform Data Systems to Multi-Agent Systems
Query optimization is (or should be) at the core of any data management system. Without effective and efficient query optimization, even a sophisticated data system is bound to underperform. In this talk, I will first share our query optimization journey in Apache Wayang, an open source framework designed to unify analytics across diverse data sources and data processing engines. I will begin with traditional cost-based optimization and show its limitations in cross-platform settings. I will then briefly discuss learning-based approaches and explain why, despite their promise, they still struggle when optimization depends on enumerating a huge search space of plans. Motivated by these limitations, I will then introduce a different perspective on query optimization: a generate-and-explore approach that replaces the traditional enumerate-score-prune methods with generative models and an exploration-driven feedback loop. I will conclude my talk by motivating the need for query optimization in multi-agent systems, an emerging class of data systems in which multiple agents must coordinate, reason, and adapt to deliver meaningful insights.
-
Reproducible Query Optimization Research for Data Systems
Identifying reasonably good plans to execute complex queries in large data systems is a crucial ingredient for a robust data management platform. The traditional cost-based query optimizer approach enumerates different execution plans for each individual query, assesses each plan based on its costs, and selects the plan that promises the lowest execution costs. However, as we all know, the optimal execution plan is not always selected, opportunities are missed, and complex analytical queries might not even work. Thus, query optimization for data systems is a highly active research area, with novel concepts being introduced continuously. A wide range of proposals, from novel cardinality estimation methods to alternative physical operation selection strategies, have been proposed. However, qualitatively and quantitatively assessing their individual strengths and weaknesses is almost impossible. We thus introduce PostBOUND, a novel optimizer development and benchmarking framework that enables rapid prototyping and common-ground comparisons, serving as a community base for reproducible optimizer research.
-
To Be Announced
-
To Be Announced, DEI keynote