SIGMOD24文章阅读：Query Compilation Without Regrets

源码仓库：https://github.com/nebulastream/nautilus

论文地址：https://dl.acm.org/doi/10.1145/3654968

这篇文章发表于SIGMOD 2024，提出并实现了一个新的数据库查询执行框架 Nautilus，它试图解决查询编译（query compilation）带来的工程复杂性问题的同时，不牺牲运行时性能

技术背景

使用JIT进行查询编译（Query Compilation）本质是权衡：

解释执行与编译执行：在小数据量的情况下，直接解释执行的速度要比先编译后执行的速度要快
开发复杂：需要开发人员同时熟练掌握数据库与编译器知识，而JIT产生的代码难以直接调试
软件可维护性：过于追求底层设计，导致工程成本高，可维护性差
自适应编译的复杂性：编译策略的不同，会导致编译速度与执行速度产生不同效果，而采取不同的编译后端（Compile Backend）则会使该问题更加复杂

只有这四者的关系问题得到妥善解决，基于JIT的查询编译（Query Compilation）才是可持续发展的

而Trace-based JIT则来源于编程语言的编译器：在Pypy中，可以实现对热点函数（经常运行的函数）进行动态优化，从而实现性能提升

实现细节

文章提出一种追踪式JIT编译（Trace-based JIT）进行查询并编译，输入的逻辑计划会在流水线化（Pipelining）后，直接解释执行（Interpreter），或通过符号执行（symbolic execution）生成追踪记录（Trace），然后把追踪记录转成Nautilus IR，代码生成为不同的编译后端（LLVM, MLIR, Flounder，甚至是C++）适配不同的延迟/吞吐需求

而对应查询也会生成流水线

而对于UDF（User Define Function），也可以通过编译进入流水线

文章快速浏览

2. THE CURSE OF QUERY COMPILATION

“Curse”这个说法可太贴切，毕竟天下并没有免费的午餐——Query Compilation就是种Trade Off

-> high system complexity

-> decreases engineering productivity

This is particularly problematic for academic projects like Mutable [32], NoisePage [51], or Peleton [50], that can’t find contributors, as many students struggle with the complexity of query compilation

这么看，是Pipeline转为Nautilus后才算Tracing？

(后文“Nautilus translates the trace to its intermediate representation, i.e., the Nautilus IR”)，对的

3. QUERY COMPILATION WITH NAUTILUS

Nautilus introduces a novel trace-based JIT compilation approach

关于什么是Traced——JIT，大家可以看这篇文章：Musings on Tracing in PyPy

但就个人观点而言，在这个问题上持保留态度

4. OPERATOR IMPLEMENTATION INTERFACE

Nautilus follows a push-based execution model

OK, it’s fine

但凡是Push Model，必定有PipeLine

这套Push Model的代码写法看起来还是不错的

All Primitive value types directly map to an associated C++ type,

一个并不意外的答复

The runtime component is pre-compiled, while the interface creates function calls to invoke specific functions on the data structure

“运行时组件是预编译的，而接口创建函数调用以调用数据结构的特定函数”

也不意外，因为也确实没有别的方案

5.TRACE-BASED JUST-IN-TIME COMPILATION

The main idea of a tracing JIT compiler is to dynamically optimize hot code paths during the execution of a program

“追踪JIT编译器的主要思想是在执行程序期间动态优化热代码路径”

似乎只要提到TracingJIT，就一定可以优化热代码路径

Nautilus IR follows static single-assignment (SSA) form and differentiates between functions,

这年头的IR基本都是SSA形式的

Nautilus operates on operator pipelines, which always contain a tight loop over some data

operator pipelines，嗯哼？

As the shape of pipelines and the set of operators is restricted, we can eliminate the need for the initial interpretation to detect hot-code paths. Instead, Nautilus uses symbolic execution

用不用KLEE？

Nautilus creates an executable query plan that fuses individual operators to data-centric pipelines

不是，fuse individual operator？这样做好么

Nautilus’s tracing algorithm executes pipelines multiple times using dummy data

emmm，dummy data？好吧😅、

下面这张图其实没看懂，因为其既不是物理计划，也不是逻辑计划——就是程序Block跳转图，表明他们的SSA是怎么一回事

To this end, they made specific trade-offs between the throughput of the generated code, compilation latency, and ease of use.

“为此，他们在生成的代码，汇编延迟和易用性之间进行了特定的权衡”

Query Compilation基本都要为这三者权衡

Nautilus provides four backends with different low-latency characteristics: a operator interpreter, a byte code interpreter, Flounder [24], and MIR [49]

emmm，真的Low Latency吗，从后文来看，跟LLVM比确实是

MIR is a general purpure JIT compiler similar to LLVM, focusing on low compilation times

即MIR JIT比LLVM ORC JIT快🤐

LLVM provides various advanced compiler optimizations, e.g., auto-vectorization

关于LLVM的auto-vectorization这点我就很想吐槽：我怎么知道我的代码用了SIMD？

关于不同Backend的特性

以及不同Backend的使用场景

low-latency backends, which minimize compilation time for shortrunning workloads; high-performance backends, which maximize throughput for long-running workloads; and specialized backends, which accelerate specific workloads, e.g., the execution of UDFs

关于UDF设立一个单独的加速方案

To this end(Specialized Backends.), Nautilus provides a specialized compilation backends based on Babelfish

On the one hand, backends may provide powerful optimizations that lead to highly efficient code. This comes at the price of large dependencies (MLIR backend) or multi-second compilation times (CPP backend).

啧，“price of large dependencies”，“multi-second compilation times”

hash join or aggregations, use function calls to call specific pre-compiled operator logic

对的，只能这么做

6.EVALUATION

照例选取几条TPC-H作为实验参照，可以看到，Execution Time一些能跑过Umbra

The MLIR backened generates SIMD code for Q6 and SF10, resulting in more efficient code than Umbra

这一段我其实非常在意，因为多半是用了LLVM的Auto-Vectorization，那玩意在X64下不一定会走AVX512

下面是TPC-H 1，3，6的完整数据

而下面的Comile Time，我对没把ByteCode拉出来对比有点不太满意（就以图上情况而言，MLIR编译速度也还行）？

7.COMPLEXITY ANALYSIS

Nautilus’ significantly reduces the complexity of compilationbased execution engines.

All right

our results indicate that Nautilus is able to reach similar compilation times as Umbra.

All right

8.Related Works

Furthermore, Nautilus’ Babelfish [30]-based UDF accelerator extends previous work like YeSQL [22] and Tuplex [72] and enables holistic optimization across relational operators and UDFs. Supporting these workloads underpins the flexibility of Nautilus’ compilation approach.

果然与YeSQL的工作有点关系

实验结果

可以看到并没有哪一种后端能完美解决问题，每种后端在性能、延迟、实现复杂度之间都有权衡，以适用于不同类型的工作负载

在短查询或调试场景中，MIR 作为一个轻量 JIT 后端，在吞吐与延迟之间取得平衡
高性能后端适合长查询或批处理场景，MLIR 和 C++ 后端以较高的编译延迟为代价，可生成高效机器码

而对Tracing而言，编译速度并不会随着选择（Selection）Query的增多面临大幅度增长

在TPC-H的Query当中，MLIR的Backend的效果要比DuckDB要好，但很少有样例可以跑赢Umbra

结语

Nautilus（鹦鹉螺）的工作有意思的地方在于将不同的Query Compilation方案集成于一处并且跑了BenchMark，以及一套基于Symbolic Execution符号运行的Trace-Based JIT(怎么与Fuss扯上关系了)

但不管怎么说，Query Compilation该难调试依旧难调试：复杂度照样降不下去，就算输出C++代码调试也不会好到那里去（像极了用IDA调试程序）

问就是未来可期：具备很强的Extensibility。这说法只能交给历史去证明了

至于Without Regret了，我只能说该Regret还是会Regret😅只要Umbra不开源，大家肯定还是更愿意DuckDB