源码仓库:https://github.com/nebulastream/nautilus

论文地址:https://dl.acm.org/doi/10.1145/3654968

2. THE CURSE OF QUERY COMPILATION

“Curse”这个说法可太贴切,毕竟天下并没有免费的午餐——Query Compilation就是种Trade Off

-> high system complexity

-> decreases engineering productivity

This is particularly problematic for academic projects like Mutable [32], NoisePage [51], or Peleton [50], that can’t find contributors, as many students struggle with the complexity of query compilation

image-20250222112118263

这么看,是Pipeline转为Nautilus后才算Tracing?

(后文“Nautilus translates the trace to its intermediate representation, i.e., the Nautilus IR”),对的

image-20250222112219549

3. QUERY COMPILATION WITH NAUTILUS

Nautilus introduces a novel trace-based JIT compilation approach

这年头谁不是trace-based JIT compilation

4. OPERATOR IMPLEMENTATION INTERFACE

Nautilus follows a push-based execution model

OK, it’s fine

但凡是Push Model,必定有PipeLine

image-20250222115733161

这套Push Model的代码写法看起来还是不错的

image-20250222120215267

All Primitive value types directly map to an associated C++ type,

一个并不意外的答复

The runtime component is pre-compiled, while the interface creates function calls to invoke specific functions on the data structure

“运行时组件是预编译的,而接口创建函数调用以调用数据结构的特定函数”

也不意外,因为也确实没有别的方案

5.TRACE-BASED JUST-IN-TIME COMPILATION

The main idea of a tracing JIT compiler is to dynamically optimize hot code paths during the execution of a program

“追踪JIT编译器的主要思想是在执行程序期间动态优化热代码路径”

似乎只要提到TracingJIT,就一定可以优化热代码路径

Nautilus IR follows static single-assignment (SSA) form and differentiates between functions,

这年头的IR基本都是SSA形式的

Nautilus operates on operator pipelines, which always contain a tight loop over some data

operator pipelines,嗯哼?

As the shape of pipelines and the set of operators is restricted, we can eliminate the need for the initial interpretation to detect hot-code paths. Instead, Nautilus uses symbolic execution

用不用KLEE?

Nautilus creates an executable query plan that fuses individual operators to data-centric pipelines

不是,fuse individual operator?这样做好么

image-20250223001157502

Nautilus’s tracing algorithm executes pipelines multiple times using dummy data

emmm,dummy data?好吧😅、

下面这张图其实没看懂,因为其既不是物理计划,也不是逻辑计划——就是程序Block跳转图,表明他们的SSA是怎么一回事

image-20250223094236168

To this end, they made specific trade-offs between the throughput of the generated code, compilation latency, and ease of use.

“为此,他们在生成的代码,汇编延迟和易用性之间进行了特定的权衡”

Query Compilation基本都要为这三者权衡

Nautilus provides four backends with different low-latency characteristics: a operator interpreter, a byte code interpreter, Flounder [24], and MIR [49]

emmm,真的Low Latency吗,从后文来看,跟LLVM比确实是

MIR is a general purpure JIT compiler similar to LLVM, focusing on low compilation times

即MIR JIT比LLVM ORC JIT快🤐

LLVM provides various advanced compiler optimizations, e.g., auto-vectorization

关于LLVM的auto-vectorization这点我就很想吐槽:我怎么知道我的代码用了SIMD?

关于不同Backend的特性

image-20250223100557072

以及不同Backend的使用场景

low-latency backends, which minimize compilation time for shortrunning workloads; high-performance backends, which maximize throughput for long-running workloads; and specialized backends, which accelerate specific workloads, e.g., the execution of UDFs

关于UDF设立一个单独的加速方案

To this end(Specialized Backends.), Nautilus provides a specialized compilation backends based on Babelfish

On the one hand, backends may provide powerful optimizations that lead to highly efficient code. This comes at the price of large dependencies (MLIR backend) or multi-second compilation times (CPP backend).

啧,“price of large dependencies”,“multi-second compilation times”

hash join or aggregations, use function calls to call specific pre-compiled operator logic

对的,只能这么做

6.EVALUATION

照例选取几条TPC-H作为实验参照,可以看到,Execution Time一些能跑过Umbra

image-20250223154038721

The MLIR backened generates SIMD code for Q6 and SF10, resulting in more efficient code than Umbra

这一段我其实非常在意,因为多半是用了LLVM的Auto-Vectorization,那玩意在X64下不一定会走AVX512

下面是TPC-H 1,3,6的完整数据

image-20250223155327884

而下面的Comile Time,我对没把ByteCode拉出来对比有点不太满意(就以图上情况而言,MLIR编译速度也还行)?

image-20250223160404310

7.COMPLEXITY ANALYSIS

Nautilus’ significantly reduces the complexity of compilationbased execution engines.

All right

image-20250223102214839

our results indicate that Nautilus is able to reach similar compilation times as Umbra.

All right

8.Related Works

Furthermore, Nautilus’ Babelfish [30]-based UDF accelerator extends previous work like YeSQL [22] and Tuplex [72] and enables holistic optimization across relational operators and UDFs. Supporting these workloads underpins the flexibility of Nautilus’ compilation approach.

果然与YeSQL的工作有点关系

评论

Nautilus(鹦鹉螺)的工作有意思的地方在于将不同的Query Compilation方案集成于一处并且跑了BenchMark,以及一套基于Symbolic Execution符号运行的Trace-Based JIT(怎么与Fuss扯上关系了)

但不管怎么说,Query Compilation该难调试依旧难调试:复杂度照样降不下去,就算输出C++代码调试也不会好到那里去(像极了用IDA调试程序)

问就是未来可期:具备很强的Extensibility。这说法只能交给历史去证明了

至于Without Regret了,我只能说该Regret还是会Regret😅只要Umbra不开源,大家肯定还是更愿意DuckDB