VLDB24论文阅读：Cloud-Native Database Systems and Unikernels——Reimagining OS Abstractions for Modern Hardware

这篇文章主要讲基于Unikernel实现数据库，关于Unikernel可以看我之前的文章：论文阅读：UniKraft-Fast, Specialized Unikernels the Easy Way，第一作者是Viktor Leis

这篇文章我是从Andy的NoisePage那边知道的😂外加去年Prisma用Unikernel + PG作为云服务，这一块应当是数据库研究大热的方向

Although the idea of custom, DBMS optimized OS kernels is old, it is largely unrealized due to the demands of hardware compatibility and the reluctance of users to install specialized operating systems.

这话很中肯，但我最近在想，在Andy等人带领下，下个5年会不会常看到Kernel-Bypass的工作😂

文章不长，相比较于一众12页的VLDB文章，这篇只有8页

Introduction

Why Have Custom OS-Kernels Not Been Successful? In practice, DBMS-optimized OS kernels never gained widespread adoption for two main reasons. First, few users would buy a DBMS that requires installing a custom OS as a prerequisite.

要我说，这就是软件工程的魅力所在😂相比较于独立成为系统的DBMS得到的速度提升，人们还是倾向于数据库的可移植性

Why This Time Is Different: Cloud. The transition to the cloud has made the use of custom OS kernels in database systems more realistic.

这话确实不假，所大家就看到了PG + Unikernel的组合，Amazon就可以框框卖他们的EC2

Operating Systems and databases

这章的内容主要是交代背景，论证DBMS的Kernel-Bypass或Unikernel这类应用内存一体的合理性

看以看到，如果采用Kernel Bypass的话，有DPDK（Networking）和SPDK(存储)这两个方案可以选择

However, user-space I/O relies on polling rather than interrupts, for which Linux provides no means to forward device interrupt requests (IRQs) as asynchronous events to the user space.

“用户空间I/O依赖于轮询而不是中断，而Linux不提供将设备中断请求（IRQ）作为异步事件转发到用户空间的方法” 这个有意思，记录下

Unikernel for databases

这一章主要介绍Unikernel的前生今世

However, because POSIX is large and includes many obscure features, all unikernels implement only a subset of POSIX.

Unikernel只对部分POSIX进行了实现，从而降低了运行体积

Running the DBMS in kernel mode offers direct access to the (virtualized) hardware and primitives that Linux, due to its shared-machine model, does not expose to the user space.

Unikernel可以支持硬件直接访问

Without the need for isolation, unikernels can provide direct control over the virtual-memory hardware and enable new use cases.

Unikernel可以抛开MMU和TLB这些用于隔离用户空间和内核空间的手段

In unikernels, option A becomes more attractive as the huge VM address space (≥ 256 TiB) eases fragmentation and page-faulting is faster and potentially more scalable: a lock-free page-fault fast path that installs a preallocated frame takes between 0.5 us (1 thread) and 1.29 us (16 threads).

使用Unikernel也可以解决内存碎片化的问题

Basically, io_uring merely provides an additional queue on top of the NVMe queue that causes substantial CPU overhead[23].

Quote：

[23] Gabriel Haas and Viktor Leis. 2023. What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines. PVLDB 16, 9 (2023). https://doi.org/10.14778/3598581.3598584

io_uring在NVMe上使用会造成额外开销？有意思，这玩意有空再看

总而言之，抛开部署的麻烦不谈，Unikernel对于性能的提升是明显的，可以抛弃部分的Linux的安全机制，还可以实现些Kernel-Bypass才能实现的功能

Evaluation: Vitual Memory

Benchmark Setup. We conduct our benchmarks within a virtual machine with 16 cores and 12 GiB of DRAM, which is sufficient physical memory for all experiments. We disabled memory ballooning to avoid unpredictable memory-access slowdowns caused by hypervisor-level fragmentation. Within the VM, we used Linux (v6.1.0, Debian Unstable) or OSv (8c792811d) as operating systems. We execute the VM on top of a physical machine with an AMD EPYC 9554P processor (64 cores, 128 HW threads, 384 GiB DRAM, 1 NUMA domain) and used QEMU (v8.0.2) with hardware-assisted virtualization (KVM). Our modified OSv version, the used benchmarks, and the resulting data is available

看来是在KVM里面跑的

结论是Unikernel对于虚拟内存十分友好

More recent DBMS/OS co-design projects include DBOS [36, 53], MxKernel [43], and COD [20].

一些OS与DBMS协同设计的方案

Towards Unikernel-DBS In the cloud

主要是云原生数据库的概念

以及如果有可能的话，FPGA，TPU，DPU也能接入Unikernel

总结

我想这篇文章能上VLDB的原因是十分新奇？😂篇幅和实验感觉偏少，更多的像是在说”Old New Story”

大家感兴趣的可以试试，这个确实需要很大的工作量

Introduction

Operating Systems and databases

Unikernel for databases

Evaluation: Vitual Memory

Related Work

Towards Unikernel-DBS In the cloud

总结