论文阅读:UniKraft-Fast, Specialized Unikernels the Easy Way
最早知道UniKraft,是Prisma通过Unikraft上线了属于他们的PostgreSQL云服务:Prisma Postgres: The Future of Serverless Databases
将APP,依赖库,指定内核打包在一起,对外暴露HTTP或UDP接口,一套介于Docker和KVM之间的虚拟化方案,有意思。
在InfoQ上截的图UniKernel
目前Unikraft的仓库依然保持稳定更新,所以我觉得UniKernel这个概念还是可以期待下的
相关Reddit讨论:Are unikernels dead?
Github仓库:unikraft/unikraft
项目官网:www.unikraft.org
论文地址:https://dl.acm.org/doi/10.1145/3447786.3456248
Introduction
for example, a web server aiming to service millions of requests per second can access a low-level, batch-based network API rather than the standard but slow socket API.
小小吐槽下,SocketAPI真的慢吗?
Our evaluation using such applications on Unikraft results in a 1.7x-2.7x performance improvement compared to Linux guests.
两倍左右这可还行
In addition, Unikraft images for these apps are around 1MB, require less than 10MB of RAM to run, and boot in around 1ms on top of the VMM time (total boot time 2ms40ms).
启动速度快,这也是应该是我觉得Prisma使用这个方案的原因吧
To support a wide range of applications, we port the musl libc library, and provide a syscall shim layer micro-library.
使用MUSL作为libc,Mark
Design Principles and Solution Space
这张图出来我觉得这章就差不多了😂Unikraft可以大规模去除不要的依赖,怎么想性能都能有效提升
• Protection-domain switches between the application and the kernel might be redundant in a virtualization context because isolation is ensured by the hypervisor, and result in measurable performance degradation.
• Multiple address spaces may be useless in a single application domain, but removing such support in standard OSes requires a massive reimplementation effort.
• For RPC-style server applications, threading is not needed, with a single, run-to-completion event loop sufficing for high performance. This would remove the need for a scheduler within the VM and its associated overheads, as well as the mismatch between the guest and hypervisor schedulers [19].
• For performance-oriented UDP-based apps, much of the OS networking stack is useless: the app could simply use the driver API, much like DPDK-style applications already do. There is currently no way to easily remove just the network stack but not the entire network sub-system from standard OSes.
• Direct access to NVMe storage from apps removes the need for file descriptors, a VFS layer and a filesystem, but removing such support from existing OSes, built around layers of the storage API, is very difficult. • Memory allocators have a large impact on application performance, and general purpose allocators have been shown to be suboptimal for many apps [66]. It would therefore be ideal if each app could choose its own allocator; this is however very difficult to do in today’s operating systems because the allocators that kernels use are baked in.
某种意义上的复古?😂POSIX兼容,去除进程隔离,去除虚拟地址,空间映射,以及Kernel Bypass的存储访问和网络访问,甚至去除多线程,一律Event-Loop
Unikraft Architecture and APIs
Unikraft can improve the performance of applications in two ways:
- Unmodified applications, by eliminating syscall overheads, reducing image size and memory consumption, and by choosing efficient memory allocators.
- Specialization, by adapting applications to take advantage of lower level APIs wherever performance is critical (e.g., a database application seeking high disk I/O throughput).
还可以通过降低SYSCALL开销,以及专业化定制获得更高的性能
Developers interested in fast boot times could further optimize the unikernel by providing their own boot code ( ) to comply with the ukboot API;
如果有能力的话,甚至还能修改引导加快速度😊
For network-bound applications, the developers can use the standard socket interface ( ) or the lower level, higher performance uknetdev API ( ) in order to significantly improve throughput;
关于Socket,提供uknetdev
这个API可以得到更高的吞吐量
下面几个小标题我感觉像说明书😅看官网的Document可能会更好些
uknetdev API
we designed an API that allows applications to operate Unikraft drivers in polling, interrupt-driven, or mixed mode.
“我们设计了一个API,该API允许应用程序在轮询,中断驱动或混合模式下操作Unikraft驱动程序”
ukalloc API
Unikraft’s memory allocation subsystem is composed of three layers:
(1) a POSIX compliant external API,
(2) an internal allocation API called ukalloc,
and (3) one or more backend allocator implementations.
“Unikraft的内存分配子系统由三层组成:(1)符合POSIX的外部API,(2)一个称为UKALLOC的内部分配API,以及(3)一个或多个后端分配器实现。”
Unikraft supports five allocation backends: a buddy system, the Two-Level Segregated Fits [53] (TLSF) real-time memory allocator, tinyalloc [67], Mimalloc [42] (version 1.6.1) and the Oscar [12] secure memory allocator
支持五种分配后端
uksched and uklock APIs
The uklock library provides synchronization primitives such as mutexes and semaphores. In order to keep the code of other libraries portable, uklock selects a target implementation depending on how the unikernel is configured.
支持同步原语
If multi-core were enabled (we do not yet support this), some primitives would use spin-locks and RCUs,
如果未来有多核,那会支持自旋锁
Application Support and Porting
可以看到Unikraft所需指令Cycle降低一个数量级
2021年的时候就支持很多语言了
软件支持度还不错
Base Evaluation
Unikraft also supports Xen and bare-metal targets (e.g., Raspberry Pi and Xilinx Ultra96-V2), but we leave their performance evaluation to future work
It’s good, Alright
这章贴这个图出来就差不多了
Compared to Lupine on QEMU/KVM, Unikraft is around 50% faster on both Redis and NGINX.
Nginx性能提升,OK
Boot performance is similar for SQLite, with the buddy allocator being the worst and tinyalloc and tlsf among the best (results not shown for brevity). At runtime, though, the order depends on how many queries are run (see Figure 16): tinyalloc is fastest for less than 1000 queries by 3-30%, becoming suboptimal with more requests, as its memory compaction algorithms are slower; using mimalloc, instead, provides a 20% performance boost under high load.
不同的Malloc内存分配方案对性能也会有影响
Specializing Applications
感觉SQLite的这个60K Insert差别不大
Discussion
Do Unikernels trade-off security?
虽然传统那些安全措施可以不要(都单独内核了,还需要那些措施降低性能?),但还是实现了CFI,Address Sanitisation和Intel MPK
Debugging
缺乏调试工具,毕竟没有完整Linux的庞大组件,但开发团队有一个ukdebug
调试工具用于缓解&解决该问题
Processes (or lack thereof) in Unikraft
Unikraft currently does not support processes and their related functions (e.g., fork() and exec()),
没有进程相关方面的API
Many modern applications however no longer depend on processes to function [5], and those that do often provide a configurable, thread-based alternative (e.g., nginx).
直接点名Nginx😂反正这些软件都有队列循环,协程啥的,也不怎么强依赖系统API
如果key-value调优,UniKraft可以达到DPDK效果
Related Work
runtime.js (JavaScript)
这个项目2020年挂了,不然我觉得会很有意思
论文列了很多,这里就不列举了
评价
这玩意不就是最小内核子系统么😂,如果能批量化,模块化那也不错
感觉如果云端调度,就重新回到云操作系统的路子上了,商业上全部用Promox而不是Docker就可以打包卖云服务了
那是不是Unikraft里面还能开Docker🤣
那这一套方案跟CloudFlare Workers相比,哪个更快呢😀有点期待