Velox是Facebook开源的数据库执行引擎,这几天起了兴趣准备试下,中文搜索引擎也没搜出环境配置的教程,于是就写了这篇记录下踩坑情况

环境

WSL2(Docker)

32GB Memory(分配给Docker 24GB)

Ryzen5 4600H(6核12线程,分配给Docker 8个线程)

配置

有了前面LLVM和MLIR的配置经验,那就不多哔哔,直接上Docker

1
docker pull ghcr.io/facebookincubator/velox-dev:ubuntu-22.04

(”镜像怎么加速“这个问题不属于本篇内容)

拉取完后记得-it/-itd启动镜像,

1
docker run -itd --name Velox ghcr.io/facebookincubator/velox-dev:ubuntu-22.04 /bin/bash

然后VScode Dev Container进去,就像下面这张图

image-20240813174200292

切换到根目录删除根目录下的/velox,重新Git Clone份最新的(Velox项目每天都有更新,变化很大)

1
2
rm /velox
git clone https://github.com/facebookincubator/velox

直接make会报错,需要事先安装pkg-config(如果make报错再安装也不迟)

1
2
3
apt insatll pkg-config
cd /velox
make

大约有1200多项需要编译(内存最高占用到18GB,开8个线程需要编译快1个小时

测试Demo的可执行文件在_build/release/velox/exec/tests/velox_in_10_min_demo

Velox In 10 minutes

https://facebookincubator.github.io/velox/velox-in-10-min.html

如果要新增/修改CPP文件,直接make即可

velox/exec/tests/VeloxIn10MinDemo.cpp中的VeloxIn10MinDemo::run()中可以见到演示代码

在启动演示代码之前,VeloxIn10MinDemo这个类用于初始化,关键字有PrestoSQL,DuckDB,TPC-H,还提供了parseExpressioncompileExpressionmakeTpchSplit等函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class VeloxIn10MinDemo : public VectorTestBase {
public:
const std::string kTpchConnectorId = "test-tpch";

VeloxIn10MinDemo() {
// Register Presto scalar functions.
functions::prestosql::registerAllScalarFunctions();

// Register Presto aggregate functions.
aggregate::prestosql::registerAllAggregateFunctions();

// Register type resolver with DuckDB SQL parser.
parse::registerTypeResolver();

// Register TPC-H connector.
auto tpchConnector =
connector::getConnectorFactory(
connector::tpch::TpchConnectorFactory::kTpchConnectorName)
->newConnector(
kTpchConnectorId, std::make_shared<core::MemConfig>());
connector::registerConnector(tpchConnector);
}

~VeloxIn10MinDemo() {
connector::unregisterConnector(kTpchConnectorId);
}

教程写着:虽然Velox不提供SQL Parser,但测试环境提供DuckDB的SQL Parser作为参考

奇怪的是,如果我单独保留vectors章节的代码,程序编译就会报错

1
TypeResolver.cpp:(.text+0x4d): undefined reference to `facebook::velox::core::Expressions::resolverHook_'

代码运行记录

data->toString(1, 5)输出1到4行,不填输出列属性

1
std::cout << data->toString(1, 5) << std::endl;

compileExpression函数如下图所示,似乎依赖PrestoSQL?

1
2
3
4
5
6
7
8
9
10
std::unique_ptr<exec::ExprSet> compileExpression(
const std::string& expr,
const RowTypePtr& rowType) {
std::vector<core::TypedExprPtr> expressions = {
parseExpression(expr, rowType)};
return std::make_unique<exec::ExprSet>(
std::move(expressions), execCtx_.get());
}

auto exprSet = compileExpression("a + b", asRowType(data->type()));

compileExpression函数会生成AST树,而经过evaluate才会转为执行结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
VectorPtr evaluate(exec::ExprSet& exprSet, const RowVectorPtr& input) {
exec::EvalCtx context(execCtx_.get(), &exprSet, input.get());

SelectivityVector rows(input->size());
std::vector<VectorPtr> result(1);
exprSet.eval(rows, context, result);
return result[0];
}

auto c = evaluate(*exprSet, data);

auto abc = makeRowVector({"a", "b", "c"}, {a, b, c});

std::cout << std::endl << "> a, b, a + b: " << abc->toString() << std::endl;
std::cout << abc->toString(0, c->size()) << std::endl;

有了PlanBuilder()就可以实现AggregationsSortingFilteringJoins这些操作,甚至支持与TPC-H的Connector(”TPC-H connector generates TPC-H tables on the fly”)

1
2
3
4
5
6
7
8
9
10
11
12
13
plan = PlanBuilder()
.tpchTableScan(
tpch::Table::TBL_NATION,
{"n_nationkey", "n_name"},
1 /*scaleFactor*/)
.planNode();

auto nations = AssertQueryBuilder(plan).split(makeTpchSplit()).copyResults(pool());

std::cout << std::endl
<< "> first 10 rows from TPC-H nation table: "
<< nations->toString() << std::endl;
std::cout << nations->toString(0, 10) << std::endl;

结语

感觉Velox in 10 minutes更多的是提起人们对Velox的兴趣,而非展示Velox的执行细节(这部分内容需要Debug去寻找)