最初知道Apache Arrow Gandiva是无意间看Arrow项目的时候看到的,冲着项目主页上的LLVM,JIT的字样,我还实际尝试在Ubuntu安装和运行了下,但最后因为实在想不清楚,在什么场景下能用上,就弃坑了😂

直到前几天,我读完NoisePage的论文和部分源码,总感觉Arrow和LLVM的结合在哪里见到过——就是Apache Arrow Gandiva,那干脆这回一并把源码看了,搞清楚这东西到底是什么

相关资料

如果现在在Bing上搜“Apache Arrow Gandiva”,那么第二篇就会是一位知乎老哥写的Apache Arrow Gandiva:远大理想与尴尬现实,这也是当时我弃坑的主要原因。但今天我想说的是:为什么要用Java去处理Arrow数据?😂如果我是用Rust/C++,那Gandiva就一点都不尴尬,相反还很有意思——Apche Arrow Gandiva做了很多打通LLVM和Arrow生态的工作,给研究学者留下了很多探索空间

中文文档

Gandiva表达式、投影器和过滤器

Gandiva 外部函数开发指南

Dremio提供的资料

Introducing the Gandiva Initiative for Apache Arrow

Adding a User Defined Function to Gandiva

Gandiva Initiative: Improving SQL Performance by 70x

大家写的Blog

湖仓一体 - Apache Arrow的那些事

项目历史&现状简述

该项目由Dremio在2018年捐给Apache Arrow,现作为Apache Arrow的子项目之一(信息来源:Gandiva: A LLVM-based Analytical Expression Compiler for Apache Arrow)如果你再进一步深究的话,会发现Arrow当中有不少人现在就在Dremio中工作,而Dremio项目也使用Apache Arrow,而Gandiva则宣称为Dremio执行引擎的一部分

Gandiva最大的亮点是使用LLVM的自动向量化完成Arrow的向量化处理,而在LLVM部分当中,还实现了Project和Filter——这里如果加上Join和Aggregation操作,很多SQL操作就齐活了,如果你再把NoisePage算上的话,甚至能完成整套纯LLVM的Arrow CURD处理机制

虽然网传这个项目烂尾(根本就没这回事好吧😅),但事实是Gandiva一直都有commit进行维护,今年LLVM20出来以后也很快做了跟进

image-20250624221759873

目前Gandiva有C和C++的相关库,但对于Rust版本的Arrow似乎就不提供相关支持了:Interfaces for gandiva bindings.

源码解析

代码下载于2025.6.24,所有代码均平铺在单层目录上

Gandiva源码的地址:https://github.com/apache/arrow/tree/main/cpp/src/gandiva

|-- CMakeLists.txt
|-- GandivaConfig.cmake.in
|-- annotator.cc
|-- annotator.h
|-- annotator_test.cc
|-- arrow.h
|-- basic_decimal_scalar.h
|-- bitmap_accumulator.cc
|-- bitmap_accumulator.h
|-- bitmap_accumulator_test.cc
|-- cache.cc
|-- cache.h
|-- cache_test.cc
|-- cast_time.cc
|-- compiled_expr.h
|-- condition.h
|-- configuration.cc
|-- configuration.h
|-- context_helper.cc
|-- date_utils.cc
|-- date_utils.h
|-- decimal_ir.cc
|-- decimal_ir.h
|-- decimal_scalar.h
|-- decimal_type_util.cc
|-- decimal_type_util.h
|-- decimal_type_util_test.cc
|-- decimal_xlarge.cc
|-- decimal_xlarge.h
|-- dex.h
|-- dex_visitor.h
|-- encrypt_utils.cc
|-- encrypt_utils.h
|-- encrypt_utils_test.cc
|-- engine.cc
|-- engine.h
|-- engine_llvm_test.cc
|-- eval_batch.h
|-- execution_context.h
|-- exported_funcs.cc
|-- exported_funcs.h
|-- exported_funcs_registry.cc
|-- exported_funcs_registry.h
|-- exported_funcs_registry_test.cc
|-- expr_decomposer.cc
|-- expr_decomposer.h
|-- expr_decomposer_test.cc
|-- expr_validator.cc
|-- expr_validator.h
|-- expression.cc
|-- expression.h
|-- expression_cache_key.h
|-- expression_registry.cc
|-- expression_registry.h
|-- expression_registry_test.cc
|-- external_c_functions.cc
|-- field_descriptor.h
|-- filter.cc
|-- filter.h
|-- formatting_utils.h
|-- func_descriptor.h
|-- function_holder.h
|-- function_holder_maker_registry.cc
|-- function_holder_maker_registry.h
|-- function_ir_builder.cc
|-- function_ir_builder.h
|-- function_registry.cc
|-- function_registry.h
|-- function_registry_arithmetic.cc
|-- function_registry_arithmetic.h
|-- function_registry_common.h
|-- function_registry_datetime.cc
|-- function_registry_datetime.h
|-- function_registry_hash.cc
|-- function_registry_hash.h
|-- function_registry_math_ops.cc
|-- function_registry_math_ops.h
|-- function_registry_string.cc
|-- function_registry_string.h
|-- function_registry_test.cc
|-- function_registry_timestamp_arithmetic.cc
|-- function_registry_timestamp_arithmetic.h
|-- function_signature.cc
|-- function_signature.h
|-- function_signature_test.cc
|-- gandiva.pc.in
|-- gandiva_aliases.h
|-- gandiva_object_cache.cc
|-- gandiva_object_cache.h
|-- gdv_function_stubs.cc
|-- gdv_function_stubs.h
|-- gdv_function_stubs_test.cc
|-- gdv_hash_function_stubs.cc
|-- gdv_string_function_stubs.cc
|-- hash_utils.cc
|-- hash_utils.h
|-- hash_utils_test.cc
|-- in_holder.h
|-- interval_holder.cc
|-- interval_holder.h
|-- interval_holder_test.cc
|-- literal_holder.cc
|-- literal_holder.h
|-- llvm_generator.cc
|-- llvm_generator.h
|-- llvm_generator_test.cc
|-- llvm_includes.h
|-- llvm_types.cc
|-- llvm_types.h
|-- llvm_types_test.cc
|-- local_bitmaps_holder.h
|-- lru_cache.h
|-- lru_cache_test.cc
|-- lvalue.h
|-- make_precompiled_bitcode.py
|-- native_function.h
|-- node.h
|-- node_visitor.h
|-- precompiled
| |-- CMakeLists.txt
| |-- arithmetic_ops.cc
| |-- arithmetic_ops_test.cc
| |-- bitmap.cc
| |-- bitmap_test.cc
| |-- decimal_ops.cc
| |-- decimal_ops.h
| |-- decimal_ops_test.cc
| |-- decimal_wrapper.cc
| |-- epoch_time_point.h
| |-- epoch_time_point_test.cc
| |-- extended_math_ops.cc
| |-- extended_math_ops_test.cc
| |-- hash.cc
| |-- hash_test.cc
| |-- print.cc
| |-- string_ops.cc
| |-- string_ops_test.cc
| |-- testing.h
| |-- time.cc
| |-- time_constants.h
| |-- time_fields.h
| |-- time_test.cc
| |-- timestamp_arithmetic.cc
| `-- types.h
|-- precompiled_bitcode.cc.in
|-- projector.cc
|-- projector.h
|-- random_generator_holder.cc
|-- random_generator_holder.h
|-- random_generator_holder_test.cc
|-- regex_functions_holder.cc
|-- regex_functions_holder.h
|-- regex_functions_holder_test.cc
|-- regex_util.cc
|-- regex_util.h
|-- selection_vector.cc
|-- selection_vector.h
|-- selection_vector_impl.h
|-- selection_vector_test.cc
|-- simple_arena.h
|-- simple_arena_test.cc
|-- symbols.map
|-- tests
| |-- CMakeLists.txt
| |-- binary_test.cc
| |-- boolean_expr_test.cc
| |-- date_time_test.cc
| |-- decimal_single_test.cc
| |-- decimal_test.cc
| |-- external_functions
| | |-- CMakeLists.txt
| | |-- multiply_by_two.cc
| | `-- multiply_by_two.h
| |-- filter_project_test.cc
| |-- filter_test.cc
| |-- generate_data.h
| |-- hash_test.cc
| |-- huge_table_test.cc
| |-- if_expr_test.cc
| |-- in_expr_test.cc
| |-- literal_test.cc
| |-- micro_benchmarks.cc
| |-- null_validity_test.cc
| |-- projector_build_validation_test.cc
| |-- projector_test.cc
| |-- test_util.cc
| |-- test_util.h
| |-- timed_evaluate.h
| |-- to_string_test.cc
| `-- utf8_test.cc
|-- to_date_holder.cc
|-- to_date_holder.h
|-- to_date_holder_test.cc
|-- tree_expr_builder.cc
|-- tree_expr_builder.h
|-- tree_expr_test.cc
|-- value_validity_pair.h
`-- visibility.h

由于代码量极大,只选取部分进行分析

node

关于Tree的Node的定义

namespace gandiva {
class FieldNode;
class FunctionNode;
class IfNode;
class LiteralNode;
class BooleanNode;
template <typename Type>
class InExpressionNode;
/// \brief Visitor for nodes in the expression tree.
class GANDIVA_EXPORT NodeVisitor {
public:
virtual ~NodeVisitor() = default;
virtual Status Visit(const FieldNode& node) = 0;
virtual Status Visit(const FunctionNode& node) = 0;
virtual Status Visit(const IfNode& node) = 0;
virtual Status Visit(const LiteralNode& node) = 0;
virtual Status Visit(const BooleanNode& node) = 0;
virtual Status Visit(const InExpressionNode<int32_t>& node) = 0;
virtual Status Visit(const InExpressionNode<int64_t>& node) = 0;
virtual Status Visit(const InExpressionNode<float>& node) = 0;
virtual Status Visit(const InExpressionNode<double>& node) = 0;
virtual Status Visit(const InExpressionNode<gandiva::DecimalScalar128>& node) = 0;
virtual Status Visit(const InExpressionNode<std::string>& node) = 0;
};
} // namespace gandiva

tree_expr

tree_expr_test.cc
tree_expr_builder.cc
tree_expr_builder.h

用于解析计算树,比如4*5+3这种,通过TreeExprBuilder完成树的构建

TEST_F(TestExprTree, TestField) {
Annotator annotator;
auto n0 = TreeExprBuilder::MakeField(i0_);
EXPECT_EQ(n0->return_type(), int32());
auto n1 = TreeExprBuilder::MakeField(b0_);
EXPECT_EQ(n1->return_type(), boolean());
ExprDecomposer decomposer(*registry_, annotator);
ValueValidityPairPtr pair;
auto status = decomposer.Decompose(*n1, &pair);
DCHECK_EQ(status.ok(), true) << status.message();
auto value = pair->value_expr();
auto value_dex = std::dynamic_pointer_cast<VectorReadFixedLenValueDex>(value);
EXPECT_EQ(value_dex->FieldType(), boolean());
EXPECT_EQ(pair->validity_exprs().size(), 1);
auto validity = pair->validity_exprs().at(0);
auto validity_dex = std::dynamic_pointer_cast<VectorReadValidityDex>(validity);
EXPECT_NE(validity_dex->ValidityIdx(), value_dex->DataIdx());
}

借助函数重载,使用访问者模式,实现树的遍历与转换

class GANDIVA_EXPORT TreeExprBuilder {
public:
/// \brief create a node on a literal.
static NodePtr MakeLiteral(bool value);
static NodePtr MakeLiteral(uint8_t value);
static NodePtr MakeLiteral(uint16_t value);
static NodePtr MakeLiteral(uint32_t value);
static NodePtr MakeLiteral(uint64_t value);
static NodePtr MakeLiteral(int8_t value);
static NodePtr MakeLiteral(int16_t value);
static NodePtr MakeLiteral(int32_t value);
static NodePtr MakeLiteral(int64_t value);
static NodePtr MakeLiteral(float value);
static NodePtr MakeLiteral(double value);
static NodePtr MakeStringLiteral(const std::string& value);
static NodePtr MakeBinaryLiteral(const std::string& value);
static NodePtr MakeDecimalLiteral(const DecimalScalar128& value);

to_date_holder

完成字符串往时间的转化

EST_F(TestToDateHolder, TestSimpleDateTime) {
EXPECT_OK_AND_ASSIGN(auto to_date_holder, ToDateHolder::Make("YYYY-MM-DD HH:MI:SS", 1));
auto& to_date = *to_date_holder;
bool out_valid;
std::string s("1986-12-01 01:01:01");
int64_t millis_since_epoch =
to_date(&execution_context_, s.data(), (int)s.length(), true, &out_valid);
EXPECT_EQ(millis_since_epoch, 533779200000);
s = std::string("1986-12-01 01:01:01.11");
millis_since_epoch =
to_date(&execution_context_, s.data(), (int)s.length(), true, &out_valid);
EXPECT_EQ(millis_since_epoch, 533779200000);
s = std::string("1986-12-01 01:01:01 +0800");
millis_since_epoch =
to_date(&execution_context_, s.data(), (int)s.length(), true, &out_valid);
EXPECT_EQ(millis_since_epoch, 533779200000);
#if 0
// TODO : this fails parsing with date::parse and strptime on linux
s = std::string("1886-12-01 00:00:00");
millis_since_epoch =
to_date(&execution_context_, s.data(), (int) s.length(), true, &out_valid);
EXPECT_EQ(out_valid, true);
EXPECT_EQ(millis_since_epoch, -2621894400000);
#endif
s = std::string("1886-12-01 01:01:01");
millis_since_epoch =
to_date(&execution_context_, s.data(), (int)s.length(), true, &out_valid);
EXPECT_EQ(millis_since_epoch, -2621894400000);
s = std::string("1986-12-11 01:30:00");
millis_since_epoch =
to_date(&execution_context_, s.data(), (int)s.length(), true, &out_valid);
EXPECT_EQ(millis_since_epoch, 534643200000);
}

simple_arena

没太理解内容,似乎是关于内存分配处理的内容,实现以Trunk为单位的内存分配

TEST_F(TestSimpleArena, TestAlloc) {
int64_t chunk_size = 4096;
SimpleArena arena(arrow::default_memory_pool(), chunk_size);
// Small allocations should come from the same chunk.
int64_t small_size = 100;
for (int64_t i = 0; i < 20; ++i) {
auto p = arena.Allocate(small_size);
EXPECT_NE(p, nullptr);
EXPECT_EQ(arena.total_bytes(), chunk_size);
EXPECT_EQ(arena.avail_bytes(), chunk_size - (i + 1) * small_size);
}
// large allocations require separate chunks
int64_t large_size = 100 * chunk_size;
auto p = arena.Allocate(large_size);
EXPECT_NE(p, nullptr);
EXPECT_EQ(arena.total_bytes(), chunk_size + large_size);
EXPECT_EQ(arena.avail_bytes(), 0);
}

selection_vector

实现对于Arrow格式存储的选择向量(Selection Vector)

这里需要补充下关于选择向量的相关知识

Selection Vector 是一种在数据处理系统中使用的技术,用来表示一批数据中哪些行被选中(有效),从而避免对不相关的数据行进行操作。它常见于列式数据库、矢量化执行引擎(如 Apache Arrow、Dremio、Gandiva)中,用于提升性能。

Selection Vector(选择向量)本质上是一个索引数组,存储的是被选中行在原始数据批中的下标。

避免复制数据:只需操作向量而不移动原始数据。

高效过滤:可以快速跳过不符合条件的行。

矢量化执行支持:配合批处理(batch processing),提升 SIMD 性能。

落到具体选择上,可能就是bitmap或是个Set

TEST_F(TestSelectionVector, TestInt16Set) {
int max_slots = 10;
std::shared_ptr<SelectionVector> selection;
auto status = SelectionVector::MakeInt16(max_slots, pool_, &selection);
EXPECT_EQ(status.ok(), true) << status.message();
selection->SetIndex(0, 100);
EXPECT_EQ(selection->GetIndex(0), 100);
selection->SetIndex(1, 200);
EXPECT_EQ(selection->GetIndex(1), 200);
selection->SetNumSlots(2);
EXPECT_EQ(selection->GetNumSlots(), 2);
// TopArray() should return an array with 100,200
auto array_raw = selection->ToArray();
const auto& array = dynamic_cast<const arrow::UInt16Array&>(*array_raw);
EXPECT_EQ(array.length(), 2) << array_raw->ToString();
EXPECT_EQ(array.Value(0), 100) << array_raw->ToString();
EXPECT_EQ(array.Value(1), 200) << array_raw->ToString();
}

也可以通过Bitmap实现向量选择

TEST_F(TestSelectionVector, TestInt64PopulateFromBitMap) {
int max_slots = 200;
std::shared_ptr<SelectionVector> selection;
auto status = SelectionVector::MakeInt64(max_slots, pool_, &selection);
EXPECT_EQ(status.ok(), true) << status.message();
int bitmap_size = RoundUpNumi64(max_slots) * 8;
std::vector<uint8_t> bitmap(bitmap_size);
arrow::bit_util::SetBit(&bitmap[0], 0);
arrow::bit_util::SetBit(&bitmap[0], 5);
arrow::bit_util::SetBit(&bitmap[0], 121);
arrow::bit_util::SetBit(&bitmap[0], 220);
status = selection->PopulateFromBitMap(&bitmap[0], bitmap_size, max_slots - 1);
EXPECT_EQ(status.ok(), true) << status.message();
EXPECT_EQ(selection->GetNumSlots(), 3);
EXPECT_EQ(selection->GetIndex(0), 0);
EXPECT_EQ(selection->GetIndex(1), 5);
EXPECT_EQ(selection->GetIndex(2), 121);
}

regex_functions/util

正则表达式相关,似乎能检测SQL相关的符号,这部分使用了Google的re2库,参考PCRE(Perl Compatible Regular Expressions)实现标准

const std::set<char> RegexUtil::pcre_regex_specials_ = {
'[', ']', '(', ')', '|', '^', '-', '+', '*', '?', '{', '}', '$', '\\', '.'};

而测试也基本围绕些简易字符串展开

你甚至能看到关于中文字符的检测,这可太稀罕了,C++的UTF-8识别这块我一直摸不着头脑😂

input_string = "路%c$大";
extract_index = 2; // Retrieve all matched string
ret = extract_numbers(&execution_context_, input_string.c_str(),
static_cast<int32_t>(input_string.length()), extract_index,
&out_length);
ret_as_str = std::string(ret, out_length);
EXPECT_EQ(out_length, 1);
EXPECT_EQ(ret_as_str, "c");

random_generator

随机数生成器,里面包含了随机种子信息

namespace gandiva {
/// Function Holder for 'random'
class GANDIVA_EXPORT RandomGeneratorHolder : public FunctionHolder {
public:
~RandomGeneratorHolder() override = default;
static Result<std::shared_ptr<RandomGeneratorHolder>> Make(const FunctionNode& node);
double operator()() { return distribution_(generator_); }
private:
explicit RandomGeneratorHolder(int seed) : distribution_(0, 1) {
int64_t seed64 = static_cast<int64_t>(seed);
seed64 = (seed64 ^ 0x00000005DEECE66D) & 0x0000ffffffffffff;
generator_.seed(static_cast<uint64_t>(seed64));
}
RandomGeneratorHolder() : distribution_(0, 1) {
generator_.seed(::arrow::internal::GetRandomSeed());
}
std::mt19937_64 generator_;
std::uniform_real_distribution<> distribution_;
};
} // namespace gandiva

project

关于Gandiva如何处理Apache Arrow的Project的代码了,

/// \brief projection using expressions.

///

/// A projector is built for a specific schema and vector of expressions.

/// Once the projector is built, it can be used to evaluate many row batches.

看以看到实现中LLVM Generator,output_fields,是否使用已有的缓存,以及代码生成设置相关属性

std::unique_ptr<LLVMGenerator> llvm_generator_;
SchemaPtr schema_;
FieldVector output_fields_;
std::shared_ptr<Configuration> configuration_;
bool built_from_cache_;
};

这里面还涉及了关于数据缓冲区的代码

Status Projector::AllocArrayData(const DataTypePtr& type, int64_t num_records,
arrow::MemoryPool* pool,
ArrayDataPtr* array_data) const {
arrow::Status astatus;
std::vector<std::shared_ptr<arrow::Buffer>> buffers;
// The output vector always has a null bitmap.
int64_t size = arrow::bit_util::BytesForBits(num_records);
ARROW_ASSIGN_OR_RAISE(auto bitmap_buffer, arrow::AllocateBuffer(size, pool));
buffers.push_back(std::move(bitmap_buffer));
// String/Binary vectors have an offsets array.
auto type_id = type->id();
if (arrow::is_binary_like(type_id)) {
auto offsets_len = arrow::bit_util::BytesForBits((num_records + 1) * 32);
ARROW_ASSIGN_OR_RAISE(auto offsets_buffer, arrow::AllocateBuffer(offsets_len, pool));
buffers.push_back(std::move(offsets_buffer));
}
// The output vector always has a data array.
int64_t data_len;
if (arrow::is_primitive(type_id) || type_id == arrow::Type::DECIMAL) {
const auto& fw_type = static_cast<const arrow::FixedWidthType&>(*type);
data_len = arrow::bit_util::BytesForBits(num_records * fw_type.bit_width());
} else if (arrow::is_binary_like(type_id)) {
// we don't know the expected size for varlen output vectors.
data_len = 0;
} else {
return Status::Invalid("Unsupported output data type " + type->ToString());
}
ARROW_ASSIGN_OR_RAISE(auto data_buffer, arrow::AllocateResizableBuffer(data_len, pool));
// This is not strictly required but valgrind gets confused and detects this
// as uninitialized memory access. See arrow::util::SetBitTo().
if (type->id() == arrow::Type::BOOL) {
memset(data_buffer->mutable_data(), 0, data_len);
}
buffers.push_back(std::move(data_buffer));
*array_data = arrow::ArrayData::Make(type, num_records, std::move(buffers));
return Status::OK();
}

有点奇怪的是这部分内容没有没有配备test

lru_cache

从Boost库修改的LRU Cache,因为代码使用了模板,所以这里看不出来是存了什么

// modified from boost LRU cache -> the boost cache supported only an
// ordered map.
namespace gandiva {
// a cache which evicts the least recently used item when it is full
template <class Key, class Value>
class LruCache {
public:
using key_type = Key;
using value_type = Value;
using list_type = std::list<key_type>;

测试代码是直接使用string

TEST_F(TestLruCache, TestLruBehavior) {
cache_.insert(TestCacheKey(1), "hello");
cache_.insert(TestCacheKey(2), "hello");
cache_.get(TestCacheKey(1));
cache_.insert(TestCacheKey(3), "hello");
// should have evicted key 2.
ASSERT_EQ(*cache_.get(TestCacheKey(1)), "hello");
}

llvm_types

有一个llvm_types用于全局的types生成管理,用于映射Arrow的类型,这样的代码也能在NoisePage里面找到

class GANDIVA_EXPORT LLVMTypes {
public:
explicit LLVMTypes(llvm::LLVMContext& context);
llvm::Type* void_type() { return llvm::Type::getVoidTy(context_); }
llvm::Type* i1_type() { return llvm::Type::getInt1Ty(context_); }
llvm::Type* i8_type() { return llvm::Type::getInt8Ty(context_); }
llvm::Type* i16_type() { return llvm::Type::getInt16Ty(context_); }
llvm::Type* i32_type() { return llvm::Type::getInt32Ty(context_); }
llvm::Type* i64_type() { return llvm::Type::getInt64Ty(context_); }
llvm::Type* i128_type() { return llvm::Type::getInt128Ty(context_); }
llvm::StructType* i128_split_type() {
// struct with high/low bits (see decimal_ops.cc:DecimalSplit)
return llvm::StructType::get(context_, {i64_type(), i64_type()}, false);
}

以及一些简单的内容初始化

llvm::Constant* i128_zero() { return i128_constant(0); }
llvm::Constant* i128_one() { return i128_constant(1); }

相关测试代码

TEST_F(TestLLVMTypes, TestFound) {
EXPECT_EQ(types_->IRType(arrow::Type::BOOL), types_->i1_type());
EXPECT_EQ(types_->IRType(arrow::Type::INT32), types_->i32_type());
EXPECT_EQ(types_->IRType(arrow::Type::INT64), types_->i64_type());
EXPECT_EQ(types_->IRType(arrow::Type::FLOAT), types_->float_type());
EXPECT_EQ(types_->IRType(arrow::Type::DOUBLE), types_->double_type());
EXPECT_EQ(types_->IRType(arrow::Type::DATE64), types_->i64_type());
EXPECT_EQ(types_->IRType(arrow::Type::TIME64), types_->i64_type());
EXPECT_EQ(types_->IRType(arrow::Type::TIMESTAMP), types_->i64_type());
EXPECT_EQ(types_->DataVecType(arrow::boolean()), types_->i1_type());
EXPECT_EQ(types_->DataVecType(arrow::int32()), types_->i32_type());
EXPECT_EQ(types_->DataVecType(arrow::int64()), types_->i64_type());
EXPECT_EQ(types_->DataVecType(arrow::float32()), types_->float_type());
EXPECT_EQ(types_->DataVecType(arrow::float64()), types_->double_type());
EXPECT_EQ(types_->DataVecType(arrow::date64()), types_->i64_type());
EXPECT_EQ(types_->DataVecType(arrow::time64(arrow::TimeUnit::MICRO)),
types_->i64_type());
EXPECT_EQ(types_->DataVecType(arrow::timestamp(arrow::TimeUnit::MILLI)),
types_->i64_type());
}
TEST_F(TestLLVMTypes, TestNotFound) {
EXPECT_EQ(types_->IRType(arrow::Type::SPARSE_UNION), nullptr);
EXPECT_EQ(types_->IRType(arrow::Type::DENSE_UNION), nullptr);
EXPECT_EQ(types_->DataVecType(arrow::null()), nullptr);
}

llvm_includes

开头的关闭MSVC的警告可以记录以下,这是我头一回遇到,看以看出Gandiva是能在Windows上面运行的

#if defined(_MSC_VER)
# pragma warning(push)
# pragma warning(disable : 4141)
# pragma warning(disable : 4146)
# pragma warning(disable : 4244)
# pragma warning(disable : 4267)
# pragma warning(disable : 4291)
# pragma warning(disable : 4624)
#endif

甚至还考虑到了不同LLVM版本的情况

#if LLVM_VERSION_MAJOR >= 10
# define LLVM_ALIGN(alignment) (llvm::Align((alignment)))
#else
# define LLVM_ALIGN(alignment) (alignment)
#endif

llvm_generator

最为核心的LLVM代码生成

生成器似乎可以对缓存有效利用

class GANDIVA_EXPORT LLVMGenerator {
public:
/// \brief Factory method to initialize the generator.
static Result<std::unique_ptr<LLVMGenerator>> Make(
const std::shared_ptr<Configuration>& config, bool cached,
std::optional<std::reference_wrapper<GandivaObjectCache>> object_cache =
std::nullopt);
/// \brief Get the cache to be used for LLVM ObjectCache.
static std::shared_ptr<Cache<ExpressionCacheKey, std::shared_ptr<llvm::MemoryBuffer>>>
GetCache();

存储关于SelectionVector::Mode的信息

SelectionVector::Mode selection_vector_mode() { return selection_vector_mode_; }

build将表达式输入生成代码

/// \brief Build the code for the expression trees for default mode with a LLVM
/// ObjectCache. Each element in the vector represents an expression tree
Status Build(const ExpressionVector& exprs, SelectionVector::Mode mode);
/// \brief Build the code for the expression trees for default mode. Each
/// element in the vector represents an expression tree
Status Build(const ExpressionVector& exprs);

execute将Arrow量输入LLVM IR函数

/// \brief Execute the built expression against the provided arguments for
/// default mode.
Status Execute(const arrow::RecordBatch& record_batch,
const ArrayDataVector& output_vector) const;
/// \brief Execute the built expression against the provided arguments for
/// all modes. Only works on the records specified in the selection_vector.
Status Execute(const arrow::RecordBatch& record_batch,
const SelectionVector* selection_vector,
const ArrayDataVector& output_vector) const;

基本LLVMContextIRbuilder自然是少不了,但这里的创建Global String居然不用检查重复,不知道是疏忽,还是因为前边有检查😂

llvm::LLVMContext* context() { return engine_->context(); }
llvm::IRBuilder<>* ir_builder() { return engine_->ir_builder(); }
llvm::Constant* CreateGlobalStringPtr(const std::string& string) {
return engine_->CreateGlobalStringPtr(string);
}

然后Vistor模式重新过一遍解析树

class Visitor : public DexVisitor {
public:
Visitor(LLVMGenerator* generator, llvm::Function* function,
llvm::BasicBlock* entry_block, llvm::Value* arg_addrs,
llvm::Value* arg_local_bitmaps, llvm::Value* arg_holder_ptrs,
std::vector<llvm::Value*> slice_offsets, llvm::Value* arg_context_ptr,
llvm::Value* loop_var);
void Visit(const VectorReadValidityDex& dex) override;
void Visit(const VectorReadFixedLenValueDex& dex) override;
void Visit(const VectorReadVarLenValueDex& dex) override;
void Visit(const LocalBitMapValidityDex& dex) override;
void Visit(const TrueDex& dex) override;
void Visit(const FalseDex& dex) override;
void Visit(const LiteralDex& dex) override;
void Visit(const NonNullableFuncDex& dex) override;
void Visit(const NullableNeverFuncDex& dex) override;
void Visit(const NullableInternalFuncDex& dex) override;
void Visit(const IfDex& dex) override;
void Visit(const BooleanAndDex& dex) override;
void Visit(const BooleanOrDex& dex) override;
void Visit(const InExprDexBase<int32_t>& dex) override;
void Visit(const InExprDexBase<int64_t>& dex) override;
void Visit(const InExprDexBase<float>& dex) override;
void Visit(const InExprDexBase<double>& dex) override;
void Visit(const InExprDexBase<gandiva::DecimalScalar128>& dex) override;
void Visit(const InExprDexBase<std::string>& dex) override;
template <typename Type>
void VisitInExpression(const InExprDexBase<Type>& dex);
LValuePtr result() { return result_; }
bool has_arena_allocs() { return has_arena_allocs_; }

还有专门关于LLVM函数生成与函数调用的函数

std::vector<llvm::Value*> BuildParams(int holder_idx,
const ValueValidityPairVector& args,
bool with_validity, bool with_context);
// Generate code to invoke a function call.
LValuePtr BuildFunctionCall(const NativeFunction* func, DataTypePtr arrow_return_type,
std::vector<llvm::Value*>* params);
// Generate code for an if-else condition.
LValuePtr BuildIfElse(llvm::Value* condition, std::function<LValuePtr()> then_func,
std::function<LValuePtr()> else_func,
DataTypePtr arrow_return_type);

通过接口添加预定义的LLVM IR函数

/// Generate code to make a function call (to a pre-compiled IR function) which takes
/// 'args' and has a return type 'ret_type'.
llvm::Value* AddFunctionCall(const std::string& full_name, llvm::Type* ret_type,
const std::vector<llvm::Value*>& args);

关于Cache的详细实现

std::shared_ptr<Cache<ExpressionCacheKey, std::shared_ptr<llvm::MemoryBuffer>>>
LLVMGenerator::GetCache() {
static std::shared_ptr<Cache<ExpressionCacheKey, std::shared_ptr<llvm::MemoryBuffer>>>
shared_cache = std::make_shared<
Cache<ExpressionCacheKey, std::shared_ptr<llvm::MemoryBuffer>>>();
return shared_cache;
}
Status LLVMGenerator::SetLLVMObjectCache(GandivaObjectCache& object_cache) {
return engine_->SetLLVMObjectCache(object_cache);
}

build的部分实现

Status LLVMGenerator::Build(const ExpressionVector& exprs, SelectionVector::Mode mode) {
selection_vector_mode_ = mode;
for (auto& expr : exprs) {
auto output = annotator_.AddOutputFieldDescriptor(expr->result());
ARROW_RETURN_NOT_OK(Add(expr, output));
}
// Compile and inject into the process' memory the generated function.
ARROW_RETURN_NOT_OK(engine_->FinalizeModule());
// setup the jit functions for each expression.
for (auto& compiled_expr : compiled_exprs_) {
auto fn_name = compiled_expr->GetFunctionName(mode);
ARROW_ASSIGN_OR_RAISE(auto fn_ptr, engine_->CompiledFunction(fn_name));
auto jit_fn = reinterpret_cast<EvalFunc>(fn_ptr);
compiled_expr->SetJITFunction(selection_vector_mode_, jit_fn);
}
return Status::OK();
}

这部分的详细内容有空的话值得细看,而关于Test的话,这边给的示范样例是LLVM自动向量化向量加

TEST_F(TestLLVMGenerator, TestAdd) {
// Setup LLVM generator to do an arithmetic add of two vectors
ASSERT_OK_AND_ASSIGN(auto generator,
LLVMGenerator::Make(TestConfigWithIrDumping(), false));
Annotator annotator;
auto field0 = std::make_shared<arrow::Field>("f0", arrow::int32());
auto desc0 = annotator.CheckAndAddInputFieldDescriptor(field0);
auto validity_dex0 = std::make_shared<VectorReadValidityDex>(desc0);
auto value_dex0 = std::make_shared<VectorReadFixedLenValueDex>(desc0);
auto pair0 = std::make_shared<ValueValidityPair>(validity_dex0, value_dex0);
auto field1 = std::make_shared<arrow::Field>("f1", arrow::int32());
auto desc1 = annotator.CheckAndAddInputFieldDescriptor(field1);
auto validity_dex1 = std::make_shared<VectorReadValidityDex>(desc1);
auto value_dex1 = std::make_shared<VectorReadFixedLenValueDex>(desc1);
auto pair1 = std::make_shared<ValueValidityPair>(validity_dex1, value_dex1);
DataTypeVector params{arrow::int32(), arrow::int32()};
auto func_desc = std::make_shared<FuncDescriptor>("add", params, arrow::int32());
FunctionSignature signature(func_desc->name(), func_desc->params(),
func_desc->return_type());
const NativeFunction* native_func =
generator->function_registry_->LookupSignature(signature);
std::vector<ValueValidityPairPtr> pairs{pair0, pair1};
auto func_dex = std::make_shared<NonNullableFuncDex>(
func_desc, native_func, FunctionHolderPtr(nullptr), -1, pairs);
auto field_sum = std::make_shared<arrow::Field>("out", arrow::int32());
auto desc_sum = annotator.CheckAndAddInputFieldDescriptor(field_sum);
// LLVM 10 doesn't like the expr function name to be the same as the module name when
// LLJIT is used
std::string fn_name = "llvm_gen_test_add_expr";
ASSERT_OK(generator->engine_->LoadFunctionIRs());
ASSERT_OK(generator->CodeGenExprValue(func_dex, 4, desc_sum, 0, fn_name,
SelectionVector::MODE_NONE));
ASSERT_OK(generator->engine_->FinalizeModule());
auto const& ir = generator->engine_->ir();
EXPECT_THAT(ir, testing::HasSubstr("vector.body"));
ASSERT_OK_AND_ASSIGN(auto fn_ptr, generator->engine_->CompiledFunction(fn_name));
ASSERT_TRUE(fn_ptr);
auto eval_func = reinterpret_cast<EvalFunc>(fn_ptr);
constexpr size_t kNumRecords = 4;
std::array<uint32_t, kNumRecords> a0{1, 2, 3, 4};
std::array<uint32_t, kNumRecords> a1{5, 6, 7, 8};
uint64_t in_bitmap = 0xffffffffffffffffull;
std::array<uint32_t, kNumRecords> out{0, 0, 0, 0};
uint64_t out_bitmap = 0;
std::array<uint8_t*, 6> addrs{
reinterpret_cast<uint8_t*>(a0.data()), reinterpret_cast<uint8_t*>(&in_bitmap),
reinterpret_cast<uint8_t*>(a1.data()), reinterpret_cast<uint8_t*>(&in_bitmap),
reinterpret_cast<uint8_t*>(out.data()), reinterpret_cast<uint8_t*>(&out_bitmap),
};
std::array<int64_t, 6> addr_offsets{0, 0, 0, 0, 0, 0};
eval_func(addrs.data(), addr_offsets.data(), nullptr, nullptr, nullptr,
0 /* dummy context ptr */, kNumRecords);
EXPECT_THAT(out, testing::ElementsAre(6, 8, 10, 12));
EXPECT_EQ(out_bitmap, 0ULL);
}

这一块可以看下GPT的解析

这个 TestAdd 测试用例是在测试 LLVM 代码生成器生成向量加法运算的完整流程。让我详细解释:

测试目标

验证 Gandiva 能否正确生成并执行两个 int32 向量相加的 LLVM 代码。

测试流程分析

1. 初始化 LLVM 生成器

ASSERT_OK_AND_ASSIGN(auto generator, LLVMGenerator::Make(TestConfigWithIrDumping(), false));
  • 创建支持 IR 转储的 LLVM 代码生成器

2. 设置输入向量描述符

// 第一个输入向量 f0 (int32)
auto field0 = std::make_shared<arrow::Field>("f0", arrow::int32());
auto desc0 = annotator.CheckAndAddInputFieldDescriptor(field0);
auto validity_dex0 = std::make_shared<VectorReadValidityDex>(desc0); // 有效性位图
auto value_dex0 = std::make_shared<VectorReadFixedLenValueDex>(desc0); // 数据值
auto pair0 = std::make_shared<ValueValidityPair>(validity_dex0, value_dex0);
// 第二个输入向量 f1 (int32) - 类似的设置

3. 定义加法函数

DataTypeVector params{arrow::int32(), arrow::int32()};
auto func_desc = std::make_shared<FuncDescriptor>("add", params, arrow::int32());
  • 定义一个接受两个 int32 参数、返回 int32 的加法函数

4. 创建函数表达式

auto func_dex = std::make_shared<NonNullableFuncDex>(
func_desc, native_func, FunctionHolderPtr(nullptr), -1, pairs);
  • 将输入向量对与加法函数绑定

5. 生成 LLVM 代码

ASSERT_OK(generator->CodeGenExprValue(func_dex, 4, desc_sum, 0, fn_name,
SelectionVector::MODE_NONE));
  • 为表达式生成 LLVM IR 代码
  • 处理 4 条记录

6. 编译并获取函数指针

ASSERT_OK_AND_ASSIGN(auto fn_ptr, generator->engine_->CompiledFunction(fn_name));
auto eval_func = reinterpret_cast<EvalFunc>(fn_ptr);
  • 将 LLVM IR 编译为机器码
  • 获取可执行的函数指针

7. 准备测试数据

std::array<uint32_t, kNumRecords> a0{1, 2, 3, 4}; // 第一个向量
std::array<uint32_t, kNumRecords> a1{5, 6, 7, 8}; // 第二个向量
uint64_t in_bitmap = 0xffffffffffffffffull; // 全部有效
std::array<uint32_t, kNumRecords> out{0, 0, 0, 0}; // 输出向量

8. 执行生成的函数

eval_func(addrs.data(), addr_offsets.data(), nullptr, nullptr, nullptr,
0 /* dummy context ptr */, kNumRecords);
  • 调用编译后的函数执行向量加法

9. 验证结果

EXPECT_THAT(out, testing::ElementsAre(6, 8, 10, 12)); // 1+5, 2+6, 3+7, 4+8
EXPECT_EQ(out_bitmap, 0ULL); // 输出位图应为0(全部有效)

关键技术点

1. 向量化优化验证

EXPECT_THAT(ir, testing::HasSubstr("vector.body"));
  • 检查生成的 LLVM IR 中包含向量化循环体
  • 确保 LLVM 进行了向量化优化

2. 内存布局

std::array<uint8_t*, 6> addrs{
reinterpret_cast<uint8_t*>(a0.data()), // 第一个向量数据
reinterpret_cast<uint8_t*>(&in_bitmap), // 第一个向量有效性位图
reinterpret_cast<uint8_t*>(a1.data()), // 第二个向量数据
reinterpret_cast<uint8_t*>(&in_bitmap), // 第二个向量有效性位图
reinterpret_cast<uint8_t*>(out.data()), // 输出向量数据
reinterpret_cast<uint8_t*>(&out_bitmap) // 输出向量有效性位图
};

测试意义

这个测试验证了 Gandiva 的核心功能:

  1. 表达式解析:将高级表达式转换为内部表示
  2. 代码生成:生成高效的 LLVM IR 代码
  3. 编译执行:将 IR 编译为机器码并执行
  4. 向量化优化:确保生成的代码利用了 SIMD 指令
  5. 正确性:验证计算结果的准确性

这是一个端到端的集成测试,确保整个代码生成和执行流水线工作正常。

这里的testing::HasSubstr是GMock的组件

这里可以看到,C语言函数是可以直接register上去的

TEST_F(TestLLVMGenerator, VerifyExtendedCFunctions) {
VerifyFunctionMapping("multiply_by_three_int32", [](auto registry) {
return TestConfigWithCFunction(std::move(registry));
});
//test_util.cc
std::shared_ptr<Configuration> TestConfigWithCFunction(
std::shared_ptr<FunctionRegistry> registry) {
return BuildConfigurationWithRegistry(std::move(registry), [](auto reg) {
return reg->Register(GetTestExternalCFunction(),
reinterpret_cast<void*>(multiply_by_three));
});
}
static int64_t multiply_by_three(int32_t value) { return value * 3; }

literal_holder

Gandiva 中统一表示和处理各种类型的常量值

namespace gandiva {
using LiteralHolder =
std::variant<bool, float, double, int8_t, int16_t, int32_t, int64_t, uint8_t,
uint16_t, uint32_t, uint64_t, std::string, DecimalScalar128>;
GANDIVA_EXPORT std::string ToString(const LiteralHolder& holder);
} // namespace gandiva

std::variant 是 C++17 引入的一个类型安全的联合体(type-safe union),它可以在运行时保存一个多个预设类型中的一个值,但不会像传统的 union 那样不安全。

Rust 的 enum 枚举类型std::variant 的更强版本

Interval_holder

处理各类时间间隔

// Pass only years and days to cast
data = "P12Y15D";
response = cast_interval_day(&execution_context_, data.data(), 7, true, &out_valid);
qty_days_in_response = 15;
qty_millis_in_response = 0;
EXPECT_TRUE(out_valid);
EXPECT_FALSE(execution_context_.has_error());
EXPECT_EQ(response, (qty_millis_in_response << 32) | qty_days_in_response);

hash_utils

hash组件用的是OpenSSL,主要是关于Sha类,Md5l类函数

GANDIVA_EXPORT
const char* gdv_sha512_hash(int64_t context, const void* message, size_t message_length,
int32_t* out_length) {
constexpr int sha512_result_length = 128;
return gdv_hash_using_openssl(context, message, message_length, EVP_sha512(),
sha512_result_length, out_length);
}
/// Hashes a generic message using the SHA256 algorithm
GANDIVA_EXPORT
const char* gdv_sha256_hash(int64_t context, const void* message, size_t message_length,
int32_t* out_length) {
constexpr int sha256_result_length = 64;
return gdv_hash_using_openssl(context, message, message_length, EVP_sha256(),
sha256_result_length, out_length);
}
/// Hashes a generic message using the SHA1 algorithm
GANDIVA_EXPORT
const char* gdv_sha1_hash(int64_t context, const void* message, size_t message_length,
int32_t* out_length) {
constexpr int sha1_result_length = 40;
return gdv_hash_using_openssl(context, message, message_length, EVP_sha1(),
sha1_result_length, out_length);
}
GANDIVA_EXPORT
const char* gdv_md5_hash(int64_t context, const void* message, size_t message_length,
int32_t* out_length) {
constexpr int md5_result_length = 32;
return gdv_hash_using_openssl(context, message, message_length, EVP_md5(),
md5_result_length, out_length);
}

gandiva_object_cache

直接对result1 = evaluate("column1 + column2 * 3");这类操作的结果进行缓存,相关操作继承自llvm::ObjectCache,使用llvm::memorybuffer缓存相关代码

class GandivaObjectCache : public llvm::ObjectCache {
public:
explicit GandivaObjectCache(
std::shared_ptr<Cache<ExpressionCacheKey, std::shared_ptr<llvm::MemoryBuffer>>>&
cache,
ExpressionCacheKey key);
~GandivaObjectCache() {}
void notifyObjectCompiled(const llvm::Module* M, llvm::MemoryBufferRef Obj);
std::unique_ptr<llvm::MemoryBuffer> getObject(const llvm::Module* M);
private:
ExpressionCacheKey cache_key_;
std::shared_ptr<Cache<ExpressionCacheKey, std::shared_ptr<llvm::MemoryBuffer>>> cache_;
};

function_signature

给函数上Hash,我猜应该是缓存记录

EXPECT_EQ(FunctionSignature("extract_month", {arrow::date32()}, arrow::int64()),
FunctionSignature("extract_month", {local_date32_type_}, local_i64_type_));
TEST_F(TestFunctionSignature, TestHash) {
FunctionSignature f1("add", {arrow::int32(), arrow::int32()}, arrow::int64());
FunctionSignature f2("add", {local_i32_type_, local_i32_type_}, local_i64_type_);
EXPECT_EQ(f1.Hash(), f2.Hash());
FunctionSignature f3("extractDay", {arrow::int64()}, arrow::int64());
FunctionSignature f4("extractday", {arrow::int64()}, arrow::int64());
EXPECT_EQ(f3.Hash(), f4.Hash());
}

function_register

class GANDIVA_EXPORT FunctionRegistry {
public:
using iterator = const NativeFunction*;
using FunctionHolderMaker =
std::function<arrow::Result<std::shared_ptr<FunctionHolder>>(
const FunctionNode& function_node)>;
FunctionRegistry();
FunctionRegistry(const FunctionRegistry&) = delete;
FunctionRegistry& operator=(const FunctionRegistry&) = delete;
/// Lookup a pre-compiled function by its signature.
const NativeFunction* LookupSignature(const FunctionSignature& signature) const;
/// \brief register a set of functions into the function registry from a given bitcode
/// file
arrow::Status Register(const std::vector<NativeFunction>& funcs,
const std::string& bitcode_path);
/// \brief register a set of functions into the function registry from a given bitcode
/// buffer
arrow::Status Register(const std::vector<NativeFunction>& funcs,
std::shared_ptr<arrow::Buffer> bitcode_buffer);
/// \brief register a C function into the function registry
/// @param func the registered function's metadata
/// @param c_function_ptr the function pointer to the
/// registered function's implementation
/// @param function_holder_maker this will be used as the function holder if the
/// function requires a function holder
arrow::Status Register(
NativeFunction func, void* c_function_ptr,
std::optional<FunctionHolderMaker> function_holder_maker = std::nullopt);
/// \brief get a list of bitcode memory buffers saved in the registry
const std::vector<std::shared_ptr<arrow::Buffer>>& GetBitcodeBuffers() const;
/// \brief get a list of C functions saved in the registry
const std::vector<std::pair<NativeFunction, void*>>& GetCFunctions() const;
const FunctionHolderMakerRegistry& GetFunctionHolderMakerRegistry() const;
iterator begin() const;
iterator end() const;
iterator back() const;
friend arrow::Result<std::shared_ptr<FunctionRegistry>> MakeDefaultFunctionRegistry();
private:
std::vector<NativeFunction> pc_registry_;
SignatureMap pc_registry_map_;
std::vector<std::shared_ptr<arrow::Buffer>> bitcode_memory_buffers_;
std::vector<std::pair<NativeFunction, void*>> c_functions_;
FunctionHolderMakerRegistry holder_maker_registry_;
Status Add(NativeFunction func);
};
/// \brief get the default function registry
GANDIVA_EXPORT std::shared_ptr<FunctionRegistry> default_function_registry();
} // namespace gandiva

function_ir_builder

一个十分通用的IR生成器(这玩意我怎么之前没想到过呢.jpg),甚至能实现If-else的block块跳转

class FunctionIRBuilder {
public:
explicit FunctionIRBuilder(Engine* engine) : engine_(engine) {}
virtual ~FunctionIRBuilder() = default;
protected:
LLVMTypes* types() { return engine_->types(); }
llvm::Module* module() { return engine_->module(); }
llvm::LLVMContext* context() { return engine_->context(); }
llvm::IRBuilder<>* ir_builder() { return engine_->ir_builder(); }
llvm::Constant* CreateGlobalStringPtr(const std::string& string) {
return engine_->CreateGlobalStringPtr(string);
}
/// Build an if-else block.
llvm::Value* BuildIfElse(llvm::Value* condition, llvm::Type* return_type,
std::function<llvm::Value*()> then_func,
std::function<llvm::Value*()> else_func);
struct NamedArg {
std::string name;
llvm::Type* type;
};
/// Build llvm fn.
llvm::Function* BuildFunction(const std::string& function_name, llvm::Type* return_type,
std::vector<NamedArg> in_args);
private:
Engine* engine_;
};

filter

这部分也是在LLVM中实现,看起来和Project差不多

private:
std::unique_ptr<LLVMGenerator> llvm_generator_;
SchemaPtr schema_;
std::shared_ptr<Configuration> configuration_;
bool built_from_cache_;

如果想要添加缓存,直接SetLLVMObjectCache即可

Status Engine::SetLLVMObjectCache(GandivaObjectCache& object_cache) {
auto cached_buffer = object_cache.getObject(nullptr);
if (cached_buffer) {
auto error = lljit_->addObjectFile(std::move(cached_buffer));
if (error) {
return Status::CodeGenError("Failed to add cached object file to LLJIT: ",
llvm::toString(std::move(error)));
}
}
return Status::OK();
}

在PassManager里面可以挂上Optimize

static void OptimizeModuleWithNewPassManager(llvm::Module& module,
llvm::TargetIRAnalysis target_analysis) {
// Setup an optimiser pipeline
llvm::PassBuilder pass_builder;
llvm::LoopAnalysisManager loop_am;
llvm::FunctionAnalysisManager function_am;
llvm::CGSCCAnalysisManager cgscc_am;
llvm::ModuleAnalysisManager module_am;
function_am.registerPass([&] { return target_analysis; });
// Register required analysis managers
pass_builder.registerModuleAnalyses(module_am);
pass_builder.registerCGSCCAnalyses(cgscc_am);
pass_builder.registerFunctionAnalyses(function_am);
pass_builder.registerLoopAnalyses(loop_am);
pass_builder.crossRegisterProxies(loop_am, function_am, cgscc_am, module_am);
pass_builder.registerPipelineStartEPCallback([&](llvm::ModulePassManager& module_pm,
llvm::OptimizationLevel Level) {
module_pm.addPass(llvm::ModuleInlinerPass());
llvm::FunctionPassManager function_pm;
function_pm.addPass(llvm::InstCombinePass());
function_pm.addPass(llvm::PromotePass());
function_pm.addPass(llvm::GVNPass());
function_pm.addPass(llvm::NewGVNPass());
function_pm.addPass(llvm::SimplifyCFGPass());
function_pm.addPass(llvm::LoopVectorizePass());
function_pm.addPass(llvm::SLPVectorizerPass());
module_pm.addPass(llvm::createModuleToFunctionPassAdaptor(std::move(function_pm)));
module_pm.addPass(llvm::GlobalOptPass());
});

engine

关于LLVM Engine的配置基本都在engine.hengine.ccengine_llvm_test.cc里面,还可以加载预编译好LLVM IR

/// load pre-compiled IR modules from precompiled_bitcode.cc and merge them into
/// the main module.
Status LoadPreCompiledIR();
// load external pre-compiled bitcodes into module
Status LoadExternalPreCompiledIR();
// Create and add mappings for cpp functions that can be accessed from LLVM.
arrow::Status AddGlobalMappings();
// Remove unused functions to reduce compile time.
Status RemoveUnusedFunctions();
std::unique_ptr<llvm::LLVMContext> context_;
std::unique_ptr<llvm::orc::LLJIT> lljit_;
std::unique_ptr<llvm::IRBuilder<>> ir_builder_;
std::unique_ptr<llvm::Module> module_;
LLVMTypes types_;
std::vector<std::string> functions_to_compile_;
bool optimize_ = true;
bool module_finalized_ = false;
bool cached_;
bool functions_loaded_ = false;
std::shared_ptr<FunctionRegistry> function_registry_;
std::string module_ir_;
std::unique_ptr<llvm::TargetMachine> target_machine_;
const std::shared_ptr<Configuration> conf_;
};

encrypt

Gandiva里面有加密套件的相关设置(但是却没看到文档关于如何使用的),其使用的AES加密也来自OpenSSL组件

GANDIVA_EXPORT
int32_t aes_encrypt(const char* plaintext, int32_t plaintext_len, const char* key,
int32_t key_len, unsigned char* cipher);
/**
* Decrypt data using aes algorithm
**/
GANDIVA_EXPORT
int32_t aes_decrypt(const char* ciphertext, int32_t ciphertext_len, const char* key,
int32_t key_len, unsigned char* plaintext);

具体的Test

TEST(TestShaEncryptUtils, TestAesEncryptDecrypt) {
// 16 bytes key
auto* key = "12345678abcdefgh";
auto* to_encrypt = "some test string";
auto key_len = static_cast<int32_t>(strlen(reinterpret_cast<const char*>(key)));
auto to_encrypt_len =
static_cast<int32_t>(strlen(reinterpret_cast<const char*>(to_encrypt)));
unsigned char cipher_1[64];
int32_t cipher_1_len =
gandiva::aes_encrypt(to_encrypt, to_encrypt_len, key, key_len, cipher_1);
unsigned char decrypted_1[64];
int32_t decrypted_1_len = gandiva::aes_decrypt(reinterpret_cast<const char*>(cipher_1),
cipher_1_len, key, key_len, decrypted_1);
EXPECT_EQ(std::string(reinterpret_cast<const char*>(to_encrypt), to_encrypt_len),
std::string(reinterpret_cast<const char*>(decrypted_1), decrypted_1_len));

decimal_ir

对于浮点数代码的生成进行了特别的处理,看来这里面坑不小😂

class DecimalIR : public FunctionIRBuilder {
public:
explicit DecimalIR(Engine* engine)
: FunctionIRBuilder(engine), enable_ir_traces_(false) {}
/// Build decimal IR functions and add them to the engine.
static Status AddFunctions(Engine* engine);
void EnableTraces() { enable_ir_traces_ = true; }
llvm::Value* CallDecimalFunction(const std::string& function_name,
llvm::Type* return_type,
const std::vector<llvm::Value*>& args);
private:
/// The intrinsic fn for divide with small divisors is about 10x slower, so not
/// using these.
static const bool kUseOverflowIntrinsics = false;
// Holder for an i128 value, along with its with scale and precision.
class ValueFull {
public:
ValueFull(llvm::Value* value, llvm::Value* precision, llvm::Value* scale)
: value_(value), precision_(precision), scale_(scale) {}
llvm::Value* value() const { return value_; }
llvm::Value* precision() const { return precision_; }
llvm::Value* scale() const { return scale_; }
private:
llvm::Value* value_;
llvm::Value* precision_;
llvm::Value* scale_;
};
// Holder for an i128 value, and a boolean indicating overflow.
class ValueWithOverflow {
public:
ValueWithOverflow(llvm::Value* value, llvm::Value* overflow)
: value_(value), overflow_(overflow) {}
// Make from IR struct
static ValueWithOverflow MakeFromStruct(DecimalIR* decimal_ir, llvm::Value* dstruct);
// Build a corresponding IR struct
llvm::Value* AsStruct(DecimalIR* decimal_ir) const;
llvm::Value* value() const { return value_; }
llvm::Value* overflow() const { return overflow_; }
private:
llvm::Value* value_;
llvm::Value* overflow_;
};

附录:Arrow类型与LLVM类型的映射

Gandiva 类型(arrow 数据类型)C 函数类型
int8int8_t
int16int16_t
int32int32_t
int64int64_t
uint8uint8_t
uint16uint16_t
uint32uint32_t
uint64uint64_t
float32float
float64double
booleanbool
date32int32_t
date64int64_t
timestampint64_t
time32int32_t
time64int64_t
interval_monthint32_t
interval_day_timeint64_t
utf8(作为参数类型)const char*、uint32_t
utf8(作为返回类型)int64_t context、const char*、uint32_t*
binary(作为参数类型)const char*、uint32_t
utf8(作为返回类型)int64_t context、const char*、uint32_t*

总结

抛开不知道为什么项目文件不区分文件夹的问题,项目代码质量很高,关键点的注释和测试样例可以让人理解Gandiva做的事情,很有意思。

虽然有提供Gandiva外部函数的相关手册,但具体怎么用的话,还是要是要看测试样例