如何在MLIR中自定义类型并且输出运行

本文是淦！MLIR输出Hello World不应该这么难的续集，同时也是对MLIR官方Toy Tutorial的Chapter #7中一些困惑的自我解答尝试

本文的实现代码已放入Github仓库：

https://github.com/mocusez/mlir-hello-world

根据Github CI的脚本可了解运行流程

对于Toy Tutroial的困惑

文章中在Toy Dialect中定义一个composite type，名称为struct

1
# A struct is defined by using the `struct` keyword followed by a name.
2
struct MyStruct {
3
  # Inside of the struct is a list of variable declarations without initializers
4
  # or shapes, which may also be other previously defined structs.
5
  var a;
6
  var b;
7
}

文章只是告诉我们了定义一个Type的流程：

定义Type
定义TypeStorage
定义ODS，即TableGen
定义Type的解析和打印
定义Type的相关操作

但是很多细节语焉不详：

我们该如何处理Type和TypeStorage的下降(lowering)？
我明明已经用了MLIR，为什么需要处理Type的解析和打印?(以及这个解析和打印，是针对Toy这门语言，还是MLIR的Dialect)

第二个问题其实极具迷惑性，但实际答案是：用于MLIR文本源代码的解析，在这点上与Toy语言的解析大不相同

基于这两点，我们开启今天文章的内容：实现一个简易的HashTable，并且能打印（Print）看到结果

实现方案概述

在Hello Dialect中定义Dict这个Type，以及Type的操作方法（Put，Get,Delete），转化为LLVM IR进行执行实现

添加自定义类型Type

首先，在Dialect的TableGen里面声明类型

1
def DictType :
2
    DialectType<Hello_Dialect, CPred<"::llvm::isa<DictType>($_self)">,
3
                "Hello dict type">;

如果不确定这里面的CPred怎么写，那么参照这个进行修改即可

在dialect.h中定义Type的C++方法，用于在转化中生成和访问Dialect，同时声明使用的TypeStorage

1
namespace hello {
2
  struct DictTypeStorage;
3
}
4

5
class DictType : public mlir::Type::TypeBase<DictType, mlir::Type,
6
                                             hello::DictTypeStorage> {
7
public:
8
  /// Inherit some necessary constructors from 'TypeBase'.
9
  using Base::Base;
10

11
  /// Create an instance of a `DictType` with the given key and value types.
12
  static DictType get(mlir::Type keyType, mlir::Type valueType);
13

14
  /// Returns the key type of this dict type.
15
  mlir::Type getKeyType();
16

17
  /// Returns the value type of this dict type.
18
  mlir::Type getValueType();
19

20
  /// The name of this dict type.
21
  static constexpr mlir::StringLiteral name = "hello.dict";
22
};

在dialect.cpp中完成Type的方法补齐，下文中的getImpl()是指从TypeStorage访问相关MLIR Type

1
DictType DictType::get(mlir::Type keyType, mlir::Type valueType) {
2
  return Base::get(keyType.getContext(), keyType, valueType);
3
}
4

5
mlir::Type DictType::getKeyType() {
6
  return getImpl()->keyType;
7
}
8

9
/// Returns the value type of this dict type.
10
mlir::Type DictType::getValueType() {
11
  return getImpl()->valueType;
12
}

以及在Dialect中的initialize()方法中添加对应的Type

1
void HelloDialect::initialize() {
2
  addOperations<
3
#define GET_OP_LIST
4
#include "Hello/HelloOps.cpp.inc"
5
      >();
6
  addTypes<DictType>();
7
}

添加自定义的TypeStorage

MLIR存在Type和TypeStorage结构，关于TypeStorage，可以理解为Type当中数据和元数据的实际存储

由于在Dialect.h中已经声明了DictTypeStorage，那么在Dialect.cpp中直接继承mlir::TypeStorage实现就行

operator==确定了Type之间的比较关系

hashKey确定操作的Type是否为同一个

实现*construct方法，使用allocator告诉Dialect需要申请的空间（这一步操作并不会在lowering中体现）

1
struct DictTypeStorage : public mlir::TypeStorage {
2
  /// The `KeyTy` defines what uniquely identifies this type.
3
  /// For dict type, we unique on the key type and value type pair.
4
  using KeyTy = std::pair<mlir::Type, mlir::Type>;
5

6
  /// Constructor for the type storage instance.
7
  DictTypeStorage(mlir::Type keyType, mlir::Type valueType)
8
      : keyType(keyType), valueType(valueType) {}
9

10
  /// Define the comparison function for the key type.
11
  bool operator==(const KeyTy &key) const {
12
    return key.first == keyType && key.second == valueType;
13
  }
14

15
  /// Define a hash function for the key type.
16
  static llvm::hash_code hashKey(const KeyTy &key) {
17
    return llvm::hash_combine(key.first, key.second);
18
  }
19

20
  /// Define a construction function for the key type.
21
  static KeyTy getKey(mlir::Type keyType, mlir::Type valueType) {
22
    return KeyTy(keyType, valueType);
23
  }
24

25
  /// Define a construction method for creating a new instance of this storage.
26
  static DictTypeStorage *construct(mlir::TypeStorageAllocator &allocator,
27
                                    const KeyTy &key) {
28
    // Allocate the storage instance and construct it.
29
    return new (allocator.allocate<DictTypeStorage>())
30
        DictTypeStorage(key.first, key.second);
31
  }
32

33
  /// The key and value types of the dict.
34
  mlir::Type keyType;
35
  mlir::Type valueType;
36
};

Type类型转化

Hello Dialect中的Dict Type需要转化为一个指针，相关的操作实际发生与Operation的下降lowering，通过下降Op的入参和返回值实现

一个方案是创建一个继承LLVMTypeConverter的方法，添加addConversion——这个方案理论可行，但我没试过

1
class HelloTypeConverter : public mlir::LLVMTypeConverter {
2
public:
3
  HelloTypeConverter(mlir::MLIRContext *ctx) : mlir::LLVMTypeConverter(ctx) {
4
    addConversion([](hello::DictType type) {
5
      return mlir::LLVM::LLVMPointerType::get(type.getContext());
6
    });
7
  }
8
};

另外一个方案是直接在typeconverter上加上addConversion

这里还分出两种写法可供大家参考（这两种方法测试均通过）:

直接点名转换的Type的类型

1
  typeConverter.addConversion([](DictType type) -> mlir::Type {
2
      return mlir::LLVM::LLVMPointerType::get(type.getContext());
3
  });

根据mlir::dyn_cast判断后确定返回

1
  typeConverter.addConversion([](mlir::Type type) -> std::optional<mlir::Type> {
2
    if (auto dictType = mlir::dyn_cast<DictType>(type))
3
      return mlir::LLVM::LLVMPointerType::get(type.getContext());
4
    return std::nullopt;
5
  });

按照构造函数传入typeconverter后，就可以使用getTypeConverter根据类型决定传回什么参数

1
auto resultType = getTypeConverter()->convertType(op->getResult(0).getType());
2
if (!resultType) {
3
  return mlir::failure();
4
}

所以这段代码就等价于

1
auto resultType = mlir::LLVM::LLVMPointerType::get(context);

这部分代码相等于就是生成不可变代码——resultType不会因为Operation传入的参数的不同而发生变化

Type类型操作方法转化

对于Dict（哈希表）而言，有put，get，delete这些操作，以及用于释放内存的create和free，这些都需要在Dialect中定义对应Op(Operation)才能实现

下面代码当中，定义AssemblyFormat确保能从文本的mlir文件中能正确解析类型

1
class Hello_Op<string mnemonic, list<Trait> traits = []> :
2
        Op<Hello_Dialect, mnemonic, traits>;
3

4
def Dict_CreateOp : Hello_Op<"dict.create", [Pure]> {
5
  let summary = "Create a new dict<string,i32>";
6
  let results = (outs DictType:$dict);
7
  let assemblyFormat = "attr-dict `:` type($dict)";
8
}
9

10
def Dict_FreeOp : Hello_Op<"dict.free", []> {
11
  let summary = "Free the dict<string,i32> memory";
12
  let arguments = (ins DictType:$dict);
13
  let assemblyFormat = "$dict attr-dict `:` type($dict)";
14
}
15

16
def Dict_PutOp : Hello_Op<"dict.put", []> {
17
  let summary = "Insert string->i32";
18
  let arguments = (ins DictType:$dict, StrAttr:$key, I32Attr:$value);
19
  let results = (outs DictType:$out);
20
  let assemblyFormat = "$dict `,` $key `=` $value attr-dict `:` type($dict) `->` type($out)";
21
}
22

23
def Dict_GetOp : Hello_Op<"dict.get", []> {
24
  let summary = "Lookup string->i32, returns i32";
25
  let arguments = (ins DictType:$dict, StrAttr:$key);
26
  let results = (outs I32:$value);
27
  let assemblyFormat = "$dict `,` $key attr-dict `:` type($dict) `->` type($value)";
28
}
29

30
def Dict_DeleteOp : Hello_Op<"dict.delete", []> {
31
  let summary = "Delete key string";
32
  let arguments = (ins DictType:$dict, StrAttr:$key);
33
  let results = (outs DictType:$out);
34
  let assemblyFormat = "$dict `,` $key attr-dict `:` type($dict) `->` type($out)";
35
}

同时也需要写明各个Op的lowering，这里以Dict_CreateOp的实现为例

1
class DictCreateOpLowering : public mlir::ConversionPattern {
2
public:
3
  explicit DictCreateOpLowering(mlir::TypeConverter &typeConverter,
4
                                mlir::MLIRContext *context)
5
      : mlir::ConversionPattern(
6
            typeConverter, hello::CreateOp::getOperationName(), 1, context) {}
7

8
  mlir::LogicalResult
9
  matchAndRewrite(mlir::Operation *op, mlir::ArrayRef<mlir::Value> operands,
10
                  mlir::ConversionPatternRewriter &rewriter) const override {
11
    auto loc = op->getLoc();
12

13
    auto resultType =
14
        getTypeConverter()->convertType(op->getResult(0).getType());
15
    if (!resultType) {
16
      return mlir::failure();
17
    }
18

19
    mlir::ModuleOp module = op->getParentOfType<mlir::ModuleOp>();
20
    auto createMapRef = getOrInsertCreateMap(rewriter, module);
21

22
    auto callOp = rewriter.create<mlir::LLVM::CallOp>(
23
        loc, resultType, createMapRef, mlir::ValueRange{});
24

25
    rewriter.replaceOp(op, callOp.getResult());
26
    return mlir::success();
27
  }
28

29
private:
30
  mlir::FlatSymbolRefAttr getOrInsertCreateMap(mlir::PatternRewriter &rewriter,
31
                                               mlir::ModuleOp module) const {
32
    auto *context = module.getContext();
33
    if (module.lookupSymbol<mlir::LLVM::LLVMFuncOp>("create_map"))
34
      return mlir::SymbolRefAttr::get(context, "create_map");
35

36
    // Create function type: () -> !llvm.ptr
37
    auto resultType = mlir::LLVM::LLVMPointerType::get(context);
38
    auto fnType =
39
        mlir::LLVM::LLVMFunctionType::get(resultType, std::nullopt, false);
40

41
    // Insert function declaration
42
    mlir::PatternRewriter::InsertionGuard insertGuard(rewriter);
43
    rewriter.setInsertionPointToStart(module.getBody());
44
    rewriter.create<mlir::LLVM::LLVMFuncOp>(module.getLoc(), "create_map",
45
                                            fnType);
46

47
    return mlir::SymbolRefAttr::get(context, "create_map");
48
  }
49
};

这个Op在做的事情就是：将Dict_CreateOp转化为对于create_map这个C/LLVM IR函数的调用，只要能实现，就能用C的printf()完成输出。

定义MLIR文本解析规则

这是非常重要的一部分，需要在声明对于自定义Type的解析范式，即printType和parseType

为了做这一步，需要将Dialect的useDefaultTypePrinterParser设为1

1
def Hello_Dialect : Dialect {
2
    let name = "hello";
3
    let summary = "A hello out-of-tree MLIR dialect.";
4
    let description = [{
5
        This dialect is minimal example to implement hello-world kind of sample code
6
        for MLIR.
7
    }];
8
    let cppNamespace = "::hello";
9
    let useDefaultTypePrinterParser = 1;
10
}

在对应的dialect.cpp中定义printType和parseType

1
mlir::Type DictType::getValueType() {
2
  return getImpl()->valueType;
3
}
4

5
mlir::Type HelloDialect::parseType(mlir::DialectAsmParser &parser) const {
6
  llvm::StringRef typeTag;
7
  if (parser.parseKeyword(&typeTag))
8
    return mlir::Type();
9

10
  if (typeTag == "dict") {
11
    if (parser.parseLess())
12
      return mlir::Type();
13

14
    mlir::Type keyType;
15
    if (parser.parseType(keyType))
16
      return mlir::Type();
17

18
    if (parser.parseComma())
19
      return mlir::Type();
20

21
    mlir::Type valueType;
22
    if (parser.parseType(valueType))
23
      return mlir::Type();
24

25
    if (parser.parseGreater())
26
      return mlir::Type();
27

28
    return DictType::get(keyType, valueType);
29
  }
30

31
  parser.emitError(parser.getNameLoc(), "unknown hello type: ") << typeTag;
32
  return mlir::Type();
33
}
34

35
void HelloDialect::printType(mlir::Type type, mlir::DialectAsmPrinter &printer) const {
36
  if (auto dictType = mlir::dyn_cast<DictType>(type)) {
37
    printer << "dict<";
38
    printer.printType(dictType.getKeyType());
39
    printer << ", ";
40
    printer.printType(dictType.getValueType());
41
    printer << ">";
42
    return;
43
  }
44

45
  llvm_unreachable("unhandled hello type");
46
}

运行！

准备一份MLIR代码

1
module {
2
  func.func @test_dict_operations() -> i32 {
3
    %dict0 = hello.dict.create : !hello.dict<index, i32>
4

5
    %dict1 = hello.dict.put %dict0, "first1" = 100 : !hello.dict<index, i32> -> !hello.dict<index, i32>
6
    %dict2 = hello.dict.put %dict1, "second" = 200 : !hello.dict<index, i32> -> !hello.dict<index, i32>
7
    %dict3 = hello.dict.put %dict2, "third" = 300 : !hello.dict<index, i32> -> !hello.dict<index, i32>
8

9
    %val1 = hello.dict.get %dict3, "first1" : !hello.dict<index, i32> -> i32
10
    %val2 = hello.dict.get %dict3, "second" : !hello.dict<index, i32> -> i32
11
    %val3 = hello.dict.get %dict3, "third" : !hello.dict<index, i32> -> i32
12

13
    %dict4 = hello.dict.delete %dict3, "second" : !hello.dict<index, i32> -> !hello.dict<index, i32>
14

15
    hello.dict.free %dict4 : !hello.dict<index, i32>
16

17
    func.return %val2 : i32
18
  }
19
}

将MLIR下降到LLVM IR，我们看一看看打印出来的LLVM IR的样子

1
; ModuleID = 'LLVMDialectModule'
2
source_filename = "LLVMDialectModule"
3
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
4
target triple = "x86_64-pc-linux-gnu"
5

6
@key_third = internal constant [6 x i8] c"third\00"
7
@key_second = internal constant [7 x i8] c"second\00"
8
@key_first1 = internal constant [7 x i8] c"first1\00"
9

10
declare void @free_map(ptr) local_unnamed_addr
11

12
declare void @delete(ptr, ptr) local_unnamed_addr
13

14
declare ptr @get(ptr, ptr) local_unnamed_addr
15

16
declare void @put(ptr, ptr, i32) local_unnamed_addr
17

18
declare ptr @create_map() local_unnamed_addr
19

20
define i32 @test_dict_operations() local_unnamed_addr {
21
  %1 = tail call ptr @create_map()
22
  tail call void @put(ptr %1, ptr nonnull @key_first1, i32 100)
23
  tail call void @put(ptr %1, ptr nonnull @key_second, i32 200)
24
  tail call void @put(ptr %1, ptr nonnull @key_third, i32 300)
25
  %2 = tail call ptr @get(ptr %1, ptr nonnull @key_first1)
26
  %3 = tail call ptr @get(ptr %1, ptr nonnull @key_second)
27
  %4 = load i32, ptr %3, align 4
28
  %5 = tail call ptr @get(ptr %1, ptr nonnull @key_third)
29
  tail call void @delete(ptr %1, ptr nonnull @key_second)
30
  tail call void @free_map(ptr %1)
31
  ret i32 %4
32
}
33

34
!llvm.module.flags = !{!0}
35

36
!0 = !{i32 2, !"Debug Info Version", i32 3}

大部分转化为了C/LLVM IR的函数调用，这样就变成了我们所熟悉的问题，和前文的printf()行为一样

在C/C++的实现相关接口函数，编译并调用运行

1
../../build/bin/hello-opt dict.mlir -emit=llvm > dict.ll
2
clang-20 dict.c dict.ll -o dict

1
extern int test_dict_operations();
2

3
int main() {
4
    printf("Starting dictionary test...\n");
5
    int result = test_dict_operations();
6
    printf("Result from test_dict_operations: %d\n", result);
7

8
    return 0;
9
}

预期结果为

1
Starting dictionary test...
2
Result from test_dict_operations: 200

思考内容

可以看到这种实现方式的局限性：只能适配于<string,i32>结构的哈希表，不能适用于其他类型。这个问题可以通过动态生成LLVM IR解决，具体操作就是另外一个问题了😂

还是建议要自己上手尝试才加深理解