オリジナルLLVM Backendを追加しよう (21. 剰余演算の最適化と論理・比較命令の追加)

LLVMにはすでにRISC-Vのバックエンドサポートが追加されている。しかし、勉強のために独自のRISC-V実装をLLVMに追加している。

資料としては Tutorial: Creating an LLVM Backend for the Cpu0 Architecture を使用している。やっとChapter-4だ。

DIV命令は演算のコストが大きくなってしまうので、LLVMはsrem命令を乗算に置き換えてしまう。

int b = 11;
b = (b + 1) % 12

上記の演算は、以下のように変換される。正直この変換は意味が分からないが、等価なのであろう。。。

(b + 1) % 12 = 0xC * 0x2AAAAAAB = 0x2_00000004
A = 0x2 >> 1 = 1
B = 0x1 << 31 = 0
C = A + B = 1
D = C * 12 = 1 * 12 = 12
ANS = D - 12 = 0

以下のようなデータフローが生成される。

f:id:msyksphinz:20190217002141p:plain — 最適化されたデータフロー。mulhsが使用されている。

ちなみに最適化前。通常のsremが使用されている。

f:id:msyksphinz:20190217002302p:plain — 最適化される前のデータフロー。sremが使用されている。

一方でMipsの方式を使用すると、以下のようになるらしい。MIPSはHILOレジスタを使用するので、MUL命令とMFHIを使用すればよい。

        lui     $1, 20164
        addiu   $2, $4, 1
        ori     $1, $1, 60495
        mult    $2, $1
        mfhi    $1
        srl     $3, $1, 31
        sra     $1, $1, 2
        addu    $1, $1, $3
        sll     $3, $1, 1
        addu    $3, $3, $1
        sll     $1, $1, 4
        subu    $1, $3, $1
        jr      $ra
        addu    $2, $2, $1

一方で、volatileを使用すると最適化が抑制される。

int test_mod()
{
  int b = 11;
  volatile int a = 12;

  b = (b+1)%a;

  return b;
}

        addi    x10, x10, 1
        lw      x11, 0(x2)
        rem     x10, x10, x11
        sw      x10, 4(x2)
        lw      x10, 4(x2)
        addi    x2, x2, 8
        ret

ローテート命令の実装以下をコンパイルすると、エラーが出てしまう。RISC-Vにはローテート命令は実装されていないので、これはどうするか問題だ。

$ ./bin/llc -march=myriscvx32 -relocation-model=pic -filetype=asm ch4_1_rotate.bc -o ch4_1_rotate.myriscvx.s

LLVM ERROR: Cannot select: t18: i32 = rotl t6, Constant:i32<30>
  t6: i32,ch = load<(dereferenceable load 4 from %ir.a)> t5, FrameIndex:i32<0>, undef:i32
    t2: i32 = FrameIndex<0>
    t4: i32 = undef
  t7: i32 = Constant<30>
In function: _Z16test_rotate_leftv

本物のRISC-Vサポートはどのような実装になっているのか調べてみた。

lib/Target/Mips/MipsISelLowering.cpp

  setOperationAction(ISD::ROTL,              MVT::i32,   Expand);
  setOperationAction(ISD::ROTL,              MVT::i64,   Expand);
...
  if (!Subtarget.hasMips32r2())
    setOperationAction(ISD::ROTR, MVT::i32,   Expand);

  if (!Subtarget.hasMips64r2())
    setOperationAction(ISD::ROTR, MVT::i64,   Expand);

setOperationAction()というのはどういうことだろう？とりあえず自分の実装にも加えてみる。

次に論理命令を追加する。これがないと最終的に上記のch4_1_rotateは生成できない。

-def ORI  : ArithLogicI<0b0010011, 0b110, "ori", or, uimm12, immZExt12, GPR>;
+def XORI : ArithLogicI<0b0010011, 0b110, "xori", xor, uimm12, immZExt12, GPR>;
+def ORI  : ArithLogicI<0b0010011, 0b110, "ori",  or,  uimm12, immZExt12, GPR>;
+def ANDI : ArithLogicI<0b0010011, 0b111, "andi", and, uimm12, immZExt12, GPR>;
+
 def LUI  : ArithLogicU<0b0110111, "lui", simm20, immSExt12>;
 def ADD  : ArithLogicR<0b0110011, 0b000, "add", add, GPR>;
 def SUB  : ArithLogicR<0b0110011, 0b000, "sub", sub, GPR>;

-// sra is IR node for ashr llvm IR instruction of .bc
-def SRL  : shift_rotate_reg<0b0110011, 0b0000000, 0b101, 0x0, "srl", srl, GPR>;
 def SLL  : shift_rotate_reg<0b0110011, 0b0000000, 0b001, 0x0, "sll", shl, GPR>;
+def AND  : ArithLogicR<0b0110011, 0b111, "and", and, GPR>;
+def SRL  : shift_rotate_reg<0b0110011, 0b0000000, 0b101, 0x0, "srl", srl, GPR>;
 def SRA  : shift_rotate_reg<0b0110011, 0b0100000, 0b101, 0x0, "sra", sra, GPR>;
+def OR   : ArithLogicR<0b0110011, 0b110, "or",  or,  GPR>;
+def XOR  : ArithLogicR<0b0110011, 0b100, "xor", xor, GPR>;

とりあえずこれで命令がローテート操作が命令出力できるようになった。一安心。

次に、比較命令の実装を行っていく。比較命令のためには、Predicateのためのオペレーションとしてsltiとかsltuとかが使用できるが、通常のSDNodeではなく、PatFragという型で定義しなければならないらしい。以下のフォーマットを作成した。

lib/Target/Mips/MipsISelLowering.cpp

// SetCC
class SetCC_R<bits<7> opcode, bits<3> funct3,
              string instr_asm, PatFrag cond_op,
              RegisterClass RC> :
  FI<opcode, funct3, (outs GPR:$ra), (ins RC:$rb, RC:$rc),
  !strconcat(instr_asm, "\t$ra, $rb, $rc"),
  [(set GPR:$ra, (cond_op RC:$rb, RC:$rc))], IIAlu> {
    let isReMaterializable = 1;
}


class SetCC_I<bits<7> opcode, bits<3> funct3,
              string instr_asm, PatFrag cond_op,
              Operand imm, PatLeaf imm_type, RegisterClass RC> :
  FI<opcode, funct3, (outs GPR:$ra), (ins RC:$rb, imm:$imm12),
  !strconcat(instr_asm, "\t$ra, $rb, $imm12"),
  [(set GPR:$ra, (cond_op RC:$rb, imm_type:$imm12))], IIAlu> {
    let isReMaterializable = 1;
}

これに基づいて比較命令を実装した。これで一応LLVMがコンパイルできるようになった。比較処理はまだ命令生成できないけど。

def SLTI  : SetCC_I<0b0010011, 0b010, "slti",  setlt,  simm12, immSExt12, GPR>;
def SLTIU : SetCC_I<0b0010011, 0b011, "sltiu", setult, simm12, immSExt12, GPR>;
def SLT   : SetCC_R<0b0110011, 0b010, "slt",  setlt,  GPR>;
def SLTU  : SetCC_R<0b0110011, 0b011, "sltu", setult, GPR>;

FPGA開発日記

カテゴリ別記事インデックス https://msyksphinz.github.io/github_pages , English Version https://fpgadevdiary.hatenadiary.com/

オリジナルLLVM Backendを追加しよう (21. 剰余演算の最適化と論理・比較命令の追加)