FPGA開発日記

カテゴリ別記事インデックス https://msyksphinz.github.io/github_pages , English Version https://fpgadevdiary.hatenadiary.com/

自作CPUにBit-manipulationの命令を実装

自作CPUにBit-manipulationの命令を実装した。

Bit-manipulationのうちRatifiedされた命令群としてZba, Zbb, Zbc, Zbsがある。

以下の命令を自作CPUに実装した。

  • Zba : address generation instructions
    • add.uw rd, rs1, rs2 Add unsigned word
    • sh1add rd, rs1, rs2 (Shift left by 1 and add)
    • sh1add.uw rd, rs1, rs2 (Shift unsigned word left by 1 and add)
    • sh2add rd, rs1, rs2 (Shift left by 2 and add)
    • sh2add.uw rd, rs1, rs2 (Shift unsigned word left by 2 and add)
    • sh3add rd, rs2, rs2 (Shift left by 3 and add)
    • sh3add.uw rd, rs1, rs2 (Shift unsigned word left by 3 and add)
    • slli.uw rd, rs1, imm (Shift-left unsigned word (Immediate))
  • Zbb : basic bit-manipulation
    • andn rd, rs1, rs2 (AND with inverted operand)
    • orn rd, rs1, rs2 (OR with inverted operand)
    • xnor rd, rs1, rs2 (Exclusive NOR)
    • clz rd, rs (Count leading zero bits)
    • clzw rd, rs (Count leading zero bits in word)
    • ctz rd, rs (Count trailing zero bits)
    • ctzw rd, rs (Count trailing zero bits in word)
    • cpop rd, rs (Count set bits)
    • cpopw rd, rs (Count set bits in word)
    • max rd, rs1, rs2 (Maximum)
    • maxu rd, rs1, rs2 (Unsigned maximum)
    • min rd, rs1, rs2 (Minimum)
    • minu rd, rs1, rs2 (Unsigned minimum)
    • sext.b rd, rs (Sign-extend byte)
    • sext.h rd, rs (Sign-extend halfword)
    • zext.h rd, rs (Zero-extend halfword)
    • rol rd, rs1, rs2 (Rotate left (Register))
    • rolw rd, rs1, rs2 (Rotate Left Word (Register))
    • ror rd, rs1, rs2 (Rotate right (Register))
    • rori rd, rs1, shamt Rotate right (Immediate))
    • roriw rd, rs1, shamt Rotate right Word (Immediate))
    • rorw rd, rs1, rs2 (Rotate right Word (Register))
    • orc.b rd, rs (Bitwise OR-Combine, byte granule)
    • rev8 rd, rs (Byte-reverse register)
  • Zbc : carry-less multiplication
    • clmul rd, rs1, rs2 (Carry-less multiply (low-part))
    • clmulh rd, rs1, rs2 (Carry-less multiply (high-part))
    • clmulr rd, rs1, rs2 (Carry-less multiply (reversed))
  • Zbs : single-bit instructions
    • bclr rd, rs1, rs2 (Single-Bit Clear (Register))
    • bclri rd, rs1, imm (Single-Bit Clear (Immediate))
    • bext rd, rs1, rs2 (Single-Bit Extract (Register))
    • bexti rd, rs1, imm (Single-Bit Extract (Immediate))
    • binv rd, rs1, rs2 (Single-Bit Invert (Register))
    • binvi rd, rs1, imm (Single-Bit Invert (Immediate))
    • bset rd, rs1, rs2 (Single-Bit Set (Register))
    • bseti rd, rs1, imm (Single-Bit Set (Immediate))

単純に、それぞれの論理演算を並べていった。

  case (i_op)
    OP_UNSIGND_ADD_32         : o_out = {31'h0, i_rs1[31: 0]} + i_rs2;
    OP_AND_INV                : o_out = i_rs1 & ~i_rs2;
    OP_CARRY_LESS_MUL         : o_out = w_clmul[riscv_pkg::XLEN_W-1: 0];
    OP_CARRY_LESS_MULH        : o_out = w_clmul[riscv_pkg::XLEN_W*2-1: riscv_pkg::XLEN_W];
    OP_CARRY_LESS_MULR        : o_out = w_clmul[riscv_pkg::XLEN_W*2-2: riscv_pkg::XLEN_W-1];
    OP_CLZ                    : o_out = w_leading_zero_count_xlen;
    OP_CLZW                   : o_out = w_leading_zero_count_32;
    OP_CPOP                   : o_out = w_bit_cnt_xlen;
    OP_CPOPW                  : o_out = w_bit_cnt_32;
    OP_CTZ                    : o_out = w_trailing_zero_count_xlen;
    OP_CTZW                   : o_out = w_trailing_zero_count_32;
    OP_SIGNED_MAX             : o_out = $signed(i_rs1) > $signed(i_rs2) ? i_rs1 : i_rs2;
    OP_UNSIGNED_MAX           : o_out =         i_rs1  >         i_rs2  ? i_rs1 : i_rs2;
    OP_SIGNED_MIN             : o_out = $signed(i_rs1) < $signed(i_rs2) ? i_rs1 : i_rs2;
    OP_UNSIGNED_MIN           : o_out =         i_rs1  <         i_rs2  ? i_rs1 : i_rs2;
    OP_BITWISE_OR             : o_out = w_bitwise_or;
    OP_INVERTED_OR            : o_out = i_rs1 | ~i_rs2;
    OP_BYTE_REVERSE           : o_out = w_byte_rev;
    OP_ROTATE_LEFT            : o_out = (i_rs1        << i_rs2[XLEN_W_W-1: 0]) | (i_rs1        >> (riscv_pkg::XLEN_W - i_rs2[XLEN_W_W-1: 0]));
    OP_ROTATE_LEFT_WORD       : begin
      w_shift_tmp_32 = (i_rs1[31: 0] << i_rs2[ 4: 0]) | (i_rs1[31: 0] >> (32 - i_rs2[ 4: 0]));
      o_out = {{(riscv_pkg::XLEN_W-32){w_shift_tmp_32[31]}}, w_shift_tmp_32};
    end
    OP_ROTATE_RIGHT           : o_out = (i_rs1        >> i_rs2[XLEN_W_W-1: 0]) | (i_rs1        << (riscv_pkg::XLEN_W - i_rs2[XLEN_W_W-1: 0]));
    OP_ROTATE_RIGHT_32        : begin
      w_shift_tmp_32 = (i_rs1[31: 0] >> i_rs2[4: 0]) | (i_rs1[31: 0] << (32 - i_rs2[4: 0]));
      o_out = {{(riscv_pkg::XLEN_W-32){w_shift_tmp_32[31]}}, w_shift_tmp_32};
    end
    OP_BIT_CLEAR              : o_out = i_rs1 & ~(1 << i_rs2[XLEN_W_W-1: 0]);
    OP_BIT_EXTRACT            : o_out = i_rs1[i_rs2[XLEN_W_W-1: 0]];
    OP_BIT_INVERT             : o_out = i_rs1 ^ (1 << i_rs2[XLEN_W_W-1: 0]);
    OP_BIT_SET                : o_out = i_rs1 | (1 << i_rs2[XLEN_W_W-1: 0]);
    OP_SIGN_EXTEND_8          : o_out = {{(riscv_pkg::XLEN_W- 8){i_rs1[ 7]}}, i_rs1[ 7: 0]};
    OP_SIGN_EXTEND_16         : o_out = {{(riscv_pkg::XLEN_W-16){i_rs1[15]}}, i_rs1[15: 0]};
    OP_SIGNED_SH1ADD          : o_out = i_rs2 + {i_rs1[riscv_pkg::XLEN_W-2: 0], 1'b0};
    OP_UNSIGNED_SH1ADD_32     : o_out = i_rs2 + {{(riscv_pkg::XLEN_W-31){1'b0}}, i_rs1[31: 0], 1'b0};
    OP_SIGNED_SH2ADD          : o_out = i_rs2 + {i_rs1[riscv_pkg::XLEN_W-3: 0], 2'b00};
    OP_UNSIGNED_SH2ADD_32     : o_out = i_rs2 + {{(riscv_pkg::XLEN_W-30){1'b0}}, i_rs1[31: 0], 2'b00};
    OP_SIGNED_SH3ADD          : o_out = i_rs2 + {i_rs1[riscv_pkg::XLEN_W-4: 0], 3'b000};
    OP_UNSIGNED_SH3ADD_32     : o_out = i_rs2 + {{(riscv_pkg::XLEN_W-29){1'b0}}, i_rs1[31: 0], 3'b000};
    OP_UNSIGNED_SHIFT_LEFT_32 : o_out = {{(riscv_pkg::XLEN_W-32){1'b0}}, i_rs1[31: 0]} << i_rs2[ 5: 0];
    OP_XNOR                   : o_out = ~(i_rs1 ^ i_rs2);
    OP_ZERO_EXTEND_16         : o_out = {{(riscv_pkg::XLEN_W-16){1'b0}}, i_rs1[15: 0]};

Count Leading Zero系の実装は、ちょっと面倒なので単純に並べて書くことにした。

module bit_clz_32
  (
   input logic [31: 0] i_in,
   output logic [5: 0] o_out
   );

always_comb begin
  /* verilator lint_off CASEX */
  casex (i_in)
    'b1xxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h00;
    'b01xx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h01;
    'b001x_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h02;
    'b0001_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h03;
    'b0000_1xxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h04;
    'b0000_01xx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h05;
    'b0000_001x_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h06;
    'b0000_0001_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h07;
    'b0000_0000_1xxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h08;
    'b0000_0000_01xx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h09;
    'b0000_0000_001x_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h0a;
    'b0000_0000_0001_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h0b;
    'b0000_0000_0000_1xxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h0c;
    'b0000_0000_0000_01xx_xxxx_xxxx_xxxx_xxxx : o_out = 'h0d;
    'b0000_0000_0000_001x_xxxx_xxxx_xxxx_xxxx : o_out = 'h0e;
    'b0000_0000_0000_0001_xxxx_xxxx_xxxx_xxxx : o_out = 'h0f;
    'b0000_0000_0000_0000_1xxx_xxxx_xxxx_xxxx : o_out = 'h10;
    'b0000_0000_0000_0000_01xx_xxxx_xxxx_xxxx : o_out = 'h11;
    'b0000_0000_0000_0000_001x_xxxx_xxxx_xxxx : o_out = 'h12;
    'b0000_0000_0000_0000_0001_xxxx_xxxx_xxxx : o_out = 'h13;
    'b0000_0000_0000_0000_0000_1xxx_xxxx_xxxx : o_out = 'h14;
    'b0000_0000_0000_0000_0000_01xx_xxxx_xxxx : o_out = 'h15;
    'b0000_0000_0000_0000_0000_001x_xxxx_xxxx : o_out = 'h16;
    'b0000_0000_0000_0000_0000_0001_xxxx_xxxx : o_out = 'h17;
    'b0000_0000_0000_0000_0000_0000_1xxx_xxxx : o_out = 'h18;
    'b0000_0000_0000_0000_0000_0000_01xx_xxxx : o_out = 'h19;
    'b0000_0000_0000_0000_0000_0000_001x_xxxx : o_out = 'h1a;
    'b0000_0000_0000_0000_0000_0000_0001_xxxx : o_out = 'h1b;
    'b0000_0000_0000_0000_0000_0000_0000_1xxx : o_out = 'h1c;
    'b0000_0000_0000_0000_0000_0000_0000_01xx : o_out = 'h1d;
    'b0000_0000_0000_0000_0000_0000_0000_001x : o_out = 'h1e;
    'b0000_0000_0000_0000_0000_0000_0000_0001 : o_out = 'h1f;
    'b0000_0000_0000_0000_0000_0000_0000_0000 : o_out = 'h20;
    default : o_out = 'h00;
  endcase // casex (i_in)                                                                                                                                                                                                                                                                                                                              
end // always_comb                                                                                                                                                                                                                                                                                                                                     

endmodule // bit_clz_32

テストには、riscv-arch-testsのB拡張用のテストを使用した。これを自作CPU用のテストパタンリストに追加して流してみる。

clz-01  : PASS
ctz-01  : PASS
cpopw-01        : PASS
cpop-01 : PASS
ctzw-01 : PASS
clzw-01 : PASS
orcb_64-01      : PASS
rev8-01 : PASS
binvi-01        : PASS
bclri-01        : PASS
roriw-01        : PASS
bseti-01        : PASS
bexti-01        : PASS
rori-01 : PASS
clmul-01        : PASS
clmulh-01       : PASS
clmulr-01       : PASS
bext-01 : PASS
bset-01 : PASS
binv-01 : PASS
bclr-01 : PASS
sext.b-01       : PASS
slli.uw-01      : PASS
rolw-01 : PASS
rol-01  : PASS
sext.h-01       : PASS
minu-01 : PASS
zext.h_64-01    : PASS
max-01  : PASS
add.uw-01       : PASS
min-01  : PASS
ror-01  : PASS
maxu-01 : PASS
rorw-01 : PASS
orn-01  : PASS
andn-01 : PASS
sh1add-01       : PASS
sh3add-01       : PASS
sh1add.uw-01    : PASS
xnor-01 : PASS
sh2add.uw-01    : PASS
sh2add-01       : PASS
sh3add.uw-01    : PASS

一応全部のテストがPASSすることを確認した!