自作CPUにBit-manipulationの命令を実装した。
Bit-manipulationのうちRatifiedされた命令群としてZba, Zbb, Zbc, Zbsがある。
以下の命令を自作CPUに実装した。
- Zba : address generation instructions
- add.uw rd, rs1, rs2 Add unsigned word
- sh1add rd, rs1, rs2 (Shift left by 1 and add)
- sh1add.uw rd, rs1, rs2 (Shift unsigned word left by 1 and add)
- sh2add rd, rs1, rs2 (Shift left by 2 and add)
- sh2add.uw rd, rs1, rs2 (Shift unsigned word left by 2 and add)
- sh3add rd, rs2, rs2 (Shift left by 3 and add)
- sh3add.uw rd, rs1, rs2 (Shift unsigned word left by 3 and add)
- slli.uw rd, rs1, imm (Shift-left unsigned word (Immediate))
- Zbb : basic bit-manipulation
- andn rd, rs1, rs2 (AND with inverted operand)
- orn rd, rs1, rs2 (OR with inverted operand)
- xnor rd, rs1, rs2 (Exclusive NOR)
- clz rd, rs (Count leading zero bits)
- clzw rd, rs (Count leading zero bits in word)
- ctz rd, rs (Count trailing zero bits)
- ctzw rd, rs (Count trailing zero bits in word)
- cpop rd, rs (Count set bits)
- cpopw rd, rs (Count set bits in word)
- max rd, rs1, rs2 (Maximum)
- maxu rd, rs1, rs2 (Unsigned maximum)
- min rd, rs1, rs2 (Minimum)
- minu rd, rs1, rs2 (Unsigned minimum)
- sext.b rd, rs (Sign-extend byte)
- sext.h rd, rs (Sign-extend halfword)
- zext.h rd, rs (Zero-extend halfword)
- rol rd, rs1, rs2 (Rotate left (Register))
- rolw rd, rs1, rs2 (Rotate Left Word (Register))
- ror rd, rs1, rs2 (Rotate right (Register))
- rori rd, rs1, shamt Rotate right (Immediate))
- roriw rd, rs1, shamt Rotate right Word (Immediate))
- rorw rd, rs1, rs2 (Rotate right Word (Register))
- orc.b rd, rs (Bitwise OR-Combine, byte granule)
- rev8 rd, rs (Byte-reverse register)
- Zbc : carry-less multiplication
- clmul rd, rs1, rs2 (Carry-less multiply (low-part))
- clmulh rd, rs1, rs2 (Carry-less multiply (high-part))
- clmulr rd, rs1, rs2 (Carry-less multiply (reversed))
- Zbs : single-bit instructions
- bclr rd, rs1, rs2 (Single-Bit Clear (Register))
- bclri rd, rs1, imm (Single-Bit Clear (Immediate))
- bext rd, rs1, rs2 (Single-Bit Extract (Register))
- bexti rd, rs1, imm (Single-Bit Extract (Immediate))
- binv rd, rs1, rs2 (Single-Bit Invert (Register))
- binvi rd, rs1, imm (Single-Bit Invert (Immediate))
- bset rd, rs1, rs2 (Single-Bit Set (Register))
- bseti rd, rs1, imm (Single-Bit Set (Immediate))
単純に、それぞれの論理演算を並べていった。
case (i_op) OP_UNSIGND_ADD_32 : o_out = {31'h0, i_rs1[31: 0]} + i_rs2; OP_AND_INV : o_out = i_rs1 & ~i_rs2; OP_CARRY_LESS_MUL : o_out = w_clmul[riscv_pkg::XLEN_W-1: 0]; OP_CARRY_LESS_MULH : o_out = w_clmul[riscv_pkg::XLEN_W*2-1: riscv_pkg::XLEN_W]; OP_CARRY_LESS_MULR : o_out = w_clmul[riscv_pkg::XLEN_W*2-2: riscv_pkg::XLEN_W-1]; OP_CLZ : o_out = w_leading_zero_count_xlen; OP_CLZW : o_out = w_leading_zero_count_32; OP_CPOP : o_out = w_bit_cnt_xlen; OP_CPOPW : o_out = w_bit_cnt_32; OP_CTZ : o_out = w_trailing_zero_count_xlen; OP_CTZW : o_out = w_trailing_zero_count_32; OP_SIGNED_MAX : o_out = $signed(i_rs1) > $signed(i_rs2) ? i_rs1 : i_rs2; OP_UNSIGNED_MAX : o_out = i_rs1 > i_rs2 ? i_rs1 : i_rs2; OP_SIGNED_MIN : o_out = $signed(i_rs1) < $signed(i_rs2) ? i_rs1 : i_rs2; OP_UNSIGNED_MIN : o_out = i_rs1 < i_rs2 ? i_rs1 : i_rs2; OP_BITWISE_OR : o_out = w_bitwise_or; OP_INVERTED_OR : o_out = i_rs1 | ~i_rs2; OP_BYTE_REVERSE : o_out = w_byte_rev; OP_ROTATE_LEFT : o_out = (i_rs1 << i_rs2[XLEN_W_W-1: 0]) | (i_rs1 >> (riscv_pkg::XLEN_W - i_rs2[XLEN_W_W-1: 0])); OP_ROTATE_LEFT_WORD : begin w_shift_tmp_32 = (i_rs1[31: 0] << i_rs2[ 4: 0]) | (i_rs1[31: 0] >> (32 - i_rs2[ 4: 0])); o_out = {{(riscv_pkg::XLEN_W-32){w_shift_tmp_32[31]}}, w_shift_tmp_32}; end OP_ROTATE_RIGHT : o_out = (i_rs1 >> i_rs2[XLEN_W_W-1: 0]) | (i_rs1 << (riscv_pkg::XLEN_W - i_rs2[XLEN_W_W-1: 0])); OP_ROTATE_RIGHT_32 : begin w_shift_tmp_32 = (i_rs1[31: 0] >> i_rs2[4: 0]) | (i_rs1[31: 0] << (32 - i_rs2[4: 0])); o_out = {{(riscv_pkg::XLEN_W-32){w_shift_tmp_32[31]}}, w_shift_tmp_32}; end OP_BIT_CLEAR : o_out = i_rs1 & ~(1 << i_rs2[XLEN_W_W-1: 0]); OP_BIT_EXTRACT : o_out = i_rs1[i_rs2[XLEN_W_W-1: 0]]; OP_BIT_INVERT : o_out = i_rs1 ^ (1 << i_rs2[XLEN_W_W-1: 0]); OP_BIT_SET : o_out = i_rs1 | (1 << i_rs2[XLEN_W_W-1: 0]); OP_SIGN_EXTEND_8 : o_out = {{(riscv_pkg::XLEN_W- 8){i_rs1[ 7]}}, i_rs1[ 7: 0]}; OP_SIGN_EXTEND_16 : o_out = {{(riscv_pkg::XLEN_W-16){i_rs1[15]}}, i_rs1[15: 0]}; OP_SIGNED_SH1ADD : o_out = i_rs2 + {i_rs1[riscv_pkg::XLEN_W-2: 0], 1'b0}; OP_UNSIGNED_SH1ADD_32 : o_out = i_rs2 + {{(riscv_pkg::XLEN_W-31){1'b0}}, i_rs1[31: 0], 1'b0}; OP_SIGNED_SH2ADD : o_out = i_rs2 + {i_rs1[riscv_pkg::XLEN_W-3: 0], 2'b00}; OP_UNSIGNED_SH2ADD_32 : o_out = i_rs2 + {{(riscv_pkg::XLEN_W-30){1'b0}}, i_rs1[31: 0], 2'b00}; OP_SIGNED_SH3ADD : o_out = i_rs2 + {i_rs1[riscv_pkg::XLEN_W-4: 0], 3'b000}; OP_UNSIGNED_SH3ADD_32 : o_out = i_rs2 + {{(riscv_pkg::XLEN_W-29){1'b0}}, i_rs1[31: 0], 3'b000}; OP_UNSIGNED_SHIFT_LEFT_32 : o_out = {{(riscv_pkg::XLEN_W-32){1'b0}}, i_rs1[31: 0]} << i_rs2[ 5: 0]; OP_XNOR : o_out = ~(i_rs1 ^ i_rs2); OP_ZERO_EXTEND_16 : o_out = {{(riscv_pkg::XLEN_W-16){1'b0}}, i_rs1[15: 0]};
Count Leading Zero系の実装は、ちょっと面倒なので単純に並べて書くことにした。
module bit_clz_32 ( input logic [31: 0] i_in, output logic [5: 0] o_out ); always_comb begin /* verilator lint_off CASEX */ casex (i_in) 'b1xxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h00; 'b01xx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h01; 'b001x_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h02; 'b0001_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h03; 'b0000_1xxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h04; 'b0000_01xx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h05; 'b0000_001x_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h06; 'b0000_0001_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h07; 'b0000_0000_1xxx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h08; 'b0000_0000_01xx_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h09; 'b0000_0000_001x_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h0a; 'b0000_0000_0001_xxxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h0b; 'b0000_0000_0000_1xxx_xxxx_xxxx_xxxx_xxxx : o_out = 'h0c; 'b0000_0000_0000_01xx_xxxx_xxxx_xxxx_xxxx : o_out = 'h0d; 'b0000_0000_0000_001x_xxxx_xxxx_xxxx_xxxx : o_out = 'h0e; 'b0000_0000_0000_0001_xxxx_xxxx_xxxx_xxxx : o_out = 'h0f; 'b0000_0000_0000_0000_1xxx_xxxx_xxxx_xxxx : o_out = 'h10; 'b0000_0000_0000_0000_01xx_xxxx_xxxx_xxxx : o_out = 'h11; 'b0000_0000_0000_0000_001x_xxxx_xxxx_xxxx : o_out = 'h12; 'b0000_0000_0000_0000_0001_xxxx_xxxx_xxxx : o_out = 'h13; 'b0000_0000_0000_0000_0000_1xxx_xxxx_xxxx : o_out = 'h14; 'b0000_0000_0000_0000_0000_01xx_xxxx_xxxx : o_out = 'h15; 'b0000_0000_0000_0000_0000_001x_xxxx_xxxx : o_out = 'h16; 'b0000_0000_0000_0000_0000_0001_xxxx_xxxx : o_out = 'h17; 'b0000_0000_0000_0000_0000_0000_1xxx_xxxx : o_out = 'h18; 'b0000_0000_0000_0000_0000_0000_01xx_xxxx : o_out = 'h19; 'b0000_0000_0000_0000_0000_0000_001x_xxxx : o_out = 'h1a; 'b0000_0000_0000_0000_0000_0000_0001_xxxx : o_out = 'h1b; 'b0000_0000_0000_0000_0000_0000_0000_1xxx : o_out = 'h1c; 'b0000_0000_0000_0000_0000_0000_0000_01xx : o_out = 'h1d; 'b0000_0000_0000_0000_0000_0000_0000_001x : o_out = 'h1e; 'b0000_0000_0000_0000_0000_0000_0000_0001 : o_out = 'h1f; 'b0000_0000_0000_0000_0000_0000_0000_0000 : o_out = 'h20; default : o_out = 'h00; endcase // casex (i_in) end // always_comb endmodule // bit_clz_32
テストには、riscv-arch-testsのB拡張用のテストを使用した。これを自作CPU用のテストパタンリストに追加して流してみる。
clz-01 : PASS ctz-01 : PASS cpopw-01 : PASS cpop-01 : PASS ctzw-01 : PASS clzw-01 : PASS orcb_64-01 : PASS rev8-01 : PASS binvi-01 : PASS bclri-01 : PASS roriw-01 : PASS bseti-01 : PASS bexti-01 : PASS rori-01 : PASS clmul-01 : PASS clmulh-01 : PASS clmulr-01 : PASS bext-01 : PASS bset-01 : PASS binv-01 : PASS bclr-01 : PASS sext.b-01 : PASS slli.uw-01 : PASS rolw-01 : PASS rol-01 : PASS sext.h-01 : PASS minu-01 : PASS zext.h_64-01 : PASS max-01 : PASS add.uw-01 : PASS min-01 : PASS ror-01 : PASS maxu-01 : PASS rorw-01 : PASS orn-01 : PASS andn-01 : PASS sh1add-01 : PASS sh3add-01 : PASS sh1add.uw-01 : PASS xnor-01 : PASS sh2add.uw-01 : PASS sh2add-01 : PASS sh3add.uw-01 : PASS
一応全部のテストがPASSすることを確認した!