自作CPUにベクトル命令を追加する実装検討 (3. 基本的なデータパスの実装)

自作CPUにベクトル実行エンジンを追加する試行をしている。

ベクトルエンジンの基本的な価値を実装している。各データ型に応じて演算を切り替える、基本的な形だ。

generate for (genvar d_idx = 0; d_idx < riscv_vec_conf_pkg::DLEN_W / 64; d_idx++) begin : datapath_loop
   vec_alu_datapath
   u_vec_alu_datapath
     (
      .i_op  (r_ex1_pipe_ctrl.op),
      .i_sew (r_ex1_issue.vlvtype.vtype.vsew),
      .i_vs1 (r_ex1_vpr_rs_data[0][d_idx*64 +: 64]),
      .i_vs2 (r_ex1_vpr_rs_data[1][d_idx*64 +: 64]),
      .i_rs1 (r_ex1_rs1_data),
      .i_v0  ('h0),
      .o_res (w_ex1_vec_result [d_idx*64 +: 64])
      );
end endgenerate // block: datapath_loop

データパスは以下のような構造で実装していく。

always_comb begin
  case (i_op)
    OP_MV_V_X : begin
      unique case (i_sew)
        scariv_vec_pkg::EW8 : for (int b = 0; b < 8; b++) w_res.w8 [b] = i_rs1[ 7: 0];
        scariv_vec_pkg::EW16: for (int b = 0; b < 4; b++) w_res.w16[b] = i_rs1[15: 0];
        scariv_vec_pkg::EW32: for (int b = 0; b < 2; b++) w_res.w32[b] = i_rs1[31: 0];
        scariv_vec_pkg::EW64: for (int b = 0; b < 1; b++) w_res.w64[b] = i_rs1[63: 0];
        default             :                             w_res = 'h0;
      endcase // unique case (i_sew)
    end

とりあえず、基本的な動作は確認できる。CSRに応じてデータ幅を切り替える機能は実装しなければならない。

4687 : 237 : PC=[000000008000200e] (M,33,01) 05877757 vsetvli a4, a4, e64, m1, ta, mu
GPR[14](17) <= 0000000000000004
4691 : 238 : PC=[0000000080002012] (M,34,01) 5e07cc57 vmv.v.x v24, a5
VPR[24](36) <= 00000000_00000004_00000000_00000004_00000000_00000004_00000000_00000004_
4691 : 239 : PC=[0000000080002016] (M,34,02) 00009fb9 c.addw  a5, a4
GPR[15](46) <= 0000000000000008
4691 : 240 : PC=[0000000080002018] (M,34,04) fef659e3 bge     a2, a5, pc - 14
4697 : 241 : PC=[000000008000200a] (M,35,01) 40f5873b subw    a4, a1, a5
GPR[14](21) <= 000000000000005c
4701 : 242 : PC=[000000008000200e] (M,36,01) 05877757 vsetvli a4, a4, e64, m1, ta, mu
GPR[14](25) <= 0000000000000004
4705 : 243 : PC=[0000000080002012] (M,37,01) 5e07cc57 vmv.v.x v24, a5
==========================================
Wrong VPR[24](37):
ISS[24] = 00000000_00000008_00000000_00000008_00000000_00000008_00000000_00000008_
RTL[24] = 00000008_00000008_00000008_00000008_00000008_00000008_00000008_00000008_
                ~~                ~~                ~~                ~~

FPGA開発日記

カテゴリ別記事インデックス https://msyksphinz.github.io/github_pages , English Version https://fpgadevdiary.hatenadiary.com/

自作CPUにベクトル命令を追加する実装検討 (3. 基本的なデータパスの実装)