自作CPUコアとBOOMv3のPPA(Performance, Power, Area)を比較する (4. 面積の解析)

今のところ自作CPUは面積が大きすぎる。Vivadoでの論理合成結果をもっと詳細に見ていくことにする。

非常に目立つのはLSUの内部のパイプラインだ。パイプラインが大きいというのがちょっと意味が分からないが、何が起きているのか見てみよう。

+-----------------------------------------------+------------------------------------------------+------------+------------+---------+------+--------+--------+--------+------+--------------+
|                    Instance                   |                     Module                     | Total LUTs | Logic LUTs | LUTRAMs | SRLs |   FFs  | RAMB36 | RAMB18 | URAM | DSP48 Blocks |
+-----------------------------------------------+------------------------------------------------+------------+------------+---------+------+--------+--------+--------+------+--------------+
|     u_lsu_top                                 |                                 scariv_lsu_top |     163127 |     162215 |     912 |    0 |  24696 |    128 |      0 |    0 |            0 |
|       (u_lsu_top)                             |                                 scariv_lsu_top |          0 |          0 |       0 |    0 |      5 |      0 |      0 |    0 |            0 |
|       lsu_loop[0].u_scariv_lsu                |                                     scariv_lsu |      42912 |      42816 |      96 |    0 |   4725 |      0 |      0 |    0 |            0 |
|       lsu_loop[1].u_scariv_lsu                |                     scariv_lsu__parameterized0 |      24523 |      24427 |      96 |    0 |   4732 |      0 |      0 |    0 |            0 |
|       u_l1d_mshr                              |                                scariv_l1d_mshr |      17814 |      17814 |       0 |    0 |   4679 |      0 |      0 |    0 |            0 |
|       u_ldq                                   |                                     scariv_ldq |       3073 |       3073 |       0 |    0 |   1467 |      0 |      0 |    0 |            0 |
|       u_lrsc                                  |                                scariv_lsu_lrsc |          0 |          0 |       0 |    0 |     57 |      0 |      0 |    0 |            0 |
|       u_scariv_dcache                         |                                  scariv_dcache |      38824 |      38104 |     720 |    0 |   2459 |    128 |      0 |    0 |            0 |
|       u_scariv_store_requester                |                         scariv_store_requestor |       1077 |       1077 |       0 |    0 |   1191 |      0 |      0 |    0 |            0 |
|       u_st_buffer                             |                               scariv_st_buffer |       9373 |       9373 |       0 |    0 |   1755 |      0 |      0 |    0 |            0 |
|       u_stq                                   |                                     scariv_stq |      25531 |      25531 |       0 |    0 |   3626 |      0 |      0 |    0 |            0 |

まずは、ネットリストとして何が大きいのかを合成結果から確認する。

grep "lsu_loop\[0\]" core_tile_wrapper_nets_list.txt | grep aligned | sed 's/[0-9]//g'  | sort -n | uniq -c | grep lsu_pipe | sort -n

      1 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/u_tlb/w_misaligned
      6 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_i_
     14 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_i___[]
     16 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_i____[]
     37 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_i__n_
     39 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/u_rs_data_select/r_ex_aligned_data_reg[]
     45 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_i____
     52 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_i___
     63 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_i__
     64 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_[]
     64 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_rep_n_
     64 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_rep_i__n_
     73 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_i_
     84 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_i_[]
    101 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_i__[]
    120 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_i_[]
    256 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_rep___n_
    256 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_rep___i__n_
    306 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_i____n_
    489 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_i__
    966 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data[]_i__n_
    977 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_rep___
   2005 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_
   3259 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_rep_
  12288 u_tile/u_lsu_top/lsu_loop[].u_lsu/u_lsu_pipe/r_ex_aligned_data_reg[]_rep___[]

うーん、ロード結果を格納するネットが非常に多い。これはおそらくロードした結果のデータを別のユニットにブロードキャストする、バイパスネットワークが大きくなっているのでは。普通の構成だと、LSUの結果は10個近くのモジュールにバイパスされる。このために信号がかなり複製されているものと思われる。

ただし、いろいろとやりようはある気がしていて、例えば本当に物理レジスタに書き込むためのバイパスネットワークと、フォワーディングのためのバイパスネットワークは分離したりすることができるし、リネームユニット内に存在するインフライト・リストのアップデートのための信号については、実際のデータは不要だ。

書き込みレジスタのタイプ(整数・浮動小数点)でも切り替えができるような気がするので、その辺で最適化をかけていくのがいいのではないだろうか。

FPGA開発日記

カテゴリ別記事インデックス https://msyksphinz.github.io/github_pages , English Version https://fpgadevdiary.hatenadiary.com/

自作CPUコアとBOOMv3のPPA(Performance, Power, Area)を比較する (4. 面積の解析)