FPGA開発日記

カテゴリ別記事インデックス https://msyksphinz.github.io/github_pages , English Version https://fpgadevdiary.hatenadiary.com/

自作CPUのVIPTキャッシュポリシ導入検討 (7. Vivadoでのクリティカルパスの確認)

自作CPUのキャッシュについて,VIPTを導入して実装を開始した.

もうちょっとクリティカルパスを特定するために,LiteXとは独立した環境でVivadoを実行し,Retimingを適用してみる. その結果をもって,現在の真のクリティカルパスを特定しようという訳だ.

まず,同じコンフィグレーションで,CPU単体でのVivadoの論理合成結果のクリティカルパスは以下のようになった: クリティカルパスはフロントエンド周辺,20.205 nsとなる.

Slack (VIOLATED) :        -10.355ns  (required time - arrival time)
  Source:                 u_mycpu_tile/u_frontend/u_mycpu_inst_buffer/u_inst_queue/r_outptr_reg[0]_rep/C
                            (rising edge-triggered cell FDCE clocked by i_clk  {rise@0.000ns fall@5.000ns period=10.000ns})
  Destination:            u_mycpu_tile/u_frontend/u_mycpu_inst_buffer/u_inst_queue/r_outptr_reg[0]/D
                            (rising edge-triggered cell FDCE clocked by i_clk  {rise@0.000ns fall@5.000ns period=10.000ns})
  Path Group:             i_clk
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            10.000ns  (i_clk rise@10.000ns - i_clk rise@0.000ns)
  Data Path Delay:        20.205ns  (logic 3.841ns (19.010%)  route 16.364ns (80.990%))
  Logic Levels:           35  (LUT1=1 LUT2=6 LUT3=2 LUT4=7 LUT5=4 LUT6=14 RAMD32=1)
  Clock Path Skew:        -0.145ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    1.693ns = ( 11.693 - 10.000 ) 
    Source Clock Delay      (SCD):    2.001ns
    Clock Pessimism Removal (CPR):    0.163ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

一方で,LiteX上でビルドした場合の論理合成結果のクリティカルパスは以下のようになった: FPUにクリティカルパスが発生した.24.517nsとなっている.

mycpu (VIOLATED) :        -4.707ns  (required time - arrival time)
  Source:                 scariv_subsystem_axi_wrapper/u_scariv_subsystem/u_tile/fpu.fpu_loop[0].u_fpu/u_fpu/u_scariv_fpnew_wrapper/u_fpnew_top/gen_operation_groups[0].i_opgroup_block/gen_merged_slice.i_multifmt_slice/gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/gen_inside_pipeline[1].mid_pipe_sum_q_reg[2][48]/C
                            (rising edge-triggered cell FDCE clocked by main_crg_clkout0  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            scariv_subsystem_axi_wrapper/u_scariv_subsystem/u_tile/fpu.fpu_loop[0].u_fpu/u_fpu/u_scariv_fpnew_wrapper/u_fpnew_top/gen_operation_groups[0].i_opgroup_block/gen_merged_slice.i_multifmt_slice/gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/gen_output_pipeline[0].out_pipe_status_q_reg[1][UF]_bret/D
                            (rising edge-triggered cell FDCE clocked by main_crg_clkout0  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             main_crg_clkout0
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            20.000ns  (main_crg_clkout0 rise@20.000ns - main_crg_clkout0 rise@0.000ns)
  Data Path Delay:        24.517ns  (logic 9.040ns (36.872%)  route 15.477ns (63.128%))
  Logic Levels:           44  (CARRY4=17 LUT2=4 LUT3=2 LUT4=2 LUT5=5 LUT6=12 MUXF7=2)
  Clock Path Skew:        -0.145ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    4.822ns = ( 24.822 - 20.000 ) 
    Source Clock Delay      (SCD):    5.446ns
    Clock Pessimism Removal (CPR):    0.479ns
  Clock Uncertainty:      0.074ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.130ns
    Phase Error              (PE):    0.000ns

両方のログを確認すると,cvfpuの周りはリタイミングができていないようにも思える.

  • CPU単体:全体で100個のリタイミング.これは制約がある?
INFO: [Synth 8-5816] Retiming module `pma_map__1`                        
INFO: [Synth 8-5816] Retiming module `pma_map__1' done               
INFO: [Synth 8-5816] Retiming module `bit_oh_or__parameterized10__1`      
INFO: [Synth 8-5816] Retiming module `bit_oh_or__parameterized10__1' done
INFO: [Synth 8-5816] Retiming module `bit_tree_lsb__parameterized1__1`
INFO: [Synth 8-5816] Retiming module `bit_tree_lsb__parameterized1__1' done
INFO: [Synth 8-5816] Retiming module `bit_extract_lsb__parameterized1__1`  
INFO: [Synth 8-5816] Retiming module `bit_extract_lsb__parameterized1__1' done
INFO: [Synth 8-5816] Retiming module `bit_tree_lsb__parameterized0__6`        
INFO: [Synth 8-5816] Retiming module `bit_tree_lsb__parameterized0__6' done
INFO: [Synth 8-5816] Retiming module `bit_extract_lsb__parameterized0__5`  
INFO: [Synth 8-5816] Retiming module `bit_extract_lsb__parameterized0__5' done
INFO: [Synth 8-5816] Retiming module `tlb`                                    
INFO: [Synth 8-5816] Retiming module `tlb' done                                 
INFO: [Synth 8-5816] Retiming module `scariv_frontend__GB1`                          
INFO: [Synth 8-5816] Retiming module `scariv_frontend__GB1' done      
...
  • LiteX使用:こちらも全体で100個のリタイミングとなっている.
INFO: [Synth 8-5816] Retiming module `scariv_frontend`          
INFO: [Synth 8-5816] Retiming module `scariv_frontend' done
INFO: [Synth 8-5816] Retiming module `scariv_stq__GB1`          
INFO: [Synth 8-5816] Retiming module `scariv_stq__GB1' done
INFO: [Synth 8-5816] Retiming module `scariv_stq__GB2`          
INFO: [Synth 8-5816] Retiming module `scariv_stq__GB2' done
INFO: [Synth 8-5816] Retiming module `scariv_stq`               
INFO: [Synth 8-5816] Retiming module `scariv_stq' done     
INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_st_buffer__GB0`
INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_st_buffer__GB0' done
INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_st_buffer__GB1`
INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_st_buffer__GB1' done
INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_dcache__GC0`
INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_dcache__GC0' done
RETIMING: forward move fails for register u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
gen_input_pipeline[0].inp_pipe_op_q_reg[1][3] 
along load instance u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
i_2744

RETIMING: forward move fails for register u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
gen_input_pipeline[0].inp_pipe_op_q_reg[1][2] along load instance u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
i_2744

RETIMING: forward move fails for register u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
gen_input_pipeline[0].inp_pipe_src_fmt_q_reg[1][0] along load instance u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
i_2744

RETIMING: forward move fails for register u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
gen_input_pipeline[0].inp_pipe_src_fmt_q_reg[1][0] along load instance u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
i_927

RETIMING: forward move fails for register u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
gen_input_pipeline[0].inp_pipe_op_q_reg[1][0] along load instance u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
i_2744

RETIMING: forward move fails for register u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
gen_input_pipeline[0].inp_pipe_op_q_reg[1][0] along load instance u_mycpu_tile/
fpu.fpu_loop[0].u_fpu/
u_fpu/
u_mycpu_fpnew_wrapper/
u_fpnew_top/
i_0/
gen_operation_groups[0].i_opgroup_block/
gen_merged_slice.i_multifmt_slice/
gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/
i_471