自作CPUのキャッシュについて,VIPTを導入して実装を開始した.
もうちょっとクリティカルパスを特定するために,LiteXとは独立した環境でVivadoを実行し,Retimingを適用してみる. その結果をもって,現在の真のクリティカルパスを特定しようという訳だ.
まず,同じコンフィグレーションで,CPU単体でのVivadoの論理合成結果のクリティカルパスは以下のようになった: クリティカルパスはフロントエンド周辺,20.205 nsとなる.
Slack (VIOLATED) : -10.355ns (required time - arrival time) Source: u_mycpu_tile/u_frontend/u_mycpu_inst_buffer/u_inst_queue/r_outptr_reg[0]_rep/C (rising edge-triggered cell FDCE clocked by i_clk {rise@0.000ns fall@5.000ns period=10.000ns}) Destination: u_mycpu_tile/u_frontend/u_mycpu_inst_buffer/u_inst_queue/r_outptr_reg[0]/D (rising edge-triggered cell FDCE clocked by i_clk {rise@0.000ns fall@5.000ns period=10.000ns}) Path Group: i_clk Path Type: Setup (Max at Slow Process Corner) Requirement: 10.000ns (i_clk rise@10.000ns - i_clk rise@0.000ns) Data Path Delay: 20.205ns (logic 3.841ns (19.010%) route 16.364ns (80.990%)) Logic Levels: 35 (LUT1=1 LUT2=6 LUT3=2 LUT4=7 LUT5=4 LUT6=14 RAMD32=1) Clock Path Skew: -0.145ns (DCD - SCD + CPR) Destination Clock Delay (DCD): 1.693ns = ( 11.693 - 10.000 ) Source Clock Delay (SCD): 2.001ns Clock Pessimism Removal (CPR): 0.163ns Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE Total System Jitter (TSJ): 0.071ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.000ns Phase Error (PE): 0.000ns
一方で,LiteX上でビルドした場合の論理合成結果のクリティカルパスは以下のようになった: FPUにクリティカルパスが発生した.24.517nsとなっている.
mycpu (VIOLATED) : -4.707ns (required time - arrival time) Source: scariv_subsystem_axi_wrapper/u_scariv_subsystem/u_tile/fpu.fpu_loop[0].u_fpu/u_fpu/u_scariv_fpnew_wrapper/u_fpnew_top/gen_operation_groups[0].i_opgroup_block/gen_merged_slice.i_multifmt_slice/gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/gen_inside_pipeline[1].mid_pipe_sum_q_reg[2][48]/C (rising edge-triggered cell FDCE clocked by main_crg_clkout0 {rise@0.000ns fall@10.000ns period=20.000ns}) Destination: scariv_subsystem_axi_wrapper/u_scariv_subsystem/u_tile/fpu.fpu_loop[0].u_fpu/u_fpu/u_scariv_fpnew_wrapper/u_fpnew_top/gen_operation_groups[0].i_opgroup_block/gen_merged_slice.i_multifmt_slice/gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/gen_output_pipeline[0].out_pipe_status_q_reg[1][UF]_bret/D (rising edge-triggered cell FDCE clocked by main_crg_clkout0 {rise@0.000ns fall@10.000ns period=20.000ns}) Path Group: main_crg_clkout0 Path Type: Setup (Max at Slow Process Corner) Requirement: 20.000ns (main_crg_clkout0 rise@20.000ns - main_crg_clkout0 rise@0.000ns) Data Path Delay: 24.517ns (logic 9.040ns (36.872%) route 15.477ns (63.128%)) Logic Levels: 44 (CARRY4=17 LUT2=4 LUT3=2 LUT4=2 LUT5=5 LUT6=12 MUXF7=2) Clock Path Skew: -0.145ns (DCD - SCD + CPR) Destination Clock Delay (DCD): 4.822ns = ( 24.822 - 20.000 ) Source Clock Delay (SCD): 5.446ns Clock Pessimism Removal (CPR): 0.479ns Clock Uncertainty: 0.074ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE Total System Jitter (TSJ): 0.071ns Discrete Jitter (DJ): 0.130ns Phase Error (PE): 0.000ns
両方のログを確認すると,cvfpuの周りはリタイミングができていないようにも思える.
- CPU単体:全体で100個のリタイミング.これは制約がある?
INFO: [Synth 8-5816] Retiming module `pma_map__1` INFO: [Synth 8-5816] Retiming module `pma_map__1' done INFO: [Synth 8-5816] Retiming module `bit_oh_or__parameterized10__1` INFO: [Synth 8-5816] Retiming module `bit_oh_or__parameterized10__1' done INFO: [Synth 8-5816] Retiming module `bit_tree_lsb__parameterized1__1` INFO: [Synth 8-5816] Retiming module `bit_tree_lsb__parameterized1__1' done INFO: [Synth 8-5816] Retiming module `bit_extract_lsb__parameterized1__1` INFO: [Synth 8-5816] Retiming module `bit_extract_lsb__parameterized1__1' done INFO: [Synth 8-5816] Retiming module `bit_tree_lsb__parameterized0__6` INFO: [Synth 8-5816] Retiming module `bit_tree_lsb__parameterized0__6' done INFO: [Synth 8-5816] Retiming module `bit_extract_lsb__parameterized0__5` INFO: [Synth 8-5816] Retiming module `bit_extract_lsb__parameterized0__5' done INFO: [Synth 8-5816] Retiming module `tlb` INFO: [Synth 8-5816] Retiming module `tlb' done INFO: [Synth 8-5816] Retiming module `scariv_frontend__GB1` INFO: [Synth 8-5816] Retiming module `scariv_frontend__GB1' done ...
- LiteX使用:こちらも全体で100個のリタイミングとなっている.
INFO: [Synth 8-5816] Retiming module `scariv_frontend` INFO: [Synth 8-5816] Retiming module `scariv_frontend' done INFO: [Synth 8-5816] Retiming module `scariv_stq__GB1` INFO: [Synth 8-5816] Retiming module `scariv_stq__GB1' done INFO: [Synth 8-5816] Retiming module `scariv_stq__GB2` INFO: [Synth 8-5816] Retiming module `scariv_stq__GB2' done INFO: [Synth 8-5816] Retiming module `scariv_stq` INFO: [Synth 8-5816] Retiming module `scariv_stq' done INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_st_buffer__GB0` INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_st_buffer__GB0' done INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_st_buffer__GB1` INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_st_buffer__GB1' done INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_dcache__GC0` INFO: [Synth 8-5816] Retiming module `scariv_lsu_vipt_dcache__GC0' done
RETIMING: forward move fails for register u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ gen_input_pipeline[0].inp_pipe_op_q_reg[1][3] along load instance u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ i_2744 RETIMING: forward move fails for register u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ gen_input_pipeline[0].inp_pipe_op_q_reg[1][2] along load instance u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ i_2744 RETIMING: forward move fails for register u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ gen_input_pipeline[0].inp_pipe_src_fmt_q_reg[1][0] along load instance u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ i_2744 RETIMING: forward move fails for register u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ gen_input_pipeline[0].inp_pipe_src_fmt_q_reg[1][0] along load instance u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ i_927 RETIMING: forward move fails for register u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ gen_input_pipeline[0].inp_pipe_op_q_reg[1][0] along load instance u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ i_2744 RETIMING: forward move fails for register u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ gen_input_pipeline[0].inp_pipe_op_q_reg[1][0] along load instance u_mycpu_tile/ fpu.fpu_loop[0].u_fpu/ u_fpu/ u_mycpu_fpnew_wrapper/ u_fpnew_top/ i_0/ gen_operation_groups[0].i_opgroup_block/ gen_merged_slice.i_multifmt_slice/ gen_num_lanes[0].active_lane.lane_instance.i_fpnew_fma_multi/ i_471