FPGA開発日記

カテゴリ別記事インデックス https://msyksphinz.github.io/github_pages , English Version https://fpgadevdiary.hatenadiary.com/

Alibaba の RISC-V プロセッサ XuanTie のCコアシリーズを見ていく

Alibaba (T-Head)が出しているXuanTie RISC-Vコアのシリーズをまとめていたら、大量にあって混乱してきた。

  • Eシリーズ (E902 / E906) : 組み込み向け
  • Cシリーズ (C906 / C908 / C910 / C920) : 高性能向け
  • Rシリーズ (R910) : リアルタイム処理向け

この中で、とりあえずCシリーズについてチェックしてみる。まずはマイクロアーキテクチャの側面からみると、

  • C906 : 5ステージSingle Issueインオーダ発行
  • C908 : Dual Issue インオーダ発行
  • C910 : 9から12ステージのアウトオブオーダ発行
  • C920 : 整数12ステージのアウトオブオーダ発行

と、C906, C908とC910, C920 でそもそも命令発行の構成が大きく違っていることが分かる。

性能面で見ても、

  • C906 : Dhrystone 2.4, CoreMark 3.8
  • C908 : Dhrystone 3.89, CoreMark 5.71
  • C910 : Dhrystone 5.8, CoreMark 7.1
  • C920 : Dhrystone 5.8 CoreMark 7.0

と大きな違いがあることが見て取れる。

次に、大きな違いであるベクトル命令拡張のサポートの違いだが、

  • C906 : RISC-V V Extension 0.7.1 VLEN=128
  • C908 : RISC-V V Extension 1.0 VLEN=128 / 256
  • C910 : なし
  • C920 : RISC-V V Extension 0.7.1. VLEN=128

と、結構な違いがある。

E902 E906
Architecture RV32E[M]C RV32IMA[F][D]C[P]
Pipeline 2-stage 5-stage (integer)
T-Head Extension T-Head MCU enhanced extensions, include interrupt accelerating and enhanced ISA
XuanTie Extensions XuanTie MCU enhanced extensions, including interrupt acceleration and enhanced ISA
Bus Interface AMBA3 AHB-Lite 32-bit Master AMBA3 AHB-Lite 32-bit Master
FPU Energy-efficient floating-pointcomputing performance
DSP Enhanced Deeply optimizedDSP unit with CSI-DSP lib compliant to v0.9.2 p-extension specs
Instruction Cache Up to 8KB (optional) Up to 32kB (optional)
Data Cache Up to 32kB (optional)
Interrupts Up to 240 interrupts + Non-maskable interrupt (NMI) Up to 240 interrupts + Non-maskable interrupt (NMI)
Memory Management Unit
Hardware Performnace Monitor (HPM) RISC-V standard HPM(optional)
Sleep modes Sleep and deep Sleep mode
Debug 2-wire/JTAG debug port
Dhrystone (DMIPS/MHz) 1.55 1.55
CoreMark (CoreMark/MHz) 2.69 2.69
C906 C908 C910 C920
Architecture RV64GCV RV64GC[V] RV64GC RV64GCV
SMP Up to 4 cores in each cluster Up to 4 cores per clusters Up to 4 cores in each cluster
Micro-architecture Out of order, 3 decode, 4 rename/dispatch, 8 issue/execute, dual load/store
Pipeline 5-stages 9-stages (integer) 12-stages (integer) 12-stages (integer)
XuanTie Extensions Xuantie Instruction Extension (XIE) XuanTie Memory Attribute Extension (XMAF) Xuantie Instruction Extension (XIE) XuanTie Memory Attribute Extension (XMAF) T-HEAD TEE Extension Xuantie Instruction Extension (XIE) XuanTie Memory Attribute Extension (XMAF) Xuantie Instruction Extension (XIE) XuanTie Memory Attribute Extension (XMAF)
Bus Interface AXI4-128 master AXI4 or ACE 128-bit master AXI4-128 master or ACE-128 master (optional) AXI4-128 master
FPU Support RISC-V Half, Single Instruction Extension Support IEEE 754-2008 standard Support RISC-V Half, Single Instruction Extension Support IEEE 754-2008 standard Support RISC-V F/D Instruction Extension Support IEEE 754-2008 standard Support RISC-V F/D Instruction Extension Support IEEE 754-2008 standard
Vector Unit Support RISC-V V Instruction extension (configurable) Vector Register width=128-bit Element size Support 8/16/32/64-bit Support INT8/INT16/INT64/BFP16/FP16/FP32 RISC-V V Extension Version 1.0 Support RISC-V V Instruction extension Vector Register width=128-bit Element size Support FP16/FP32/INT8/INT16/INT32/INT64
DSP Enhanced
Device Coherence Port (DCP) AXI4 128-bit slave (optional) AXI4-128 slave AXI4-128 slave (Optional)
Low Latency Port (LLP) AXI4 128-bit master (optional)
Instruction Cache Up to 64kB (configurable) Up to 64kB with optional parity Up to 64kB with parity check (optional) Up to 64kB with optional parity
Data Cache Up to 64kB (configurable) Up to 64kB with optional ECC Up to 64kB with ECC (optional) Up to 64kB with optional ECC
L2 Cache Up to 4MB with optional ECC supporting parallel access with multi-bank Up to 8MB with ECC (optional) Support parallel access with multi-bank Up to 8MB with ECC (optional) Support parallel access with multi-bank
Interrupts Flexibly configurable Platform Level Interrupt Controller (PLIC) Configurable Platform-Level Interrupt Controller (PLIC) for supporting wide range of system event scenarios Flexibility configurable Platform Level Interrupt Controller (PLIC) Configurable Platform-Level Interrupt Controller (PLIC) for supporting wide range of system event scenarios
Memory Management Unit Sv39 Virtual memory translation Sv39/Sv48 virtual memory trasnlation with Svnapot and Svpbmt Sv39 virtual memory translation Up to 2048 entry TLB Sv39 virtual memory translation Up to 2048 entry TLB
PMP Up to 16 regions Up to 64 regions, ePMP Up to 16 regions Up to 16 regions
Debug RISC-V Debug RISC-V Debug (multi-core debug supported)
Performance Monitor Unit (PMU) RISC-V PMU
Various Branch Prediction Branch History Table (BHT) Branch Target Buffer (BTB) Return Address Stack (RAS)
Dhrystone (DMIPS/MHz) 2.4 3.89 5.8 5.8
CoreMark (CoreMark/MHz) 3.8 5.71 7.1 7
R910
Architecture RV64GC
SMP Up to 4 cores for each cluster
Micro-architecture
Pipeline 9 stages to 12 stages
XuanTie Extensions Xuantie Instruction Extension (XIE) XuanTie Memory Attribute Extension (XMAF)
Bus Interface AXI4 Main master to initiate access requests AXI4 LLP as Master to quickly initiate requests to access peripheral devices APB4 FPP as Master to quickly initiate requsets to access peripheral devices AXI4 TMS as Slave to receive external requests to access the TCM The AXI4 Slave to provide data coherence between devices/accelerators and cores
FPU Support RISC-V Single Instruction Extension Support IEEE 754-2008 standard
TCM Instruction TCM : the size is configurable, ranging from 16kB to 64kB Data TCM : the size is configurable, ranging from 16kB to 64kB Optional ECC support
Instruction Cache Up to 64kB (Optional ECC support)
Data Cache Up to 64kB (Optional ECC support)
L2 Cache Up to 8MB (Optional ECC support)
Memory Management Unit Sv39 Virtual memory translation
Dhrystone (DMIPS/MHz) 5.8
CoreMark (CoreMark/MHz) 7