Alibaba (T-Head)が出しているXuanTie RISC-Vコアのシリーズをまとめていたら、大量にあって混乱してきた。
- Eシリーズ (E902 / E906) : 組み込み向け
- Cシリーズ (C906 / C908 / C910 / C920) : 高性能向け
- Rシリーズ (R910) : リアルタイム処理向け
この中で、とりあえずCシリーズについてチェックしてみる。まずはマイクロアーキテクチャの側面からみると、
- C906 : 5ステージSingle Issueインオーダ発行
- C908 : Dual Issue インオーダ発行
- C910 : 9から12ステージのアウトオブオーダ発行
- C920 : 整数12ステージのアウトオブオーダ発行
と、C906, C908とC910, C920 でそもそも命令発行の構成が大きく違っていることが分かる。
性能面で見ても、
- C906 : Dhrystone 2.4, CoreMark 3.8
- C908 : Dhrystone 3.89, CoreMark 5.71
- C910 : Dhrystone 5.8, CoreMark 7.1
- C920 : Dhrystone 5.8 CoreMark 7.0
と大きな違いがあることが見て取れる。
次に、大きな違いであるベクトル命令拡張のサポートの違いだが、
- C906 : RISC-V V Extension 0.7.1 VLEN=128
- C908 : RISC-V V Extension 1.0 VLEN=128 / 256
- C910 : なし
- C920 : RISC-V V Extension 0.7.1. VLEN=128
と、結構な違いがある。
E902 | E906 | |
---|---|---|
Architecture | RV32E[M]C | RV32IMA[F][D]C[P] |
Pipeline | 2-stage | 5-stage (integer) |
T-Head Extension | T-Head MCU enhanced extensions, include interrupt accelerating and enhanced ISA | |
XuanTie Extensions | XuanTie MCU enhanced extensions, including interrupt acceleration and enhanced ISA | |
Bus Interface | AMBA3 AHB-Lite 32-bit Master | AMBA3 AHB-Lite 32-bit Master |
FPU | Energy-efficient floating-pointcomputing performance | |
DSP Enhanced | Deeply optimizedDSP unit with CSI-DSP lib compliant to v0.9.2 p-extension specs | |
Instruction Cache | Up to 8KB (optional) | Up to 32kB (optional) |
Data Cache | Up to 32kB (optional) | |
Interrupts | Up to 240 interrupts + Non-maskable interrupt (NMI) | Up to 240 interrupts + Non-maskable interrupt (NMI) |
Memory Management Unit | ||
Hardware Performnace Monitor (HPM) | RISC-V standard HPM(optional) | |
Sleep modes | Sleep and deep Sleep mode | |
Debug | 2-wire/JTAG debug port | |
Dhrystone (DMIPS/MHz) | 1.55 | 1.55 |
CoreMark (CoreMark/MHz) | 2.69 | 2.69 |
C906 | C908 | C910 | C920 | |
---|---|---|---|---|
Architecture | RV64GCV | RV64GC[V] | RV64GC | RV64GCV |
SMP | Up to 4 cores in each cluster | Up to 4 cores per clusters | Up to 4 cores in each cluster | |
Micro-architecture | Out of order, 3 decode, 4 rename/dispatch, 8 issue/execute, dual load/store | |||
Pipeline | 5-stages | 9-stages (integer) | 12-stages (integer) | 12-stages (integer) |
XuanTie Extensions | Xuantie Instruction Extension (XIE) XuanTie Memory Attribute Extension (XMAF) | Xuantie Instruction Extension (XIE) XuanTie Memory Attribute Extension (XMAF) T-HEAD TEE Extension | Xuantie Instruction Extension (XIE) XuanTie Memory Attribute Extension (XMAF) | Xuantie Instruction Extension (XIE) XuanTie Memory Attribute Extension (XMAF) |
Bus Interface | AXI4-128 master | AXI4 or ACE 128-bit master | AXI4-128 master or ACE-128 master (optional) | AXI4-128 master |
FPU | Support RISC-V Half, Single Instruction Extension Support IEEE 754-2008 standard | Support RISC-V Half, Single Instruction Extension Support IEEE 754-2008 standard | Support RISC-V F/D Instruction Extension Support IEEE 754-2008 standard | Support RISC-V F/D Instruction Extension Support IEEE 754-2008 standard |
Vector Unit | Support RISC-V V Instruction extension (configurable) Vector Register width=128-bit Element size Support 8/16/32/64-bit Support INT8/INT16/INT64/BFP16/FP16/FP32 | RISC-V V Extension Version 1.0 | Support RISC-V V Instruction extension Vector Register width=128-bit Element size Support FP16/FP32/INT8/INT16/INT32/INT64 | |
DSP Enhanced | ||||
Device Coherence Port (DCP) | AXI4 128-bit slave (optional) | AXI4-128 slave | AXI4-128 slave (Optional) | |
Low Latency Port (LLP) | AXI4 128-bit master (optional) | |||
Instruction Cache | Up to 64kB (configurable) | Up to 64kB with optional parity | Up to 64kB with parity check (optional) | Up to 64kB with optional parity |
Data Cache | Up to 64kB (configurable) | Up to 64kB with optional ECC | Up to 64kB with ECC (optional) | Up to 64kB with optional ECC |
L2 Cache | Up to 4MB with optional ECC supporting parallel access with multi-bank | Up to 8MB with ECC (optional) Support parallel access with multi-bank | Up to 8MB with ECC (optional) Support parallel access with multi-bank | |
Interrupts | Flexibly configurable Platform Level Interrupt Controller (PLIC) | Configurable Platform-Level Interrupt Controller (PLIC) for supporting wide range of system event scenarios | Flexibility configurable Platform Level Interrupt Controller (PLIC) | Configurable Platform-Level Interrupt Controller (PLIC) for supporting wide range of system event scenarios |
Memory Management Unit | Sv39 Virtual memory translation | Sv39/Sv48 virtual memory trasnlation with Svnapot and Svpbmt | Sv39 virtual memory translation Up to 2048 entry TLB | Sv39 virtual memory translation Up to 2048 entry TLB |
PMP | Up to 16 regions | Up to 64 regions, ePMP | Up to 16 regions | Up to 16 regions |
Debug | RISC-V Debug | RISC-V Debug (multi-core debug supported) | ||
Performance Monitor Unit (PMU) | RISC-V PMU | |||
Various Branch Prediction | Branch History Table (BHT) Branch Target Buffer (BTB) Return Address Stack (RAS) | |||
Dhrystone (DMIPS/MHz) | 2.4 | 3.89 | 5.8 | 5.8 |
CoreMark (CoreMark/MHz) | 3.8 | 5.71 | 7.1 | 7 |
R910 | |
---|---|
Architecture | RV64GC |
SMP | Up to 4 cores for each cluster |
Micro-architecture | |
Pipeline | 9 stages to 12 stages |
XuanTie Extensions | Xuantie Instruction Extension (XIE) XuanTie Memory Attribute Extension (XMAF) |
Bus Interface | AXI4 Main master to initiate access requests AXI4 LLP as Master to quickly initiate requests to access peripheral devices APB4 FPP as Master to quickly initiate requsets to access peripheral devices AXI4 TMS as Slave to receive external requests to access the TCM The AXI4 Slave to provide data coherence between devices/accelerators and cores |
FPU | Support RISC-V Single Instruction Extension Support IEEE 754-2008 standard |
TCM | Instruction TCM : the size is configurable, ranging from 16kB to 64kB Data TCM : the size is configurable, ranging from 16kB to 64kB Optional ECC support |
Instruction Cache | Up to 64kB (Optional ECC support) |
Data Cache | Up to 64kB (Optional ECC support) |
L2 Cache | Up to 8MB (Optional ECC support) |
Memory Management Unit | Sv39 Virtual memory translation |
Dhrystone (DMIPS/MHz) | 5.8 |
CoreMark (CoreMark/MHz) | 7 |