RISC-V Vector Extension v1.0で追加される新規仕様について

RISC-V Vector Extension v0.9がすでにStableになっているが、v1.0がReadyに入りつつある。すでにGitHubのドキュメントはv1.0のリリースに向けて改変が行われており、いくつかの新規追加事項についてはすでにアップデートが完了している。

v1.0とv0.9の差分については、すでに以下のドキュメントによってまとめられている。

github.com

いくつかの仕様について変更が加えられているが、多くの物は細かなもので、おおよそ重箱の隅をつつくものばかりである。一応ここにリストアップしておく。

SLEN=VLEN layout mandatory
- The group has decided to make the SLEN=VLEN layout mandatory. In-register layout of the bytes of a vector matches in-memory layout of bytes in a vector. Many of the optimizations possible with the earlier SLEN<VLEN layouts can be achieved with microarchitectural techniques on wide datapath machines, and SLEN=VLEN provides a much simpler specification and interface to software.
Support ELEN > VLEN for LMUL > 1
- Specification was loosened to allow elements wider than a single vector register to be supported using a vector register group, but profiles can still mandate a minimum ELEN when LMUL = 1.
Defined vector FP exception behavior
Defined interaction of misa.v and mstatus.vs
Defined integer narrowing pseudo-instruction vncvt.x.x.v vd,vs,vm
Added reciprocal and reciprocal square-root estimate instructions
Added EEW encoding to whole register moves and load/stores to support microarchitectures with internal data rearrangement.
Added vrgatherei16 instruction
Rearranged bits in vtype to make vlmul bits into a contiguous field
Moved EDIV to appendix and removed instruction encoding for dot instructions to make clear not part of v1.0
Moved quad-widening mulacc to appendix and removed instruction encodings to make clear not part of v1.0

重要なアップデートの一つとして、vtypeシステムレジスタのビットフィールド変更が挙げられる。これは将来の拡張に向けてvlmulのビットフィールドを綺麗にしておこうという意図があるようだ。

Rearranged bits in vtype to make vlmul bits into a contiguous field

f:id:msyksphinz:20200909210605p:plain

ちなみにこの新しいビットフィールドに対応したGNU GCCおよびSpikeシミュレータは実装が既に完了しており、誰でもダウンロードできる状態になっているようだ。

さらにもう一つ、2つの近似計算命令が加わっている。

Added reciprocal and reciprocal square-root estimate instructions

Reciprocal / Reciprocal Square Root というのは要するに $x$ に対して $\dfrac{1}{x}$ と $\dfrac{1}{\sqrt{x}}$ を計算する命令だ。通常これは漸化式により何度もループを繰り返して計算する命令なのだが、精度は無視して短レイテンシで実行するための命令が定義されている。

    # Floating-point reciprocal square-root estimate to 7 bits.
    vfrsqrte7.v vd, vs2, vm

    # Floating-point reciprocal estimate to 7 bits.
    vfrece7.v vd, vs2, vm

この命令は厳密な計算を行う訳ではなく、テーブルサーチによって精度が低いもののテーブルサーチだけで近似値を求めてしまおうという命令だ。この命令がいったいどのような分野で活用されるのか全く想像がつかないが（AI関連では浮動小数点命令の精度が求められないということなのだろうか？）、細かい部分は置いておいてこの命令は動作を非常にザックリと概観すると、

vfrece7.vの場合
- 浮動小数点値の指数部eについて、 $2*B-1-e$ （Bは指数部のバイアス値）を計算する。
- 浮動小数点値の仮数部のhiddenビットを除いた上位7ビットについて、以下のテーブルに基づいた変換を行う。仮数部のそれ以外の部分については0で埋める。

以下のコードはSpikeの実装から借用した。このテーブルを使って、仮数部の7ビットを置き換えることによって近似計算を実現する。

     static const uint8_t table[] = {
         127, 125, 123, 121, 119, 117, 116, 114,
         112, 110, 109, 107, 105, 104, 102, 100,
         99, 97, 96, 94, 93, 91, 90, 88,
         87, 85, 84, 83, 81, 80, 79, 77,
         76, 75, 74, 72, 71, 70, 69, 68,
         66, 65, 64, 63, 62, 61, 60, 59,
         58, 57, 56, 55, 54, 53, 52, 51,
         50, 49, 48, 47, 46, 45, 44, 43,
         42, 41, 40, 40, 39, 38, 37, 36,
         35, 35, 34, 33, 32, 31, 31, 30,
         29, 28, 28, 27, 26, 25, 25, 24,
         23, 23, 22, 21, 21, 20, 19, 19,
         18, 17, 17, 16, 15, 15, 14, 14,
         13, 12, 12, 11, 11, 10, 9, 9,
         8, 8, 7, 7, 6, 5, 5, 4,
         4, 3, 3, 2, 2, 1, 1, 0};

確かに言われてみればこの近似式は驚くにあたらない。そもそも浮動小数点値は $(-1)^{S}\times 2^{(E-\text{bias})}\times 1.F$ で表現されるので、これの逆数を計算するとすれば、 $\dfrac{1}{(-1)^{S}\times 2^{(E-\text{bias})}\times 1.F} = (-1)^{S}\times 2^{(-E+\text{bias})}\times \dfrac{1}{1.F}$ とすればいいのは容易に想像できる。ここで必要なのは仮数部である $\dfrac{1}{1.F}$ の部分だけテーブルで探索してあげればよい。

同様に浮動小数点の平方根の逆数を計算する場合、 $\dfrac{1}{\sqrt{(-1)^{S}\times 2^{(E-\text{bias})}\times 1.F}}$ なので $S=0$ だとして、 $2^{\frac{(-E+\text{bias})}{2}}\times \dfrac{1}{\sqrt{1.F}}$ なので、これも同様に $\dfrac{1}{\sqrt{1.F}}$ の部分のみをテーブルで計算すればよいことが分かる（数学から離れてしまってかなり立つのだが合っているかな？）

FPGA開発日記

カテゴリ別記事インデックス https://msyksphinz.github.io/github_pages , English Version https://fpgadevdiary.hatenadiary.com/

RISC-V Vector Extension v1.0で追加される新規仕様について