VFMAB, VFMAT (BFloat16, by scalar) -- AArch32

The BFloat16 floating-point widening multiply-add long instruction widens the even-numbered (bottom) or odd-numbered (top) 16-bit elements in the first source vector, and an indexed element in the second source vector from Bfloat16 to single-precision format. The instruction then multiplies and adds these values to the overlapping single-precision elements of the destination vector.

Unlike other BFloat16 multiplication instructions, this performs a fused multiply-add, without intermediate rounding that uses the Round to Nearest rounding mode and can generate a floating-point exception that causes cumulative exception bits in the FPSCR to be set.

It has encodings from the following instruction sets: A32 ( A1 ) and T32 ( T1 ) .

A1
(FEAT_AA32BF16)

if !IsFeatureImplemented(FEAT_AA32BF16) then UNDEFINED; if Vd<0> == '1' || Vn<0> == '1' then UNDEFINED; constant integer d = UInt(D:Vd); constant integer n = UInt(N:Vn); constant integer m = UInt(Vm<2:0>); constant integer i = UInt(M:Vm<3>); constant integer elements = 128 DIV 32; constant integer sel = UInt(Q);

T1
(FEAT_AA32BF16)

if InITBlock() then UNPREDICTABLE; if !IsFeatureImplemented(FEAT_AA32BF16) then UNDEFINED; if Vd<0> == '1' || Vn<0> == '1' then UNDEFINED; constant integer d = UInt(D:Vd); constant integer n = UInt(N:Vn); constant integer m = UInt(Vm<2:0>); constant integer i = UInt(M:Vm<3>); constant integer elements = 128 DIV 32; constant integer sel = UInt(Q);

Assembler Symbols

<bt>

Is the bottom or top element specifier, encoded in Q:

Q	<bt>
0	B
1	T

<q>	See Standard assembler syntax fields.

<Qd>	Is the 128-bit name of the SIMD&FP destination register, encoded in the "D:Vd" field as <Qd>*2.

<Qn>	Is the 128-bit name of the first SIMD&FP source register, encoded in the "N:Vn" field as <Qn>*2.

<Dm>	Is the 64-bit name of the second SIMD&FP source register, encoded in the "Vm<2:0>" field.

<index>

Is the element index in the range 0 to 3, encoded in the "M:Vm<3>" field.

Operation

CheckAdvSIMDEnabled(); constant FPCR_Type fpcr = StandardFPCR(); constant bits(128) operand1 = Q[n>>1]; constant bits(64) operand2 = D[m]; constant bits(128) operand3 = Q[d>>1]; bits(128) result; constant bits(32) element2 = Elem[operand2, i, 16] : Zeros(16); for e = 0 to elements-1 constant bits(32) element1 = Elem[operand1, 2 * e + sel, 16] : Zeros(16); constant bits(32) addend = Elem[operand3, e, 32]; Elem[result, e, 32] = FPMulAdd(addend, element1, element2, fpcr); Q[d>>1] = result;

Internal version only: isa v01_31, pseudocode v2024-03_rel ; Build timestamp: 2024-03-25T10:05

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
1	1	1	1	1	1	1	0	0	D	1	1	Vn				Vd				1	0	0	0	N	Q	M	1	Vm

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
1	1	1	1	1	1	1	0	0	D	1	1	Vn				Vd				1	0	0	0	N	Q	M	1	Vm

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
1	1	1	1	1	1	1	0	0	D	1	1	Vn				Vd				1	0	0	0	N	Q	M	1	Vm

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
1	1	1	1	1	1	1	0	0	D	1	1	Vn				Vd				1	0	0	0	N	Q	M	1	Vm

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
1	1	1	1	1	1	1	0	0	D	1	1	Vn				Vd				1	0	0	0	N	Q	M	1	Vm

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
1	1	1	1	1	1	1	0	0	D	1	1	Vn				Vd				1	0	0	0	N	Q	M	1	Vm