BFMMLA

This BFloat16 floating-point (BF16) matrix multiply-accumulate instruction multiplies the 2×4 matrix of BF16 values held in each 128-bit segment of the first source vector by the 4×2 BF16 matrix in the corresponding segment of the second source vector. The resulting 2×2 single-precision (FP32) matrix product is then destructively added to the FP32 matrix accumulator held in the corresponding segment of the addend and destination vector. This is equivalent to performing a 4-way dot product per destination element.

All floating-point calculations performed by this instruction are performed with the following behaviors, irrespective of the value in FPCR:

* Uses the non-IEEE 754 Round-to-Odd mode, which forces bit 0 of an inexact result to 1, and rounds an overflow to an appropriately signed Infinity.

* The cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC and IOC) are not modified.

* Trapped floating-point exceptions are disabled, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE and IOE) are all zero.

SVE
(FEAT_BF16)

Assembler Symbols

<Zda>

Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<Zn>	Is the name of the first source scalable vector register, encoded in the "Zn" field.

<Zm>	Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckNonStreamingSVEEnabled(); integer segments = VL DIV 128; bits(VL) operand1 = Z[n]; bits(VL) operand2 = Z[m]; bits(VL) operand3 = Z[da]; bits(VL) result; bits(128) op1, op2; bits(128) res, addend; for s = 0 to segments-1 op1 = Elem[operand1, s, 128]; op2 = Elem[operand2, s, 128]; addend = Elem[operand3, s, 128]; res = BFMatMulAdd(addend, op1, op2); Elem[result, s, 128] = res; Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is unpredictable:

The MOVPRFX instruction must be unpredicated.
The MOVPRFX instruction must specify the same destination register as this instruction.
The destination register must not refer to architectural register state referenced by any other source operand register of this instruction.

Internal version only: isa v33.11seprel, AdvSIMD v29.05, pseudocode v2021-09_rel, sve v2021-09_rc3d ; Build timestamp: 2021-10-06T11:41

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0	1	1	0	0	1	0	0	0	1	1	Zm					1	1	1	0	0	1	Zn					Zda

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0	1	1	0	0	1	0	0	0	1	1	Zm					1	1	1	0	0	1	Zn					Zda

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0	1	1	0	0	1	0	0	0	1	1	Zm					1	1	1	0	0	1	Zn					Zda

BFMMLA

SVE(FEAT_BF16)

BFMMLA <Zda>.S, <Zn>.H, <Zm>.H

Assembler Symbols

Operation

Operational information

SVE
(FEAT_BF16)

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0	1	1	0	0	1	0	0	0	1	1	Zm					1	1	1	0	0	1	Zn					Zda