torch.nn.MultiheadAttention 是否包含归一化层和前馈层？

Does torch.nn.MultiheadAttention contain normalisation layer and feed forward layer?

试图找到 multihead attention 的源代码，但找不到任何实现细节。我想知道这个模块是否只包含注意力部分而不是整个转换器块（即它不包含归一化层、残差连接和额外的前馈神经网络）？

根据source code，答案是否定的。 MultiheadAttention不出所料只实现了注意力功能。