HuggingFace 实现中的 Reformer local 和 LSH attention

Reformer local and LSH attention in HuggingFace implementation

pytorch
huggingface-transformers

最近在HuggingFace中实现的Reformer既有所谓的LSH Self Attention，也有Local Self Attention，但看完后我不是很清楚其中的区别the documentation。两者都使用 bucketing 来避免 vanilla transformers 的二次内存需求，但尚不清楚它们有何不同。

本地自我关注是否只允许查询按顺序关注它们附近的键（即，在句子中给定的 window 内），而不是 LSH 自我关注的适当 LSH 散列做？还是其他原因？

在仔细检查源代码后，我发现 Local Self Attention 确实关注了顺序接近的标记。

HuggingFace 实现中的 Reformer local 和 LSH attention

Reformer local and LSH attention in HuggingFace implementation

pytorch

huggingface-transformers