原子指令可以跨越高速缓存行吗?

Can atomic instructions straddle cache lines?

LOCK DEC 这样的 x86 指令可以跨越多个缓存行,还是它们会出现段错误?

不问他们是否应该,只问是否允许。

(我知道某些 SSE 指令必须在缓存边界上对齐)

是的,这是允许的。您也可以尝试一下。或者阅读指令集参考:

The integrity of the LOCK prefix is not affected by the alignment of the memory field. Memory locking is observed for arbitrarily misaligned fields.

但另请参阅:

Exceptions

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.

请注意,通常不会启用对齐检查。

这是允许的,但您可能会出现性能大幅下降,因为锁可能无法在缓存内维护,并且可能会降级为完整的总线锁(实际上是完整的系统停顿)。

参见例如- https://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures :

In the days of Intel 486 processors, the lock prefix used to assert a lock on the bus along with a large hit in performance. Starting with the Intel Pentium Pro architecture, the bus lock is transformed into a cache lock. A lock will still be asserted on the bus in the most modern architectures if the lock resides in uncacheable memory or if the lock extends beyond a cache line boundary splitting cache lines. Both of these scenarios are unlikely, so most lock prefixes will be transformed into a cache lock which is much less expensive.

它可能因处理器规格而异,但请注意,另一个考虑因素是跨越线边界也可能意味着跨越页面边界,这更难维护(因此更有可能降级)。