原子指令可以跨越高速缓存行吗?
Can atomic instructions straddle cache lines?
像 LOCK DEC
这样的 x86 指令可以跨越多个缓存行,还是它们会出现段错误?
不问他们是否应该,只问是否允许。
(我知道某些 SSE 指令必须在缓存边界上对齐)
是的,这是允许的。您也可以尝试一下。或者阅读指令集参考:
The integrity of the LOCK prefix is not affected by the alignment of
the memory field. Memory locking is observed for arbitrarily
misaligned fields.
但另请参阅:
Exceptions
#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
请注意,通常不会启用对齐检查。
这是允许的,但您可能会出现性能大幅下降,因为锁可能无法在缓存内维护,并且可能会降级为完整的总线锁(实际上是完整的系统停顿)。
In the days of Intel 486 processors, the lock prefix used to assert a
lock on the bus along with a large hit in performance. Starting with
the Intel Pentium Pro architecture, the bus lock is transformed into a
cache lock. A lock will still be asserted on the bus in the most
modern architectures if the lock resides in uncacheable memory or if
the lock extends beyond a cache line boundary splitting cache lines.
Both of these scenarios are unlikely, so most lock prefixes will be
transformed into a cache lock which is much less expensive.
它可能因处理器规格而异,但请注意,另一个考虑因素是跨越线边界也可能意味着跨越页面边界,这更难维护(因此更有可能降级)。
像 LOCK DEC
这样的 x86 指令可以跨越多个缓存行,还是它们会出现段错误?
不问他们是否应该,只问是否允许。
(我知道某些 SSE 指令必须在缓存边界上对齐)
是的,这是允许的。您也可以尝试一下。或者阅读指令集参考:
The integrity of the LOCK prefix is not affected by the alignment of the memory field. Memory locking is observed for arbitrarily misaligned fields.
但另请参阅:
Exceptions
#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
请注意,通常不会启用对齐检查。
这是允许的,但您可能会出现性能大幅下降,因为锁可能无法在缓存内维护,并且可能会降级为完整的总线锁(实际上是完整的系统停顿)。
In the days of Intel 486 processors, the lock prefix used to assert a lock on the bus along with a large hit in performance. Starting with the Intel Pentium Pro architecture, the bus lock is transformed into a cache lock. A lock will still be asserted on the bus in the most modern architectures if the lock resides in uncacheable memory or if the lock extends beyond a cache line boundary splitting cache lines. Both of these scenarios are unlikely, so most lock prefixes will be transformed into a cache lock which is much less expensive.
它可能因处理器规格而异,但请注意,另一个考虑因素是跨越线边界也可能意味着跨越页面边界,这更难维护(因此更有可能降级)。