Solaris 上哪个 xarch 用于 SHA 扩展?

Which xarch for SHA extensions on Solaris?

Oracle 最近发布了 Sun Studio 12.6。我们有 SHA-1 和 SHA-256 intrinsic based implementation(用于 ARM 和 Intel),我们想在 Solaris i86 机器上启用扩展。

A.2.115.3 -xarch Flags for x86 提供 12.6 手册和 -xarch 选项,但它不讨论 SHA。

我们为 SHA 使用哪个 -xarch 选项?

如果 Studio 12.6 不支持 SHA 指令集(我强烈怀疑它不支持,因为我在 What's New in the Oracle Developer Studio 12.6 Release 文档),你真倒霉。

差不多。

您可以创建自己的内联汇编函数。 See man inline:

inline(4)

Name

inline, filename.il - Assembly language inline template files

Description

Assembly language call instructions are replaced by a copy of their corresponding function body obtained from the inline template (*.il) file.

Inline template files have a suffix of .il, for example:

% CC foo.il hello.c

Inlining is done by the compiler's code generator.

...

Examples

Please review libm.il or vis.il for examples. You can find a version of these libraries that is specific to each supported architecture under the compiler's lib/ directory.

...

一个例子can be found here(强调我的):

Performance Tuning With Sun Studio Compilers and Inline Assembly Code

...

This paper provides a demonstration of how to measure the performance of a critical piece of code. An example using a compiler flag and another example using inline assembly code are provided. The results are compared to show the benefits and differences of each approach.

...

Example 8: Inline Assembly Code for the Iterative Mandelbrot Calculation

Knowing all these facts, the inline code can be written, as shown in Example 8.

.inline mandel_il,0
// x is stored in %xmm0
// y is stored in %xmm1
// 4.0 is stored in %xmm2
// max_int is stored in %rdi

// set registers to zero
  xorps %xmm3, %xmm3
  xorps %xmm4, %xmm4
  xorps %xmm5, %xmm5
  xorps %xmm6, %xmm6
  xorps %xmm7, %xmm7
  xorq %rax, %rax

.loop:
// check to see if u2 - v2 > 4.0
  movss %xmm5, %xmm7
  addss %xmm6, %xmm7
  ucomiss %xmm2, %xmm7
  jp     .exit
  jae    .exit

// v = 2 * v * u + y
  mulss %xmm3, %xmm4
  addss %xmm4, %xmm4
  addss %xmm1, %xmm4
// u = u2 - v2 + x
  movss %xmm5, %xmm3
  subss %xmm6, %xmm3
  addss %xmm0, %xmm3
// u2 = u * u
  movss %xmm3, %xmm5
  mulss %xmm3, %xmm5
// v2 = v * v
  movss %xmm4, %xmm6
  mulss %xmm4, %xmm6

  incl %eax
  cmpl %edi, %eax
  jl .loop

.exit:
// end of mandel_il
.end

一点也不难。在 Solaris 8 的日子里,我不得不为我正在咨询的客户编写很多 SPARC 内联汇编程序函数,其中一些非常基本——有效地用一行代码来包装一条指令。我发誓它们中的一些会出现在更高版本的 Studio 编译器套件中(因为我们是由 Sun 本身分包的,这并不奇怪,不要介意其中一些是显而易见的事实 - floor()ceil(), IIRC, 是其中的两个 - 并且一开始就应该在那里...)