Solaris 上哪个 xarch 用于 SHA 扩展?
Which xarch for SHA extensions on Solaris?
Oracle 最近发布了 Sun Studio 12.6。我们有 SHA-1 和 SHA-256 intrinsic based implementation(用于 ARM 和 Intel),我们想在 Solaris i86 机器上启用扩展。
A.2.115.3 -xarch Flags for x86 提供 12.6 手册和 -xarch
选项,但它不讨论 SHA。
我们为 SHA 使用哪个 -xarch
选项?
如果 Studio 12.6 不支持 SHA 指令集(我强烈怀疑它不支持,因为我在 What's New in the Oracle Developer Studio 12.6 Release 文档),你真倒霉。
差不多。
您可以创建自己的内联汇编函数。 See man inline
:
inline(4)
Name
inline, filename.il - Assembly language inline template files
Description
Assembly language call instructions are replaced by a copy of their
corresponding function body obtained from the inline template (*.il)
file.
Inline template files have a suffix of .il, for example:
% CC foo.il hello.c
Inlining is done by the compiler's code generator.
...
Examples
Please review libm.il or vis.il for examples. You can find a version of these libraries that is specific to each supported architecture under the compiler's lib/ directory.
...
一个例子can be found here(强调我的):
Performance Tuning With Sun Studio Compilers and Inline Assembly Code
...
This paper provides a demonstration of how to measure the performance
of a critical piece of code. An example using a compiler flag and
another example using inline assembly code are provided. The results are compared to show the benefits and differences of each
approach.
...
Example 8: Inline Assembly Code for the Iterative Mandelbrot Calculation
Knowing all these facts, the inline code can be written, as shown in
Example 8.
.inline mandel_il,0
// x is stored in %xmm0
// y is stored in %xmm1
// 4.0 is stored in %xmm2
// max_int is stored in %rdi
// set registers to zero
xorps %xmm3, %xmm3
xorps %xmm4, %xmm4
xorps %xmm5, %xmm5
xorps %xmm6, %xmm6
xorps %xmm7, %xmm7
xorq %rax, %rax
.loop:
// check to see if u2 - v2 > 4.0
movss %xmm5, %xmm7
addss %xmm6, %xmm7
ucomiss %xmm2, %xmm7
jp .exit
jae .exit
// v = 2 * v * u + y
mulss %xmm3, %xmm4
addss %xmm4, %xmm4
addss %xmm1, %xmm4
// u = u2 - v2 + x
movss %xmm5, %xmm3
subss %xmm6, %xmm3
addss %xmm0, %xmm3
// u2 = u * u
movss %xmm3, %xmm5
mulss %xmm3, %xmm5
// v2 = v * v
movss %xmm4, %xmm6
mulss %xmm4, %xmm6
incl %eax
cmpl %edi, %eax
jl .loop
.exit:
// end of mandel_il
.end
一点也不难。在 Solaris 8 的日子里,我不得不为我正在咨询的客户编写很多 SPARC 内联汇编程序函数,其中一些非常基本——有效地用一行代码来包装一条指令。我发誓它们中的一些会出现在更高版本的 Studio 编译器套件中(因为我们是由 Sun 本身分包的,这并不奇怪,不要介意其中一些是显而易见的事实 - floor()
和 ceil()
, IIRC, 是其中的两个 - 并且一开始就应该在那里...)
Oracle 最近发布了 Sun Studio 12.6。我们有 SHA-1 和 SHA-256 intrinsic based implementation(用于 ARM 和 Intel),我们想在 Solaris i86 机器上启用扩展。
A.2.115.3 -xarch Flags for x86 提供 12.6 手册和 -xarch
选项,但它不讨论 SHA。
我们为 SHA 使用哪个 -xarch
选项?
如果 Studio 12.6 不支持 SHA 指令集(我强烈怀疑它不支持,因为我在 What's New in the Oracle Developer Studio 12.6 Release 文档),你真倒霉。
差不多。
您可以创建自己的内联汇编函数。 See man inline
:
inline(4)
Name
inline, filename.il - Assembly language inline template files
Description
Assembly language call instructions are replaced by a copy of their corresponding function body obtained from the inline template (*.il) file.
Inline template files have a suffix of .il, for example:
% CC foo.il hello.c
Inlining is done by the compiler's code generator.
...
Examples
Please review libm.il or vis.il for examples. You can find a version of these libraries that is specific to each supported architecture under the compiler's lib/ directory.
...
一个例子can be found here(强调我的):
Performance Tuning With Sun Studio Compilers and Inline Assembly Code
...
This paper provides a demonstration of how to measure the performance of a critical piece of code. An example using a compiler flag and another example using inline assembly code are provided. The results are compared to show the benefits and differences of each approach.
...
Example 8: Inline Assembly Code for the Iterative Mandelbrot Calculation
Knowing all these facts, the inline code can be written, as shown in Example 8.
.inline mandel_il,0 // x is stored in %xmm0 // y is stored in %xmm1 // 4.0 is stored in %xmm2 // max_int is stored in %rdi // set registers to zero xorps %xmm3, %xmm3 xorps %xmm4, %xmm4 xorps %xmm5, %xmm5 xorps %xmm6, %xmm6 xorps %xmm7, %xmm7 xorq %rax, %rax .loop: // check to see if u2 - v2 > 4.0 movss %xmm5, %xmm7 addss %xmm6, %xmm7 ucomiss %xmm2, %xmm7 jp .exit jae .exit // v = 2 * v * u + y mulss %xmm3, %xmm4 addss %xmm4, %xmm4 addss %xmm1, %xmm4 // u = u2 - v2 + x movss %xmm5, %xmm3 subss %xmm6, %xmm3 addss %xmm0, %xmm3 // u2 = u * u movss %xmm3, %xmm5 mulss %xmm3, %xmm5 // v2 = v * v movss %xmm4, %xmm6 mulss %xmm4, %xmm6 incl %eax cmpl %edi, %eax jl .loop .exit: // end of mandel_il .end
一点也不难。在 Solaris 8 的日子里,我不得不为我正在咨询的客户编写很多 SPARC 内联汇编程序函数,其中一些非常基本——有效地用一行代码来包装一条指令。我发誓它们中的一些会出现在更高版本的 Studio 编译器套件中(因为我们是由 Sun 本身分包的,这并不奇怪,不要介意其中一些是显而易见的事实 - floor()
和 ceil()
, IIRC, 是其中的两个 - 并且一开始就应该在那里...)