提高 bigint 写入磁盘的性能
improving bigint write to disk performance
我正在处理非常大的 bigint
数字,我需要将它们写入磁盘并稍后再读回它们,因为它们不能一次全部放入内存。
当前的 Chapel 实现首先将 bigint
转换为 string
,然后将 string
写入磁盘 [1]。对于大整数,这需要很长时间。
var outputFile = open("outputPath", iomode.cwr);
var writer = outputFile.writer();
writer.write(reallyLargeBigint);
writer.close();
outputFile.close();
有什么方法可以使用 GMP 的 mpz_out_raw()
/mpz_inp_raw()
[2] 或 mpz_export()
/mpz_import()
[3] 或其他类似的方法来转储 bigint
的字节直接写入磁盘而不事先转换为字符串,然后将字节读回 bigint
对象?
这也适用于 bigint
数组吗?
如果在当前状态下不可能,如何将此类功能添加到 Chapel 的标准库中?
[1] https://github.com/chapel-lang/chapel/blob/master/modules/standard/BigInteger.chpl#L346
序言:
size
的数据是静态属性,但流量永远是我们最大的敌人。
"Could such functionality be added to Chapel's standard library?"
以当前价格增加几个单位,数十甚至数百 [TB]
-s 的 RAM 容量,问题是恕我直言 永远无法通过任何扩展来解决语言 在上面的草图方向。
为什么从来没有?由于成本激增:
如果只花一点时间了解事实,下面的延迟图就会出现在一张空白的 sheet 纸上。虽然各自的数字可能略有不同,但信息在数量级和思维过程的依赖链中:
________________________________________________________________________________________
/ /
/ ________________________________________________________ /
/ / / /
/ / xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx / /
/ / / / / / /
/ / SOMEWHAT / PRETTY / PROHIBITIVELY / / /
/ CHEAPEST / CHEAP / EXPENSIVE / EXPENSIVE / / /
/ EVER / ZONE / ZONE / ZONE / / /
/___________________/. . . . . / _ _ _ _ _ _ _ _/ ! ! ! ! ! ! ! !/ / /_______________________
/ / / / / / / / /
in-CACHE / in-RAM / CONVERT / STORE / RE-READ / CONVERT / in-RAM / in-CACHE / in-CPU-uop /
~ + 5 [ns] | | | | | | | |
+ 5 [ns] | | | | | | | |
| | | | | | | | |
| ~ +300 [ns/kB] | | | | | | |
| +300 [ns/kB] | | | | | | |
| | | | | | | | |
| |+VOLUME [ GB] | | | | | |
| | x 100.000[ns/GB] | | | | | |
| | | | | | | | |
| | |+1 | | | | | |
| | | x 15.000.000[ns] | | | | |
| | |+VOLUME [ GB] | | | | |
| | | x 3.580.000.000[ns/GB] | | | | |
| | | | | | | | |
| | | |+1 FIND | | | | |
| | | | x 15.000.000[ns] | | | |
| | | |+1 DATA | | | | |
| | | | x 15.000.000[ns] | | | |
| | | |+VOLUME [ GB] | | | |
| | | | x 3.580.000.000[ns/GB] | | | |
| | | | | | | | |
| | | | |+VOLUME [ GB] | | |
| | | | | x 100.000[ns/GB] | | |
| | | | | | | | |
| | | | | | ~ +300 [ns/kB] | |
| | | | | | +300 [ns/kB] | |
| | | | | | | | |
| | | | | | | ~ + 5 [ns] |
| | | | | | | + 5 [ns] |
| | | | | | | | |
| | | | | | | | ~ + 0.3 [ns/uop]
| | | | | | | | + 2.0 [ns/uop]
最后但同样重要的是,calculate such step's effects on << 1.0
speedup
鉴于原始处理时间 XYZ [ns]
,
"modified" 处理需要:
XYZ [ns] : the PURPOSE
+ ( VOL [GB] * 300.000.000 [ns/GB] ) : + MEM/CONVERT
+ ( VOL [GB] * 100.000 [ns/GB] ) : + CPU/CONVERT
+ 15.000.000 [ns] : + fileIO SEEK
+ ( VOL [GB] * 3.580.000.000 [ns/GB] ) : + fileIO STORE
+ 15.000.000 [ns] : + fileIO SEEK / FAT
+ 15.000.000 [ns] : + fileIO SEEK / DATA
+ ( VOL [GB] * 3.580.000.000 [ns/GB] ) : + fileIO RE-READ
+ ( VOL [GB] * 100.000 [ns/GB] ) : + CPU/CONVERT
+ ( VOL [GB] * 300.000.000 [ns/GB] ) : + MEM/CONVERT
_______________________________________________
45.000.XYZ [ns]
+ 7.660.200.000 [ns/GB] * VOL [GB]
所以,这样的不利影响性能will get damaged ( as Amdahl's Law shows ):
1
S = ------------------------------------------------------------ << 1.00
1 + ( 45.000.XYZ [ns] + 7.660.200.000 [ns/GB] * VOL[GB] )
您提到的功能在任何 Chapel 模块中都不能直接使用,但您可以编写 extern
procs and extern
types 来直接访问 GMP
功能。
首先我们需要能够使用 C 文件,因此为它们声明一些过程和类型:
extern type FILE;
extern type FILEptr = c_ptr(FILE);
extern proc fopen(filename: c_string, mode: c_string): FILEptr;
extern proc fclose(fp: FILEptr);
然后我们就可以声明我们需要的GMP函数了:
extern proc mpz_out_raw(stream: FILEptr, const op: mpz_t): size_t;
extern proc mpz_inp_raw(ref rop: mpz_t, stream: FILEptr): size_t;
现在我们可以用它们写一个bigint
值:
use BigInteger;
var res: bigint;
res.fac(100); // Compute 100!
writeln("Writing the number: ", res);
var f = fopen("gmp_outfile", "w");
mpz_out_raw(f, res.mpz);
fclose(f);
然后从文件中读回:
var readIt: bigint;
f = fopen("gmp_outfile", "r");
mpz_inp_raw(readIt.mpz, f);
fclose(f);
writeln("Read the number:", readIt);
对于 bigint
个值的数组,只需遍历它们以写入或读取它们:
// initialize the array
var A: [1..10] bigint;
for i in 1..10 do
A[i].fac(i);
// write the array to a file
f = fopen("gmp_outfile", "w");
for i in 1..10 do
mpz_out_raw(f, A[i].mpz);
fclose(f);
// read the array back in from the file
var B: [1..10] bigint;
f = fopen("gmp_outfile", "r");
for i in 1..10 do
mpz_inp_raw(B[i].mpz, f);
fclose(f);
我正在处理非常大的 bigint
数字,我需要将它们写入磁盘并稍后再读回它们,因为它们不能一次全部放入内存。
当前的 Chapel 实现首先将 bigint
转换为 string
,然后将 string
写入磁盘 [1]。对于大整数,这需要很长时间。
var outputFile = open("outputPath", iomode.cwr);
var writer = outputFile.writer();
writer.write(reallyLargeBigint);
writer.close();
outputFile.close();
有什么方法可以使用 GMP 的 mpz_out_raw()
/mpz_inp_raw()
[2] 或 mpz_export()
/mpz_import()
[3] 或其他类似的方法来转储 bigint
的字节直接写入磁盘而不事先转换为字符串,然后将字节读回 bigint
对象?
这也适用于 bigint
数组吗?
如果在当前状态下不可能,如何将此类功能添加到 Chapel 的标准库中?
[1] https://github.com/chapel-lang/chapel/blob/master/modules/standard/BigInteger.chpl#L346
序言:
size
的数据是静态属性,但流量永远是我们最大的敌人。
"Could such functionality be added to Chapel's standard library?"
以当前价格增加几个单位,数十甚至数百 [TB]
-s 的 RAM 容量,问题是恕我直言 永远无法通过任何扩展来解决语言 在上面的草图方向。
为什么从来没有?由于成本激增:
如果只花一点时间了解事实,下面的延迟图就会出现在一张空白的 sheet 纸上。虽然各自的数字可能略有不同,但信息在数量级和思维过程的依赖链中:
________________________________________________________________________________________
/ /
/ ________________________________________________________ /
/ / / /
/ / xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx / /
/ / / / / / /
/ / SOMEWHAT / PRETTY / PROHIBITIVELY / / /
/ CHEAPEST / CHEAP / EXPENSIVE / EXPENSIVE / / /
/ EVER / ZONE / ZONE / ZONE / / /
/___________________/. . . . . / _ _ _ _ _ _ _ _/ ! ! ! ! ! ! ! !/ / /_______________________
/ / / / / / / / /
in-CACHE / in-RAM / CONVERT / STORE / RE-READ / CONVERT / in-RAM / in-CACHE / in-CPU-uop /
~ + 5 [ns] | | | | | | | |
+ 5 [ns] | | | | | | | |
| | | | | | | | |
| ~ +300 [ns/kB] | | | | | | |
| +300 [ns/kB] | | | | | | |
| | | | | | | | |
| |+VOLUME [ GB] | | | | | |
| | x 100.000[ns/GB] | | | | | |
| | | | | | | | |
| | |+1 | | | | | |
| | | x 15.000.000[ns] | | | | |
| | |+VOLUME [ GB] | | | | |
| | | x 3.580.000.000[ns/GB] | | | | |
| | | | | | | | |
| | | |+1 FIND | | | | |
| | | | x 15.000.000[ns] | | | |
| | | |+1 DATA | | | | |
| | | | x 15.000.000[ns] | | | |
| | | |+VOLUME [ GB] | | | |
| | | | x 3.580.000.000[ns/GB] | | | |
| | | | | | | | |
| | | | |+VOLUME [ GB] | | |
| | | | | x 100.000[ns/GB] | | |
| | | | | | | | |
| | | | | | ~ +300 [ns/kB] | |
| | | | | | +300 [ns/kB] | |
| | | | | | | | |
| | | | | | | ~ + 5 [ns] |
| | | | | | | + 5 [ns] |
| | | | | | | | |
| | | | | | | | ~ + 0.3 [ns/uop]
| | | | | | | | + 2.0 [ns/uop]
最后但同样重要的是,calculate such step's effects on << 1.0
speedup
鉴于原始处理时间 XYZ [ns]
,
"modified" 处理需要:
XYZ [ns] : the PURPOSE
+ ( VOL [GB] * 300.000.000 [ns/GB] ) : + MEM/CONVERT
+ ( VOL [GB] * 100.000 [ns/GB] ) : + CPU/CONVERT
+ 15.000.000 [ns] : + fileIO SEEK
+ ( VOL [GB] * 3.580.000.000 [ns/GB] ) : + fileIO STORE
+ 15.000.000 [ns] : + fileIO SEEK / FAT
+ 15.000.000 [ns] : + fileIO SEEK / DATA
+ ( VOL [GB] * 3.580.000.000 [ns/GB] ) : + fileIO RE-READ
+ ( VOL [GB] * 100.000 [ns/GB] ) : + CPU/CONVERT
+ ( VOL [GB] * 300.000.000 [ns/GB] ) : + MEM/CONVERT
_______________________________________________
45.000.XYZ [ns]
+ 7.660.200.000 [ns/GB] * VOL [GB]
所以,这样的不利影响性能will get damaged ( as Amdahl's Law shows ):
1
S = ------------------------------------------------------------ << 1.00
1 + ( 45.000.XYZ [ns] + 7.660.200.000 [ns/GB] * VOL[GB] )
您提到的功能在任何 Chapel 模块中都不能直接使用,但您可以编写 extern
procs and extern
types 来直接访问 GMP
功能。
首先我们需要能够使用 C 文件,因此为它们声明一些过程和类型:
extern type FILE;
extern type FILEptr = c_ptr(FILE);
extern proc fopen(filename: c_string, mode: c_string): FILEptr;
extern proc fclose(fp: FILEptr);
然后我们就可以声明我们需要的GMP函数了:
extern proc mpz_out_raw(stream: FILEptr, const op: mpz_t): size_t;
extern proc mpz_inp_raw(ref rop: mpz_t, stream: FILEptr): size_t;
现在我们可以用它们写一个bigint
值:
use BigInteger;
var res: bigint;
res.fac(100); // Compute 100!
writeln("Writing the number: ", res);
var f = fopen("gmp_outfile", "w");
mpz_out_raw(f, res.mpz);
fclose(f);
然后从文件中读回:
var readIt: bigint;
f = fopen("gmp_outfile", "r");
mpz_inp_raw(readIt.mpz, f);
fclose(f);
writeln("Read the number:", readIt);
对于 bigint
个值的数组,只需遍历它们以写入或读取它们:
// initialize the array
var A: [1..10] bigint;
for i in 1..10 do
A[i].fac(i);
// write the array to a file
f = fopen("gmp_outfile", "w");
for i in 1..10 do
mpz_out_raw(f, A[i].mpz);
fclose(f);
// read the array back in from the file
var B: [1..10] bigint;
f = fopen("gmp_outfile", "r");
for i in 1..10 do
mpz_inp_raw(B[i].mpz, f);
fclose(f);