用小数秒创建时间戳

Question

awk可以用strftime函数生成时间戳，例如

$ awk 'BEGIN {print strftime("%Y/%m/%d %H:%M:%S")}'
2019/03/26 08:50:42

但我需要一个小数秒的时间戳，最好精确到纳秒。 gnu date 可以用 %N 元素做到这一点：

$ date "+%Y/%m/%d %H:%M:%S.%N"
2019/03/26 08:52:32.753019800

但是与调用 strftime 相比，从 awk 中调用 date 的效率相对较低，而且我需要高性能，因为我正在使用 [=13] 处理许多大文件=] 并且需要在处理文件时生成许多时间戳。有没有一种方法 awk 可以有效地生成包含小数秒的时间戳（理想情况下是纳秒，但毫秒也是可以接受的）？

添加我正在尝试执行的示例：

awk -v logFile="$logFile" -v outputFile="$outputFile" '
BEGIN {
   print "[" strftime("%Y%m%d %H%M%S") "] Starting to process " FILENAME "." >> logFile
}
{
    data[] += 
}
END {
    print "[" strftime("%Y%m%d %H%M%S") "] Processed " NR " records." >> logFile
    for (id in data) {
        print id ": " data[id] >> outputFile
    }
}
' oneOfManyLargeFiles

Answer 1

如果你在 Linux，你可以使用 /proc/uptime:

$ cat /proc/uptime 
123970.49 354146.84

获取一些厘秒（第一个值是正常运行时间）并计算开始和任何事情发生之间的时间差：

$ while true ; do echo ping ; sleep 0.989 ; done |        # yes | awk got confusing
awk '
function get_uptime(   a, line) {
    if((getline line < "/proc/uptime") > 0)
        split(line,a," ")
    close("/proc/uptime")
    return a[1]
}
BEGIN {
    basetime=get_uptime()                    
}
{
    if(!wut)                                 # define here the cause
        print get_uptime()-basetime          # calculate time difference
}'

输出：

Answer 2

如果你真的需要亚秒级的计时，那么任何调用外部命令如date或读取外部系统文件如/proc/uptime或/proc/rct都会打败亚秒级精度的目的。这两种情况都需要很多资源来检索请求的信息（即时间）

由于 OP 已经使用 GNU awk，您可以使用动态扩展。动态扩展是一种通过实现用 C 或 C++ 编写的新函数并用 gawk 动态加载它们来向 awk 添加新功能的方法。 GNU awk manual.

中详细记录了如何编写这些函数

幸运的是，GNU awk 4.2.1 自带一组默认的动态库，可以随意加载。这些库之一是具有两个简单函数的 time 库：

the_time = gettimeofday() Return the time in seconds that has elapsed since 1970-01-01 UTC as a floating-point value. If the time is unavailable on this platform, return -1 and set ERRNO. The returned time should have sub-second precision, but the actual precision may vary based on the platform. If the standard C gettimeofday() system call is available on this platform, then it simply returns the value. Otherwise, if on MS-Windows, it tries to use GetSystemTimeAsFileTime().

result = sleep(seconds) Attempt to sleep for seconds seconds. If seconds is negative, or the attempt to sleep fails, return -1 and set ERRNO. Otherwise, return zero after sleeping for the indicated amount of time. Note that seconds may be a floating-point (nonintegral) value. Implementation details: depending on platform availability, this function tries to use nanosleep() or select() to implement the delay.

_{source: GNU awk manual}

现在可以以相当直接的方式调用此函数：

awk '@load "time"; BEGIN{printf "%.6f", gettimeofday()}'
1553637193.575861

为了证明此方法比更经典的实现更快，我使用 gettimeofday():

对所有 3 个实现进行了计时

awk '@load "time"
     function get_uptime(   a) {
        if((getline line < "/proc/uptime") > 0)
        split(line,a," ")
        close("/proc/uptime")
        return a[1]
     }
     function curtime(    cmd, line, time) {
        cmd = "date 7+%Y/%m/%d %H:%M:%S.%N7"
        if ( (cmd | getline line) > 0 ) {
           time = line
        }
        else {
           print "Error: " cmd " failed" | "cat>&2"
        }
        close(cmd)
        return time
      }
      BEGIN{
        t1=getimeofday(); curtime(); t2=gettimeofday();
        print "curtime()",t2-t1
        t1=getimeofday(); get_uptime(); t2=gettimeofday();
        print "get_uptime()",t2-t1
        t1=getimeofday(); gettimeofday(); t2=gettimeofday();
        print "gettimeofday()",t2-t1
      }'

输出：

curtime() 0.00519109
get_uptime() 7.98702e-05
gettimeofday() 9.53674e-07

虽然很明显 curtime() 在加载外部二进制文件时是最慢的，但令人吃惊的是 awk 在处理额外的外部 /proc/ 文件时速度快得惊人。

用小数秒创建时间戳

Create timestamp with fractional seconds

awk

timestamp

strftime