是否有现成的无监督基于多字符串的模式发现 library/software?
Is there a ready-to-use unsupervised multi-string-based pattern discovery library/software?
strace
是一个跟踪系统调用和信号的命令。其输出示例:
poll([{fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=30, events=POLLIN}, {fd=31, events=POLLIN}, {fd=104, events=POLLIN}], 5, 11) = 0 (Timeout)
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f946e0c56e8, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x7f946e0c5698, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=30, events=POLLIN}, {fd=31, events=POLLIN}, {fd=104, events=POLLIN}], 5, 0) = 0 (Timeout)
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f946e0c56e8, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x7f946e0c5698, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
它是高度模式化的——有没有现成的软件可以读取上面的输入,然后无监督地识别模式,比如:
====================================================
Patterns
====================================================
Pattern 1 (P1): {fd=, events=}
P2: P1
where =POLLIN
P3: [P2a, P2b, P2c, P2d, P2e]
where a.=, b.=, c.=, d.=, e.=
P4: poll(P3, , ) = 0 (Timeout)
where P3.= P3.= P3.= P3.= P3.=
P6: recvmsg(, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
P7: futex(, FUTEX_WAKE_PRIVATE, ) = 1
P8:
P4a
P6b
P7c
P7d
P6e
P6f
where a.= a.= ...
====================================================
Output
====================================================
P8 where =11, =12, ...
P8 where =11, =12, ...
是否有已经实现的随时可用的无监督宇宙?
压缩算法就是很好的例子。查看 deflate、gzip、xz 和 lzma 的理论和实现。
strace
是一个跟踪系统调用和信号的命令。其输出示例:
poll([{fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=30, events=POLLIN}, {fd=31, events=POLLIN}, {fd=104, events=POLLIN}], 5, 11) = 0 (Timeout)
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f946e0c56e8, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x7f946e0c5698, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=30, events=POLLIN}, {fd=31, events=POLLIN}, {fd=104, events=POLLIN}], 5, 0) = 0 (Timeout)
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f946e0c56e8, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x7f946e0c5698, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(30, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
它是高度模式化的——有没有现成的软件可以读取上面的输入,然后无监督地识别模式,比如:
====================================================
Patterns
====================================================
Pattern 1 (P1): {fd=, events=}
P2: P1
where =POLLIN
P3: [P2a, P2b, P2c, P2d, P2e]
where a.=, b.=, c.=, d.=, e.=
P4: poll(P3, , ) = 0 (Timeout)
where P3.= P3.= P3.= P3.= P3.=
P6: recvmsg(, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
P7: futex(, FUTEX_WAKE_PRIVATE, ) = 1
P8:
P4a
P6b
P7c
P7d
P6e
P6f
where a.= a.= ...
====================================================
Output
====================================================
P8 where =11, =12, ...
P8 where =11, =12, ...
是否有已经实现的随时可用的无监督宇宙?
压缩算法就是很好的例子。查看 deflate、gzip、xz 和 lzma 的理论和实现。