文本的高效移位不变特征变换

Efficient shift invariant feature transform for text

编辑

有3个连续的比特流。一次开始阅读它们。一段时间后,一个停止,现在有 3 个相同长度的非常长的字符串。

这 3 个字符串应该包含介于两者之间的某个位置的已发送消息。除了发送消息随机位。

objective 现在是,找出如何叠加 3 个字符串以进一步执行任何纠错。

hfkasjkfhjs<<this is a string><hjaksdf
jkdf::this is b strimg>>iowefjlasfjoie
jfaskflsjdflf<<this is a  tring>>oweio

这是一个简单的例子。现在我想要的是这个

<<this is a string><
::this is b string>>
<<this is a  tring>>

现在我可以只使用多数表决并获得正确的序列

<<this is a string>>

我如何有效地实现这一目标?

TXR Lisp 中的探索性编程:

fuzz-extract.tl 的内容:

(defun fuzz (str)
  (window-map 1 "  "
              (do if (memql #\X @rest)
                #\X #\space)
              str))

(defun correlate (str1 str2 thresh)
  (let ((len (length str1))
        (pat (mkstring thresh #\X)))
    (each ((offs (range* 0 len)))
      (let* ((str2-shf `@[str2 offs..:]@[str2 0..offs]`)
             (str2-dshf `@{str2-shf}@{str2-shf}`)
             (raw-diff (mapcar [iff eql (ret #\X) (ret #\space)]
                               str1 str2-dshf))
             (diff (fuzz raw-diff))
             (pos (search-str diff pat)))
        (if pos
          (let ((rng (+ (r^ #/X+/ pos diff) #R(-2 2))))
            (if (< (from rng) 0)
              (set rng 0..(to rng)))
            (return-from correlate [str1 rng])))))))

(defun count-same (big-s lit-s offs)
  (countq t [mapcar eql [big-s offs..:] lit-s]))

(defun find-off (big-s lit-s)
  (let ((idx-count-pairs (collect-each ((i (range 0 (- (length big-s)
                                                       (length lit-s)))))
                           (list i (count-same big-s lit-s i)))))
    (first [find-max idx-count-pairs : second])))

(defun extract-from-three (str1 str2 str3 : (thresh 10))
  (let* ((ss1 (correlate str1 str2 thresh))
         (ss2 (correlate str2 str3 thresh))
         (ss3 (correlate str3 str1 thresh))
         (maxlen [[mapf max length length length] ss1 ss2 ss3])
         (pad (mkstring (trunc maxlen 2) #\space))
         (buf1 `@pad@ss1@pad`)
         (off1 (find-off buf1 ss2))
         (buf2 `@{"" off1}@ss2`)
         (off2 (find-off buf1 ss3))
         (buf3 `@{"" off2}@ss3`))
    (mapcar (do cond
              ((eql @1 @2) @1)
              ((eql @2 @3) @2)
              ((eql @3 @1) @3)
              (t #\space))
            buf1 buf2 buf3)))

互动环节:

$ txr -i fuzz-extract.tl
1> (extract-from-three
     "hfkasjkfhjs<<this is a string><hjaksdf"
     "jkdf::this is b strimg>>iowefjlasfjoie"
     "jfaskflsjdflf<<this is a  tring>>oweio")
"             f<<this is a string>>  "
2> (trim-str *1)
"f<<this is a string>>"