elisp 实现 "uniq -c" Unix 命令以计算唯一行数

elisp implementation of the "uniq -c" Unix command to count unique lines

如果区域有数据:

花
公园
花
结石
花
结石
结石
花

M-x some-command 应该在不同的缓冲区中给我:

4花
2 石头
1个公园

然后可以按频率或项目对这些数据进行排序。

It is similar to uniq -c in bash.

那为什么不用uniq -c呢?

随着区域突出显示,M-| "sort | uniq -c",将 运行 该命令应用于当前区域。结果将显示在迷你缓冲区中,并将列在 *Messages* 缓冲区中。添加前缀 arg 会将结果插入当前缓冲区。

我想一个常见的方法是对字符串进行哈希处理,然后打印内容。这种方法可以在 emacs 中轻松实现。

;; See the emacs manual for creating a hash table test
;; https://www.gnu.org/software/emacs/manual/html_node/elisp/Defining-Hash.html
(defun case-fold-string= (a b)
  (eq t (compare-strings a nil nil b nil nil t)))
(defun case-fold-string-hash (a)
  (sxhash (upcase a)))

(define-hash-table-test 'case-fold
  'case-fold-string= 'case-fold-string-hash)

(defun uniq (beg end)
  "Print counts of strings in region."
  (interactive "r")
  (let ((h (make-hash-table :test 'case-fold))
        (lst (split-string (buffer-substring-no-properties beg end) "\n"
                           'omit-nulls " "))
        (output-func (if current-prefix-arg 'insert 'princ)))
    (dolist (str lst) 
      (puthash str (1+ (gethash str h 0)) h))
    (maphash (lambda (key val)
               (apply output-func (list (format "%d: %s\n" val key))))
             h)))

选择该文本时的输出

4: flower
1: park
3: stone

我想您可以采用多种方法来解决这个问题。这是一个相当简单的方法:

(defun uniq-c (beginning end)
  "Like M-| uniq -c"
  (interactive "r")
  (let ((source (current-buffer))
        (dest (generate-new-buffer "*uniq-c*"))
        (case-fold-search nil))
    (set-buffer dest)
    (insert-buffer-substring source beginning end)
    (goto-char (point-min))
    (while (let* ((line (buffer-substring (line-beginning-position)
                                          (line-end-position)))
                  (pattern (concat "^" (regexp-quote line) "$"))
                  (count (count-matches pattern (point) (point-max))))
             (insert (format "%d " count))
             (forward-line 1)
             (flush-lines pattern)
             (not (eobp))))
    (pop-to-buffer dest)))