打开文件前检测编码

Question

我得到一个字符编码未知的文件。运行 file -bi test.trace returns text/plain; charset=us-ascii 但使用

(with-open-file (stream "/home/*/test.trace" :external-format :us-ascii)
 (code-to-work-with-file))

给我一个例外：

:ASCII stream decoding error on
#<SB-SYS:FD-STREAM for "file /home/*/test.trace"   {10208D2723}>:

  the octet sequence #(194) cannot be decoded.    [Condition of type SB-INT:STREAM-DECODING-ERROR]

如何在打开文件之前检测文件的编码？

我可以用 emacs、less 和 nano 打开文件，所以这似乎是编码的错误检测或 [=17] 的差异=] 和 sbcl 认为编码应该是这样的。

我目前通过强制每个文件使用 vim +set nobomb | set fenc=utf8| x file-path 的 utf8 编码来避免这个问题。但即使在这之后 file 仍然认为它是 us-ascii 编码。另外，这不是一个有效的永久解决方案，而是一个让它工作的肮脏黑客。

Answer 1

正如 here、

中的 prorgammers stackexchange 所指出的

Files generally indicate their encoding with a file header. There are many examples here. However, even reading the header you can never be sure what encoding a file is really using.

我在我的系统中寻找跟踪文件并找到了这个，但这没有任何有趣的东西

2016-06-22 13:10:07☆|ruby-2.2.3@laguna| ~/learn/lisp/Whosebug/scripts 中的 Antonios-MacBook-Pro ○ → 文件 -I resources/hello.trace resources/hello.trace: text/plain; charset=us-ascii

2016-06-22 13:11:50☆|ruby-2.2.3@laguna| ~/learn/lisp/Whosebug/scripts 中的 Antonios-MacBook-Pro ○ → 猫 resources/hello.trace 打印！ { "Hello, World!" } 打印！ { 连续！ ( "Hello, World!" , "\n" ) }

所以有了这段代码我可以阅读它：

CL-USER> (with-open-file (in "/Users/toni/learn/lisp/Whosebug/scripts/resources/hello.trace" :external-format :us-ascii)
           (when in
    (loop for line = (read-line in nil)
         while line do (format t "~a~%" line))))
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
NIL

甚至用中文或其他什么：

我们可以这样读取ascci字符

CL-USER> (format nil "~{~C~}" (mapcar #'code-char '(194)))
"Â"

或任何其他奇怪的字符，所以看起来可以是带有重音符号的字符我将其添加到文件中：

 println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
Â
patatopita

我得到同样的错误：

：

上的 ASCII 流解码错误

对于"file /Users/toni/learn/lisp/Whosebug/scripts/resources/hello.trace" {1003994043}>:

无法解码八位字节序列#(195)。 [类型 SB-INT 的条件：STREAM-DECODING-ERROR]

所以此时你可以使用条件和重新启动来更改字符，有一个选项，我不是这种代码的专家，但可以重新启动

Restarts:
 0: [ATTEMPT-RESYNC] Attempt to resync the stream at a character boundary and continue.
 1: [FORCE-END-OF-FILE] Force an end of file.
 2: [INPUT-REPLACEMENT] Use string as replacement input, attempt to resync at a character boundary and continue.
 3: [*ABORT] Return to SLIME's top level.
 4: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {10050E0003}>)

输入替换，如果不是，请尝试使用 latin-1 或 ISO 等欧洲语言....

CL-USER> (with-open-file (in "/Users/toni/learn/lisp/Whosebug/scripts/resources/hello.trace" :external-format :latin-1)
           (when in
    (loop for line = (read-line in nil)
         while line do (format t "~a~%" line))))
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
Â¬
patatopita
NIL

应该可以，祝你好运

所以让我们用欧洲字符集阅读

打开文件前检测编码

Detecting encoding before opening a file

sbcl

common-lisp

character-encoding

file-handling