javac 如何处理 Unicode 字形?

How does javac process Unicode glyphs?

我尝试了 System.out.println("ñ");,它打印了 ñ。为什么javac 运行 没有通过错误?

Javac 可以配置为具有源文件编码。这样,您就可以将字符文字(和符号名称!)与非 ASCII 字符一起使用。

如果与文件编码实际匹配,则一切正常。

如果不是,您可能会遇到错误,但更可能的是,只是一些损坏的字符串。

为了再次打印文本,程序还需要知道在打印时使用哪种编码。所有这些都需要正确配置(Java中的默认值不可移植),否则你会得到各种破碎的文本输出。

Java char String 原生为 UTF-16。它可以处理 'ñ' 和 "ñ"。

JLS-3.1. Unicode 说(部分),

The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.

JLS-3.2. Lexical Structure 对此进行了扩展,

A raw Unicode character stream is translated into a sequence of tokens, using the following three lexical translation steps, which are applied in turn:

  1. A translation of Unicode escapes (§3.3) in the raw stream of Unicode characters to the corresponding Unicode character. A Unicode escape of the form \uxxxx, where xxxx is a hexadecimal value, represents the UTF-16 code unit whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

  2. A translation of the Unicode stream resulting from step 1 into a stream of input characters and line terminators (§3.4).

  3. A translation of the stream of input characters and line terminators resulting from step 2 into a sequence of input elements (§3.5) which, after white space (§3.6) and comments (§3.7) are discarded, comprise the tokens (§3.5) that are the terminal symbols of the syntactic grammar (§2.3).