一个晦涩的问题:记录了 VT100 'soft-wrap' 转义序列?

An obscure one: Documented VT100 'soft-wrap' escape sequence?

当通过 SSH 连接到远程 BASH 会话时(终端类型设置为 vt100),控制台命令行将在光标到达第 80 列时软换行。

我想知道的是,此时发送的 <space><carriage return> 序列是否在任何地方都有记录?

例如发送以下字符串

    std::string str = "0123456789"  // 1
                        "0123456789"
                        "0123456789"    // 3
                        "0123456789"
                        "0123456789"    // 5
                        "012345678 9"
                        "0123456789_"   // 7
                        "0123456789"
                        "0";

从主机返回以下响应(Linux Mint 碰巧)

01234567890123456789012345678901234567890123456789012345678<WS><WS><CR>90123456789_01234567890

观察到的行为并不是 bash 的一部分;相反,它是 readline 库行为的一部分。如果您简单地使用 echo(这是一个 bash 内置函数)输出足够的文本以强制自动换行,则不会发生这种情况,如果 bash 产生错误消息也不会发生比控制台宽。 (例如,尝试命令 .,参数超过 80 个字符,不对应于任何现有文件。)

所以它不是官方的“软包装序列”,也不是任何标准的一部分。相反,它是解决与控制台显示管理相关的许多恼人问题之一的实用解决方案。

换行的终端实现有歧义:

  1. 终端在最右边插入一个字符后换行。

  2. 终端在发送下一个字符之前换行。

因此,无法在最后一列位置之后可靠地发送换行符。如果终端已经换行(上面的选项 1),那么换行符将创建一个额外的空行。否则(选项 2),下面的换行符将被“吃掉”。

如今,几乎所有终端都遵循选项 2 的某些变体,这是 DEC VT-100 终端的行为。在 terminfo 终端描述数据库的词汇表中,这称为 xenl:“eat-newline-glitch”。

选项 2 实际上有两个可能的子变体。在 VT-100(和 xterm)实际实现的一个中,光标在行尾处处于异常状态;实际上,它是屏幕外一个字符的位置,因此您仍然可以在同一行中向后space 光标。其他历史悠久的终端“吃掉”了换行符,但无论如何都将光标定位在下一行的开头,这样 backspace 就不可能了。 (除非终端有 bw 能力。)

这给需要准确跟踪光标位置的程序带来了问题,即使对于像回显输入这样看似简单的应用程序也是如此。 (显然,回显输入的最简单方法是让终端自己完成,但这排除了实现额外的控制字符,如制表符完成的能力。)假设用户已经输入文本到右边距,然后键入backspace character 删除最后输入的字符。通常,您可以通过输出 cub1(向左移动 1)代码然后输出 el(清除到行尾)来实现后退 space-删除。 (如果删除在一行中间就比较复杂,但是原理是一样的。)

但是,如果光标可能位于下一行的开头,这将不起作用。如果您知道光标在下一行的开头,则可以在执行 el 之前向上移动然后向右移动,但如果光标仍在同一行上,那将不起作用。

从历史上看,被认为“正确”的是用硬 return 将光标强制移动到下一行。 (以下引用摘自 ncurses 发行版中的文件 terminfo.src。我不知道是谁或何时写的):

# Note that the <xenl> glitch in vt100 is not quite the same as on the Concept,
# since the cursor is left in a different position while in the
# weird state (concept at beginning of next line, vt100 at end
# of this line) so all versions of vi before 3.7 don't handle
# <xenl> right on vt100. The correct way to handle <xenl> is when
# you output the char in column 80, immediately output CR LF
# and then assume you are in column 1 of the next line. If <xenl>
# is on, am should be on too.

但是还有另一种方法可以解决这个问题,它甚至不需要你知道终端是否有 xenl “故障”:输出一个 space 字符,然后终端肯定会有换行,然后return到最左边一列。

事实证明,如果终端仿真器是 xterm(可能还有其他此类仿真器),这个技巧还有另一个好处,它允许您通过双击 select 一个“单词”在上面。如果自动换行发生在一个单词的中间,那么即使它被分成两行,您仍然可以 select 整个单词将是理想的。如果您遵循上面 terminfo 文件中的建议,那么 xterm 将(相当合理地)将拆分词视为两个词,因为它们之间有一个明确的换行符。但如果让终端自动换行,xterm 会将结果视为单个单词。 (尽管输出了 space 字符,它还是这样做了,大概是因为 space 字符被覆盖了。)

简而言之,SPCR 序列无论如何都不是 VT100 终端的标准化功能。相反,它是对终端描述的特定功能与特定(和通用)终端仿真器的观察行为相结合的实用响应。这段代码的变体可以在各种代码库中找到,尽管据我所知它不是任何教科书或正式文档的一部分,但它肯定是终端处理民间技术的一部分 [注 2]。

readline 的情况下,您会发现一个 comment in the code 比这个答案更具电报性:[注 1]

  /* If we're at the right edge of a terminal that supports xn, we're
     ready to wrap around, so do so.  This fixes problems with knowing
     the exact cursor position and cut-and-paste with certain terminal
     emulators.  In this calculation, TEMP is the physical screen
     position of the cursor. */

xnxenl 的缩写形式。)


注释

  1. 当我键入此答案时,评论位于 git 存储库当前视图中 display.c 的第 1326 行。在未来的版本中,它可能位于不同的行号,因此提供的 link 将不起作用。如果您发现它已更改,请随时更正 link。

  2. 在此答案的原始版本中,我将此过程描述为“终端处理民间传说的一部分”,其中我使用“民间传说”一词来描述从程序员传给程序员的知识,而不是而不是成为学术文本和国际标准的一部分。虽然“folklore”经常带有负面含义,但我没有这样的偏见使用它。 “传说”(根据wiktionary) refers to "all the facts and traditions about a particular subject that have been accumulated over time through education or experience", and is derived from an Old Germanic word meaning "teach". Folklore is therefore the accumulated education and experience of the "folk", as opposed to the establishment: in Eric S. Raymond's analogy of the Cathedral and the Bazaar,民间传说是集市的知识库。

    这种用法引起了至少一位高技能人士的注意 ,他们建议使用“深奥”一词来描述有关终端处理的这一点信息。 “深奥”(再次根据维基词典)适用于“仅供少数具有专业知识或兴趣或开明的内部圈子的人或可能被其理解的信息”,源自希腊语 ἐσωτερικός,“内部圆圈”。 (也就是说,大教堂的知识。)

    虽然语义讨论至少很有趣,但我通过使用希望不那么情绪化的词“folkcraft”来更改文本。

将换行作为一种特殊情况的原因不止一个("folklore" 似乎是一个不恰当的术语):

  • xterm FAQ That description of wrapping is odd, say more? 是许多讨论 vt100 换行的地方之一。
  • vim 和 screen both take care to not use cursor-addressing to avoid the wrapping, since that would interfere with selecting a wrapped line in xterm. Instead (and the sample seems to show bash doing this too) they send a series of printable characters which step across the margin before sending other control sequences which would prevent the line-wrapping flag from being set in xterm. This is noted in xterm's manual page:

    Logical words and lines selected by double- or triple-clicking may wrap across more than one screen line if lines were wrapped by xterm itself rather than by the application running in the window.

  • 至于"comments in code" - 当然有,向维护者解释什么不应该改变。这个来自 Sven Mascheck 的 XTerm 资源文件给出了很好的解释:

    ! Wether this works also with _wrapped_ selections, depends on ! - the terminal emulator: Neither MIT X11R5/6 nor Suns openwin xterm ! know about that. Use the 'xfree xterm' or 'rxvt'. Both compile on ! all major platforms. ! - It only works if xterm is wrapping the line itself ! (not always really obvious for the user, though). ! - Among the different vi's, vim actually supports this with a ! clever and little hackish trick (see screen.c): ! ! But before: vim inspects the _name_ of the value of TERM. ! This must be similar to "xterm" (like "xterm-xfree86", which is ! better than "xterm-color", btw, see his FAQ). ! The terminfo entry _itself_ doesn't matter here ! (e.g.: 'xterm' and 'vs100' are the same entry, but with ! the latter it doesn't work). ! ! If vim has to wrap a word, it appends a space at the first part, ! this space will be wrapped by xterm. Going on with writing, vim ! in turn then positions the cursor again at the _beginning_ of this ! next line. Thus, the space is not visible. But xterm now believes ! that the two lines are actually a single one--as xterm _has_ done ! some wrapping also...

@rici 引用的评论来自 Eric Raymond 于 1995 年从 SCO 合并的 terminfo 文件。history section of the terminfo source refers to this. Some of the material in that is based on the BSD termcap sources, but differs, as one would notice when comparing the BSD termcap in this section with ncurses。以 "not quite" 开头的四个段落与 SCO 文件相同(除了换行)。这是该文件中的 cut/paste:

# # --------------------------------
#
# dec: DEC (DIGITAL EQUIPMENT CORPORATION)
#
# Manufacturer: DEC (DIGITAL EQUIPTMENT CORP.)
# Class:    II
# 
# Info:
#   Note that xenl glitch in vt100 is not quite the same as concept,
#   since the cursor is left in a different position while in the
#   weird state (concept at beginning of next line, vt100 at end
#   of this line) so all versions of vi before 3.7 don't handle
#   xenl right on vt100. The correct way to handle xenl is when
#   you output the char in column 80, immediately output CR LF
#   and then assume you are in column 1 of the next line. If xenl
#   is on, am should be on too.
#   
#   I assume you have smooth scroll off or are at a slow enough baud
#   rate that it doesn't matter (1200? or less). Also this assumes
#   that you set auto-nl to "on", if you set it off use vt100-nam 
#   below.
#   
#   The padding requirements listed here are guesses. It is strongly
#   recommended that xon/xoff be enabled, as this is assumed here.
#   
#   The vt100 uses rs2 and rf rather than is2/tbc/hts because the 
#   tab settings are in non-volatile memory and don't need to be 
#   reset upon login. Also setting the number of columns glitches 
#   the screen annoyingly. You can type "reset" to get them set.
#
# smkx and rmkx, given below, were removed. 
# smkx=\E[?1h\E=, rmkx=\E[?1l\E>,
# Somtimes smkx and rmkx are included.  This will put the auxilliary keypad in
# dec application mode, which is not appropriate for SCO applications.
vt100|vt100-am|dec vt100 (w/advanced video),

如果比较两者,ncurses 版本在 terminfo 功能名称周围添加了尖括号,并且在第一句中进行了细微的语法更改。但是评论的作者显然不是Raymond