在 python 中编码字符串

encoding strings in python

我正在尝试对从 Excel 文档中获取的一段文本进行编码。它包含各种奇怪的字符,如引号、反斜杠、括号等。将其转换为 Python 兼容字符串的正确方法是什么,以便我可以处理它并将其写入变量?

ExampleText = "MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS.  At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws  @ 6" o.c."

我试过:str(ExampleText)但显然失败了。

感谢您的帮助!

ps。这是我得到的错误:UnicodeEncodeError: ('unknown', '\x00', 0, 1, '') ps2。我在 IronPython2.7 我知道一个无赖:-(

来自我们在评论中的对话

# -*- coding: utf-8 -*-

ExampleText = '"MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS.  At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws  @ 6" o.c."'

print(ExampleText)

编码 header 行是必需的,因为其中有 non-ascii 个字符。

您也可以用 '''""":

包裹文字
x = '''some string'''
x = """some string"""

请注意,更好的解决方案可能是使用 csv.

等包直接从数据中获取字符串而不是 copying/pasting 到您的代码中

如果给定的代码与您所拥有的完全匹配,难怪它会出现问题。您用双引号将其括起来,但该字符串包含双引号。保持原样,当解释器看到下一个双引号时,字符串将结束,然后会有一堆它无法识别的术语(如 DIAMETERPOWDER),然后最终另一个字符串将开始等等。

您需要使用反斜杠对字符串的双引号进行转义,或者在字符串的两边分别用三个引号引起来。

ExampleText = "MINIMUM TRACK FASTENING SHALL BE 0.145\" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8\" CENTERS FOR BEARING WALLS, AND AT 12\" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2\" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS.  At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145\" x 1 1/2\" powder actuated fasteners spaced on 4\" centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8\" DIA. 2205 expansion anchors w/ 2 1/2\" min. embedment - OR-Simpson \"Titen\" screws  @ 6\" o.c."

ExampleText = """MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS.  At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws  @ 6" o.c."""

SO 内置的语法高亮表示你的样本由几个字符串组成,而我的是一个连续的字符串。

另外,字符串只包含正斜杠,没有反斜杠,所以没有问题。如果有反斜杠并且您想解决这个问题,您可以在字符串前面加上 r 来表示原始字符串:r'hello\nworld 打印为 hello\nworld。原始字符串唯一不能处理的是字符串中的最后一个字符是反斜杠。通过在后面添加来解决这个问题:r'C:\Users\jsmith' + '\'r'C:\Users\jsmith' '\'(在连接文字字符串时 + 不是绝对必要的)。

只有在将字符串写入源代码时才需要这样做。自动处理来自 input() 或文件等外部来源的字符串。

您可以使用 re 包中的 escape() 函数:

>>> import re
>>> re.escape(ExampleText)
    '\"MINIMUM\ TRACK\ FASTENING\ SHALL\ BE\ 0.145\"\ DIAMETER ...'
>>> ExampleText = ExampleText.decode('string_escape')
    '"MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER ...'

escape() 函数将转义所有非字母数字字符及其双反斜杠等效字符。这应该可以很好地处理您的输入字符串。