在 python 中编码字符串
encoding strings in python
我正在尝试对从 Excel 文档中获取的一段文本进行编码。它包含各种奇怪的字符,如引号、反斜杠、括号等。将其转换为 Python 兼容字符串的正确方法是什么,以便我可以处理它并将其写入变量?
ExampleText = "MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS. At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws @ 6" o.c."
我试过:str(ExampleText)
但显然失败了。
感谢您的帮助!
ps。这是我得到的错误:UnicodeEncodeError: ('unknown', '\x00', 0, 1, '')
ps2。我在 IronPython2.7 我知道一个无赖:-(
来自我们在评论中的对话
# -*- coding: utf-8 -*-
ExampleText = '"MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS. At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws @ 6" o.c."'
print(ExampleText)
编码 header 行是必需的,因为其中有 non-ascii 个字符。
您也可以用 '''
或 """
:
包裹文字
x = '''some string'''
x = """some string"""
请注意,更好的解决方案可能是使用 csv.
等包直接从数据中获取字符串而不是 copying/pasting 到您的代码中
如果给定的代码与您所拥有的完全匹配,难怪它会出现问题。您用双引号将其括起来,但该字符串包含双引号。保持原样,当解释器看到下一个双引号时,字符串将结束,然后会有一堆它无法识别的术语(如 DIAMETER
和 POWDER
),然后最终另一个字符串将开始等等。
您需要使用反斜杠对字符串的双引号进行转义,或者在字符串的两边分别用三个引号引起来。
ExampleText = "MINIMUM TRACK FASTENING SHALL BE 0.145\" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8\" CENTERS FOR BEARING WALLS, AND AT 12\" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2\" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS. At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145\" x 1 1/2\" powder actuated fasteners spaced on 4\" centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8\" DIA. 2205 expansion anchors w/ 2 1/2\" min. embedment - OR-Simpson \"Titen\" screws @ 6\" o.c."
或
ExampleText = """MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS. At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws @ 6" o.c."""
SO 内置的语法高亮表示你的样本由几个字符串组成,而我的是一个连续的字符串。
另外,字符串只包含正斜杠,没有反斜杠,所以没有问题。如果有反斜杠并且您想解决这个问题,您可以在字符串前面加上 r
来表示原始字符串:r'hello\nworld
打印为 hello\nworld
。原始字符串唯一不能处理的是字符串中的最后一个字符是反斜杠。通过在后面添加来解决这个问题:r'C:\Users\jsmith' + '\'
或 r'C:\Users\jsmith' '\'
(在连接文字字符串时 +
不是绝对必要的)。
只有在将字符串写入源代码时才需要这样做。自动处理来自 input()
或文件等外部来源的字符串。
您可以使用 re
包中的 escape()
函数:
>>> import re
>>> re.escape(ExampleText)
'\"MINIMUM\ TRACK\ FASTENING\ SHALL\ BE\ 0.145\"\ DIAMETER ...'
>>> ExampleText = ExampleText.decode('string_escape')
'"MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER ...'
escape()
函数将转义所有非字母数字字符及其双反斜杠等效字符。这应该可以很好地处理您的输入字符串。
我正在尝试对从 Excel 文档中获取的一段文本进行编码。它包含各种奇怪的字符,如引号、反斜杠、括号等。将其转换为 Python 兼容字符串的正确方法是什么,以便我可以处理它并将其写入变量?
ExampleText = "MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS. At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws @ 6" o.c."
我试过:str(ExampleText)
但显然失败了。
感谢您的帮助!
ps。这是我得到的错误:UnicodeEncodeError: ('unknown', '\x00', 0, 1, '') ps2。我在 IronPython2.7 我知道一个无赖:-(
来自我们在评论中的对话
# -*- coding: utf-8 -*-
ExampleText = '"MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS. At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws @ 6" o.c."'
print(ExampleText)
编码 header 行是必需的,因为其中有 non-ascii 个字符。
您也可以用 '''
或 """
:
x = '''some string'''
x = """some string"""
请注意,更好的解决方案可能是使用 csv.
等包直接从数据中获取字符串而不是 copying/pasting 到您的代码中如果给定的代码与您所拥有的完全匹配,难怪它会出现问题。您用双引号将其括起来,但该字符串包含双引号。保持原样,当解释器看到下一个双引号时,字符串将结束,然后会有一堆它无法识别的术语(如 DIAMETER
和 POWDER
),然后最终另一个字符串将开始等等。
您需要使用反斜杠对字符串的双引号进行转义,或者在字符串的两边分别用三个引号引起来。
ExampleText = "MINIMUM TRACK FASTENING SHALL BE 0.145\" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8\" CENTERS FOR BEARING WALLS, AND AT 12\" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2\" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS. At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145\" x 1 1/2\" powder actuated fasteners spaced on 4\" centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8\" DIA. 2205 expansion anchors w/ 2 1/2\" min. embedment - OR-Simpson \"Titen\" screws @ 6\" o.c."
或
ExampleText = """MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS. At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws @ 6" o.c."""
SO 内置的语法高亮表示你的样本由几个字符串组成,而我的是一个连续的字符串。
另外,字符串只包含正斜杠,没有反斜杠,所以没有问题。如果有反斜杠并且您想解决这个问题,您可以在字符串前面加上 r
来表示原始字符串:r'hello\nworld
打印为 hello\nworld
。原始字符串唯一不能处理的是字符串中的最后一个字符是反斜杠。通过在后面添加来解决这个问题:r'C:\Users\jsmith' + '\'
或 r'C:\Users\jsmith' '\'
(在连接文字字符串时 +
不是绝对必要的)。
只有在将字符串写入源代码时才需要这样做。自动处理来自 input()
或文件等外部来源的字符串。
您可以使用 re
包中的 escape()
函数:
>>> import re
>>> re.escape(ExampleText)
'\"MINIMUM\ TRACK\ FASTENING\ SHALL\ BE\ 0.145\"\ DIAMETER ...'
>>> ExampleText = ExampleText.decode('string_escape')
'"MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER ...'
escape()
函数将转义所有非字母数字字符及其双反斜杠等效字符。这应该可以很好地处理您的输入字符串。