如何将 Python 列表中带有单引号的字符串元素转换为双引号

Question

我正在为 NLP 任务预处理数据，需要按以下方式构建数据：

[tokenized_sentence] tab [tags_corresponding_to_tokens]

我有一个包含数千行这种格式的文本文件，其中两个列表由制表符分隔。这是一个例子

['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.']    ['I-ORG', 'O', 'I-MISC', 'O', 'O', 'O', 'I-MISC', 'O', 'O']

我用来获取它的代码是

with open('data.txt', 'w') as foo:
    for i,j in zip(range(len(text)),range(len(tags))):
        foo.write(str([item for item in text[i].split()]) + '\t' + str([tag for tag in tags[j]]) + '\n')

其中 text 是一个包含句子的列表（即每个句子都是一个字符串），tags 是一个标签列表（即一个句子中每个word/token对应的标签是一个列表）。

我需要让列表中的字符串元素具有双引号而不是单引号，同时保持此结构。预期的输出应该是这样的

["EU", "rejects", "German", "call", "to", "boycott", "British", "lamb", "."]    ["I-ORG",  "O", "I-MISC", "O", "O", "O", "I-MISC", "O", "O"]

我已经尝试使用 Python 中 json 模块的 json.dump() 和 json.dumps()，但我没有得到所需的预期输出。相反，我将这两个列表作为字符串。我最大的努力是像这样手动添加双引号（对于标签）

for i in range(len(tags)):
    for token in tags[i]:
        tkn = "\"%s\"" %token
        print(tkn)

给出输出

"I-ORG"
"O"
"I-MISC"
"O"
"O"
"O"
"I-MISC"
"O"
"O"
"I-PER"
"I-PER"
.
.
.

但是，这似乎效率太低了。我看过这些相关问题

Convert single-quoted string to double-quoted string

但他们没有直接解决这个问题。

我正在使用 Python 3.8

Answer 1

我很确定没有办法强制 python 用双引号写字符串；默认是单引号。正如@deadshot 评论的那样，您可以在将整个字符串写入文件后将 ' 替换为 "，或者在写入每个单词时手动添加双引号。 this post 的答案有很多不同的方法，最简单的是 f'"{your_string_here}"'。不过，您需要分别编写每个字符串，因为编写列表会自动在每个项目周围添加 '，这将是非常复杂的。

将字符串写入文件后，只需执行 find and replace ' with "。

你甚至可以用 python:

# after the string is written in 'data.txt'
with open('data.txt', "r") as f:
    text = f.read()

text = text.replace("'", '"')

with open('data.txt', "w") as f:
    text = f.write(text)

根据下面OP的评论编辑

用这个代替上面的；这应该可以解决大部分问题，因为它会搜索字符串 ', '，希望它只出现在一个字符串的末尾和下一个

的开头

with open('data.txt', "r") as f:
    text = f.read()

# replace ' at the start of the list
text = text.replace("['", '["')

# replace ' at the end of the list
text = text.replace("']", '"]')

# replace ' at the item changes inside the list
text = text.replace("', '", '", "')

with open('data.txt', "w") as f:
    text = f.write(text)

（由 OP 编辑）根据我的最新评论进行的新编辑

运行这解决了我在评论中描述的问题和 returns 预期的解决方案。

with open('data.txt', "r") as f:
    text = f.read()

# replace ' at the start of the list
text = text.replace("['", '["')

# replace ' at the end of the list
text = text.replace("']", '"]')

# replace ' at the item changes inside the list
text = text.replace("', '", '", "')

text = text.replace("', ", '", ')

text = text.replace(", '", ', "')

with open('data.txt', "w") as f:
    text = f.write(text)

如何将 Python 列表中带有单引号的字符串元素转换为双引号

How to convert string elements with single quotes to double quotes in a Python list

python

preprocessor

python-3.8

根据下面OP的评论编辑

（由 OP 编辑）根据我的最新评论进行的新编辑

如何将 Python 列表中带有单引号的字符串元素转换为双引号

How to convert string elements with single quotes to double quotes in a Python list

python

preprocessor

python-3.8

根据下面OP的评论编辑

（由 OP 编辑​​）根据我的最新评论进行的新编辑

（由 OP 编辑）根据我的最新评论进行的新编辑