删除包含 2 个单词的引号并删除它们之间的逗号

Question

跟进 Python to replace a symbol between between 2 words in a quote

扩展输入和预期输出：

试图用 &[=46= 替换第二行中两个词 Durango 和 PC 之间的 comma ] 然后也删除引号 "。与 Orbis 和 PC 和 第四行 的第三行相同，我会在引号中包含 2 个单词组合喜欢处理 "AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC"

我想使用 Python 保留其余的行。

输入

2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened 2,Kenny Chong,Core Tech - Rendering,"Durango, PC",55,Reopened 3,SIN-Audio,AAA - Audio,"Orbis, PC",13,Open LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,"AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC",29,Waiting For ... ... ...

像这样，我的示例中可以有 100 行。所以预期输出是：

2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened 2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened 3,SIN-Audio,AAA - Audio, Orbis & PC,13,Open LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango, Orbis & PC,29,Waiting For ... ... ...

到目前为止，我可以考虑逐行阅读，然后如果该行包含引号，则将其替换为无字符，但是替换里面的符号是我遇到的问题。

这是我现在拥有的：

for line in lines: expr2 = re.findall('"(.*?)"', line) if len(expr2)!=0: expr3 = re.split('"',line) expr4 = expr3[0]+expr3[1].replace(","," &")+expr3[2] print >>k, expr4 else: print >>k, line

但是没有考虑第4行的情况？也可以有超过 3 个组合。例如

3,SIN-Audio,"AAA - Audio, xxxx, yyyy","Orbis, PC","13, 22",Open

并希望做这个 3,SIN-Audio,AAA - Audio & xxxx & yyyy, Orbis & PC, 13 & 22,Open

如何实现，有什么建议吗？学习Python.

Answer 1

因此，通过将输入文件视为 .csv 我们可以轻松地将这些行变成易于使用的内容。

例如，

2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened

读作：

['2', 'Kenny Chong', 'Core Tech - Rendering', 'Durango, PC', '55', 'Reopened']

然后，通过将 , 的所有实例替换为 _& (space)，我们将得到以下行：

['2', 'Kenny Chong', 'Core Tech - Rendering', 'Durango & PC', '55', 'Reopened']

并且它在一行中替换了多个 ,s 的实例，最后写入时我们不再有原来的双引号。

这是代码，假设 in.txt 是您的输入文件，它将写入 out.txt。

import csv

with open('in.txt') as infile:
    reader = csv.reader(infile)

    with open('out.txt', 'w') as outfile:
        for line in reader:
            line = list(map(lambda s: s.replace(',', ' &'), line))
            outfile.write(','.join(line) + '\n')

第四行输出为：

LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango & Orbis & PC,29,Waiting For

Answer 2

请检查一次：我找不到可以做到这一点的单个表达式。所以它以一种有点复杂的方式做到了。如果我能找到更好的方法会更新(Python 3)

import re
st = "3,SIN-Audio,\"AAA - Audio, xxxx, yyyy\",\"Orbis, PC\",\"13, 22\",Open"
found = re.findall(r'\"(.*)\"',st)[0].split("\",\"")
final = ""
for word in found:
    final = final + (" &").join(word.split(","))+","
result = re.sub(r'\"(.*)\"',final[:-1],st)
print(result)

删除包含 2 个单词的引号并删除它们之间的逗号

Remove quotes holding 2 words and remove comma between them

python

text-processing

process

data-processing

delimiter