删除巨大的 .txt 文件中的逗号和后续文本

Question

我需要一些帮助来解决这个问题。我有一个巨大的单词列表作为一个 .txt 文件，几乎有 10 万行。

问题是，有些行后面有一个逗号和一些文本，就像这样

hi, ho
i
am, em
yellow

我需要删除所有包含它们的行的逗号，以及逗号后面的文本以获得这种格式：

hi
i
am
yellow

Answer 1

试试这个

import fileinput
filename='abc.txt'
with open(filename, 'r') as f:
    readl=f.readlines()

with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
    for line in readl:
        if ',' in line:
             print(line.replace(line.strip('\n'), line.strip('\n').split(',')[0]), end='')

Answer 2

这应该可以解决问题：

out = open("out.txt", "w")
with open("file.txt", encoding="utf-8") as f:
    for line in f.readlines():
        idx = line.find(",") 
        index = idx if idx != -1 else len(line)
        out.write(line[0:idx]+"\n")
out.close()

它从名为 file.txt 的文件中读取行并将格式化后的版本保存到名为 out.txt 的文件中输入：

hi, ho
i
am, em
yellow
,
hey, ge
gibberish, he
años, luz detrás

输出：

hi
i
am
yellow

hey
gibberish
años

Answer 3

只要不是为了 Python 中的更大项目，我就可以在 PowerShell 中快速完成此类操作。

我刚刚测试了我刚刚根据您提供的示例编写的快速脚本。

$txtfile = get-content C:\YourPath\YourFile.txt
$txtfile[0]
$myarray =@()
foreach ($line in $txtfile){
    $newline = [string]$line.Trim()
    $final = $newline -split ",", 2 | select -First 1
    $myarray += $final
}

$myarray | out-file C:\YourPath\OutFile.txt

删除巨大的 .txt 文件中的逗号和后续文本

Remove commas and following text on a huge .txt file

python

formatting

text

comma

txt