Python,在一个文件中,有逗号分隔的值,如何检查行之间的重复值并删除重复的行?
Python, In a file, that has comma separated values, how to check for duplicated values between lines and delete the duplicated lines?
我有一个这种格式的 txt 文件:
- 01,西班牙
- 02,美国
- 03,印度
- 01,意大利
- 01,葡萄牙
- 04,巴西
我需要检查号码是否重复,例如在这个例子中,号码“01”有西班牙、意大利和葡萄牙。如果两行或多行具有相同的数字,我只需要保留重复数字的第一个出现并摆脱其他 apparitions.It 将在文件中显示:
- 01,西班牙
- 02,美国
- 03,印度
- 04,巴西
import sets
seen = sets.Set()
with open('in.txt', 'r'), open('out.txt', 'w') as fr, fw:
for line in fr:
row = line.split(',')
if row[0] not in seen:
fw.write(line)
seen.add(row[0])
import os
with open("file.txt", "r") as infile:
numbers = set()
f = open("_file.txt", "w")
for line in infile:
tokens = line.split(',')
if int(tokens[0]) not in numbers:
numbers.add(int(tokens[0]))
f.write(line)
f.close()
os.remove("file.txt")
os.rename("_file.txt", "file.txt")
# Read your entire file into memory.
my_file = 'my_file.txt'
with open(my_file) as f_in:
content = f_in.readlines()
# Keep track of the numbers that have already appeared
# while rewriting the content back to your file.
numbers = []
with open(my_file, 'w') as f_out:
for line in content:
number, country = line.split(',')
if not number in numbers:
f_out.write(line)
numbers.append(number)
我希望这是最容易理解的。
我有一个这种格式的 txt 文件:
- 01,西班牙
- 02,美国
- 03,印度
- 01,意大利
- 01,葡萄牙
- 04,巴西
我需要检查号码是否重复,例如在这个例子中,号码“01”有西班牙、意大利和葡萄牙。如果两行或多行具有相同的数字,我只需要保留重复数字的第一个出现并摆脱其他 apparitions.It 将在文件中显示:
- 01,西班牙
- 02,美国
- 03,印度
- 04,巴西
import sets
seen = sets.Set()
with open('in.txt', 'r'), open('out.txt', 'w') as fr, fw:
for line in fr:
row = line.split(',')
if row[0] not in seen:
fw.write(line)
seen.add(row[0])
import os
with open("file.txt", "r") as infile:
numbers = set()
f = open("_file.txt", "w")
for line in infile:
tokens = line.split(',')
if int(tokens[0]) not in numbers:
numbers.add(int(tokens[0]))
f.write(line)
f.close()
os.remove("file.txt")
os.rename("_file.txt", "file.txt")
# Read your entire file into memory.
my_file = 'my_file.txt'
with open(my_file) as f_in:
content = f_in.readlines()
# Keep track of the numbers that have already appeared
# while rewriting the content back to your file.
numbers = []
with open(my_file, 'w') as f_out:
for line in content:
number, country = line.split(',')
if not number in numbers:
f_out.write(line)
numbers.append(number)
我希望这是最容易理解的。