通过读取文本文件从文本文件中删除某些链接
Remove certain links from a textfile by reading textfile
所以我有 whitelist.txt 其中包含一些链接, scrapedlist.txt 包含其他链接,并且还有 whitelist.txt.
中的链接
我正在尝试打开并阅读 whitelist.txt 然后打开并阅读 scrapedlist.txt - 以写入一个新文件 updatedlist2.txt 它将包含 scrapedlist.txt 减去 [= 的所有内容34=]whitelist.txt.
我是 Python 的新手,所以还在学习中。我搜索了答案,这就是我想出的答案:
def whitelist_file_func():
with open("whitelist.txt", "r") as whitelist_read:
whitelist_read.readlines()
whitelist_read.close()
unique2 = set()
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.readlines()
scrapedlist_read.close()
unique3 = set()
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if unique2 not in line and line not in unique3:
whitelist_write2.write(line)
unique3.add(line)
我遇到了这个错误,我也不确定我的做法是否正确:
if unique2 not in line and line not in unique3:
TypeError: 'in <string>' requires string as left operand, not set
我应该怎么做才能实现上面提到的,我的代码对吗?
编辑:
whitelist.txt:
KUWAIT
ISRAEL
FRANCE
scrapedlist.txt:
USA
CANADA
GERMANY
KUWAIT
ISRAEL
FRANCE
updatedlist2.txt(应该是这样):
USA
CANADA
GERMANY
根据您的描述,我对您的代码进行了一些更改。
readlines()
方法替换为 read().splitlines()
。他们都读取整个文件并将每一行转换为列表项。不同之处在于 readlines()
在项目末尾包含 \n
。
unique2
和 unique3
被删除。我找不到它们的用法。
- 前两部分
whitelist_lines
和 scrapedlist_lines
是两个包含链接的列表。根据您的描述,我们需要不在 whitelist_lines
列表中的 scrapedlist_lines
行,因此条件 if unique2 not in line and line not in unique3:
更改为 if line not in whitelist_lines:
.
- 如果您使用的是 Python 2.5 及更高版本,可以使用 with 语句自动为您调用 close()。
最终代码为:
with open("whitelist.txt", "r") as whitelist_read:
whitelist_lines = whitelist_read.read().split("\n")
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.read().split("\n")
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if line not in whitelist_lines:
whitelist_write2.write(line + "\n")
所以我有 whitelist.txt 其中包含一些链接, scrapedlist.txt 包含其他链接,并且还有 whitelist.txt.
中的链接我正在尝试打开并阅读 whitelist.txt 然后打开并阅读 scrapedlist.txt - 以写入一个新文件 updatedlist2.txt 它将包含 scrapedlist.txt 减去 [= 的所有内容34=]whitelist.txt.
我是 Python 的新手,所以还在学习中。我搜索了答案,这就是我想出的答案:
def whitelist_file_func():
with open("whitelist.txt", "r") as whitelist_read:
whitelist_read.readlines()
whitelist_read.close()
unique2 = set()
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.readlines()
scrapedlist_read.close()
unique3 = set()
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if unique2 not in line and line not in unique3:
whitelist_write2.write(line)
unique3.add(line)
我遇到了这个错误,我也不确定我的做法是否正确:
if unique2 not in line and line not in unique3:
TypeError: 'in <string>' requires string as left operand, not set
我应该怎么做才能实现上面提到的,我的代码对吗?
编辑:
whitelist.txt:
KUWAIT
ISRAEL
FRANCE
scrapedlist.txt:
USA
CANADA
GERMANY
KUWAIT
ISRAEL
FRANCE
updatedlist2.txt(应该是这样):
USA
CANADA
GERMANY
根据您的描述,我对您的代码进行了一些更改。
readlines()
方法替换为read().splitlines()
。他们都读取整个文件并将每一行转换为列表项。不同之处在于readlines()
在项目末尾包含\n
。unique2
和unique3
被删除。我找不到它们的用法。- 前两部分
whitelist_lines
和scrapedlist_lines
是两个包含链接的列表。根据您的描述,我们需要不在whitelist_lines
列表中的scrapedlist_lines
行,因此条件if unique2 not in line and line not in unique3:
更改为if line not in whitelist_lines:
. - 如果您使用的是 Python 2.5 及更高版本,可以使用 with 语句自动为您调用 close()。
最终代码为:
with open("whitelist.txt", "r") as whitelist_read:
whitelist_lines = whitelist_read.read().split("\n")
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.read().split("\n")
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if line not in whitelist_lines:
whitelist_write2.write(line + "\n")