通过读取文本文件从文本文件中删除某些链接

Remove certain links from a textfile by reading textfile

所以我有 whitelist.txt 其中包含一些链接, scrapedlist.txt 包含其他链接,并且还有 whitelist.txt.

中的链接

我正在尝试打开并阅读 whitelist.txt 然后打开并阅读 scrapedlist.txt - 以写入一个新文件 updatedlist2.txt 它将包含 scrapedlist.txt 减去 [= 的所有内容34=]whitelist.txt.

我是 Python 的新手,所以还在学习中。我搜索了答案,这就是我想出的答案:

def whitelist_file_func():
    with open("whitelist.txt", "r") as whitelist_read:
        whitelist_read.readlines()
    whitelist_read.close()

    unique2 = set()

    with open("scrapedlist.txt", "r") as scrapedlist_read:
        scrapedlist_lines = scrapedlist_read.readlines()
    scrapedlist_read.close()

    unique3 = set()

    with open("updatedlist2.txt", "w") as whitelist_write2:
   
        for line in scrapedlist_lines:
            if unique2 not in line and line not in unique3:
                whitelist_write2.write(line)
                unique3.add(line)

我遇到了这个错误,我也不确定我的做法是否正确:

if unique2 not in line and line not in unique3:
TypeError: 'in <string>' requires string as left operand, not set

我应该怎么做才能实现上面提到的,我的代码对吗?

编辑:

whitelist.txt:

KUWAIT
ISRAEL
FRANCE

scrapedlist.txt:

USA
CANADA
GERMANY
KUWAIT
ISRAEL
FRANCE

updatedlist2.txt(应该是这样):

USA
CANADA
GERMANY

根据您的描述,我对您的代码进行了一些更改。

  1. readlines() 方法替换为 read().splitlines()。他们都读取整个文件并将每一行转换为列表项。不同之处在于 readlines() 在项目末尾包含 \n
  2. unique2unique3 被删除。我找不到它们的用法。
  3. 前两部分 whitelist_linesscrapedlist_lines 是两个包含链接的列表。根据您的描述,我们需要不在 whitelist_lines 列表中的 scrapedlist_lines 行,因此条件 if unique2 not in line and line not in unique3: 更改为 if line not in whitelist_lines:.
  4. 如果您使用的是 Python 2.5 及更高版本,可以使用 with 语句自动为您调用 close()。

最终代码为:

with open("whitelist.txt", "r") as whitelist_read:
    whitelist_lines = whitelist_read.read().split("\n")
    
with open("scrapedlist.txt", "r") as scrapedlist_read:
    scrapedlist_lines = scrapedlist_read.read().split("\n")

with open("updatedlist2.txt", "w") as whitelist_write2:
    for line in scrapedlist_lines:
        if line not in whitelist_lines:
            whitelist_write2.write(line + "\n")