将正文保存在 csv 文件中 | Python 3

Question

我正在尝试创建一个包含多篇文章的数据库，用于文本挖掘。我通过网络抓取提取正文，然后将这些文章的正文保存在 csv 文件中。但是，我无法保存所有正文。我想出的代码只保存最后 URL （文章）的文本，而如果我打印我正在抓取的内容（以及我应该保存的内容），我会获得所有文章的正文。

我刚刚包含了列表中的一些 URL（其中包含更多的 URL）只是为了给你一个想法：

import requests
from bs4 import BeautifulSoup
import csv

r=["http://www.nytimes.com/2016/10/12/world/europe/germany-arrest-syrian-refugee.html",
"http://www.nytimes.com/2013/06/16/magazine/the-effort-to-stop-the-    attack.html",
"http://www.nytimes.com/2016/10/06/world/europe/police-brussels-knife-terrorism.html",
"http://www.nytimes.com/2016/08/23/world/europe/france-terrorist-attacks.html",
"http://www.nytimes.com/interactive/2016/09/09/us/document-Review-of-the-San-Bernardino-Terrorist-Shooting.html",
]

for url in r:
    t= requests.get(url)
    t.encoding = "ISO-8859-1"
    soup = BeautifulSoup(t.content, 'lxml')
    text = soup.find_all(("p",{"class": "story-body-text story-content"}))
    print(text)
with open('newdb30.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(text)

Answer 1

尝试在for循环之前声明变量，例如all_text = ""，并在for循环结束时通过all_text += text + "\n"将text添加到all_text（\n 创建一个新行）。

然后，在最后一行，不要写 text，而是写 all_text.

将正文保存在 csv 文件中 | Python 3

Save body text on csv file | Python 3

python

csv

screen-scraping

web