FeedParser,删除特殊字符并写入 CSV

FeedParser, Removing Special Characters and Writing to CSV

我正在学习 Python。我为自己设定了一个小目标,那就是构建一个 RSS 抓取工具。我正在尝试收集作者、Link 和标题。我想从那里写入 CSV。

我遇到了一些问题。自昨晚以来,我一直在寻找答案,但似乎找不到解决方案。我确实有一种感觉,我在 feedparser 正在解析的内容和将其移动到 CSV 之间缺少一些知识,但我还没有词汇表,不知道要 Google.

  1. 如何删除“[”和“'”等特殊字符?
  2. 如何在创建新文件时将作者、link 和标题写入新行?

1 个特殊字符

rssurls = 'http://feeds.feedburner.com/TechCrunch/'

techart = feedparser.parse(rssurls)
# feeds = []

# for url in rssurls:
#     feedparser.parse(url)
# for feed in feeds:
#     for post in feed.entries:
#         print(post.title)

# print(feed.entires)

techdeets = [post.author + " , " + post.title + " , " + post.link  for post in techart.entries]
techdeets = [y.strip() for y in techdeets]
techdeets

输出:我得到了我需要的信息,但 .strip 标签没有剥离。

['Darrell Etherington , Spin launches first city-sanctioned dockless bike sharing in Bay Area , http://feedproxy.google.com/~r/Techcrunch/~3/BF74UZWBinI/', 'Ryan Lawler , With .3 million in funding, CarDash wants to change how you get your car serviced , http://feedproxy.google.com/~r/Techcrunch/~3/pkamfdPAhhY/', 'Ron Miller , AlienVault plug-in searches for stolen passwords on Dark Web , http://feedproxy.google.com/~r/Techcrunch/~3/VbmdS0ODoSo/', 'Lucas Matney , Firefox for Windows gets native WebVR support, performance bumps in latest update , http://feedproxy.google.com/~r/Techcrunch/~3/j91jQJm-f2E/',...]

2) 写入 CSV

import csv

savedfile = open('/test1.txt', 'w')
savedfile.write(str(techdeets) + "/n")
savedfile.close()

import pandas as pd
df = pd.read_csv('/test1.txt', encoding='cp1252')
df

输出: 输出是一个只有 1 行和多列的数据框。

你快到了:-)

如何使用 pandas 先创建一个数据框然后保存它,像这样 "continuing from your code":

df = pd.DataFrame(columns=['author', 'title', 'link'])
for i, post in enumerate(techart.entries):
    df.loc[i] = post.author, post.title, post.link

那你就可以保存了:

df.to_csv('myfilename.csv', index=False)

您也可以直接从 feedparser 条目写入数据框:

>>> import feedparser
>>> import pandas as pd
>>>
>>> rssurls = 'http://feeds.feedburner.com/TechCrunch/'
>>> techart = feedparser.parse(rssurls)
>>>
>>> df = pd.DataFrame()
>>>
>>> df['author'] = [post.author for post in techart.entries]
>>> df['title'] = [post.title for post in techart.entries]
>>> df['link'] = [post.link for post in techart.entries]