从 Doc 的迭代器中提取数据
Extracting data from an iterator of Doc's
我正在使用 yanytapi (https://pypi.org/project/yanytapi/) Python 纽约时报包装器 Api。我设法 运行 我的搜索并以 JSON 格式获取数据 运行 宁以下代码:
obama = api.search("Obama",
fq={"headline": "Obama",
"source": ["Reuters",
"AP",
"The New York Times"]},
begin_date="20190701", # this can also be an int
facet_field=["source", "day_of_week"],
facet_filter=True)
for item in obama:
print(item)
输出如下所示:
{"_id": "nyt://article/2c48c662-6053-562e-8187-88c954f5983f", "blog":
{}, "byline": {"original": "By Arit John", "person": [{"firstname":
"Arit", "middlename": null, "lastname": "John", "qualifier": null,
"title": null, "role": "reported", "organization": "", "rank": 1}],
"organization": null}, "document_type": "article", "headline":
{"main": "Obama Shares His Summer Reading List", "kicker": null,
"content_kicker": null, "print_headline": "Barack Obama Shares His
Reading List", "name": null, "seo": null, "sub": null}, "keywords":
[{"name": "subject", "value": "Writing and Writers", "rank": 1,
"major": "N"}, {"name": "subject", "value": "Books and Literature",
"rank": 2, "major": "N"}, {"name": "persons", "value": "Obama,
Barack", "rank": 3, "major": "N"}]....
我尝试提取数据并将其放入 df 运行宁以下内容:
users_locs = [[article['_id'], article["document_type"]] for article in obama]
df = pd.DataFrame(data=users_locs, columns=['ID', 'type'])
df
但是我的数据框是空的?为什么?我该如何提取它?
根据文档,文章是 Doc
对象,要访问不同的字段,您应该使用 .<field_name>
语法,例如:
obama = api.search("Obama",
fq={"headline": "Obama",
"source": ["Reuters",
"AP",
"The New York Times"]},
begin_date="20190821", # this can also be an int
facet_field=["source", "day_of_week"],
facet_filter=True)
users_locs = [[article._id, article.document_type] for article in obama]
df = pd.DataFrame(data=users_locs, columns=['ID', 'type'])
df
这是我的结果:
ID type
0 nyt://article/5722feb7-c751-50dd-ac84-85526e11... article
1 nyt://article/3577d507-ba57-5b9c-bcee-b1542650... article
2 nyt://article/9c2f0502-8264-5645-af44-d8656d5d... article
3 nyt://article/b55ca58d-dc0f-5f5f-a01c-178d2fc7... article
4 nyt://article/f3596774-562f-5c74-b62f-2c60f2d2... article
5 nyt://article/d783f1e3-26b3-561d-9455-5f2e035b... article
6 nyt://article/aa503b22-66ab-5796-a923-e3c99c79... article
7 nyt://article/41e68733-a47e-58bc-bbc8-f93397f2... article
8 nyt://article/98bc5831-3639-5abc-a339-3e1d74fc... article
9 nyt://article/ff30c8ef-bf58-5ce8-9d92-4b25a464... article
我正在使用 yanytapi (https://pypi.org/project/yanytapi/) Python 纽约时报包装器 Api。我设法 运行 我的搜索并以 JSON 格式获取数据 运行 宁以下代码:
obama = api.search("Obama",
fq={"headline": "Obama",
"source": ["Reuters",
"AP",
"The New York Times"]},
begin_date="20190701", # this can also be an int
facet_field=["source", "day_of_week"],
facet_filter=True)
for item in obama:
print(item)
输出如下所示:
{"_id": "nyt://article/2c48c662-6053-562e-8187-88c954f5983f", "blog": {}, "byline": {"original": "By Arit John", "person": [{"firstname": "Arit", "middlename": null, "lastname": "John", "qualifier": null, "title": null, "role": "reported", "organization": "", "rank": 1}], "organization": null}, "document_type": "article", "headline": {"main": "Obama Shares His Summer Reading List", "kicker": null, "content_kicker": null, "print_headline": "Barack Obama Shares His Reading List", "name": null, "seo": null, "sub": null}, "keywords": [{"name": "subject", "value": "Writing and Writers", "rank": 1, "major": "N"}, {"name": "subject", "value": "Books and Literature", "rank": 2, "major": "N"}, {"name": "persons", "value": "Obama, Barack", "rank": 3, "major": "N"}]....
我尝试提取数据并将其放入 df 运行宁以下内容:
users_locs = [[article['_id'], article["document_type"]] for article in obama]
df = pd.DataFrame(data=users_locs, columns=['ID', 'type'])
df
但是我的数据框是空的?为什么?我该如何提取它?
根据文档,文章是 Doc
对象,要访问不同的字段,您应该使用 .<field_name>
语法,例如:
obama = api.search("Obama",
fq={"headline": "Obama",
"source": ["Reuters",
"AP",
"The New York Times"]},
begin_date="20190821", # this can also be an int
facet_field=["source", "day_of_week"],
facet_filter=True)
users_locs = [[article._id, article.document_type] for article in obama]
df = pd.DataFrame(data=users_locs, columns=['ID', 'type'])
df
这是我的结果:
ID type
0 nyt://article/5722feb7-c751-50dd-ac84-85526e11... article
1 nyt://article/3577d507-ba57-5b9c-bcee-b1542650... article
2 nyt://article/9c2f0502-8264-5645-af44-d8656d5d... article
3 nyt://article/b55ca58d-dc0f-5f5f-a01c-178d2fc7... article
4 nyt://article/f3596774-562f-5c74-b62f-2c60f2d2... article
5 nyt://article/d783f1e3-26b3-561d-9455-5f2e035b... article
6 nyt://article/aa503b22-66ab-5796-a923-e3c99c79... article
7 nyt://article/41e68733-a47e-58bc-bbc8-f93397f2... article
8 nyt://article/98bc5831-3639-5abc-a339-3e1d74fc... article
9 nyt://article/ff30c8ef-bf58-5ce8-9d92-4b25a464... article