Pandas 数据框
Pandas Dataframe
我想使用 pandas 数据框、列名称 - 产品标题和填充 t 来表示数据。
例如:
产品标题
漫威:电影 Collection
奇迹
迪尼电影等等..
import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd
r= requests.get("http://www.walmart.com/search/?query=marvel&cat_id=4096_530598")
r.content
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", {"class" : "tile-conent"})
g_price = soup.find_all("div",{"class" : "item-price-container"})
g_star = soup.find_all("div",{"class" : "stars stars-small tile-row"})
for product_title in g_data:
a_product_title = product_title.find_all("a","js-product-title")
for text_product_title in a_product_title :
t = text_product_title.text
print t
期望的输出-
Product Title :
Marvel Heroes: Collection
Marvel: Guardians Of The Galaxy (Widescreen)
Marvel Complete Giftset (Widescreen)
Marvel's The Avengers (Widescreen)
Marvel Knights: Wolverine Versus Sabretooth - Reborn (Widescreen)
Superheroes Collection: The Incredible Hulk Returns / The Trial Of The Incredible Hulk / How To Draw Comics The Marvel Way (Widescreen)
Marvel: Iron Man & Hulk - Heroes United (Widescreen)
Marvel's The Avengers (DVD + Blu-ray) (Widescreen)
Captain America: The Winter Soldier (Widescreen)
Iron Man 3 (DVD + Digital Copy) (Widescreen)
Thor: The Dark World (Widescreen)
Spider-Man (2-Disc) (Special Edition) (Widescreen)
Elektra / Fantastic Four / Daredevil (Director's Cut) / Fantastic Four 2: Rise Of The Silver Surfer
Spider-Man / Spider-Man 2 / Spider-Man 3 (Widescreen)
Spider-Man 2 (Widescreen)
The Punisher (Extended Cut) (Widescreen)
DC Showcase: Superman / Shazam!: The Return Of The Black Adam
Ultimate Avengers: The Movie (Widescreen)
The Next Avengers: Heroes Of Tomorrow (Widescreen)
Ultimate Avengers 1 & 2 (Blu-ray) (Widescreen)
我厌倦了 append 函数和 join 但它确实有效.. 我们在 pandas 数据帧中有任何特定的函数吗?
所需的输出应该是使用 Pandas 数据框的结果。
好吧,这会让你开始,这会将所有标题提取到字典中(为方便起见,我使用默认字典):
In [163]:
from collections import defaultdict
data=defaultdict(list)
for product_title in g_data:
a_product_title = product_title.find_all("a","js-product-title")
for text_title in a_product_title:
data['Product title'].append(text_title.text)
df = pd.DataFrame(data)
df
Out[163]:
Product title
0 Marvel Heroes: Collection
1 Marvel: Guardians Of The Galaxy (Widescreen)
2 Marvel Complete Giftset (Widescreen)
3 Marvel's The Avengers (Widescreen)
4 Marvel Knights: Wolverine Versus Sabretooth - ...
5 Superheroes Collection: The Incredible Hulk Re...
6 Marvel: Iron Man & Hulk - Heroes United (Wides...
7 Marvel's The Avengers (DVD + Blu-ray) (Widescr...
8 Captain America: The Winter Soldier (Widescreen)
9 Iron Man 3 (DVD + Digital Copy) (Widescreen)
10 Thor: The Dark World (Widescreen)
11 Spider-Man (2-Disc) (Special Edition) (Widescr...
12 Elektra / Fantastic Four / Daredevil (Director...
13 Spider-Man / Spider-Man 2 / Spider-Man 3 (Wide...
14 Spider-Man 2 (Widescreen)
15 The Punisher (Extended Cut) (Widescreen)
16 DC Showcase: Superman / Shazam!: The Return Of...
17 Ultimate Avengers: The Movie (Widescreen)
18 The Next Avengers: Heroes Of Tomorrow (Widescr...
19 Ultimate Avengers 1 & 2 (Blu-ray) (Widescreen)
因此您可以修改此脚本以将价格和演员作为键添加到数据字典中,然后从生成的字典中构造 df,这将比一次追加一行更好
我想使用 pandas 数据框、列名称 - 产品标题和填充 t 来表示数据。
例如:
产品标题
漫威:电影 Collection
奇迹
迪尼电影等等..
import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd
r= requests.get("http://www.walmart.com/search/?query=marvel&cat_id=4096_530598")
r.content
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", {"class" : "tile-conent"})
g_price = soup.find_all("div",{"class" : "item-price-container"})
g_star = soup.find_all("div",{"class" : "stars stars-small tile-row"})
for product_title in g_data:
a_product_title = product_title.find_all("a","js-product-title")
for text_product_title in a_product_title :
t = text_product_title.text
print t
期望的输出-
Product Title :
Marvel Heroes: Collection
Marvel: Guardians Of The Galaxy (Widescreen)
Marvel Complete Giftset (Widescreen)
Marvel's The Avengers (Widescreen)
Marvel Knights: Wolverine Versus Sabretooth - Reborn (Widescreen)
Superheroes Collection: The Incredible Hulk Returns / The Trial Of The Incredible Hulk / How To Draw Comics The Marvel Way (Widescreen)
Marvel: Iron Man & Hulk - Heroes United (Widescreen)
Marvel's The Avengers (DVD + Blu-ray) (Widescreen)
Captain America: The Winter Soldier (Widescreen)
Iron Man 3 (DVD + Digital Copy) (Widescreen)
Thor: The Dark World (Widescreen)
Spider-Man (2-Disc) (Special Edition) (Widescreen)
Elektra / Fantastic Four / Daredevil (Director's Cut) / Fantastic Four 2: Rise Of The Silver Surfer
Spider-Man / Spider-Man 2 / Spider-Man 3 (Widescreen)
Spider-Man 2 (Widescreen)
The Punisher (Extended Cut) (Widescreen)
DC Showcase: Superman / Shazam!: The Return Of The Black Adam
Ultimate Avengers: The Movie (Widescreen)
The Next Avengers: Heroes Of Tomorrow (Widescreen)
Ultimate Avengers 1 & 2 (Blu-ray) (Widescreen)
我厌倦了 append 函数和 join 但它确实有效.. 我们在 pandas 数据帧中有任何特定的函数吗?
所需的输出应该是使用 Pandas 数据框的结果。
好吧,这会让你开始,这会将所有标题提取到字典中(为方便起见,我使用默认字典):
In [163]:
from collections import defaultdict
data=defaultdict(list)
for product_title in g_data:
a_product_title = product_title.find_all("a","js-product-title")
for text_title in a_product_title:
data['Product title'].append(text_title.text)
df = pd.DataFrame(data)
df
Out[163]:
Product title
0 Marvel Heroes: Collection
1 Marvel: Guardians Of The Galaxy (Widescreen)
2 Marvel Complete Giftset (Widescreen)
3 Marvel's The Avengers (Widescreen)
4 Marvel Knights: Wolverine Versus Sabretooth - ...
5 Superheroes Collection: The Incredible Hulk Re...
6 Marvel: Iron Man & Hulk - Heroes United (Wides...
7 Marvel's The Avengers (DVD + Blu-ray) (Widescr...
8 Captain America: The Winter Soldier (Widescreen)
9 Iron Man 3 (DVD + Digital Copy) (Widescreen)
10 Thor: The Dark World (Widescreen)
11 Spider-Man (2-Disc) (Special Edition) (Widescr...
12 Elektra / Fantastic Four / Daredevil (Director...
13 Spider-Man / Spider-Man 2 / Spider-Man 3 (Wide...
14 Spider-Man 2 (Widescreen)
15 The Punisher (Extended Cut) (Widescreen)
16 DC Showcase: Superman / Shazam!: The Return Of...
17 Ultimate Avengers: The Movie (Widescreen)
18 The Next Avengers: Heroes Of Tomorrow (Widescr...
19 Ultimate Avengers 1 & 2 (Blu-ray) (Widescreen)
因此您可以修改此脚本以将价格和演员作为键添加到数据字典中,然后从生成的字典中构造 df,这将比一次追加一行更好