使用 python 从 html 中提取文本
extract text from html using python
希望有人能帮助我。我是 python 的新手,但我想从一个网站上抓取数据,不幸的是,该网站需要一个帐户。虽然我无法提取日期(即 2017-06-01)。
<li class="latest-value-item">
<div class="latest-value-label">Date</div>
<div class="latest-value">2017-06-01</div>
</li>
<li class="latest-value-item">
<div class="latest-value-label">Index</div>
<div class="latest-value">1430</div>
</li>
这是我的代码:
import urllib3
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd
import requests
import csv
from datetime import datetime
url = 'https://www.quandl.com/data/LLOYDS/BCI-Baltic-Capesize-Index'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
Baltic_Indices = []
New_Value = []
#new = soup.find_all('div', attrs={'class':'latest-value'}).get_text()
date = soup.find_all(class_="latest value")
text1 = date.text
print(text1)
date = soup.find_all(class_="latest value")
您使用了错误的 CSS class 名称 ('latest value' != 'latest-value'
)
print(soup.find_all(attrs={'class': 'latest-value'}))
# [<div class="latest-value">2017-06-01</div>, <div class="latest-value">1430</div>]
for element in soup.find_all(attrs={'class': 'latest-value'}):
print(element.text)
# 2017-06-01
# 1430
我更喜欢使用 attrs
kwarg,但你的方法也很有效(给定正确的 CSS class 名称)
for element in soup.find_all(class_='latest-value'):
print(element.text)
# 2017-06-01
# 1430
希望有人能帮助我。我是 python 的新手,但我想从一个网站上抓取数据,不幸的是,该网站需要一个帐户。虽然我无法提取日期(即 2017-06-01)。
<li class="latest-value-item">
<div class="latest-value-label">Date</div>
<div class="latest-value">2017-06-01</div>
</li>
<li class="latest-value-item">
<div class="latest-value-label">Index</div>
<div class="latest-value">1430</div>
</li>
这是我的代码:
import urllib3
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd
import requests
import csv
from datetime import datetime
url = 'https://www.quandl.com/data/LLOYDS/BCI-Baltic-Capesize-Index'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
Baltic_Indices = []
New_Value = []
#new = soup.find_all('div', attrs={'class':'latest-value'}).get_text()
date = soup.find_all(class_="latest value")
text1 = date.text
print(text1)
date = soup.find_all(class_="latest value")
您使用了错误的 CSS class 名称 ('latest value' != 'latest-value'
)
print(soup.find_all(attrs={'class': 'latest-value'}))
# [<div class="latest-value">2017-06-01</div>, <div class="latest-value">1430</div>]
for element in soup.find_all(attrs={'class': 'latest-value'}):
print(element.text)
# 2017-06-01
# 1430
我更喜欢使用 attrs
kwarg,但你的方法也很有效(给定正确的 CSS class 名称)
for element in soup.find_all(class_='latest-value'):
print(element.text)
# 2017-06-01
# 1430