如何从维基媒体 BS4 获取作者
How to get author from Wikimedia BS4
您好,我目前一直面临从维基媒体照片中获取作者的问题。
bs4 find 总是返回 None 并且我被卡住了。我想知道是否有人代码告诉我一些可能有效的代码。
Example wikimedia: https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg
My aim is to get the authors name and its corresponding link
当前代码
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "lxml")
#This return None though
table = soup.find("table", {'class': "fileinfotpl-type-information toccolours vevent mw-content-ltr"})
from bs4 import BeautifulSoup
import requests
res = requests.get("https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg")
soup = BeautifulSoup(res.text, "html.parser")
author_td = soup.find("table", class_="fileinfotpl-type-information toccolours vevent mw-content-ltr").find("tbody").find_all("tr")[-1]
print(author_td.find_all("td")[-1].get_text(strip=True))
输出:
Dirk Vorderstraße
import requests
from bs4 import BeautifulSoup
url = 'https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(soup.select_one('td:contains("Author")').find_next('td').get_text(strip=True))
打印:
Dirk Vorderstraße
您好,我目前一直面临从维基媒体照片中获取作者的问题。 bs4 find 总是返回 None 并且我被卡住了。我想知道是否有人代码告诉我一些可能有效的代码。
Example wikimedia: https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg
My aim is to get the authors name and its corresponding link
当前代码
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "lxml")
#This return None though
table = soup.find("table", {'class': "fileinfotpl-type-information toccolours vevent mw-content-ltr"})
from bs4 import BeautifulSoup
import requests
res = requests.get("https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg")
soup = BeautifulSoup(res.text, "html.parser")
author_td = soup.find("table", class_="fileinfotpl-type-information toccolours vevent mw-content-ltr").find("tbody").find_all("tr")[-1]
print(author_td.find_all("td")[-1].get_text(strip=True))
输出:
Dirk Vorderstraße
import requests
from bs4 import BeautifulSoup
url = 'https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(soup.select_one('td:contains("Author")').find_next('td').get_text(strip=True))
打印:
Dirk Vorderstraße