使用 python 在 kaggle 中抓取数据集的名称

scraping name of dataset in kaggle using python

您好, 请问我怎样才能在 kaggle 中获取数据集的名称,使用 beatiful soup 或 selenium 或 scrapy。 我测试了这段代码,但没有 return :

from bs4 import BeautifulSoup
import requests

url = 'https://www.kaggle.com/heptapod/titanic'
res = requests.get(url)
html_page = res.content

soup = BeautifulSoup(html_page, 'html.parser')
datasetName = soup.find('h5',{'class':'sc-dIvrsQ sc-hHEiqL sc-kaPsuu kSVYRu ccTnQh ffXPrd'})

print(datasetName)

看图: inspect element from kaggle

使用硒

from selenium.webdriver.chrome.options import Options
opt = Options()
opt.add_argument('--headless')
driver = webdriver.Chrome(executable_path = 'yourdriverpath', options=opt)

driver.get("https://www.kaggle.com/heptapod/titanic")
time.sleep(5)
datasetname = driver.find_element(By.XPATH, "//div[@role='button']//div//div").text
print(datasetname)

输出:

train_and_test2.csv

Process finished with exit code 0

dataset snapshot