使用 python 在 kaggle 中抓取数据集的名称
scraping name of dataset in kaggle using python
您好,
请问我怎样才能在 kaggle 中获取数据集的名称,使用 beatiful soup 或 selenium 或 scrapy。
我测试了这段代码,但没有 return :
from bs4 import BeautifulSoup
import requests
url = 'https://www.kaggle.com/heptapod/titanic'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
datasetName = soup.find('h5',{'class':'sc-dIvrsQ sc-hHEiqL sc-kaPsuu kSVYRu ccTnQh ffXPrd'})
print(datasetName)
看图:
inspect element from kaggle
使用硒
from selenium.webdriver.chrome.options import Options
opt = Options()
opt.add_argument('--headless')
driver = webdriver.Chrome(executable_path = 'yourdriverpath', options=opt)
driver.get("https://www.kaggle.com/heptapod/titanic")
time.sleep(5)
datasetname = driver.find_element(By.XPATH, "//div[@role='button']//div//div").text
print(datasetname)
输出:
train_and_test2.csv
Process finished with exit code 0
dataset snapshot
您好, 请问我怎样才能在 kaggle 中获取数据集的名称,使用 beatiful soup 或 selenium 或 scrapy。 我测试了这段代码,但没有 return :
from bs4 import BeautifulSoup
import requests
url = 'https://www.kaggle.com/heptapod/titanic'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
datasetName = soup.find('h5',{'class':'sc-dIvrsQ sc-hHEiqL sc-kaPsuu kSVYRu ccTnQh ffXPrd'})
print(datasetName)
看图: inspect element from kaggle
使用硒
from selenium.webdriver.chrome.options import Options
opt = Options()
opt.add_argument('--headless')
driver = webdriver.Chrome(executable_path = 'yourdriverpath', options=opt)
driver.get("https://www.kaggle.com/heptapod/titanic")
time.sleep(5)
datasetname = driver.find_element(By.XPATH, "//div[@role='button']//div//div").text
print(datasetname)
输出:
train_and_test2.csv
Process finished with exit code 0
dataset snapshot