使用 python 从 Tableau 图表中抓取数据

Question

我想从以下网站抓取所有权数据：

https://www.usnewsdeserts.com/states/california/#1536357227283-a4a9d6e4-ccf9

我使用的代码如下：

import requests
from bs4 import BeautifulSoup
import json
import re
import random
url = "https://public.tableau.com/vizql/w/TopOwnersCalifornia/v/Owners/bootstrapSession/sessions/5E565C4C5F7D462BBE8DFEE9246F846E-0:0"
header = random.choice(user_agent_list)
url = "https://public.tableau.com/vizql/w/TopOwnersCalifornia/v/Owners/bootstrapSession/sessions/5E565C4C5F7D462BBE8DFEE9246F846E-0:0"
header = random.choice(user_agent_list)
HEADERS = {"User-Agent": header}
params = {"stickySessionKey": {"dataserverPermissions":"44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a"}}
r = requests.post(url, params=params, headers = HEADERS)
soup = BeautifulSoup(r.text, "html.parser")      
print(soup)

我得到：

<br/>
2020-12-12 12:41:46.829
(X9S6ik90vQizHF9Qa-S@CwAAAUk,0:0)

我怎样才能得到这些数据？

Answer 1

我已经 tableau scraper library 从 Tableau 工作表中提取数据。您只需要在开发人员工具的网络选项卡中找到画面 URL，在这种情况下：

GET https://public.tableau.com/views/NewspapersByCountyCalifornia/Newspaperbycounty

您可以使用以下代码提取数据：

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/NewspapersByCountyCalifornia/Newspaperbycounty"

ts = TS()
ts.loads(url)
dashboard = ts.getDashboard()

for t in dashboard.worksheets:
    #show worksheet name
    print(f"WORKSHEET NAME : {t.name}")
    #show dataframe for this worksheet
    print(t.data)

run in this repl.it

使用 python 从 Tableau 图表中抓取数据

Scraping data from Tableau chart with python

python

beautifulsoup

web-scraping

tableau-api