如何从 Google 电子表格中的单元格读取 link 如果它位于 href 标记内 (gspread)
How to read a link from a cell in Google Spreadsheet if it's inside href tag (gspread)
我是Whosebug的新手,所以如果我做错了提前道歉
我在 Google 张纸上有一个电子表格,例如 this one
并且href标签内的单元格中有一个link。我想使用 Google Sheets API 或 gspread.link 和单元格的文本。
我已经尝试过 this solution 但我得到了访问令牌 'None'。
我曾尝试使用 beautifulsoup 进行网页抓取,但效果不佳。
至于 bs4 解决方案,我尝试使用这段代码,我发现 here
from bs4 import BeautifulSoup
import requests
html = requests.get('https://docs.google.com/spreadsheets/d/1v8vM7yQ-27SFemt8_3IRiZr-ZauE29edin-azKpigws/edit#gid=0').text
soup = BeautifulSoup(html, "lxml")
tables = soup.find_all("table")
content = []
for table in tables:
content.append([[td.text for td in row.find_all("td")] for row in table.find_all("tr")])
print(content)
我明白了。如果有人需要,这是完整的代码
import requests
import gspread
import urllib.parse
import pickle
spreadsheetId = "###" # Please set the Spreadsheet ID.
cellRange = "Yoursheetname!A1:A100" # Please set the range with A1Notation. In this case, the hyperlink of the cell "A1" of "Sheet1" is retrieved.
with open('token_sheets_v4.pickle', 'rb') as token:
# get this file here
# https://developers.google.com/identity/sign-in/web/sign-in
credentials = pickle.load(token)
client = gspread.authorize(credentials)
# 1. Retrieve the access token.
access_token = client.auth.token
# 2. Request to the method of spreadsheets.get in Sheets API using `requests` module.
fields = "sheets(data(rowData(values(hyperlink))))"
url = "https://sheets.googleapis.com/v4/spreadsheets/" + spreadsheetId + "?ranges=" + urllib.parse.quote(cellRange) + "&fields=" + urllib.parse.quote(fields)
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res)
# 3. Retrieve the hyperlink.
obj = res.json()
print(obj)
link = obj["sheets"][0]['data'][0]['rowData'][0]['values'][0]['hyperlink']
print(link)
更新!!
更优雅的解决方案是这个。创建服务:
CLIENT_SECRET_FILE = 'secret/secret.json'
API_SERVICE_NAME = 'sheets'
API_VERSION = 'v4'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']
def Create_Service():
cred = None
pickle_file = f'secret/token_{API_SERVICE_NAME}_{API_VERSION}.pickle'
if os.path.exists(pickle_file):
with open(pickle_file, 'rb') as token:
cred = pickle.load(token)
if not cred or not cred.valid:
if cred and cred.expired and cred.refresh_token:
cred.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
cred = flow.run_local_server()
with open(pickle_file, 'wb') as token:
pickle.dump(cred, token)
try:
service = build(API_SERVICE_NAME, API_VERSION, credentials=cred)
print(API_SERVICE_NAME, 'service created successfully')
return service
except Exception as e:
print('Unable to connect.')
print(e)
return None
service = Create_Service()
并以方便词典的形式从传播sheet中的每个sheet中提取链接
fields = "sheets(properties(title),data(startColumn,rowData(values(hyperlink))))"
print(service.spreadsheets().get(spreadsheetId=self.__spreadsheet_id,
fields=fields).execute())
那么,字段是如何工作的。我们去 Spreadsheet object description 寻找 JSON 代表。如果我们想要 return,例如 sheet 来自那个 json 表示的对象,我们只需要使用这个 fields = "sheets",因为 Spreadsheet 的字段“sheets”是它的 json 表示。
好的,很酷。我们得到了 sheets 对象。如何访问 sheet 对象字段?只需单击那个东西并查找它的字段。
那么,如何合并字段呢?这简单。例如,我想从 sheets 对象 return 字段“属性”和“数据”,我这样写字段字符串: fields = "sheets (属性,数据)”。所以我们只是将它们列为普通函数中的参数但没有 space.
这同样适用于 return 数据字段等的对象。
我是Whosebug的新手,所以如果我做错了提前道歉
我在 Google 张纸上有一个电子表格,例如 this one
并且href标签内的单元格中有一个link。我想使用 Google Sheets API 或 gspread.link 和单元格的文本。
我已经尝试过 this solution 但我得到了访问令牌 'None'。
我曾尝试使用 beautifulsoup 进行网页抓取,但效果不佳。
至于 bs4 解决方案,我尝试使用这段代码,我发现 here
from bs4 import BeautifulSoup
import requests
html = requests.get('https://docs.google.com/spreadsheets/d/1v8vM7yQ-27SFemt8_3IRiZr-ZauE29edin-azKpigws/edit#gid=0').text
soup = BeautifulSoup(html, "lxml")
tables = soup.find_all("table")
content = []
for table in tables:
content.append([[td.text for td in row.find_all("td")] for row in table.find_all("tr")])
print(content)
我明白了。如果有人需要,这是完整的代码
import requests
import gspread
import urllib.parse
import pickle
spreadsheetId = "###" # Please set the Spreadsheet ID.
cellRange = "Yoursheetname!A1:A100" # Please set the range with A1Notation. In this case, the hyperlink of the cell "A1" of "Sheet1" is retrieved.
with open('token_sheets_v4.pickle', 'rb') as token:
# get this file here
# https://developers.google.com/identity/sign-in/web/sign-in
credentials = pickle.load(token)
client = gspread.authorize(credentials)
# 1. Retrieve the access token.
access_token = client.auth.token
# 2. Request to the method of spreadsheets.get in Sheets API using `requests` module.
fields = "sheets(data(rowData(values(hyperlink))))"
url = "https://sheets.googleapis.com/v4/spreadsheets/" + spreadsheetId + "?ranges=" + urllib.parse.quote(cellRange) + "&fields=" + urllib.parse.quote(fields)
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res)
# 3. Retrieve the hyperlink.
obj = res.json()
print(obj)
link = obj["sheets"][0]['data'][0]['rowData'][0]['values'][0]['hyperlink']
print(link)
更新!!
更优雅的解决方案是这个。创建服务:
CLIENT_SECRET_FILE = 'secret/secret.json'
API_SERVICE_NAME = 'sheets'
API_VERSION = 'v4'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']
def Create_Service():
cred = None
pickle_file = f'secret/token_{API_SERVICE_NAME}_{API_VERSION}.pickle'
if os.path.exists(pickle_file):
with open(pickle_file, 'rb') as token:
cred = pickle.load(token)
if not cred or not cred.valid:
if cred and cred.expired and cred.refresh_token:
cred.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
cred = flow.run_local_server()
with open(pickle_file, 'wb') as token:
pickle.dump(cred, token)
try:
service = build(API_SERVICE_NAME, API_VERSION, credentials=cred)
print(API_SERVICE_NAME, 'service created successfully')
return service
except Exception as e:
print('Unable to connect.')
print(e)
return None
service = Create_Service()
并以方便词典的形式从传播sheet中的每个sheet中提取链接
fields = "sheets(properties(title),data(startColumn,rowData(values(hyperlink))))"
print(service.spreadsheets().get(spreadsheetId=self.__spreadsheet_id,
fields=fields).execute())
那么,字段是如何工作的。我们去 Spreadsheet object description 寻找 JSON 代表。如果我们想要 return,例如 sheet 来自那个 json 表示的对象,我们只需要使用这个 fields = "sheets",因为 Spreadsheet 的字段“sheets”是它的 json 表示。
好的,很酷。我们得到了 sheets 对象。如何访问 sheet 对象字段?只需单击那个东西并查找它的字段。
那么,如何合并字段呢?这简单。例如,我想从 sheets 对象 return 字段“属性”和“数据”,我这样写字段字符串: fields = "sheets (属性,数据)”。所以我们只是将它们列为普通函数中的参数但没有 space.
这同样适用于 return 数据字段等的对象。