如何从 Google 电子表格中的单元格读取 link 如果它位于 href 标记内 (gspread)

How to read a link from a cell in Google Spreadsheet if it's inside href tag (gspread)

我是Whosebug的新手,所以如果我做错了提前道歉

我在 Google 张纸上有一个电子表格,例如 this one

并且href标签内的单元格中有一个link。我想使用 Google Sheets API 或 gspread.link 和单元格的文本。

我已经尝试过 this solution 但我得到了访问令牌 'None'。

我曾尝试使用 beautifulsoup 进行网页抓取,但效果不佳。

至于 bs4 解决方案,我尝试使用这段代码,我发现 here

from bs4 import BeautifulSoup
import requests

html = requests.get('https://docs.google.com/spreadsheets/d/1v8vM7yQ-27SFemt8_3IRiZr-ZauE29edin-azKpigws/edit#gid=0').text
soup = BeautifulSoup(html, "lxml")
tables = soup.find_all("table")

content = []

for table in tables:
    content.append([[td.text for td in row.find_all("td")] for row in table.find_all("tr")])

print(content)

我明白了。如果有人需要,这是完整的代码

import requests
import gspread
import urllib.parse
import pickle



spreadsheetId = "###"  # Please set the Spreadsheet ID.
cellRange = "Yoursheetname!A1:A100"  # Please set the range with A1Notation. In this case, the hyperlink of the cell "A1" of "Sheet1" is retrieved.


with open('token_sheets_v4.pickle', 'rb') as token:
    # get this file here
    # https://developers.google.com/identity/sign-in/web/sign-in
    credentials = pickle.load(token)

client = gspread.authorize(credentials)

# 1. Retrieve the access token.
access_token = client.auth.token

# 2. Request to the method of spreadsheets.get in Sheets API using `requests` module.
fields = "sheets(data(rowData(values(hyperlink))))"
url = "https://sheets.googleapis.com/v4/spreadsheets/" + spreadsheetId + "?ranges=" + urllib.parse.quote(cellRange) + "&fields=" + urllib.parse.quote(fields)
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res)

# 3. Retrieve the hyperlink.
obj = res.json()
print(obj)
link = obj["sheets"][0]['data'][0]['rowData'][0]['values'][0]['hyperlink']
print(link)

更新!!

更优雅的解决方案是这个。创建服务:

CLIENT_SECRET_FILE = 'secret/secret.json'
API_SERVICE_NAME = 'sheets'
API_VERSION = 'v4'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']


def Create_Service():
    cred = None

    pickle_file = f'secret/token_{API_SERVICE_NAME}_{API_VERSION}.pickle'
if os.path.exists(pickle_file):
    with open(pickle_file, 'rb') as token:
        cred = pickle.load(token)

if not cred or not cred.valid:
    if cred and cred.expired and cred.refresh_token:
        cred.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
        cred = flow.run_local_server()

    with open(pickle_file, 'wb') as token:
        pickle.dump(cred, token)

try:
    service = build(API_SERVICE_NAME, API_VERSION, credentials=cred)
    print(API_SERVICE_NAME, 'service created successfully')
    return service
except Exception as e:
    print('Unable to connect.')
    print(e)
    return None

service = Create_Service()

并以方便词典的形式从传播sheet中的每个sheet中提取链接

    fields = "sheets(properties(title),data(startColumn,rowData(values(hyperlink))))"
    
    print(service.spreadsheets().get(spreadsheetId=self.__spreadsheet_id,
                                     fields=fields).execute())

那么,字段是如何工作的。我们去 Spreadsheet object description 寻找 JSON 代表。如果我们想要 return,例如 sheet 来自那个 json 表示的对象,我们只需要使用这个 fields = "sheets",因为 Spreadsheet 的字段“sheets”是它的 json 表示。

好的,很酷。我们得到了 sheets 对象。如何访问 sheet 对象字段?只需单击那个东西并查找它的字段。

那么,如何合并字段呢?这简单。例如,我想从 sheets 对象 return 字段“属性”和“数据”,我这样写字段字符串: fields = "sheets (属性,数据)”。所以我们只是将它们列为普通函数中的参数但没有 space.

这同样适用于 return 数据字段等的对象。