有没有办法从 Google Sheet 中的特定颜色编码行创建 DataFrame? (gspread 和 pandas)
Is there a way to create a DataFrame from specific colour coded rows that are in within a Google Sheet? (gspread and pandas)
我有一个 Google Sheet 有大量的颜色编码的行。我一直在寻找基于彩色行创建一个新的 DataFrame。是否可以根据颜色 select 行?或者即使有一种方法 select 没有颜色编码的行。
在这方面真的找不到任何东西,所以真的不确定这是否可行。
我不知道如何用 gspread 做到这一点(很可能是不可能的),但是用 google-api-python-client
很容易(它是 gspread 的依赖项)
您需要将 includeGridData
参数传递给 spreadsheets().get()
方法。这是 doc:
中的一个稍微修改过的示例
data = (
service.spreadsheets()
.get(
spreadsheetId=spreadsheet_id,
ranges=ranges,
includeGridData=True # important,
fields=",".join([ # specify only required fields to reduce response size
"sheets.data.rowData.values.formattedValue",
"sheets.data.rowData.values.effectiveFormat.backgroundColor",
])
)
.execute()
)
# now you should parse returned JSON according your needs, e.g.:
def parse(data): # data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet
white_color = {"red": 1, "green": 1, "blue": 1}
for grid_data in data["sheets"][0]["data"]:
# grid_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#GridData
for row_data in grid_data["rowData"]:
# row_data["values"] contains a list of cells (CellData), one per column
# cell_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#CellData
row = []
for cell_data in row_data["values"]:
value = cell_data["formattedValue"] # or cell_data["userEnteredValue"]
color = cell_data["effectiveFormat"]["backgroundColor"]
if color != white_color:
row.append(value)
else:
row.append(None)
yield row
pd.DataFrame(list(parse(data)))
更新:读取多列
我不得不稍微修改你的答案才能通过 NameError: name 'row' is not defined
data = (
service.spreadsheets()
.get(
spreadsheetId=sheet_id,
ranges=ranges,
includeGridData=True, # important,
fields=",".join([ # specify only required fields to reduce response size
"sheets.data.rowData.values.formattedValue",
#"sheets.data.rowData.values.effectiveFormat.textFormat.strikethrough",
"sheets.data.rowData.values.effectiveFormat.backgroundColor",
])
)
.execute()
)
def parse(data):
white_color = {"red": 1, "green": 1, "blue": 1}
for grid_data in data["sheets"][0]["data"]:
for row_data in grid_data["rowData"]:
cell_data = row_data["values"][0]
value = cell_data["formattedValue"]
color = cell_data["effectiveFormat"]["backgroundColor"]
if color == white_color:
yield value
print(pd.DataFrame({"column": list(parse(data))}))
这几乎是我需要的。在这个解决方案中只返回第一列,你如何迭代剩余的列?
我有一个 Google Sheet 有大量的颜色编码的行。我一直在寻找基于彩色行创建一个新的 DataFrame。是否可以根据颜色 select 行?或者即使有一种方法 select 没有颜色编码的行。
在这方面真的找不到任何东西,所以真的不确定这是否可行。
我不知道如何用 gspread 做到这一点(很可能是不可能的),但是用 google-api-python-client
很容易(它是 gspread 的依赖项)
您需要将 includeGridData
参数传递给 spreadsheets().get()
方法。这是 doc:
data = (
service.spreadsheets()
.get(
spreadsheetId=spreadsheet_id,
ranges=ranges,
includeGridData=True # important,
fields=",".join([ # specify only required fields to reduce response size
"sheets.data.rowData.values.formattedValue",
"sheets.data.rowData.values.effectiveFormat.backgroundColor",
])
)
.execute()
)
# now you should parse returned JSON according your needs, e.g.:
def parse(data): # data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet
white_color = {"red": 1, "green": 1, "blue": 1}
for grid_data in data["sheets"][0]["data"]:
# grid_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#GridData
for row_data in grid_data["rowData"]:
# row_data["values"] contains a list of cells (CellData), one per column
# cell_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#CellData
row = []
for cell_data in row_data["values"]:
value = cell_data["formattedValue"] # or cell_data["userEnteredValue"]
color = cell_data["effectiveFormat"]["backgroundColor"]
if color != white_color:
row.append(value)
else:
row.append(None)
yield row
pd.DataFrame(list(parse(data)))
更新:读取多列
我不得不稍微修改你的答案才能通过 NameError: name 'row' is not defined
data = (
service.spreadsheets()
.get(
spreadsheetId=sheet_id,
ranges=ranges,
includeGridData=True, # important,
fields=",".join([ # specify only required fields to reduce response size
"sheets.data.rowData.values.formattedValue",
#"sheets.data.rowData.values.effectiveFormat.textFormat.strikethrough",
"sheets.data.rowData.values.effectiveFormat.backgroundColor",
])
)
.execute()
)
def parse(data):
white_color = {"red": 1, "green": 1, "blue": 1}
for grid_data in data["sheets"][0]["data"]:
for row_data in grid_data["rowData"]:
cell_data = row_data["values"][0]
value = cell_data["formattedValue"]
color = cell_data["effectiveFormat"]["backgroundColor"]
if color == white_color:
yield value
print(pd.DataFrame({"column": list(parse(data))}))
这几乎是我需要的。在这个解决方案中只返回第一列,你如何迭代剩余的列?