有没有办法从 Google Sheet 中的特定颜色编码行创建 DataFrame? (gspread 和 pandas)

Is there a way to create a DataFrame from specific colour coded rows that are in within a Google Sheet? (gspread and pandas)

我有一个 Google Sheet 有大量的颜色编码的行。我一直在寻找基于彩色​​行创建一个新的 DataFrame。是否可以根据颜色 select 行?或者即使有一种方法 select 没有颜色编码的行。

在这方面真的找不到任何东西,所以真的不确定这是否可行。

我不知道如何用 gspread 做到这一点(很可能是不可能的),但是用 google-api-python-client 很容易(它是 gspread 的依赖项)

您需要将 includeGridData 参数传递给 spreadsheets().get() 方法。这是 doc:

中的一个稍微修改过的示例
data = (
    service.spreadsheets()
    .get(
        spreadsheetId=spreadsheet_id, 
        ranges=ranges, 
        includeGridData=True  # important,
        fields=",".join([  # specify only required fields to reduce response size
            "sheets.data.rowData.values.formattedValue",
            "sheets.data.rowData.values.effectiveFormat.backgroundColor",
        ])
    )
    .execute()
)

# now you should parse returned JSON according your needs, e.g.:

def parse(data): # data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet
    white_color = {"red": 1, "green": 1, "blue": 1}

    for grid_data in data["sheets"][0]["data"]:
        # grid_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#GridData
        for row_data in grid_data["rowData"]:
            # row_data["values"] contains a list of cells (CellData), one per column
            # cell_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#CellData
            row = []
            for cell_data in row_data["values"]:
                value = cell_data["formattedValue"] # or cell_data["userEnteredValue"]
                color = cell_data["effectiveFormat"]["backgroundColor"]
                if color != white_color:
                    row.append(value)
                else:
                    row.append(None)
            yield row

pd.DataFrame(list(parse(data)))

更新:读取多列

我不得不稍微修改你的答案才能通过 NameError: name 'row' is not defined

data = (
    service.spreadsheets()
    .get(
        spreadsheetId=sheet_id,
        ranges=ranges,
        includeGridData=True,  # important,
        fields=",".join([  # specify only required fields to reduce response size
            "sheets.data.rowData.values.formattedValue",
            #"sheets.data.rowData.values.effectiveFormat.textFormat.strikethrough",
            "sheets.data.rowData.values.effectiveFormat.backgroundColor",
        ])
    )
    .execute()
)

def parse(data):
    white_color = {"red": 1, "green": 1, "blue": 1}
    for grid_data in data["sheets"][0]["data"]:
      for row_data in grid_data["rowData"]:
        cell_data = row_data["values"][0]
        value = cell_data["formattedValue"]
        color = cell_data["effectiveFormat"]["backgroundColor"]
        if color == white_color:
          yield value

print(pd.DataFrame({"column": list(parse(data))}))

这几乎是我需要的。在这个解决方案中只返回第一列,你如何迭代剩余的列?