还有其他方法可以连接到 Airflow 中的 Google 表格吗?

Are there any other ways to connect to Google Sheets in Airflow?

我正在尝试使用 Python 操作员连接到 Google Airflow 中的工作表,如下所示

import pandas as pd
import pygsheets
from google.oauth2 import service_account
from airflow.operators.python import PythonOperator

def estblsh_conn_to_gs():

    creds = service_account.Credentials.from_service_account_file(
        'service_account_json_file',
        scopes=('google_api_spreadsheets_auth_link', 'google_api_gdrive_auth_link'),
        subject='client_mail'
    )

    pg = pygsheets.authorize(custom_credentials=creds)
    return pg

def get_data_from_spreadsheet(spreadsheet_link, worksheet_title):

    pg = establish_conn_to_gs()
    doc = pg.open_by_url('spreadsheet_link')
    data = doc.worksheet_by_title('worksheet_name').get_all_values(include_tailing_empty_rows=False)
    return data

get_data_from_gs = PythonOperator(
    task_id = 'get_data_from_gs',
    python_callable = get_data_from_spreadsheet(link, title)
)

这很好用,但也许有任何替代方法可以做到这一点?我找到了 Google Sheets Operator 但当前的 tech doc 不好(

感谢帮助!

气流有GSheetsHook which Interact with Google Sheets via Google Cloud connection (If you don't have connection defined you can follow this doc)

要从 Google Sheet 获取数据,只需使用挂钩。无需自己实现它 - 如果功能不是您所需要的,那么您可以继承 hook 并对其进行增强。

要获取值,您可以使用:

get_values - 从单个范围 (API)

中获取 Google Sheet 的值

batch_get_values - 从范围列表 (API)

中获取 Google Sheet 的值

示例:

from airflow.providers.google.suite.hooks.sheets import GSheetsHook
from airflow.operators.python import PythonOperator

def get_data_from_spreadsheet():
    hook = GSheetsHook(
        gcp_conn_id="google_conn_id",
    )
    spreadsheet = hook.get_values(spreadsheet='name', range='my-range' )
   #spreadsheet is list of values from your spreadsheet.
   #add the rest of your code here.


get_data_from_gs = PythonOperator(
    task_id = 'get_data_from_gs',
    python_callable = get_data_from_spreadsheet(link, title)
)