Bigquery:在没有实际 运行 的情况下获取查询架构

Bigquery: Get schema for query without actually running it

有没有办法在没有实际 运行 的情况下获取 bigquery 查询的架构? (我试过 DryRun,但它 returns 只有统计数据,没有实际的架构)

假设您可以使用 API 获取架构,您将需要调用 Tables: get 方法来获取 table 的架构。

对于 publicdata 项目中样本数据集的出生率 table,请求为

   GET https://www.googleapis.com/bigquery/v2/projects/publicdata/datasets/samples/tables/natality?key={YOUR_API_KEY}

相关的响应将是

{ 
 "kind": "bigquery#table",
 "etag": "\"nwg3tKAm7RiC5vqWthFIuCNSGxs/MTQ0MDYyNTMzMDYwNA\"",
 "id": "publicdata:samples.natality",
 "selfLink": "https://www.googleapis.com/bigquery/v2/projects/publicdata/datasets/samples/tables/natality",
 "tableReference": {
  "projectId": "publicdata",
  "datasetId": "samples",
  "tableId": "natality"
 },
 "description": "This table describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008. The Centers for Disease Control (CDC) and Prevention's National Center for Health Statistics (NCHS) receives this data as electronic files, prepared from individual records processed by each registration area, through the Vital Statistics Cooperative Program. \n\nYou can access the CDC's data at: http://www.cdc.gov/nchs/data_access/Vitalstatsonline.htm",
 "schema": {
  "fields": [
   {
    "name": "source_year",
    "type": "INTEGER",
    "mode": "REQUIRED",
    "description": "Four-digit year of the birth. Example: 1975."
   },
   {
    "name": "year",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Four-digit year of the birth. Example: 1975."
   },
   {
    "name": "month",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Month index of the date of birth, where 1=January."
   },
   {
    "name": "day",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Day of birth, starting from 1."
   },
   {
    "name": "wday",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Day of the week, where 1 is Sunday and 7 is Saturday."
   },
   {
    "name": "state",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "The two character postal code for the state. Entries after 2004 do not include this value."
   },
   {
    "name": "is_male",
    "type": "BOOLEAN",
    "mode": "REQUIRED",
    "description": "TRUE if the child is male, FALSE if female."
   },
   {
    "name": "child_race",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "The race of the child. One of the following numbers:\n\n1 - White\n2 - Black\n3 - American Indian\n4 - Chinese\n5 - Japanese\n6 - Hawaiian\n7 - Filipino\n9 - Unknown/Other\n18 - Asian Indian\n28 - Korean\n39 - Samoan\n48 - Vietnamese"
   },
   {
    "name": "weight_pounds",
    "type": "FLOAT",
    "mode": "NULLABLE",
    "description": "Weight of the child, in pounds."
   },
   {
    "name": "plurality",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "How many children were born as a result of this pregnancy. twins=2, triplets=3, and so on."
   },
   {
    "name": "apgar_1min",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Apgar scores measure the health of a newborn child on a scale from 0-10. Value after 1 minute. Available from 1978-2002."
   },
   {
    "name": "apgar_5min",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Apgar scores measure the health of a newborn child on a scale from 0-10. Value after 5 minutes. Available from 1978-2002."
   },
   {
    "name": "mother_residence_state",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "The two-letter postal code of the mother's state of residence when the child was born."
   },
   {
    "name": "mother_race",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Race of the mother. Same values as child_race."
   },
   {
    "name": "mother_age",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Reported age of the mother when giving birth."
   },
   {
    "name": "gestation_weeks",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "The number of weeks of the pregnancy."
   },
   {
    "name": "lmp",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "Date of the last menstrual period in the format MMDDYYYY. Unknown values are recorded as \"99\" or \"9999\"."
   },
   {
    "name": "mother_married",
    "type": "BOOLEAN",
    "mode": "NULLABLE",
    "description": "True if the mother was married when she gave birth."
   },
   {
    "name": "mother_birth_state",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "The two-letter postal code of the mother's birth state."
   },
   {
    "name": "cigarette_use",
    "type": "BOOLEAN",
    "mode": "NULLABLE",
    "description": "True if the mother smoked cigarettes. Available starting 2003."
   },
   {
    "name": "cigarettes_per_day",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of cigarettes smoked by the mother per day. Available starting 2003."
   },
   {
    "name": "alcohol_use",
    "type": "BOOLEAN",
    "mode": "NULLABLE",
    "description": "True if the mother used alcohol. Available starting 1989."
   },
   {
    "name": "drinks_per_week",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of drinks per week consumed by the mother. Available starting 1989."
   },
   {
    "name": "weight_gain_pounds",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of pounds gained by the mother during pregnancy."
   },
   {
    "name": "born_alive_alive",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of children previously born to the mother who are now living."
   },
   {
    "name": "born_alive_dead",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of children previously born to the mother who are now dead."
   },
   {
    "name": "born_dead",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Number of children who were born dead (i.e. miscarriages)"
   },
   {
    "name": "ever_born",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Total number of children to whom the woman has ever given birth (includes the current birth)."
   },
   {
    "name": "father_race",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Race of the father. Same values as child_race."
   },
   {
    "name": "father_age",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "Age of the father when the child was born."
   },
   {
    "name": "record_weight",
    "type": "INTEGER",
    "mode": "NULLABLE",
    "description": "1 or 2, where 1 is a row from a full-reporting area, and 2 is a row from a 50% sample area."
   }
  ]
 },
 "numBytes": "23562717384",
 "numRows": "137826763",
 "creationTime": "1335916045005",
 "lastModifiedTime": "1440625330604",
 "type": "TABLE",
 "location": "US"
}

如果命令行更方便,您可以运行 bq 命令加上以下参数来获取table的架构:

bq show publicdata:samples.natality

输出如下所示:

Table publicdata:samples.natality

   Last modified                  Schema                 Total Rows   Total Bytes   Expiration
 ----------------- ------------------------------------ ------------ ------------- ------------
  27 Aug 00:42:10   |- source_year: integer (required)   137826763    23562717384
                    |- year: integer
                    |- month: integer
                    |- day: integer
                    |- wday: integer
                    |- state: string
                    |- is_male: boolean (required)
                    |- child_race: integer
                    |- weight_pounds: float
                    |- plurality: integer
                    |- apgar_1min: integer
                    |- apgar_5min: integer
                    |- mother_residence_state: string
                    |- mother_race: integer
                    |- mother_age: integer
                    |- gestation_weeks: integer
                    |- lmp: string
                    |- mother_married: boolean
                    |- mother_birth_state: string
                    |- cigarette_use: boolean
                    |- cigarettes_per_day: integer
                    |- alcohol_use: boolean
                    |- drinks_per_week: integer
                    |- weight_gain_pounds: integer
                    |- born_alive_alive: integer
                    |- born_alive_dead: integer
                    |- born_dead: integer
                    |- ever_born: integer
                    |- father_race: integer
                    |- father_age: integer
                    |- record_weight: integer

没有 运行 查询就没有获得模式的好方法。然而,,这是一种笨拙的方法。

您可以使用要签出的查询创建视图。然后,该视图将具有一个由 运行 该查询产生的架构。然后,您可以在完成后删除该视图。

在正常运行和模拟运行中,模式都包含在查询响应主体中,当它们成功时 [ref],这可能是视图在没有 运行 查询的情况下获取其模式的方式。

但是,如果您想使用 bigquery's python library, you have to access the QueryJob class“内部”属性和方法检索它,因为没有提供“public”...

from google.cloud import bigquery
# bigquery.__version__ == '1.9.0'

client = bigquery.Client()
job_config = bigquery.QueryJobConfig(dry_run=True)

query_job = client.query(
    query="SELECT * FROM `bigquery-public-data.usa_names.usa_1910_2013`",
    job_config=job_config,
)

# Solution 1
schema = query_job._properties['statistics']['query']['schema']

# Solution 2 
job_stats = query_job._job_statistics()
schema = job_stats['schema']

我花了一段时间才弄明白这一点。希望这对您有所帮助!