如何将 pandas float64 类型转换为 NUMERIC Bigquery 类型?
How to convert pandas float64 type to NUMERIC Bigquery type?
我有一个熊猫数据框 df:
<bound method NDFrame.head of DAT_RUN DAT_FORECAST LIB_SOURCE MES_LONGITUDE MES_LATITUDE MES_TEMPERATURE MES_HUMIDITE MES_PLUIE MES_VITESSE_VENT MES_U_WIND MES_V_WIND
0 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 3.75 11.994824 72.0 0.0 2.653137 -2.402910 -1.124792
1 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 4.00 13.094824 74.3 0.0 2.976434 -2.972910 -0.144792
2 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 4.25 12.594824 75.3 0.0 3.128418 -2.702910 1.575208
3 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 4.50 12.094824 75.5 0.0 3.183418 -2.342910 2.155208
我将 DAT_RUN 和 DAT_FORECAST 列转换为日期时间格式:
df["DAT_RUN"] = pd.to_datetime(df['DAT_RUN'], format="%Y-%m-%dT%H:%M:%SZ") # previously "%Y-%m-%d %H:%M:%S"
df["DAT_FORECAST"] = pd.to_datetime(df['DAT_FORECAST'], format="%Y-%m-%dT%H:%M:%SZ")
df.dtypes:
DAT_RUN datetime64[ns]
DAT_FORECAST datetime64[ns]
LIB_SOURCE object
MES_LONGITUDE float64
MES_LATITUDE float64
MES_TEMPERATURE float64
MES_HUMIDITE float64
MES_PLUIE float64
MES_VITESSE_VENT float64
MES_U_WIND float64
MES_V_WIND float64
我使用 bigquery.Client().load_table_from_dataframe() 函数将数据插入 Bigquery table 其中数字列具有 NUMERIC bigquery table.
它returns这个错误:
pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16)
我尝试用 :
修复它
df["MES_LONGITUDE"] = df["MES_LONGITUDE"].astype(str).map(decimal.Decimal)
但仅此而已。
谢谢。
我设法通过 decimal.Context 解决了这个问题,希望对您有所帮助:
import decimal
import numpy as np
import pandas as pd
from google.cloud import bigquery
df = pd.DataFrame(
data={
"MES_HUMIDITE": np.array([2.653137, 2.976434, 3.128418, 3.183418]),
"MES_PLUIE": np.array([-2.402910, -2.972910, -2.702910, -2.342910]),
},
dtype="float",
)
我们检查数据类型声明:
df.dtypes
# MES_HUMIDITE float64
# MES_PLUIE float64
# dtype: object
初始化Context
为7位,因为它是那些列中的精度,如果你需要每列不同的精度值,你可以创建多个Context
:
context = decimal.Context(prec=7)
df["MES_HUMIDITE"] = df["MES_HUMIDITE"].apply(context.create_decimal_from_float)
df["MES_PLUIE"] = df["MES_PLUIE"].apply(context.create_decimal_from_float)
现在,每一项都是 Decimal 对象:
df["MES_HUMIDITE"][0]
# Decimal('2.653137')
类型已更改,Pandas 将小数存储为对象,我猜这不是本机数据格式:
df.dtypes
# MES_HUMIDITE object
# MES_PLUIE object
# dtype: object
table_id = "test_dataset.test"
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("MES_HUMIDITE", "NUMERIC"),
bigquery.SchemaField("MES_PLUIE", "NUMERIC"),
],
write_disposition="WRITE_TRUNCATE",
)
client = bigquery.Client.from_service_account_json("/path_to_key.json")
job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
job.result()
然而,decimal types are generally recommended for financial calculations and, although I do not know your exact case and usage, you are probably safe using FLOAT64
, at least for latitude and longitude.
我有一个熊猫数据框 df:
<bound method NDFrame.head of DAT_RUN DAT_FORECAST LIB_SOURCE MES_LONGITUDE MES_LATITUDE MES_TEMPERATURE MES_HUMIDITE MES_PLUIE MES_VITESSE_VENT MES_U_WIND MES_V_WIND
0 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 3.75 11.994824 72.0 0.0 2.653137 -2.402910 -1.124792
1 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 4.00 13.094824 74.3 0.0 2.976434 -2.972910 -0.144792
2 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 4.25 12.594824 75.3 0.0 3.128418 -2.702910 1.575208
3 2022-03-29T00:00:00Z 2022-03-29T01:00:00Z gfs_025 43.50 4.50 12.094824 75.5 0.0 3.183418 -2.342910 2.155208
我将 DAT_RUN 和 DAT_FORECAST 列转换为日期时间格式:
df["DAT_RUN"] = pd.to_datetime(df['DAT_RUN'], format="%Y-%m-%dT%H:%M:%SZ") # previously "%Y-%m-%d %H:%M:%S"
df["DAT_FORECAST"] = pd.to_datetime(df['DAT_FORECAST'], format="%Y-%m-%dT%H:%M:%SZ")
df.dtypes:
DAT_RUN datetime64[ns]
DAT_FORECAST datetime64[ns]
LIB_SOURCE object
MES_LONGITUDE float64
MES_LATITUDE float64
MES_TEMPERATURE float64
MES_HUMIDITE float64
MES_PLUIE float64
MES_VITESSE_VENT float64
MES_U_WIND float64
MES_V_WIND float64
我使用 bigquery.Client().load_table_from_dataframe() 函数将数据插入 Bigquery table 其中数字列具有 NUMERIC bigquery table.
它returns这个错误:
pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16)
我尝试用 :
修复它df["MES_LONGITUDE"] = df["MES_LONGITUDE"].astype(str).map(decimal.Decimal)
但仅此而已。 谢谢。
我设法通过 decimal.Context 解决了这个问题,希望对您有所帮助:
import decimal
import numpy as np
import pandas as pd
from google.cloud import bigquery
df = pd.DataFrame(
data={
"MES_HUMIDITE": np.array([2.653137, 2.976434, 3.128418, 3.183418]),
"MES_PLUIE": np.array([-2.402910, -2.972910, -2.702910, -2.342910]),
},
dtype="float",
)
我们检查数据类型声明:
df.dtypes
# MES_HUMIDITE float64
# MES_PLUIE float64
# dtype: object
初始化Context
为7位,因为它是那些列中的精度,如果你需要每列不同的精度值,你可以创建多个Context
:
context = decimal.Context(prec=7)
df["MES_HUMIDITE"] = df["MES_HUMIDITE"].apply(context.create_decimal_from_float)
df["MES_PLUIE"] = df["MES_PLUIE"].apply(context.create_decimal_from_float)
现在,每一项都是 Decimal 对象:
df["MES_HUMIDITE"][0]
# Decimal('2.653137')
类型已更改,Pandas 将小数存储为对象,我猜这不是本机数据格式:
df.dtypes
# MES_HUMIDITE object
# MES_PLUIE object
# dtype: object
table_id = "test_dataset.test"
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("MES_HUMIDITE", "NUMERIC"),
bigquery.SchemaField("MES_PLUIE", "NUMERIC"),
],
write_disposition="WRITE_TRUNCATE",
)
client = bigquery.Client.from_service_account_json("/path_to_key.json")
job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
job.result()
然而,decimal types are generally recommended for financial calculations and, although I do not know your exact case and usage, you are probably safe using FLOAT64
, at least for latitude and longitude.