如何在 BigQuery 中正确读取和输出数值?

How to read and output numeric values properly in BigQuery?

我正在尝试从存储在 GCS 中的 CSV 文件中读取以下行

headers: "A","B","C","D"

行 1:“4000,0000000000000”,“15400000,000”,“12311918,400000”,“3088081,600” 行 2:“5000,0000000000000”,“19250000,000”,“15389898,000000”,“3860102,000”

这里的问题是 BigQuery 如何实际解释并输出这些数字:

Results query number 1

它将 A 解释为 FLOAT64,将 B、C 和 D 解释为 INT64,这没关系,因为我决定使用自动检测模式。但是当我尝试将它转换为不同的类型时,它仍然不正确地输出数字。

这是查询:

SELECT
  CAST(quantity AS INT64) AS A,
  CAST(expenses_2 AS FLOAT64) AS B,
  CAST(cexpenses_3AS FLOAT64) AS C,
  CAST(expenses_4 AS FLOAT64) AS D
FROM
  `wide-gecko-289100.bqtest.expenses`

以上是查询的结果:

Result query number 2

无论哪种方式,它都误解了如何读取数字,应该如下:

行1:[4000] [15400000] [12311918,4] [3088081,6]

第 2 行:[5000] [19250000] [15389898] [3860102]

有办法解决吗?

我认为这是 BigQuery 如何解释逗号的问题。它似乎将其检测为千位分隔符而不是小数点。

https://issuetracker.google.com/issues/129992574

是否可以用“.”代替相反?

这是因为 BigQuery 不理解您为数值使用的本地化格式。它需要句点 (.) 字符作为小数点分隔符。

如果您无法在 BigQuery 中生成 CSV 文件的过程中尽早处理此问题,另一种策略是改为对列使用字符串类型,然后进行一些操作。

这是一个简单的转换示例,显示了一些字符串操作和转换以获得所需的类型。如果您同时使用逗号和句点作为本地化格式的一部分,您将需要更复杂的字符串操作。

WITH
sample_row AS (
  SELECT "4000,0000000000000" as A, "15400000,000" as B,"12311918,400000" as C,"3088081,600" as D
)

SELECT
  A,
  CAST(REPLACE(A,",",".") AS FLOAT64) as A_as_float64,
  CAST(CAST(REPLACE(A,",",".") AS FLOAT64) AS INT64) as A_as_int64
FROM
  sample_row

您也可以将其概括为用户定义的函数(临时或持久的)以使其更易于重用:

CREATE TEMPORARY FUNCTION parseAsFloat(instr STRING) AS (CAST(REPLACE(instr,",",".") AS FLOAT64));

WITH
sample_row AS (
  SELECT "4000,0000000000000" as A, "15400000,000" as B,"12311918,400000" as C,"3088081,600" as D
)

SELECT
  CAST(parseAsFloat(A) AS INT64) as A,
  parseAsFloat(B) as B,
  parseAsFloat(C) as C,
  parseAsFloat(D) as D,
FROM
  sample_row