Azure Synapse 中第 1 行第 4 列的批量加载数据转换错误(类型不匹配或指定代码页的字符无效)

Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 4 in Azure Synapse

我的 Azure Data Lake 中有一个 Spotify CSV 文件。我正在尝试在 Azure Synapse 中创建外部 table 你 SQL 无服务器池。

我收到以下错误消息

Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 4 (Track_popularity) in data file https://test.dfs.core.windows.net/data/folder/updated.csv.

我正在使用下面的脚本


IF NOT EXISTS (SELECT * FROM sys.external_file_formats WHERE name = 'SynapseDelimitedTextFormat') 
    CREATE EXTERNAL FILE FORMAT [SynapseDelimitedTextFormat] 
    WITH ( FORMAT_TYPE = DELIMITEDTEXT ,
           FORMAT_OPTIONS (
             FIELD_TERMINATOR = ',',
             USE_TYPE_DEFAULT = FALSE
            ))
GO

IF NOT EXISTS (SELECT * FROM sys.external_data_sources WHERE name = 'test.dfs.core.windows.net') 
    CREATE EXTERNAL DATA SOURCE [test.dfs.core.windows.net] 
    WITH (
        LOCATION = 'abfss://data@test.dfs.core.windows.net' 
    )
GO

CREATE EXTERNAL TABLE updated (
    [Artist] nvarchar(4000),
    [Track] nvarchar(4000),
    [Track_id] nvarchar(4000),
    [Track_popularity] bigint,
    [Artist_id] nvarchar(4000),
    [Artist_Popularity] bigint,
    [Genres] nvarchar(4000),
    [Followers] bigint,
    [danceability] float,
    [energy] float,
    [key] bigint,
    [loudness] float,
    [mode] bigint,
    [speechiness] float,
    [acousticness] float,
    [instrumentalness] float,
    [liveness] float,
    [valence] float,
    [tempo] float,
    [duration_ms] bigint,
    [time_signature] bigint
    )
    WITH (
    LOCATION = 'data/updated.csv',
    DATA_SOURCE = [data_test_dfs_core_windows_net],
    FILE_FORMAT = [SynapseDelimitedTextFormat]
    )
GO


SELECT TOP 100 * FROM dbo.updated
GO

以下是数据样本

我的 CSV 是 utf-8 编码。不确定是什么问题。错误显示列 (Track_popularity)。请指教

我猜您可能有 header 行应该跳过。删除外部文件 table,然后删除并重新创建外部文件格式,如下所示:


    CREATE EXTERNAL FILE FORMAT [SynapseDelimitedTextFormat] 
    WITH ( FORMAT_TYPE = DELIMITEDTEXT ,
           FORMAT_OPTIONS (
             FIELD_TERMINATOR = ',',
             USE_TYPE_DEFAULT = FALSE,
             FIRST_ROW = 2
            ))