SQL 服务器中的 Python：INT 列中的 NULL 值映射到 -2147483648 而不是 None

Question

tl;博士

我在 SQL Server 2017 中使用 Python。Python 代码包含在我向其传递查询的存储过程中。查询得到评估并将数据传递给 Python。如果查询中的字符串列（char、nchar、varchar、nvarchar）包含 NULL，它将在 Python 中映射到 None。但是如果 int 列包含 NULL，它会映射到 -2147483648（我猜是最小整数值）。

我的问题是如何从 int 列中获取 NULL 值，使其成为 Python 中的 None，而不是 -2147483648？该列需要保留 int.

可重现的例子

我正在使用的测试数据：

CREATE TABLE [dbo].[test_table](
    [a-string] [nvarchar](50) NULL,
    [a-date] [date] NULL,
    [a-int] [int] NULL,
    [a-null-int] [int] NULL,
    [a-null-str] [nvarchar](50) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[test_table] ([a-string], [a-date], [a-int], [a-null-int], [a-null-str]) VALUES (N'asdf', CAST(N'2018-04-11' AS Date), 1, NULL, NULL)
INSERT [dbo].[test_table] ([a-string], [a-date], [a-int], [a-null-int], [a-null-str]) VALUES (N'fdsa', CAST(N'2008-04-11' AS Date), 2, NULL, NULL)
INSERT [dbo].[test_table] ([a-string], [a-date], [a-int], [a-null-int], [a-null-str]) VALUES (N'Bob "Bla" Bob', CAST(N'2028-04-11' AS Date), 3, NULL, NULL)
INSERT [dbo].[test_table] ([a-string], [a-date], [a-int], [a-null-int], [a-null-str]) VALUES (N'Bob, Bob', CAST(N'2038-04-11' AS Date), 4, NULL, NULL)
INSERT [dbo].[test_table] ([a-string], [a-date], [a-int], [a-null-int], [a-null-str]) VALUES (N'Bob bob', CAST(N'1998-04-11' AS Date), 5, 1, NULL)

最后两列包含一些 NULL 值。第一个是int类型，第二个是nvarchar.

类型

存储过程的代码：

CREATE PROCEDURE [dbo].[usp_test]
    @query NVARCHAR(max)
AS
BEGIN
EXEC sp_execute_external_script 
@language = N'Python', 
@script = N'
print(InputDataSet)
',
@input_data_1 = @query
END;

存储过程有一个带有查询的参数，该参数将查询结果传递给 Python 代码。在 Python 代码中，我正在打印数据。

我如何执行存储过程：

EXEC [dbo].[usp_test] N'SELECT [a-string],CAST([a-date] as nvarchar(20)) as [a-date],[a-int],[a-null-int],[a-null-str] FROM [dbo].[test_table]'

结果是：

        a-string      a-date  a-int  a-null-int a-null-str
0           asdf  2018-04-11      1 -2147483648       None
1           fdsa  2008-04-11      2 -2147483648       None
2  Bob "Bla" Bob  2028-04-11      3 -2147483648       None
3       Bob, Bob  2038-04-11      4 -2147483648       None
4        Bob bob  1998-04-11      5           1       None

意外行为在 a-null-int 列中。如何在保持 int 的同时使其成为 None 而不是 -2147483648？

此问题与 SQL 服务器密切相关。根据 Microsoft 的 this documentation，BxlServer 或 SQL Satellite（不确定）处理 SQL 服务器和 Python 之间的数据传输。我希望问题出在其中一项服务中。但是我不知道如何规避它。

完成的研究：

SQL 服务器中 Python 的 Microsoft 教程：docs.microsoft.com
SQL 运行 Python 的 Server 2017 架构：https://docs.microsoft.com/en-us/sql/advanced-analytics/python/new-components-in-sql-server-to-support-python-integration?view=sql-server-2017
Pandas 支持整数 NA: http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

编辑1；此问题是否与问题 How to store empty value as an Integerfield?

重复

海事组织没有。问题似乎是数据类型（str 与 int）之间存在差异。这里不是这种情况。如果我检查数据类型，我得到：

print(type(InputDataSet.ix[0,"a-null-int"]))
>>> <class 'numpy.int32'>

这是正确的。我正在传递一个 int 列，它被映射到 python int。但我需要的是 None.

编辑2；回复@arun-gurunathan 回答：

在开始之前我需要声明，[a-null-int] 列需要保持整数类型。对于上下文，我需要将数据导出到 CSV。为了说明我的问题，我将 [a-null-int] 列中最后一行的值从 NULL 更改为 1。相应地更改了问题的开头。

使用 RxMissingValues.int32() 我得到用于替换 NULL 值的值，即 -2147483648。我可以用 numpy.NaN 替换这些值。这不是防弹修复，因为如果 SQL 服务器中的列恰好包含此值会发生什么？尽管如此，我还是继续走这条路...

下面的代码我放在上面的存储过程中：

import numpy
from revoscalepy import RxMissingValues
InputDataSet.loc[InputDataSet["a-null-int"] == RxMissingValues.int32(), ("a-null-int")] = numpy.NaN
print(InputDataSet)

这是我得到的（缩写）：

   a-null-int
0         NaN
1         NaN
2         NaN
3         NaN
4         1.0

[a-null-int] 列转换为 float。 pandas doc and has been discussed on Whosebug.

中记录了此行为

由于 NumPy 在处理 NA 值方面的限制，我预计我的问题无法解决。我会再等一段时间，看看是否会弹出更多关于如何将列 a-null-int 的类型保持为 int 的答案，或者一些解决方法。否则我会接受@arun-gurunathan 的回答。

Answer 1

rxMissingValues document 描述了在整数列中存储 None 值的 pandas/numpy 限制。您可以按照文档中的描述通过检查缺失值 (rxMissingValues.int32()) 来处理这些问题。

SQL 服务器中的 Python：INT 列中的 NULL 值映射到 -2147483648 而不是 None

Python in SQL Server: NULL values in INT columns get mapped to -2147483648 rather than None

python

sql-server

stored-procedures

sql-server-2017

tl;博士

可重现的例子

完成的研究：

编辑1；此问题是否与问题 How to store empty value as an Integerfield?

编辑2；回复@arun-gurunathan 回答：