Pandas 从 MSSQL 加载数据帧
Pandas load dataframe from MSSQL
我正在尝试将数据加载到数据框,以便稍后在记录链接中使用它,但是我收到错误消息:
空数据框
列:[名字、姓氏、公司名]
索引:[]
我不确定我做错了什么?
代码:
import pymssql
import time
import recordlinkage
import pandas.io.sql as psql
#SQL connection
conn = pymssql.connect(host='server', user='xx', password='xx', database='Test')
cursor = conn.cursor()
print(time.ctime())
sql = "select FirstName, LastName, CompanyName, ID from [Test].[dbo].[Person]with(nolock) where ID < 100"
dfA = psql.read_sql(sql, conn, index_col='ID')
print(dfA)
# Indexation step
pcl = recordlinkage.index.Block(on='FirstName')
pairs = pcl.index(dfA)
# Comparison step
compare_cl = recordlinkage.Compare()
compare_cl.exact('FirstName', 'FirstName', label='FirstName')
compare_cl.string('LastName', 'LastName', method='jarowinkler', threshold=0.85, label='LastName')
compare_cl.string('CompanyName', 'CompanyName', threshold=0.85, label='CompanyName')
features = compare_cl.compute(pairs, dfA)
# Classification step
matches = features[features.sum(axis=1) > 3]
print(len(matches))
print(matches)
这部分返回的错误是 DataFrame 中没有数据
修复从 > 3 到 > 2 或 >1
matches = features[features.sum(axis=1) > 3]
我正在尝试将数据加载到数据框,以便稍后在记录链接中使用它,但是我收到错误消息:
空数据框 列:[名字、姓氏、公司名] 索引:[]
我不确定我做错了什么?
代码:
import pymssql
import time
import recordlinkage
import pandas.io.sql as psql
#SQL connection
conn = pymssql.connect(host='server', user='xx', password='xx', database='Test')
cursor = conn.cursor()
print(time.ctime())
sql = "select FirstName, LastName, CompanyName, ID from [Test].[dbo].[Person]with(nolock) where ID < 100"
dfA = psql.read_sql(sql, conn, index_col='ID')
print(dfA)
# Indexation step
pcl = recordlinkage.index.Block(on='FirstName')
pairs = pcl.index(dfA)
# Comparison step
compare_cl = recordlinkage.Compare()
compare_cl.exact('FirstName', 'FirstName', label='FirstName')
compare_cl.string('LastName', 'LastName', method='jarowinkler', threshold=0.85, label='LastName')
compare_cl.string('CompanyName', 'CompanyName', threshold=0.85, label='CompanyName')
features = compare_cl.compute(pairs, dfA)
# Classification step
matches = features[features.sum(axis=1) > 3]
print(len(matches))
print(matches)
这部分返回的错误是 DataFrame 中没有数据 修复从 > 3 到 > 2 或 >1
matches = features[features.sum(axis=1) > 3]