Python sql 从关联实体结果到 2d pandas 数据帧的查询

Question

我想从作为关联实体的 sql table 填充 pandas 数据框，以便数据框具有实体之一的索引和列 headers 第二个实体。

例如，我有以下 SQL tables:

实体 1

code	name
A	Type A
B	Type B

实体 2

code	name
W	Type W
X	Type X
Y	Type Y
Z	Type Z

关联实体

Entity 1_code	Entity 2_code	value
A	W	1
A	Y	7
A	Z	3
B	X	88
B	Y	5

我希望我的数据框具有以下结构

	W	X	Y	Z
A	1	NaN	7	3
B	NaN	88	5	NaN

语义上我可以通过使用以下伪代码加载一个空框架来做到这一点：

connection = psycopg2.connect( ... )

# create empty df with index set to Entity 1 codes
df = psql.read_sql('SELECT code FROM entity_1', connection, index_col='code')

cur = connection.cursor()
cur.execute('SELECT code FROM entity_2')

# create list of column names
entity_2_codes = [r[0] for r in cur.fetchall()]
# add columns from entity 2 codes
df=df.reindex(columns=entity_2_codes) 

# now loop through each associative entity entry and insert value into dataframe

有什么聪明的方法可以更有效地填充 table 吗？也许一次添加一列或一行？请注意，数据是稀疏的，因此并非每个 Entity 1 x Entity 2 组合都会有一个值。

Answer 1

您可以使用 pandas pivot() or pivot_table() 方法。 pivot 在不需要聚合时使用（每个 Entity 1_code 和 Entity 2_code 组合只有一个值）。 pivot_table 可用于聚合（求和、计数、最大值）如果您有多个值，您可以指定如何填充 NA 值等。

如果您可以将 Associative entity table 加载到 DataFrame df 中，这将是：

df.pivot(index='Entity 1_code', columns='Entity 2_code', values='value')

或使用pivot_table:

df.pivot_table(index='Entity 1_code', columns='Entity 2_code', values='value', aggfunc='mean')

如果每个组合只有一个值，pivot_table 可以通过将 aggfunc 设置为 'mean' 来模仿 pivot，因为平均值将只是那个值。

Answer 2

在 SQL 中，您可以使用条件聚合来执行此操作 - 但这需要您事先知道代码列表：

select entity_1_code,
    max(case when entity_2code = 'W' then value end) as w,
    max(case when entity_2code = 'X' then value end) as x,
    max(case when entity_2code = 'Y' then value end) as y,
    max(case when entity_2code = 'Z' then value end) as z
from associative_entity
group by entity_1_code

请注意，您不需要其他两个表来获得您显示的结果。但是如果你想从这些表中获取信息，你总是可以在上面的查询中 join 它们。

Answer 3

您只能查询关联table：

然后你可以应用一个简单的.pivot()函数

df = pd.DataFrame({'Entity 1_code':['A', 'A','A', 'B', 'B'], 'Entity 2_code': ['W', 'Y', 'Z', 'X', 'Y'], 'code_value':[1,7,3,88,5]})
df.pivot(index='Entity 1_code', columns='Entity 2_code', values='code_value')

Python sql 从关联实体结果到 2d pandas 数据帧的查询

Python sql query from associative entity result to 2d pandas dataframe

python

sql

associative

pandas