Python Pandas 非等值连接
Python Pandas non equal join
有table
import pandas as pd
import numpy as np
list_1=[['Steven',np.nan,'C1'],
['Michael',np.nan,'C2'],
['Robert',np.nan,'C3'],
['Buchanan',np.nan,'C1'],
['Suyama',np.nan,'C2'],
['King',np.nan,'C3']]
labels=['first_name','last_name','class']
df=pd.DataFrame(list_1,columns=labels)
df
输出
first_name last_name class
0 Steven NaN C1
1 Michael NaN C2
2 Robert NaN C3
3 Buchanan NaN C1
4 Suyama NaN C2
5 King NaN C3
需要:
first_name last_name
Steven Buchanan
Michael Suyama
Robert King
所以我需要进行非对等连接
等效 SQL 查询:
;with cte as
(
SELECT first_name,
class,
ROW_NUMBER() OVER (partition by class ORDER BY first_name) as rn
FROM students
)
select c_fn.first_name,
c_ln.first_name
from cte c_fn join cte c_ln on c_fn.class=c_ln.class and c_ln.rn< c_fn.rn
或作为SQL查询:
;with cte as
(
SELECT first_name,
last_name,
ROW_NUMBER() OVER ( ORDER BY (select null)) as rn
FROM students
)
select fn.first_name,
ln.first_name as last_name
from cte fn join cte ln on ln.rn=fn.rn+3
PANDAS中的问题是NON EQUAL SELF JOIN不能用MERGE完成。
而且我找不到其他方法.....
我们可以通过使用 groupby
和 agg
并连接字符串以更智能的方式解决 pandas 中的这个问题。然后我们 split
他们到列:
dfn = df.groupby('class').agg(' '.join)['first_name'].str.split(' ', expand=True)
dfn.columns = [df.columns[:2]]
dfn = dfn.reset_index(drop=True)
first_name last_name
0 Steven Buchanan
1 Michael Suyama
2 Robert King
您可以将索引设置为 'class' 和 select 个人姓名:
df = df.setIndex('class')
first_name = df.loc["C1", "first_name"].values[0]
last_name = df.loc["C1", "last_name"].values[1]
有table
import pandas as pd
import numpy as np
list_1=[['Steven',np.nan,'C1'],
['Michael',np.nan,'C2'],
['Robert',np.nan,'C3'],
['Buchanan',np.nan,'C1'],
['Suyama',np.nan,'C2'],
['King',np.nan,'C3']]
labels=['first_name','last_name','class']
df=pd.DataFrame(list_1,columns=labels)
df
输出
first_name last_name class
0 Steven NaN C1
1 Michael NaN C2
2 Robert NaN C3
3 Buchanan NaN C1
4 Suyama NaN C2
5 King NaN C3
需要:
first_name last_name
Steven Buchanan
Michael Suyama
Robert King
所以我需要进行非对等连接 等效 SQL 查询:
;with cte as
(
SELECT first_name,
class,
ROW_NUMBER() OVER (partition by class ORDER BY first_name) as rn
FROM students
)
select c_fn.first_name,
c_ln.first_name
from cte c_fn join cte c_ln on c_fn.class=c_ln.class and c_ln.rn< c_fn.rn
或作为SQL查询:
;with cte as
(
SELECT first_name,
last_name,
ROW_NUMBER() OVER ( ORDER BY (select null)) as rn
FROM students
)
select fn.first_name,
ln.first_name as last_name
from cte fn join cte ln on ln.rn=fn.rn+3
PANDAS中的问题是NON EQUAL SELF JOIN不能用MERGE完成。 而且我找不到其他方法.....
我们可以通过使用 groupby
和 agg
并连接字符串以更智能的方式解决 pandas 中的这个问题。然后我们 split
他们到列:
dfn = df.groupby('class').agg(' '.join)['first_name'].str.split(' ', expand=True)
dfn.columns = [df.columns[:2]]
dfn = dfn.reset_index(drop=True)
first_name last_name
0 Steven Buchanan
1 Michael Suyama
2 Robert King
您可以将索引设置为 'class' 和 select 个人姓名:
df = df.setIndex('class')
first_name = df.loc["C1", "first_name"].values[0]
last_name = df.loc["C1", "last_name"].values[1]