带有数据框的列表的交叉连接(笛卡尔积)
Cross join (Cartesian product) of a list with a dataframe
我有一个列表和一个数据框。
import pandas as pd
work_station = ['A','B','C']
name = ['Mike','Tom','Scott','Tracy']
salary = ['60000','50000','100000','90000']
df = pd.DataFrame({'name':name,'salary':salary})
我想将 work_station 和 df 交叉连接在一起,因此输出如下所示:
station Name salary
A Mike 60000
A Tom 50000
A Scott 100000
A Tracy 90000
B Mike 60000
B Tom 50000
B Scott 100000
B Tracy 90000
C Mike 60000
C Tom 50000
C Scott 100000
C Tracy 90000
我尝试使用 * 函数
df1 = work_station * salary
但它不起作用,因为
TypeError: can't multiply sequence by non-int of type 'list'
有什么建议吗?谢谢!
简单易行,使用 concat
和 keys
参数:
(pd.concat([df] * len(work_station), keys=work_station)
.reset_index(level=1, drop=True)
.rename_axis('station')
.reset_index()
)
station name salary
0 A Mike 60000
1 A Tom 50000
2 A Scott 100000
3 A Tracy 90000
4 B Mike 60000
5 B Tom 50000
6 B Scott 100000
7 B Tracy 90000
8 C Mike 60000
9 C Tom 50000
10 C Scott 100000
11 C Tracy 90000
您也可以使用笛卡尔积走 merge
路线:
(pd.DataFrame(work_station, columns=['station'])
.assign(foo=1)
.merge(df.assign(foo=1))
.drop('foo', 1)
)
station name salary
0 A Mike 60000
1 A Tom 50000
2 A Scott 100000
3 A Tracy 90000
4 B Mike 60000
5 B Tom 50000
6 B Scott 100000
7 B Tracy 90000
8 C Mike 60000
9 C Tom 50000
10 C Scott 100000
11 C Tracy 90000
我有一个列表和一个数据框。
import pandas as pd
work_station = ['A','B','C']
name = ['Mike','Tom','Scott','Tracy']
salary = ['60000','50000','100000','90000']
df = pd.DataFrame({'name':name,'salary':salary})
我想将 work_station 和 df 交叉连接在一起,因此输出如下所示:
station Name salary
A Mike 60000
A Tom 50000
A Scott 100000
A Tracy 90000
B Mike 60000
B Tom 50000
B Scott 100000
B Tracy 90000
C Mike 60000
C Tom 50000
C Scott 100000
C Tracy 90000
我尝试使用 * 函数
df1 = work_station * salary
但它不起作用,因为
TypeError: can't multiply sequence by non-int of type 'list'
有什么建议吗?谢谢!
简单易行,使用 concat
和 keys
参数:
(pd.concat([df] * len(work_station), keys=work_station)
.reset_index(level=1, drop=True)
.rename_axis('station')
.reset_index()
)
station name salary
0 A Mike 60000
1 A Tom 50000
2 A Scott 100000
3 A Tracy 90000
4 B Mike 60000
5 B Tom 50000
6 B Scott 100000
7 B Tracy 90000
8 C Mike 60000
9 C Tom 50000
10 C Scott 100000
11 C Tracy 90000
您也可以使用笛卡尔积走 merge
路线:
(pd.DataFrame(work_station, columns=['station'])
.assign(foo=1)
.merge(df.assign(foo=1))
.drop('foo', 1)
)
station name salary
0 A Mike 60000
1 A Tom 50000
2 A Scott 100000
3 A Tracy 90000
4 B Mike 60000
5 B Tom 50000
6 B Scott 100000
7 B Tracy 90000
8 C Mike 60000
9 C Tom 50000
10 C Scott 100000
11 C Tracy 90000