使用 Pandas.Melt 重新排列 Pandas 数据框以一次获取多列?
Rearrange Pandas Dataframe using Pandas.Melt to take multiple columns at once?
我有一个 pandas 数据框,如下所示。实际上有 192 个 X,Y,Z 三元组列,这只是前三个。
shot V0e V0n V0d S0_Pe S0_Pn S0_Pd S0_Se S0_Sn S0_Sd
0 1001 457950.4 7331695.2 2.5 458004.5 7331794.1 2.2 457950.4 7331695.2 2.1
1 1002 457948.0 7331689.4 2.3 457999.5 7331782.5 2.3 457993.6 7331792.8 2.3
2 1003 457945.6 7331683.5 2.4 457999.5 7331782.5 2.4 457945.6 7331683.5 2.6
3 1004 457943.3 7331677.8 2.3 457995.4 7331770.8 2.3 457988.8 7331781.2 2.5
4 1005 457940.9 7331672.1 2.2 457995.4 7331770.8 2.6 457948.0 7331689.4 2.4
我想做的是将它们重新排列成如下所示,这样我就可以在 Plotly 动画中对它们进行动画处理。为此,需要长格式的数据。
Shot Easting Northing Depth
1001 457950.4 7331695.2 2.5 #V0e,V0n,V0d
1001 458004.5 7331794.1 2.2 #S0_Pe,S0_Pn,S0_Pd
1001 457950.4 7331695.2 2.1 #S0_Se,S0_Sn,S0_Sd
1002 457948.0 7331689.4 2.3 #V0e,V0n,V0d
1002 457999.5 7331782.5 2.3 #S0_Pe,S0_Pn,S0_Pd
1002 457993.6 7331792.8 2.3 #S0_Se,S0_Sn,S0_Sd
1003 457945.6 7331683.5 2.4 #V0e,V0n,V0d
1003 457999.5 7331782.5 2.4 #S0_Pe,S0_Pn,S0_Pd
1003 457945.6 7331683.5 2.6 #S0_Se,S0_Sn,S0_Sd
1004 457943.3 7331677.8 2.3 #V0e,V0n,V0d
1004 457995.4 7331770.8 2.3 #S0_Pe,S0_Pn,S0_Pd
1004 457995.4 7331770.8 2.3 #S0_Se,S0_Sn,S0_Sd
1005 457943.3 7331677.8 2.3 #V0e,V0n,V0d
1005 457995.4 7331770.8 2.3 #S0_Pe,S0_Pn,S0_Pd
1005 457948.0 7331689.4 2.4 #S0_Se,S0_Sn,S0_Sd
这也是可以接受的。
Shot Easting Northing Depth
1001 457950.4 7331695.2 2.5 #V0e,V0n,V0d
1002 457948.0 7331689.4 2.3 #V0e,V0n,V0d
1003 457945.6 7331683.5 2.4 #V0e,V0n,V0d
1004 457943.3 7331677.8 2.3 #V0e,V0n,V0d
1005 457943.3 7331677.8 2.3 #V0e,V0n,V0d
1001 458004.5 7331794.1 2.2 #S0_Pe,S0_Pn,S0_Pd
1002 457999.5 7331782.5 2.3 #S0_Pe,S0_Pn,S0_Pd
1003 457999.5 7331782.5 2.4 #S0_Pe,S0_Pn,S0_Pd
1004 457995.4 7331770.8 2.3 #S0_Pe,S0_Pn,S0_Pd
1005 457995.4 7331770.8 2.3 #S0_Pe,S0_Pn,S0_Pd
1001 457950.4 7331695.2 2.1 #S0_Se,S0_Sn,S0_Sd
1002 457993.6 7331792.8 2.3 #S0_Se,S0_Sn,S0_Sd
1003 457945.6 7331683.5 2.6 #S0_Se,S0_Sn,S0_Sd
1004 457995.4 7331770.8 2.3 #S0_Se,S0_Sn,S0_Sd
1005 457948.0 7331689.4 2.4 #S0_Se,S0_Sn,S0_Sd
我看过pandas.melt。
dfMELT=pd.melt(df,id_vars=['shot'],value_vars=["V0e","V0n","V0d","S0_Pe","S0_Pn","S0_Pd","S0_Se","S0_Sn","S0_Sd"])
但它不会像上面那样重新排列,它一次只适用于 3 列,所以我的 X 被处理,然后是我的 Y,然后是我的 Z。这显然不适用于在散点图中绘制坐标.
shot variable value
0 1001 V0e 457950.4
1 1002 V0e 457948.0
2 1003 V0e 457945.6
3 1004 V0e 457943.3
....
shot variable value
779 1780 V0e 456009.1
780 1001 V0n 7331695.2
781 1002 V0n 7331689.4
你可以使用pivot_longer from pyjanitor来抽象这个过程。您的列有一个模式(有些以 e
结尾,有些以 d
结尾,有些以 n
结尾)。让我们传递一个捕获此模式的正则表达式列表:
# pip install janitor
import janitor
import pandas as pd
df.pivot_longer(index='shot',
names_to = ("Easting", "Northing", "Depth"),
names_pattern = (r".+e$", r".+n$", r".+d$"))
shot Easting Northing Depth
0 1001 457950.4 7331695.2 2.5
1 1002 457948.0 7331689.4 2.3
2 1003 457945.6 7331683.5 2.4
3 1004 457943.3 7331677.8 2.3
4 1005 457940.9 7331672.1 2.2
5 1001 458004.5 7331794.1 2.2
6 1002 457999.5 7331782.5 2.3
7 1003 457999.5 7331782.5 2.4
8 1004 457995.4 7331770.8 2.3
9 1005 457995.4 7331770.8 2.6
10 1001 457950.4 7331695.2 2.1
11 1002 457993.6 7331792.8 2.3
12 1003 457945.6 7331683.5 2.6
13 1004 457988.8 7331781.2 2.5
14 1005 457948.0 7331689.4 2.4
如果你想要它出现的顺序:
df.pivot_longer(index='shot',
names_to = ("Easting", "Northing", "Depth"),
names_pattern = (r".+e$", r".+n$", r".+d$"),
sort_by_appearance = True)
shot Easting Northing Depth
0 1001 457950.4 7331695.2 2.5
1 1001 458004.5 7331794.1 2.2
2 1001 457950.4 7331695.2 2.1
3 1002 457948.0 7331689.4 2.3
4 1002 457999.5 7331782.5 2.3
5 1002 457993.6 7331792.8 2.3
6 1003 457945.6 7331683.5 2.4
7 1003 457999.5 7331782.5 2.4
8 1003 457945.6 7331683.5 2.6
9 1004 457943.3 7331677.8 2.3
10 1004 457995.4 7331770.8 2.3
11 1004 457988.8 7331781.2 2.5
12 1005 457940.9 7331672.1 2.2
13 1005 457995.4 7331770.8 2.6
14 1005 457948.0 7331689.4 2.4
您也可以使用 .value
方法 - 任何与 .value
关联的值都保留为 header:
(df.pivot_longer(index = 'shot',
names_to = ".value",
names_pattern = r".+(.)$")
.rename(columns = {"e" : "Easting",
"n" : "Northing",
"d" : "Depth"}
)
)
shot Easting Northing Depth
0 1001 457950.4 7331695.2 2.5
1 1002 457948.0 7331689.4 2.3
2 1003 457945.6 7331683.5 2.4
3 1004 457943.3 7331677.8 2.3
4 1005 457940.9 7331672.1 2.2
5 1001 458004.5 7331794.1 2.2
6 1002 457999.5 7331782.5 2.3
7 1003 457999.5 7331782.5 2.4
8 1004 457995.4 7331770.8 2.3
9 1005 457995.4 7331770.8 2.6
10 1001 457950.4 7331695.2 2.1
11 1002 457993.6 7331792.8 2.3
12 1003 457945.6 7331683.5 2.6
13 1004 457988.8 7331781.2 2.5
14 1005 457948.0 7331689.4 2.4
pivot_longer旨在使重塑过程更容易;但是,您可能对 importing/installing 另一个库不感兴趣 - 让我们看看使用 pandas' wide_to_long
:
解决此问题的一种方法
首先,让我们重命名列:
res = df.copy()
res = res.set_index('shot')
res = res.rename(columns = lambda col: f"Easting_{col[:-1]}"
if col.endswith("e")
else f"Northing_{col[:-1]}"
if col.endswith("n")
else f"Depth_{col[:-1]}")
现在我们可以重塑:
(pd.wide_to_long(res.reset_index(),
i = 'shot',
stubnames = ['Easting', 'Northing', 'Depth'],
j = 'wateva',
sep = "_",
suffix = ".+")
.droplevel('wateva')
.reset_index()
)
shot Easting Northing Depth
0 1001 457950.4 7331695.2 2.5
1 1002 457948.0 7331689.4 2.3
2 1003 457945.6 7331683.5 2.4
3 1004 457943.3 7331677.8 2.3
4 1005 457940.9 7331672.1 2.2
5 1001 458004.5 7331794.1 2.2
6 1002 457999.5 7331782.5 2.3
7 1003 457999.5 7331782.5 2.4
8 1004 457995.4 7331770.8 2.3
9 1005 457995.4 7331770.8 2.6
10 1001 457950.4 7331695.2 2.1
11 1002 457993.6 7331792.8 2.3
12 1003 457945.6 7331683.5 2.6
13 1004 457988.8 7331781.2 2.5
14 1005 457948.0 7331689.4 2.4
我有一个 pandas 数据框,如下所示。实际上有 192 个 X,Y,Z 三元组列,这只是前三个。
shot V0e V0n V0d S0_Pe S0_Pn S0_Pd S0_Se S0_Sn S0_Sd
0 1001 457950.4 7331695.2 2.5 458004.5 7331794.1 2.2 457950.4 7331695.2 2.1
1 1002 457948.0 7331689.4 2.3 457999.5 7331782.5 2.3 457993.6 7331792.8 2.3
2 1003 457945.6 7331683.5 2.4 457999.5 7331782.5 2.4 457945.6 7331683.5 2.6
3 1004 457943.3 7331677.8 2.3 457995.4 7331770.8 2.3 457988.8 7331781.2 2.5
4 1005 457940.9 7331672.1 2.2 457995.4 7331770.8 2.6 457948.0 7331689.4 2.4
我想做的是将它们重新排列成如下所示,这样我就可以在 Plotly 动画中对它们进行动画处理。为此,需要长格式的数据。
Shot Easting Northing Depth
1001 457950.4 7331695.2 2.5 #V0e,V0n,V0d
1001 458004.5 7331794.1 2.2 #S0_Pe,S0_Pn,S0_Pd
1001 457950.4 7331695.2 2.1 #S0_Se,S0_Sn,S0_Sd
1002 457948.0 7331689.4 2.3 #V0e,V0n,V0d
1002 457999.5 7331782.5 2.3 #S0_Pe,S0_Pn,S0_Pd
1002 457993.6 7331792.8 2.3 #S0_Se,S0_Sn,S0_Sd
1003 457945.6 7331683.5 2.4 #V0e,V0n,V0d
1003 457999.5 7331782.5 2.4 #S0_Pe,S0_Pn,S0_Pd
1003 457945.6 7331683.5 2.6 #S0_Se,S0_Sn,S0_Sd
1004 457943.3 7331677.8 2.3 #V0e,V0n,V0d
1004 457995.4 7331770.8 2.3 #S0_Pe,S0_Pn,S0_Pd
1004 457995.4 7331770.8 2.3 #S0_Se,S0_Sn,S0_Sd
1005 457943.3 7331677.8 2.3 #V0e,V0n,V0d
1005 457995.4 7331770.8 2.3 #S0_Pe,S0_Pn,S0_Pd
1005 457948.0 7331689.4 2.4 #S0_Se,S0_Sn,S0_Sd
这也是可以接受的。
Shot Easting Northing Depth
1001 457950.4 7331695.2 2.5 #V0e,V0n,V0d
1002 457948.0 7331689.4 2.3 #V0e,V0n,V0d
1003 457945.6 7331683.5 2.4 #V0e,V0n,V0d
1004 457943.3 7331677.8 2.3 #V0e,V0n,V0d
1005 457943.3 7331677.8 2.3 #V0e,V0n,V0d
1001 458004.5 7331794.1 2.2 #S0_Pe,S0_Pn,S0_Pd
1002 457999.5 7331782.5 2.3 #S0_Pe,S0_Pn,S0_Pd
1003 457999.5 7331782.5 2.4 #S0_Pe,S0_Pn,S0_Pd
1004 457995.4 7331770.8 2.3 #S0_Pe,S0_Pn,S0_Pd
1005 457995.4 7331770.8 2.3 #S0_Pe,S0_Pn,S0_Pd
1001 457950.4 7331695.2 2.1 #S0_Se,S0_Sn,S0_Sd
1002 457993.6 7331792.8 2.3 #S0_Se,S0_Sn,S0_Sd
1003 457945.6 7331683.5 2.6 #S0_Se,S0_Sn,S0_Sd
1004 457995.4 7331770.8 2.3 #S0_Se,S0_Sn,S0_Sd
1005 457948.0 7331689.4 2.4 #S0_Se,S0_Sn,S0_Sd
我看过pandas.melt。
dfMELT=pd.melt(df,id_vars=['shot'],value_vars=["V0e","V0n","V0d","S0_Pe","S0_Pn","S0_Pd","S0_Se","S0_Sn","S0_Sd"])
但它不会像上面那样重新排列,它一次只适用于 3 列,所以我的 X 被处理,然后是我的 Y,然后是我的 Z。这显然不适用于在散点图中绘制坐标.
shot variable value
0 1001 V0e 457950.4
1 1002 V0e 457948.0
2 1003 V0e 457945.6
3 1004 V0e 457943.3
....
shot variable value
779 1780 V0e 456009.1
780 1001 V0n 7331695.2
781 1002 V0n 7331689.4
你可以使用pivot_longer from pyjanitor来抽象这个过程。您的列有一个模式(有些以 e
结尾,有些以 d
结尾,有些以 n
结尾)。让我们传递一个捕获此模式的正则表达式列表:
# pip install janitor
import janitor
import pandas as pd
df.pivot_longer(index='shot',
names_to = ("Easting", "Northing", "Depth"),
names_pattern = (r".+e$", r".+n$", r".+d$"))
shot Easting Northing Depth
0 1001 457950.4 7331695.2 2.5
1 1002 457948.0 7331689.4 2.3
2 1003 457945.6 7331683.5 2.4
3 1004 457943.3 7331677.8 2.3
4 1005 457940.9 7331672.1 2.2
5 1001 458004.5 7331794.1 2.2
6 1002 457999.5 7331782.5 2.3
7 1003 457999.5 7331782.5 2.4
8 1004 457995.4 7331770.8 2.3
9 1005 457995.4 7331770.8 2.6
10 1001 457950.4 7331695.2 2.1
11 1002 457993.6 7331792.8 2.3
12 1003 457945.6 7331683.5 2.6
13 1004 457988.8 7331781.2 2.5
14 1005 457948.0 7331689.4 2.4
如果你想要它出现的顺序:
df.pivot_longer(index='shot',
names_to = ("Easting", "Northing", "Depth"),
names_pattern = (r".+e$", r".+n$", r".+d$"),
sort_by_appearance = True)
shot Easting Northing Depth
0 1001 457950.4 7331695.2 2.5
1 1001 458004.5 7331794.1 2.2
2 1001 457950.4 7331695.2 2.1
3 1002 457948.0 7331689.4 2.3
4 1002 457999.5 7331782.5 2.3
5 1002 457993.6 7331792.8 2.3
6 1003 457945.6 7331683.5 2.4
7 1003 457999.5 7331782.5 2.4
8 1003 457945.6 7331683.5 2.6
9 1004 457943.3 7331677.8 2.3
10 1004 457995.4 7331770.8 2.3
11 1004 457988.8 7331781.2 2.5
12 1005 457940.9 7331672.1 2.2
13 1005 457995.4 7331770.8 2.6
14 1005 457948.0 7331689.4 2.4
您也可以使用 .value
方法 - 任何与 .value
关联的值都保留为 header:
(df.pivot_longer(index = 'shot',
names_to = ".value",
names_pattern = r".+(.)$")
.rename(columns = {"e" : "Easting",
"n" : "Northing",
"d" : "Depth"}
)
)
shot Easting Northing Depth
0 1001 457950.4 7331695.2 2.5
1 1002 457948.0 7331689.4 2.3
2 1003 457945.6 7331683.5 2.4
3 1004 457943.3 7331677.8 2.3
4 1005 457940.9 7331672.1 2.2
5 1001 458004.5 7331794.1 2.2
6 1002 457999.5 7331782.5 2.3
7 1003 457999.5 7331782.5 2.4
8 1004 457995.4 7331770.8 2.3
9 1005 457995.4 7331770.8 2.6
10 1001 457950.4 7331695.2 2.1
11 1002 457993.6 7331792.8 2.3
12 1003 457945.6 7331683.5 2.6
13 1004 457988.8 7331781.2 2.5
14 1005 457948.0 7331689.4 2.4
pivot_longer旨在使重塑过程更容易;但是,您可能对 importing/installing 另一个库不感兴趣 - 让我们看看使用 pandas' wide_to_long
:
首先,让我们重命名列:
res = df.copy()
res = res.set_index('shot')
res = res.rename(columns = lambda col: f"Easting_{col[:-1]}"
if col.endswith("e")
else f"Northing_{col[:-1]}"
if col.endswith("n")
else f"Depth_{col[:-1]}")
现在我们可以重塑:
(pd.wide_to_long(res.reset_index(),
i = 'shot',
stubnames = ['Easting', 'Northing', 'Depth'],
j = 'wateva',
sep = "_",
suffix = ".+")
.droplevel('wateva')
.reset_index()
)
shot Easting Northing Depth
0 1001 457950.4 7331695.2 2.5
1 1002 457948.0 7331689.4 2.3
2 1003 457945.6 7331683.5 2.4
3 1004 457943.3 7331677.8 2.3
4 1005 457940.9 7331672.1 2.2
5 1001 458004.5 7331794.1 2.2
6 1002 457999.5 7331782.5 2.3
7 1003 457999.5 7331782.5 2.4
8 1004 457995.4 7331770.8 2.3
9 1005 457995.4 7331770.8 2.6
10 1001 457950.4 7331695.2 2.1
11 1002 457993.6 7331792.8 2.3
12 1003 457945.6 7331683.5 2.6
13 1004 457988.8 7331781.2 2.5
14 1005 457948.0 7331689.4 2.4