从列名中的模式创建多索引

Create Multiindex from pattern in column names

我正在尝试编写一个脚本,该脚本将采用 DataFrame 具有任意数量的实验条件(例如,3 种不同浓度的药物)和每个条件的任意数量的重复(即, 试验 1-3) 看起来像这样:

      100_uM_Drug_Trial_1  100_uM_Drug_Trial_2  10_uM_Drug_Trial_1  \
0             459.924747          635.685284         518.163653   
1             459.458934          636.249568         518.445279   
2             460.006374          636.435523         518.743388   
3             460.002453          636.794022         518.895792   
4             460.598404          636.103206         518.836557   
5             460.309564          637.187444         518.976234   
6             460.609499          636.335023         519.005662   
7             460.843505          637.123839         519.041012   
8             460.969187          637.047453         518.880728   
9             460.832477          637.231533         519.108122   
10            461.255201          638.176752         518.979086   
11            461.310764          636.924448         518.979923   
12            461.507783          637.824450         519.117064   
13            461.116555          637.145600         519.106675   
14            461.891845          638.136241         519.531348   
15            461.746859          637.819223         519.161308   
16            461.840650          637.977134         519.203945   
17            462.028374          638.474671         519.184845   
18            461.726244          638.039615         519.225926   
19            462.128634          638.624309         519.177030   
20            461.242868          637.636891         519.460114   
21            462.201164          638.493620         519.469176   
22            464.078771          637.749872         519.505141   
23            464.605662          639.119425         519.654590   
24            464.352002          638.789306         519.947157   
25            464.485028          638.656634         519.822459   
26            464.506035          639.428889         519.906759   
27            464.834154          638.481042         520.143631   
28            464.886412          639.267176         520.218972   
29            465.414446          638.661687         520.384017 

...并通过条件和试验对其进行多重索引,因此它看起来像这样:

Condition     100_uM_Drug                            10_uM_Drug
Trial         1                   2                  1
0             459.924747          635.685284         518.163653   
1             459.458934          636.249568         518.445279   
2             460.006374          636.435523         518.743388   
3             460.002453          636.794022         518.895792   
4             460.598404          636.103206         518.836557   
5             460.309564          637.187444         518.976234   
6             460.609499          636.335023         519.005662   
7             460.843505          637.123839         519.041012   
8             460.969187          637.047453         518.880728   
9             460.832477          637.231533         519.108122   
10            461.255201          638.176752         518.979086   
11            461.310764          636.924448         518.979923   
12            461.507783          637.824450         519.117064   
13            461.116555          637.145600         519.106675   
14            461.891845          638.136241         519.531348   
15            461.746859          637.819223         519.161308   
16            461.840650          637.977134         519.203945   
17            462.028374          638.474671         519.184845   
18            461.726244          638.039615         519.225926   
19            462.128634          638.624309         519.177030   
20            461.242868          637.636891         519.460114   
21            462.201164          638.493620         519.469176   
22            464.078771          637.749872         519.505141   
23            464.605662          639.119425         519.654590   
24            464.352002          638.789306         519.947157   
25            464.485028          638.656634         519.822459   
26            464.506035          639.428889         519.906759   
27            464.834154          638.481042         520.143631   
28            464.886412          639.267176         520.218972   
29            465.414446          638.661687         520.384017 

我已经尝试了几种方法,包括通过正则表达式过滤列名,但我没有任何效果。有没有我错过的快速简便的方法?

感谢

您可以在拆分 column 个名称 (see docs) 时使用 MultiIndex.from_tuples()

df.columns = pd.MultiIndex.from_tuples([('_'.join(col.split('_')[:3]), col.split('_')[-1]) for col in df.columns], names=['Drug', 'Trial'])

产生:

Drug  100_uM_Drug              10_uM_Drug
Trial           1           2           1
0               0  459.924747  635.685284
1               1  459.458934  636.249568
2               2  460.006374  636.435523
3               3  460.002453  636.794022