Convert/unpack pandas 元组数据帧到列表中用作列 headers 没有 ( ,) 语法

Convert/unpack pandas dataframe of tuples into a list to use as column headers without ( ,) syntax

我修剪了列中的字符串以隔离关键字并创建一个数据框 (totalarea_cols),然后我可以用它来标记第二个数据框 (totalarea_p) 的 headers .

但是,关键字似乎是作为元组创建的,当用于标记第二个数据框中的列时,包含元组语法(参见下面的示例;totalarea_p.head()

这是代码示例:

totalarea_code = df_meta_p2.loc[df_meta_p2['Label English'].str.contains('Total area under dry season '), 'Code'];
totalarea_cols = df_meta_p2['Label English'].str.extractall('Total area under dry season (.*)').reset_index(drop=True)
totalarea_p = df_data_p2.loc[: , totalarea_code];
totalarea_p.columns = totalarea_cols

我想从字符串中提取关键字的元数据示例:

In[33]: df_meta_p2['Label English']
Out[33]: 
0                                          District code
1                                          Province code
2                               Province name in English
3                               District name in English
4                                   Province name in Lao
5         Total area under dry season groundnut (peanut)
6      Total number of households growing dry season ...
7      Total number of households growing dry season ...
8      Total number of households growing dry season ...
9      Total number of households growing dry season ...
10     Total number of households growing dry season ...
11     Total number of households growing dry season yam
12     Total number of households growing dry season ...
13     Total number of households growing dry season ...
14     Total number of households growing dry season ...
15     Total number of households growing dry season ...
16     Total number of households growing dry season ...
17     Total number of households growing dry season ...
18     Total number of households growing dry season ...
19     Total number of households growing dry season ...

Name: Label English, dtype: object

使用 str.extractall:

的 DataFrame 输出示例
In [34]: totalarea_cols
Out[34]: 
                                                   0
0                                 groundnut (peanut)
1                       lowland rice/irrigation rice
2                                        upland rice
3                                             potato
4                                       sweet potato
5                                            cassava
6                                                yam
7                                               taro
8                   other tuber, root and bulk crops
9                                          mungbeans
10                                            cowpea
11                                        sugar cane
12                                           soybean
13                                            sesame
14                                            cotton
15                                           tobacco
16                           vegetable not specified
17                                           cabbage

当代入第二个 DataFrame 时 headers 列的样本,totalarea_p:

In [36]:  totalarea_p.head()
Out[36]: 
   (groundnut (peanut),)  (lowland rice/irrigation rice,)  (upland rice,)  \
0                    0.0                             0.00               0   
1                    0.0                             0.00               0   
2                    0.0                             0.00               0   
3                    0.0                             0.30               0   
4                    0.0                             1.01               0   

   (potato,)  (sweet potato,)  (cassava,)  (yam,)  (taro,)  \
0        0.0             0.00         0.0     0.0        0   
1        0.0             0.00         0.0     0.0        0   
2        0.0             0.52         0.0     0.0        0   
3        0.0             0.01         0.0     0.0        0   
4        0.0             0.00         0.0     0.0        0   

我花了一天的大部分时间寻找答案,但除了 post 找到的 ,我一无所获。有什么想法吗??

系列需要 select 列 0,因此将代码更改为:

totalarea_p.columns = totalarea_cols[0]

或 select 按位置 iloc:

totalarea_p.columns = totalarea_cols.iloc[:, 0]