Convert/unpack pandas 元组数据帧到列表中用作列 headers 没有 ( ,) 语法
Convert/unpack pandas dataframe of tuples into a list to use as column headers without ( ,) syntax
我修剪了列中的字符串以隔离关键字并创建一个数据框 (totalarea_cols
),然后我可以用它来标记第二个数据框 (totalarea_p
) 的 headers .
但是,关键字似乎是作为元组创建的,当用于标记第二个数据框中的列时,包含元组语法(参见下面的示例;totalarea_p.head()
)
这是代码示例:
totalarea_code = df_meta_p2.loc[df_meta_p2['Label English'].str.contains('Total area under dry season '), 'Code'];
totalarea_cols = df_meta_p2['Label English'].str.extractall('Total area under dry season (.*)').reset_index(drop=True)
totalarea_p = df_data_p2.loc[: , totalarea_code];
totalarea_p.columns = totalarea_cols
我想从字符串中提取关键字的元数据示例:
In[33]: df_meta_p2['Label English']
Out[33]:
0 District code
1 Province code
2 Province name in English
3 District name in English
4 Province name in Lao
5 Total area under dry season groundnut (peanut)
6 Total number of households growing dry season ...
7 Total number of households growing dry season ...
8 Total number of households growing dry season ...
9 Total number of households growing dry season ...
10 Total number of households growing dry season ...
11 Total number of households growing dry season yam
12 Total number of households growing dry season ...
13 Total number of households growing dry season ...
14 Total number of households growing dry season ...
15 Total number of households growing dry season ...
16 Total number of households growing dry season ...
17 Total number of households growing dry season ...
18 Total number of households growing dry season ...
19 Total number of households growing dry season ...
Name: Label English, dtype: object
使用 str.extractall:
的 DataFrame 输出示例
In [34]: totalarea_cols
Out[34]:
0
0 groundnut (peanut)
1 lowland rice/irrigation rice
2 upland rice
3 potato
4 sweet potato
5 cassava
6 yam
7 taro
8 other tuber, root and bulk crops
9 mungbeans
10 cowpea
11 sugar cane
12 soybean
13 sesame
14 cotton
15 tobacco
16 vegetable not specified
17 cabbage
当代入第二个 DataFrame 时 headers 列的样本,totalarea_p
:
In [36]: totalarea_p.head()
Out[36]:
(groundnut (peanut),) (lowland rice/irrigation rice,) (upland rice,) \
0 0.0 0.00 0
1 0.0 0.00 0
2 0.0 0.00 0
3 0.0 0.30 0
4 0.0 1.01 0
(potato,) (sweet potato,) (cassava,) (yam,) (taro,) \
0 0.0 0.00 0.0 0.0 0
1 0.0 0.00 0.0 0.0 0
2 0.0 0.52 0.0 0.0 0
3 0.0 0.01 0.0 0.0 0
4 0.0 0.00 0.0 0.0 0
我花了一天的大部分时间寻找答案,但除了 post 找到的 ,我一无所获。有什么想法吗??
系列需要 select 列 0
,因此将代码更改为:
totalarea_p.columns = totalarea_cols[0]
或 select 按位置 iloc
:
totalarea_p.columns = totalarea_cols.iloc[:, 0]
我修剪了列中的字符串以隔离关键字并创建一个数据框 (totalarea_cols
),然后我可以用它来标记第二个数据框 (totalarea_p
) 的 headers .
但是,关键字似乎是作为元组创建的,当用于标记第二个数据框中的列时,包含元组语法(参见下面的示例;totalarea_p.head()
)
这是代码示例:
totalarea_code = df_meta_p2.loc[df_meta_p2['Label English'].str.contains('Total area under dry season '), 'Code'];
totalarea_cols = df_meta_p2['Label English'].str.extractall('Total area under dry season (.*)').reset_index(drop=True)
totalarea_p = df_data_p2.loc[: , totalarea_code];
totalarea_p.columns = totalarea_cols
我想从字符串中提取关键字的元数据示例:
In[33]: df_meta_p2['Label English']
Out[33]:
0 District code
1 Province code
2 Province name in English
3 District name in English
4 Province name in Lao
5 Total area under dry season groundnut (peanut)
6 Total number of households growing dry season ...
7 Total number of households growing dry season ...
8 Total number of households growing dry season ...
9 Total number of households growing dry season ...
10 Total number of households growing dry season ...
11 Total number of households growing dry season yam
12 Total number of households growing dry season ...
13 Total number of households growing dry season ...
14 Total number of households growing dry season ...
15 Total number of households growing dry season ...
16 Total number of households growing dry season ...
17 Total number of households growing dry season ...
18 Total number of households growing dry season ...
19 Total number of households growing dry season ...
Name: Label English, dtype: object
使用 str.extractall:
的 DataFrame 输出示例In [34]: totalarea_cols
Out[34]:
0
0 groundnut (peanut)
1 lowland rice/irrigation rice
2 upland rice
3 potato
4 sweet potato
5 cassava
6 yam
7 taro
8 other tuber, root and bulk crops
9 mungbeans
10 cowpea
11 sugar cane
12 soybean
13 sesame
14 cotton
15 tobacco
16 vegetable not specified
17 cabbage
当代入第二个 DataFrame 时 headers 列的样本,totalarea_p
:
In [36]: totalarea_p.head()
Out[36]:
(groundnut (peanut),) (lowland rice/irrigation rice,) (upland rice,) \
0 0.0 0.00 0
1 0.0 0.00 0
2 0.0 0.00 0
3 0.0 0.30 0
4 0.0 1.01 0
(potato,) (sweet potato,) (cassava,) (yam,) (taro,) \
0 0.0 0.00 0.0 0.0 0
1 0.0 0.00 0.0 0.0 0
2 0.0 0.52 0.0 0.0 0
3 0.0 0.01 0.0 0.0 0
4 0.0 0.00 0.0 0.0 0
我花了一天的大部分时间寻找答案,但除了 post 找到的
系列需要 select 列 0
,因此将代码更改为:
totalarea_p.columns = totalarea_cols[0]
或 select 按位置 iloc
:
totalarea_p.columns = totalarea_cols.iloc[:, 0]