如何修改 pandas 数据框多索引的 0 级?
How to modify a the level 0 of a pandas data frame multi index?
我有以下 pandas 数据框:
0 value
name ga:browserVersion
Chrome 44.18 43.0.2357.130 0.139987 14.0%
43.0.2357.124 0.113107 11.31%
43.0.2357.134 0.103564 10.36%
44.0.2403.155 0.093181 9.32%
43.0.2357.81 0.092643 9.26%
44.0.2403.157 0.082780 8.28%
44.0.2403.125 0.070978 7.1%
44.0.2403.130 0.066152 6.62%
43.0.2357.132 0.064872 6.49%
44.0.2403.107 0.039940 3.99%
Internet Explorer 32.12 11.0 0.769828 76.98%
9.0 0.101842 10.18%
10.0 0.063672 6.37%
8.0 0.057929 5.79%
7.0 0.006320 0.63%
6.0 0.000353 0.04%
7.0b 0.000024 0.0%
999.1 0.000024 0.0%
10.6 0.000003 0.0%
5.5 0.000003 0.0%
Firefox 12.76 39.0 0.404164 40.42%
38.0 0.340139 34.01%
40.0 0.139032 13.9%
31.0 0.043926 4.39%
37.0 0.012160 1.22%
36.0 0.006963 0.7%
34.0 0.005601 0.56%
35.0 0.005495 0.55%
21.0 0.003508 0.35%
33.0 0.003209 0.32%
Safari 9.37 8.0.6 0.174829 17.48%
8.0.7 0.172087 17.21%
7.1.6 0.077686 7.77%
5.1.9 0.072729 7.27%
6.1.6 0.067831 6.78%
7.1.7 0.053092 5.31%
8.0.5 0.052637 5.26%
8.0.3 0.035921 3.59%
8.0.8 0.030222 3.02%
8.0.4 0.027923 2.79%
Opera 0.56 30.0.1835.125 0.220076 22.01%
30.0.1835.88 0.163912 16.39%
30.0.1835.59 0.123083 12.31%
31.0.1889.99 0.114718 11.47%
31.0.1889.174 0.111532 11.15%
29.0.1795.60 0.072296 7.23%
12.17 0.063334 6.33%
12.16 0.019319 1.93%
30.0.1835.52 0.009162 0.92%
29.0.1795.54600 0.008763 0.88%
这有一个包含 2 个级别的多索引,name 和 ga:browserVersion。
我想要做的是将“%”添加到级别 0,使其看起来像:
Chrome 44.18%
等等
我创建了一个列表,其中包含我想用来替换当前索引的值:
new_index = []
for i in df.index.get_level_values(0):
i = i+'%'
new_index.append(i)
然后我尝试替换旧索引:
df.index.get_level_values(0) = new_index
但我得到:
SyntaxError: can't assign to function call
我知道这对 'normal' 索引仅在一个级别上有效。有没有办法用多索引来实现这个?
您可以使用 rename
并传递新列名的字典,例如:
In [38]:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C' : np.random.randn(8), 'D' : np.random.randn(8)})
df
Out[38]:
A B C D
0 foo one -0.510057 0.063085
1 bar one -0.570631 -0.648810
2 foo two -1.360048 1.609831
3 bar three 0.628927 -0.379887
4 foo two -0.415176 -1.798492
5 bar two -0.147208 -0.366342
6 foo one -0.333823 1.136703
7 foo three 1.054773 -0.781997
In [36]:
mi = df.set_index(['A','B'])
mi
Out[36]:
C D
A B
foo one 0.172031 0.371076
bar one 1.007468 0.993607
foo two 0.552025 -0.478913
bar three 0.128154 -0.709580
foo two -0.211721 0.569326
bar two -0.713624 0.745678
foo one -0.109175 0.448490
three -0.388360 0.762513
In [39]:
mi.rename(index={'foo':'yes','bar':'no'})
Out[39]:
C D
A B
yes one 0.172031 0.371076
no one 1.007468 0.993607
yes two 0.552025 -0.478913
no three 0.128154 -0.709580
yes two -0.211721 0.569326
no two -0.713624 0.745678
yes one -0.109175 0.448490
three -0.388360 0.762513
所以在你的情况下,我将使用 zip
:
创建一个字典
df.rename(index=dict(zip(df.index.get_level_values(0), new_index)))
# first reset the index , the indices should be converted to columns
df_reset = df.reset_index()
# then concatenate `%` to your column ( this column was an index level 0 )
df_reset.name = df_reset.name + '%'
# set the index again for your data frame
df_reset.set_index(['name' , 'ga:browserVersion'])
我有以下 pandas 数据框:
0 value
name ga:browserVersion
Chrome 44.18 43.0.2357.130 0.139987 14.0%
43.0.2357.124 0.113107 11.31%
43.0.2357.134 0.103564 10.36%
44.0.2403.155 0.093181 9.32%
43.0.2357.81 0.092643 9.26%
44.0.2403.157 0.082780 8.28%
44.0.2403.125 0.070978 7.1%
44.0.2403.130 0.066152 6.62%
43.0.2357.132 0.064872 6.49%
44.0.2403.107 0.039940 3.99%
Internet Explorer 32.12 11.0 0.769828 76.98%
9.0 0.101842 10.18%
10.0 0.063672 6.37%
8.0 0.057929 5.79%
7.0 0.006320 0.63%
6.0 0.000353 0.04%
7.0b 0.000024 0.0%
999.1 0.000024 0.0%
10.6 0.000003 0.0%
5.5 0.000003 0.0%
Firefox 12.76 39.0 0.404164 40.42%
38.0 0.340139 34.01%
40.0 0.139032 13.9%
31.0 0.043926 4.39%
37.0 0.012160 1.22%
36.0 0.006963 0.7%
34.0 0.005601 0.56%
35.0 0.005495 0.55%
21.0 0.003508 0.35%
33.0 0.003209 0.32%
Safari 9.37 8.0.6 0.174829 17.48%
8.0.7 0.172087 17.21%
7.1.6 0.077686 7.77%
5.1.9 0.072729 7.27%
6.1.6 0.067831 6.78%
7.1.7 0.053092 5.31%
8.0.5 0.052637 5.26%
8.0.3 0.035921 3.59%
8.0.8 0.030222 3.02%
8.0.4 0.027923 2.79%
Opera 0.56 30.0.1835.125 0.220076 22.01%
30.0.1835.88 0.163912 16.39%
30.0.1835.59 0.123083 12.31%
31.0.1889.99 0.114718 11.47%
31.0.1889.174 0.111532 11.15%
29.0.1795.60 0.072296 7.23%
12.17 0.063334 6.33%
12.16 0.019319 1.93%
30.0.1835.52 0.009162 0.92%
29.0.1795.54600 0.008763 0.88%
这有一个包含 2 个级别的多索引,name 和 ga:browserVersion。 我想要做的是将“%”添加到级别 0,使其看起来像:
Chrome 44.18%
等等
我创建了一个列表,其中包含我想用来替换当前索引的值:
new_index = []
for i in df.index.get_level_values(0):
i = i+'%'
new_index.append(i)
然后我尝试替换旧索引:
df.index.get_level_values(0) = new_index
但我得到:
SyntaxError: can't assign to function call
我知道这对 'normal' 索引仅在一个级别上有效。有没有办法用多索引来实现这个?
您可以使用 rename
并传递新列名的字典,例如:
In [38]:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C' : np.random.randn(8), 'D' : np.random.randn(8)})
df
Out[38]:
A B C D
0 foo one -0.510057 0.063085
1 bar one -0.570631 -0.648810
2 foo two -1.360048 1.609831
3 bar three 0.628927 -0.379887
4 foo two -0.415176 -1.798492
5 bar two -0.147208 -0.366342
6 foo one -0.333823 1.136703
7 foo three 1.054773 -0.781997
In [36]:
mi = df.set_index(['A','B'])
mi
Out[36]:
C D
A B
foo one 0.172031 0.371076
bar one 1.007468 0.993607
foo two 0.552025 -0.478913
bar three 0.128154 -0.709580
foo two -0.211721 0.569326
bar two -0.713624 0.745678
foo one -0.109175 0.448490
three -0.388360 0.762513
In [39]:
mi.rename(index={'foo':'yes','bar':'no'})
Out[39]:
C D
A B
yes one 0.172031 0.371076
no one 1.007468 0.993607
yes two 0.552025 -0.478913
no three 0.128154 -0.709580
yes two -0.211721 0.569326
no two -0.713624 0.745678
yes one -0.109175 0.448490
three -0.388360 0.762513
所以在你的情况下,我将使用 zip
:
df.rename(index=dict(zip(df.index.get_level_values(0), new_index)))
# first reset the index , the indices should be converted to columns
df_reset = df.reset_index()
# then concatenate `%` to your column ( this column was an index level 0 )
df_reset.name = df_reset.name + '%'
# set the index again for your data frame
df_reset.set_index(['name' , 'ga:browserVersion'])