通过匹配部分索引标签添加索引列和重新索引数据框
Add an index column and reindex dataframe by matching partial index lables
我有一个多索引 df s:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
pd.MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index)
s
我想通过匹配索引列 "first" 和 "second" 添加一个新的索引列 "zero",x、y、z 到 s。换句话说,我想重复 s 三次,但要使用带有 x、y、z 的附加索引列。我尝试了重新索引(见下文),但为什么它给我的都是 NaN?
mux=pd.MultiIndex.from_product([["x","y","z"],
s.index.get_level_values(0),
s.index.get_level_values(1)],
names=["zero","first", "second"])
t=s.reindex(mux)
t
我也试过指定匹配级别为"first"和"second",但是级别好像只需要一个整数?
IIUC,你想要pd.concat
?
s = pd.concat([s] * 3, axis=0, keys=['x', 'y', 'z'])
如果需要,重命名轴:
s = s.rename_axis(['zero', 'first', 'second'])
s
zero first second
x bar one 0.510567
two 0.066620
baz one 0.667948
two -1.471894
foo one 1.881198
two 0.143628
qux one 1.108174
two -0.978112
y bar one 0.510567
two 0.066620
baz one 0.667948
two -1.471894
foo one 1.881198
two 0.143628
qux one 1.108174
two -0.978112
z bar one 0.510567
two 0.066620
baz one 0.667948
two -1.471894
foo one 1.881198
two 0.143628
qux one 1.108174
two -0.978112
dtype: float64
您可以使用 reindex
, but is necessary create MultiIndex
by levels
. But it append new level to existing, so if necessary add reorder_levels
and sort_index
:
np.random.seed(123)
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index)
#print (s)
mux=pd.MultiIndex.from_product([s.index.levels[0],s.index.levels[1], ["x","y","z"]])
t=s.reindex(mux, method='ffill').reorder_levels([2,0,1]).sort_index()
print (t)
x bar one -1.085631
two 0.997345
baz one 0.282978
two -1.506295
foo one -0.578600
two 1.651437
qux one -2.426679
two -0.428913
y bar one -1.085631
two 0.997345
baz one 0.282978
two -1.506295
foo one -0.578600
two 1.651437
qux one -2.426679
two -0.428913
z bar one -1.085631
two 0.997345
baz one 0.282978
two -1.506295
foo one -0.578600
two 1.651437
qux one -2.426679
two -0.428913
dtype: float64
我有一个多索引 df s:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
pd.MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index)
s
我想通过匹配索引列 "first" 和 "second" 添加一个新的索引列 "zero",x、y、z 到 s。换句话说,我想重复 s 三次,但要使用带有 x、y、z 的附加索引列。我尝试了重新索引(见下文),但为什么它给我的都是 NaN?
mux=pd.MultiIndex.from_product([["x","y","z"],
s.index.get_level_values(0),
s.index.get_level_values(1)],
names=["zero","first", "second"])
t=s.reindex(mux)
t
我也试过指定匹配级别为"first"和"second",但是级别好像只需要一个整数?
IIUC,你想要pd.concat
?
s = pd.concat([s] * 3, axis=0, keys=['x', 'y', 'z'])
如果需要,重命名轴:
s = s.rename_axis(['zero', 'first', 'second'])
s
zero first second
x bar one 0.510567
two 0.066620
baz one 0.667948
two -1.471894
foo one 1.881198
two 0.143628
qux one 1.108174
two -0.978112
y bar one 0.510567
two 0.066620
baz one 0.667948
two -1.471894
foo one 1.881198
two 0.143628
qux one 1.108174
two -0.978112
z bar one 0.510567
two 0.066620
baz one 0.667948
two -1.471894
foo one 1.881198
two 0.143628
qux one 1.108174
two -0.978112
dtype: float64
您可以使用 reindex
, but is necessary create MultiIndex
by levels
. But it append new level to existing, so if necessary add reorder_levels
and sort_index
:
np.random.seed(123)
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index)
#print (s)
mux=pd.MultiIndex.from_product([s.index.levels[0],s.index.levels[1], ["x","y","z"]])
t=s.reindex(mux, method='ffill').reorder_levels([2,0,1]).sort_index()
print (t)
x bar one -1.085631
two 0.997345
baz one 0.282978
two -1.506295
foo one -0.578600
two 1.651437
qux one -2.426679
two -0.428913
y bar one -1.085631
two 0.997345
baz one 0.282978
two -1.506295
foo one -0.578600
two 1.651437
qux one -2.426679
two -0.428913
z bar one -1.085631
two 0.997345
baz one 0.282978
two -1.506295
foo one -0.578600
two 1.651437
qux one -2.426679
two -0.428913
dtype: float64