如何将 MultiIndex 转换为字符串类型
How do I convert a MultiIndex to type string
考虑 MultiIndex idx
idx = pd.MultiIndex.from_product([range(2013, 2016), range(1, 5)])
当我做的时候
idx.to_series().str.join(' ')
我明白了
2013 1 NaN
2 NaN
3 NaN
4 NaN
2014 1 NaN
2 NaN
3 NaN
4 NaN
2015 1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
发生这种情况是因为不同级别的 dtype 是 int
而不是 str
。 join
期望 str
。如何将整个 idx
转换为 str
?
我完成了
join = lambda x, delim=' ': delim.join([str(y) for y in x])
idx.to_series().apply(join, delim=' ')
2013 1 2013 1
2 2013 2
3 2013 3
4 2013 4
2014 1 2014 1
2 2014 2
3 2014 3
4 2014 4
2015 1 2015 1
2 2015 2
3 2015 3
4 2015 4
dtype: object
我希望有一种我忽略的更简单的方法。
我不确定这是最优雅的方式,但它应该有效:
idx.get_level_values(0).astype(str).values + ' ' + idx.get_level_values(1).astype(str).values
是这样的吗?
idx.to_series().apply(lambda x: '{0}-{1}'.format(*x))
使用 itertools
中的 starmap
的通用解决方案
from itertools import starmap
def flat2(midx, sep=''):
fstr = sep.join(['{}'] * midx.nlevels)
return pd.Index(starmap(fstr.format, midx))
示范[=18=]
midx = pd.MultiIndex.from_product([[1, 2], [3, 4]])
flat(midx)
Index([u'13', u'14', u'23', u'24'], dtype='object')
flat(midx, '_')
Index([u'1_3', u'1_4', u'2_3', u'2_4'], dtype='object')
最快的是列表理解:
print (['{} {}'.format(i[1], i[0]) for i in idx])
print ([' '.join((str(i[0]), str(i[1]))) for i in idx])
时间:
In [21]: %timeit (['{} {}'.format(i[1], i[0]) for i in idx])
The slowest run took 4.68 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.51 µs per loop
In [22]: %timeit ([' '.join((str(i[0]), str(i[1]))) for i in idx])
The slowest run took 6.48 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.62 µs per loop
In [23]: %timeit (idx.get_level_values(0).astype(str).values + ' ' + idx.get_level_values(1).astype(str).values)
The slowest run took 5.91 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 215 µs per loop
In [24]: %timeit idx.to_series().apply(lambda x: '{0}-{1}'.format(*x))
The slowest run took 5.43 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 369 µs per loop
In [25]: %timeit idx.to_series().str.join(' ')
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 394 µs per loop
考虑 MultiIndex idx
idx = pd.MultiIndex.from_product([range(2013, 2016), range(1, 5)])
当我做的时候
idx.to_series().str.join(' ')
我明白了
2013 1 NaN
2 NaN
3 NaN
4 NaN
2014 1 NaN
2 NaN
3 NaN
4 NaN
2015 1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
发生这种情况是因为不同级别的 dtype 是 int
而不是 str
。 join
期望 str
。如何将整个 idx
转换为 str
?
我完成了
join = lambda x, delim=' ': delim.join([str(y) for y in x])
idx.to_series().apply(join, delim=' ')
2013 1 2013 1
2 2013 2
3 2013 3
4 2013 4
2014 1 2014 1
2 2014 2
3 2014 3
4 2014 4
2015 1 2015 1
2 2015 2
3 2015 3
4 2015 4
dtype: object
我希望有一种我忽略的更简单的方法。
我不确定这是最优雅的方式,但它应该有效:
idx.get_level_values(0).astype(str).values + ' ' + idx.get_level_values(1).astype(str).values
是这样的吗?
idx.to_series().apply(lambda x: '{0}-{1}'.format(*x))
使用 itertools
starmap
的通用解决方案
from itertools import starmap
def flat2(midx, sep=''):
fstr = sep.join(['{}'] * midx.nlevels)
return pd.Index(starmap(fstr.format, midx))
示范[=18=]
midx = pd.MultiIndex.from_product([[1, 2], [3, 4]])
flat(midx)
Index([u'13', u'14', u'23', u'24'], dtype='object')
flat(midx, '_')
Index([u'1_3', u'1_4', u'2_3', u'2_4'], dtype='object')
midx = pd.MultiIndex.from_product([[1, 2], [3, 4]])
flat(midx)
Index([u'13', u'14', u'23', u'24'], dtype='object')
flat(midx, '_')
Index([u'1_3', u'1_4', u'2_3', u'2_4'], dtype='object')
最快的是列表理解:
print (['{} {}'.format(i[1], i[0]) for i in idx])
print ([' '.join((str(i[0]), str(i[1]))) for i in idx])
时间:
In [21]: %timeit (['{} {}'.format(i[1], i[0]) for i in idx])
The slowest run took 4.68 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.51 µs per loop
In [22]: %timeit ([' '.join((str(i[0]), str(i[1]))) for i in idx])
The slowest run took 6.48 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.62 µs per loop
In [23]: %timeit (idx.get_level_values(0).astype(str).values + ' ' + idx.get_level_values(1).astype(str).values)
The slowest run took 5.91 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 215 µs per loop
In [24]: %timeit idx.to_series().apply(lambda x: '{0}-{1}'.format(*x))
The slowest run took 5.43 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 369 µs per loop
In [25]: %timeit idx.to_series().str.join(' ')
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 394 µs per loop