pandas:遍历 excel sheet 中的表
pandas: loop over tables in excel sheet
我试图以特定方式遍历一组表,但我卡住了。
我的表是多索引的,看起来像这样:
#read excel
df = pd.read_excel(data_file,
header=[0,1],
index_col=[0,1])
T Gender Age
Total Male Female 16-24 25-34 35-44 45-54 55-75
Q1. Are you? Yes 17.5 26.8 23.4 13.7 20.7 100 - 17.6
No 17.5 26.8 23.4 13.7 20.7 100 11.5 22.6
Don’t know 17.5 26.8 23.4 13.7 20.7 100 - -
Q2. Are you? Yes 18.5 26.8 23.4 13.7 20.7 100 - 17.6
No 17.5 22.8 23.4 13.7 20.7 100 11.5 22.6
Don’t know 17.5 26.8 23.4 13.7 20.7 100 - -
我想遍历这些索引和列并打印:
T
Total
Q1. Are you? Yes 17.5
No 17.5
Don’t know 17.5
Gender
Male Female
Q1. Are you? Yes 26.8 23.4
No 26.8 23.4
Don’t know 26.8 23.4
等等....
到目前为止,我的代码将外部索引组合在一起,这使我可以向下循环,但我不知道如何水平跨越..?
for outerside_grp, innerside_grp in df.groupby(level=0):
print innerside_grp
更新
下面的代码有点像我想要的(感谢 Joshua Baboo),但现在我想知道这是否是最有效的方法?
for key in df.index.levels[0]:
for col in df.columns.levels[0]:
print df.loc[row:row, col]
如你所说:
'My tables are multiindex'
假设 groupby(level=0)
不是必需的,因为原始数据帧在行轴和列轴上都处于 2 级 MultiIndex 结构中,请查看以下示例是否符合您的目的:
import pandas as pd
print 'pandas-version: ', pd.__version__
import numpy a`enter code here`s np
l1 = ['r0_1', 'r0_2']
l2 = sorted(['r1_1','r1_2','r1_3'])
c1 = ['c0_1', 'c0_2', 'c0_3']
c2 = ['c1_1', 'c1_2', 'c1_3']
nrows = len(l1) * len(l2)
ncols = len(c1) * len(c2)
df = pd.DataFrame(np.random.random( nrows * ncols).reshape(nrows, ncols),
index=pd.MultiIndex.from_product([l1, l2],
names=['one','two']),
columns=pd.MultiIndex.from_product([c1, c2]))
l_all = slice(None)
# updated loop only over columns.level[0]
# to get all-rows for each column group
for col0 in df.columns.levels[0]:
print df.loc(axis=1)[col0,:]
输出
pandas-version: 0.15.2
c0_1
c1_1 c1_2 c1_3
one two
r0_1 r1_1 0.177051 0.159676 0.677900
r1_2 0.980404 0.441649 0.763252
r1_3 0.631876 0.724937 0.158891
r0_2 r1_1 0.856933 0.432360 0.690534
r1_2 0.568308 0.381117 0.430071
r1_3 0.680781 0.795433 0.378414
c0_2
c1_1 c1_2 c1_3
one two
r0_1 r1_1 0.275005 0.266315 0.326656
r1_2 0.841370 0.197737 0.215751
r1_3 0.511860 0.007003 0.509688
r0_2 r1_1 0.170542 0.577844 0.616402
r1_2 0.440131 0.497631 0.628281
r1_3 0.061970 0.192166 0.687346
c0_3
c1_1 c1_2 c1_3
one two
r0_1 r1_1 0.308490 0.372552 0.275818
r1_2 0.718901 0.784083 0.839253
r1_3 0.357739 0.821503 0.336578
r0_2 r1_1 0.758157 0.248164 0.983741
r1_2 0.498885 0.972781 0.922519
r1_3 0.107162 0.364109 0.591648
我试图以特定方式遍历一组表,但我卡住了。
我的表是多索引的,看起来像这样:
#read excel
df = pd.read_excel(data_file,
header=[0,1],
index_col=[0,1])
T Gender Age
Total Male Female 16-24 25-34 35-44 45-54 55-75
Q1. Are you? Yes 17.5 26.8 23.4 13.7 20.7 100 - 17.6
No 17.5 26.8 23.4 13.7 20.7 100 11.5 22.6
Don’t know 17.5 26.8 23.4 13.7 20.7 100 - -
Q2. Are you? Yes 18.5 26.8 23.4 13.7 20.7 100 - 17.6
No 17.5 22.8 23.4 13.7 20.7 100 11.5 22.6
Don’t know 17.5 26.8 23.4 13.7 20.7 100 - -
我想遍历这些索引和列并打印:
T
Total
Q1. Are you? Yes 17.5
No 17.5
Don’t know 17.5
Gender
Male Female
Q1. Are you? Yes 26.8 23.4
No 26.8 23.4
Don’t know 26.8 23.4
等等....
到目前为止,我的代码将外部索引组合在一起,这使我可以向下循环,但我不知道如何水平跨越..?
for outerside_grp, innerside_grp in df.groupby(level=0):
print innerside_grp
更新
下面的代码有点像我想要的(感谢 Joshua Baboo),但现在我想知道这是否是最有效的方法?
for key in df.index.levels[0]:
for col in df.columns.levels[0]:
print df.loc[row:row, col]
如你所说:
'My tables are multiindex'
假设 groupby(level=0)
不是必需的,因为原始数据帧在行轴和列轴上都处于 2 级 MultiIndex 结构中,请查看以下示例是否符合您的目的:
import pandas as pd
print 'pandas-version: ', pd.__version__
import numpy a`enter code here`s np
l1 = ['r0_1', 'r0_2']
l2 = sorted(['r1_1','r1_2','r1_3'])
c1 = ['c0_1', 'c0_2', 'c0_3']
c2 = ['c1_1', 'c1_2', 'c1_3']
nrows = len(l1) * len(l2)
ncols = len(c1) * len(c2)
df = pd.DataFrame(np.random.random( nrows * ncols).reshape(nrows, ncols),
index=pd.MultiIndex.from_product([l1, l2],
names=['one','two']),
columns=pd.MultiIndex.from_product([c1, c2]))
l_all = slice(None)
# updated loop only over columns.level[0]
# to get all-rows for each column group
for col0 in df.columns.levels[0]:
print df.loc(axis=1)[col0,:]
输出
pandas-version: 0.15.2
c0_1
c1_1 c1_2 c1_3
one two
r0_1 r1_1 0.177051 0.159676 0.677900
r1_2 0.980404 0.441649 0.763252
r1_3 0.631876 0.724937 0.158891
r0_2 r1_1 0.856933 0.432360 0.690534
r1_2 0.568308 0.381117 0.430071
r1_3 0.680781 0.795433 0.378414
c0_2
c1_1 c1_2 c1_3
one two
r0_1 r1_1 0.275005 0.266315 0.326656
r1_2 0.841370 0.197737 0.215751
r1_3 0.511860 0.007003 0.509688
r0_2 r1_1 0.170542 0.577844 0.616402
r1_2 0.440131 0.497631 0.628281
r1_3 0.061970 0.192166 0.687346
c0_3
c1_1 c1_2 c1_3
one two
r0_1 r1_1 0.308490 0.372552 0.275818
r1_2 0.718901 0.784083 0.839253
r1_3 0.357739 0.821503 0.336578
r0_2 r1_1 0.758157 0.248164 0.983741
r1_2 0.498885 0.972781 0.922519
r1_3 0.107162 0.364109 0.591648