重新排列数据框尺寸
Rearranging dataframe dimensions
我有一个包含很多列的 df :
date risk lev chemical weight date risk lev chemical weight
15-5-16 5 Potasium 5mg 15-5-16 3 Sodium 7 mg
14-5-16 6 Potasium 10mg 14-5-16 2 Sodium 2 mg
我想重新排列一下,让每4列分成一个新行,比如df是这样的:
date risk lev chemical weight
15-5-16 5 Potasium 5mg
15-5-16 3 Sodium 7mg
14-5-16 6 Potasium 10mg
14-5-16 2 Sodium 2mg
抱歉,我没有包括我的尝试,但这是我第一次解决这个问题,不知道如何继续
首先去掉列名中的重复项:
In [248]: df
Out[248]:
date risk lev chemical weight date.1 risk lev.1 chemical.1 weight.1
0 15-5-16 5 Potasium 5mg 15-5-16 3 Sodium 7 mg
1 14-5-16 6 Potasium 10mg 14-5-16 2 Sodium 2 mg
现在我们可以使用 pd.lreshape
In [249]: d = {
...: 'chemical': ['chemical','chemical.1'],
...: 'weight':['weight','weight.1'],
...: 'date':['date','date.1'],
...: 'risk lev': ['risk lev','risk lev.1']
...: }
In [250]: pd.lreshape(df, d)
Out[250]:
chemical weight date risk lev
0 Potasium 5mg 15-5-16 5
1 Potasium 10mg 14-5-16 6
2 Sodium 7 mg 15-5-16 3
3 Sodium 2 mg 14-5-16 2
首先删除包含列名的重复项,然后将 pd.lreshape
与由 dict comprehension
创建的 dict
一起使用:
s = df.columns.to_series()
df.columns = s.add(s.groupby(s).cumcount().astype(str))
print (df)
date0 risk lev0 chemical0 weight0 date1 risk lev1 chemical1 weight1
0 15-5-16 5 Potasium 5mg 15-5-16 3 Sodium 7mg
1 14-5-16 6 Potasium 10mg 14-5-16 2 Sodium 2mg
cols = ['date','risk lev','chemical','weight']
d = {x:df.columns[df.columns.str.startswith(x)].tolist() for x in cols}
print (d)
{'date': ['date0', 'date1'],
'weight': ['weight0', 'weight1'],
'risk lev': ['risk lev0', 'risk lev1'],
'chemical': ['chemical0', 'chemical1']}
df = pd.lreshape(df, d)
print (df)
date weight risk lev chemical
0 15-5-16 5mg 5 Potasium
1 14-5-16 10mg 6 Potasium
2 15-5-16 7mg 3 Sodium
3 14-5-16 2mg 2 Sodium
pd.wide_to_long
更有效地解决了这个问题。您必须先在每一列的末尾放置一个数字。
df.columns = [col + str(i//4 + 1) for i, col in enumerate(df.columns)]
pd.wide_to_long(df.reset_index(),
stubnames=['date', 'risk lev', 'chemical', 'weight'],
i='index',
j='dropme').reset_index(drop=True)
date risk lev chemical weight
0 15-5-16 5 Potasium 5mg
1 14-5-16 6 Potasium 10mg
2 15-5-16 3 Sodium 7mg
3 14-5-16 2 Sodium 2mg
我有一个包含很多列的 df :
date risk lev chemical weight date risk lev chemical weight
15-5-16 5 Potasium 5mg 15-5-16 3 Sodium 7 mg
14-5-16 6 Potasium 10mg 14-5-16 2 Sodium 2 mg
我想重新排列一下,让每4列分成一个新行,比如df是这样的:
date risk lev chemical weight
15-5-16 5 Potasium 5mg
15-5-16 3 Sodium 7mg
14-5-16 6 Potasium 10mg
14-5-16 2 Sodium 2mg
抱歉,我没有包括我的尝试,但这是我第一次解决这个问题,不知道如何继续
首先去掉列名中的重复项:
In [248]: df
Out[248]:
date risk lev chemical weight date.1 risk lev.1 chemical.1 weight.1
0 15-5-16 5 Potasium 5mg 15-5-16 3 Sodium 7 mg
1 14-5-16 6 Potasium 10mg 14-5-16 2 Sodium 2 mg
现在我们可以使用 pd.lreshape
In [249]: d = {
...: 'chemical': ['chemical','chemical.1'],
...: 'weight':['weight','weight.1'],
...: 'date':['date','date.1'],
...: 'risk lev': ['risk lev','risk lev.1']
...: }
In [250]: pd.lreshape(df, d)
Out[250]:
chemical weight date risk lev
0 Potasium 5mg 15-5-16 5
1 Potasium 10mg 14-5-16 6
2 Sodium 7 mg 15-5-16 3
3 Sodium 2 mg 14-5-16 2
首先删除包含列名的重复项,然后将 pd.lreshape
与由 dict comprehension
创建的 dict
一起使用:
s = df.columns.to_series()
df.columns = s.add(s.groupby(s).cumcount().astype(str))
print (df)
date0 risk lev0 chemical0 weight0 date1 risk lev1 chemical1 weight1
0 15-5-16 5 Potasium 5mg 15-5-16 3 Sodium 7mg
1 14-5-16 6 Potasium 10mg 14-5-16 2 Sodium 2mg
cols = ['date','risk lev','chemical','weight']
d = {x:df.columns[df.columns.str.startswith(x)].tolist() for x in cols}
print (d)
{'date': ['date0', 'date1'],
'weight': ['weight0', 'weight1'],
'risk lev': ['risk lev0', 'risk lev1'],
'chemical': ['chemical0', 'chemical1']}
df = pd.lreshape(df, d)
print (df)
date weight risk lev chemical
0 15-5-16 5mg 5 Potasium
1 14-5-16 10mg 6 Potasium
2 15-5-16 7mg 3 Sodium
3 14-5-16 2mg 2 Sodium
pd.wide_to_long
更有效地解决了这个问题。您必须先在每一列的末尾放置一个数字。
df.columns = [col + str(i//4 + 1) for i, col in enumerate(df.columns)]
pd.wide_to_long(df.reset_index(),
stubnames=['date', 'risk lev', 'chemical', 'weight'],
i='index',
j='dropme').reset_index(drop=True)
date risk lev chemical weight
0 15-5-16 5 Potasium 5mg
1 14-5-16 6 Potasium 10mg
2 15-5-16 3 Sodium 7mg
3 14-5-16 2 Sodium 2mg