重新排列数据框尺寸

Rearranging dataframe dimensions

我有一个包含很多列的 df :

   date  risk lev   chemical  weight    date    risk lev   chemical  weight
 15-5-16   5         Potasium   5mg    15-5-16      3       Sodium     7 mg 
 14-5-16   6         Potasium   10mg   14-5-16      2       Sodium     2 mg  

我想重新排列一下,让每4列分成一个新行,比如df是这样的:

   date  risk lev   chemical  weight
 15-5-16      5      Potasium   5mg   
 15-5-16      3       Sodium    7mg  
 14-5-16      6      Potasium   10mg   
 14-5-16      2       Sodium    2mg

抱歉,我没有包括我的尝试,但这是我第一次解决这个问题,不知道如何继续

首先去掉列名中的重复项:

In [248]: df
Out[248]:
      date  risk lev  chemical weight   date.1  risk lev.1 chemical.1 weight.1
0  15-5-16         5  Potasium    5mg  15-5-16           3     Sodium     7 mg
1  14-5-16         6  Potasium   10mg  14-5-16           2     Sodium     2 mg

现在我们可以使用 pd.lreshape

In [249]: d = {
     ...:     'chemical': ['chemical','chemical.1'],
     ...:     'weight':['weight','weight.1'],
     ...:     'date':['date','date.1'],
     ...:     'risk lev': ['risk lev','risk lev.1']
     ...: }

In [250]: pd.lreshape(df, d)
Out[250]:
   chemical weight     date  risk lev
0  Potasium    5mg  15-5-16         5
1  Potasium   10mg  14-5-16         6
2    Sodium   7 mg  15-5-16         3
3    Sodium   2 mg  14-5-16         2

首先删除包含列名的重复项,然后将 pd.lreshape 与由 dict comprehension 创建的 dict 一起使用:

s = df.columns.to_series()
df.columns = s.add(s.groupby(s).cumcount().astype(str))
print (df)
     date0  risk lev0 chemical0 weight0    date1  risk lev1 chemical1 weight1
0  15-5-16          5  Potasium     5mg  15-5-16          3    Sodium     7mg
1  14-5-16          6  Potasium    10mg  14-5-16          2    Sodium     2mg


cols = ['date','risk lev','chemical','weight']
d = {x:df.columns[df.columns.str.startswith(x)].tolist() for x in cols}
print (d)
{'date': ['date0', 'date1'], 
 'weight': ['weight0', 'weight1'], 
 'risk lev': ['risk lev0', 'risk lev1'], 
 'chemical': ['chemical0', 'chemical1']}

df = pd.lreshape(df, d)
print (df)
      date weight  risk lev  chemical
0  15-5-16    5mg         5  Potasium
1  14-5-16   10mg         6  Potasium
2  15-5-16    7mg         3    Sodium
3  14-5-16    2mg         2    Sodium

pd.wide_to_long 更有效地解决了这个问题。您必须先在每一列的末尾放置一个数字。

df.columns = [col + str(i//4 + 1) for i, col in enumerate(df.columns)]

pd.wide_to_long(df.reset_index(), 
                stubnames=['date', 'risk lev', 'chemical', 'weight'], 
                i='index', 
                j='dropme').reset_index(drop=True)

      date  risk lev  chemical weight
0  15-5-16         5  Potasium    5mg
1  14-5-16         6  Potasium   10mg
2  15-5-16         3    Sodium    7mg
3  14-5-16         2    Sodium    2mg