dataframe 如何根据列的子字符串转换 table

dataframe how pivot table based on substring of the column

我有一个数据框:

df = 
time id ser1 ser2 ... ser20 N0ch0 N1ch0 N2ch0 N0ch1 N1ch1 N2ch1 N0ch2 N1ch2 N2ch2 N0ch3 N1ch3 N2ch3
  1   2  4    5         3     8     7     8    5     1      4    6     2      7    9    8      6

我想根据通道('ch' 子字符串)旋转它,这样它就会变成一列,所以新的数据框将是:

time id channel ser1 ser2 ... ser20 N0 N1 N2
  1   2   0      4    5         3   8  7  8
  1   2   1      4    5         3   5  1  4
  1   2   2      4    5         3   6  2  7
  1   2   3      4    5         3   9  8  6

最好的方法是什么?

我们可以使用 set_index to save any columns which should be unmodified. Then str.split the remaining columns on 'ch' which appears to be the delimiter between new column name and channel number. Then stack and reset_index in order to go from MultiIndex columns to long form. Follow up with astype 将新频道列从字符串转换为 int(如果需要)。

# columns to save
idx_cols = ['time', 'id', 'ser1', 'ser2']
res = df.set_index(idx_cols)
# Separate N value from channel number
res.columns = res.columns.str.split('ch', expand=True).rename([None, 'channel'])
# Go to long form
res = res.stack().reset_index()
# Convert to number from string
res['channel'] = res['channel'].astype(int)

res:

   time  id  ser1  ser2  channel  N0  N1  N2
0     1   2     4     5        0   8   7   8
1     1   2     4     5        1   5   1   4

或者wide_to_long can be used which abstracts some of the reshaping, but requires a follow up str.extract获取频道号,并手动指定所有“stubnames”:

# columns to save
idx_cols = ['time', 'id', 'ser1', 'ser2']
res = (
    pd.wide_to_long(
        df,
        i=idx_cols,
        j='channel',
        stubnames=['N0', 'N1', 'N2'],  # all stub names (add more if needed)
        suffix=r'ch\d+'  # suffix
    ).reset_index()
)
# Get only the channel numbers and convert to int
res['channel'] = res['channel'].str.extract(r'(\d+$)').astype(int)

res

   time  id  ser1  ser2  channel  N0  N1  N2
0     1   2     4     5        0   8   7   8
1     1   2     4     5        1   5   1   4

任一选项的注释 idx_cols 可以动态创建而不是手动创建。

通过切片前 n 列(此示例代码为 4):

idx_cols = df.columns[:4]

或者根据条件过滤 DataFrame 列(如 str.startswith:

idx_cols = ['time', 'id', *df.columns[df.columns.str.startswith('ser')]]

示例设置:

import pandas as pd

df = pd.DataFrame({
    'time': [1], 'id': [2], 'ser1': [4], 'ser2': [5],
    'N0ch0': [8], 'N1ch0': [7], 'N2ch0': [8],
    'N0ch1': [5], 'N1ch1': [1], 'N2ch1': [4]
})

您可以使用 melt 开始,参数 id_vars 设置为您的 'ser' 列和 'time' + 'id'.

然后可以将'variable'列拆分为2列,其中一列将在使用pivot_table时用作索引列,另一列将是column :

# Columns to be used as index in melt & pivot
id_cols = ['time','id'] + list(df.filter(like='ser'))

# Melt and split a column
m = df.melt(id_vars = id_cols)
m[['N','channel']] = m.variable.str.split('ch', 1 ,expand=True)

# Pivot the melted dataframe
out = m.pivot_table(index = id_cols + ['channel'],  columns='N', values='value').reset_index()

打印:

   time  id channel  ser1  ser2  ser20  N0  N1  N2
0     1   2       0     4     5      3   8   7   8
1     1   2       1     4     5      3   5   1   4
2     1   2       2     4     5      3   6   2   7
3     1   2       3     4     5      3   9   8   6