在 pandas 中跨行和列对角复制值?
Diagonally copying values across rows and columns in pandas?
这是一个示例数据框:
datetime temp T1 T2 T3 T4 T5
115 2020-01-04 02:53:00+00:00 58 0 0 0 0 0
116 2020-01-04 03:53:00+00:00 51 0 0 0 0 0
117 2020-01-04 04:53:00+00:00 49 0 0 0 0 0
118 2020-01-04 05:53:00+00:00 48 0 0 0 0 0
119 2020-01-04 06:00:00+00:00 48 0 0 0 0 0
120 2020-01-04 06:53:00+00:00 47 0 0 0 0 0
这是我想要的输出:
datetime temp T1 T2 T3 T4 T5
115 2020-01-04 02:53:00+00:00 58 0 0 0 0 0
116 2020-01-04 03:53:00+00:00 51 58 0 0 0 0
117 2020-01-04 04:53:00+00:00 49 51 58 0 0 0
118 2020-01-04 05:53:00+00:00 48 49 51 58 0 0
119 2020-01-04 06:00:00+00:00 48 48 49 51 58 0
120 2020-01-04 06:53:00+00:00 47 48 48 49 51 58
for col in df.columns[df.columns.str.contains('T')]:
df[col] = df['temp'].shift(int(col[1:]),fill_value = 0)
print(df)
我们也可以用pd.Index.difference
for col in df.columns.difference(['datetime','temp']):
df[col] = df['temp'].shift(int(col[1:]),fill_value = 0)
输出
datetime temp T1 T2 T3 T4 T5
115 2020-01-04-02:53:00+00:00 58 0 0 0 0 0
116 2020-01-04-03:53:00+00:00 51 58 0 0 0 0
117 2020-01-04-04:53:00+00:00 49 51 58 0 0 0
118 2020-01-04-05:53:00+00:00 48 49 51 58 0 0
119 2020-01-04-06:00:00+00:00 48 48 49 51 58 0
120 2020-01-04-06:53:00+00:00 47 48 48 49 51 58
对于使用 numpy 的解决方案,我没有找到通过引用输入列表使用 shift 的内置解决方案,但我们可以利用 @Divakar 提供的优秀答案,并使用它来获得我们的解决方案从我们的数据框创建所需的数组:
cols = df.columns[2:]
mat = np.ones((len(df),len(cols))) * df['temp'][:,None]
r = np.arange(1,len(cols)+1)
df[cols]=strided_indexing_roll(mat.T,r).T
print(df)
datetime temp T1 T2 T3 T4 T5
0 2020-01-04 02:53:00+00:00 58 0.0 0.0 0.0 0.0 0.0
1 2020-01-04 03:53:00+00:00 51 58.0 0.0 0.0 0.0 0.0
2 2020-01-04 04:53:00+00:00 49 51.0 58.0 0.0 0.0 0.0
3 2020-01-04 05:53:00+00:00 48 49.0 51.0 58.0 0.0 0.0
4 2020-01-04 06:00:00+00:00 48 48.0 49.0 51.0 58.0 0.0
5 2020-01-04 06:53:00+00:00 47 48.0 48.0 49.0 51.0 58.0
注意:将函数中的p = np.full((a.shape[0],a.shape[1]-1),np.nan)
行改为p = np.full((a.shape[0],a.shape[1]-1),0)
这是一个示例数据框:
datetime temp T1 T2 T3 T4 T5
115 2020-01-04 02:53:00+00:00 58 0 0 0 0 0
116 2020-01-04 03:53:00+00:00 51 0 0 0 0 0
117 2020-01-04 04:53:00+00:00 49 0 0 0 0 0
118 2020-01-04 05:53:00+00:00 48 0 0 0 0 0
119 2020-01-04 06:00:00+00:00 48 0 0 0 0 0
120 2020-01-04 06:53:00+00:00 47 0 0 0 0 0
这是我想要的输出:
datetime temp T1 T2 T3 T4 T5
115 2020-01-04 02:53:00+00:00 58 0 0 0 0 0
116 2020-01-04 03:53:00+00:00 51 58 0 0 0 0
117 2020-01-04 04:53:00+00:00 49 51 58 0 0 0
118 2020-01-04 05:53:00+00:00 48 49 51 58 0 0
119 2020-01-04 06:00:00+00:00 48 48 49 51 58 0
120 2020-01-04 06:53:00+00:00 47 48 48 49 51 58
for col in df.columns[df.columns.str.contains('T')]:
df[col] = df['temp'].shift(int(col[1:]),fill_value = 0)
print(df)
我们也可以用pd.Index.difference
for col in df.columns.difference(['datetime','temp']):
df[col] = df['temp'].shift(int(col[1:]),fill_value = 0)
输出
datetime temp T1 T2 T3 T4 T5
115 2020-01-04-02:53:00+00:00 58 0 0 0 0 0
116 2020-01-04-03:53:00+00:00 51 58 0 0 0 0
117 2020-01-04-04:53:00+00:00 49 51 58 0 0 0
118 2020-01-04-05:53:00+00:00 48 49 51 58 0 0
119 2020-01-04-06:00:00+00:00 48 48 49 51 58 0
120 2020-01-04-06:53:00+00:00 47 48 48 49 51 58
对于使用 numpy 的解决方案,我没有找到通过引用输入列表使用 shift 的内置解决方案,但我们可以利用
cols = df.columns[2:]
mat = np.ones((len(df),len(cols))) * df['temp'][:,None]
r = np.arange(1,len(cols)+1)
df[cols]=strided_indexing_roll(mat.T,r).T
print(df)
datetime temp T1 T2 T3 T4 T5
0 2020-01-04 02:53:00+00:00 58 0.0 0.0 0.0 0.0 0.0
1 2020-01-04 03:53:00+00:00 51 58.0 0.0 0.0 0.0 0.0
2 2020-01-04 04:53:00+00:00 49 51.0 58.0 0.0 0.0 0.0
3 2020-01-04 05:53:00+00:00 48 49.0 51.0 58.0 0.0 0.0
4 2020-01-04 06:00:00+00:00 48 48.0 49.0 51.0 58.0 0.0
5 2020-01-04 06:53:00+00:00 47 48.0 48.0 49.0 51.0 58.0
注意:将函数中的p = np.full((a.shape[0],a.shape[1]-1),np.nan)
行改为p = np.full((a.shape[0],a.shape[1]-1),0)