Pandas/xarray - 根据另一个数据框动态水平移动值
Pandas/xarray - Shift horizontally values dynamically based on another dataframe
我想水平移动数据帧 test_1
的每一行,跟随另一个数据帧 df_x
中相应行中的值。 df_x
每行中的值应定义向左移动的步数。
test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
df_x = pd.DataFrame([[1],[3],[2]])
我的预期输出是:
Out[157]:
0 1 2 3
0 2 3 4 NA
1 14 NA NA NA
2 23 24 NA NA
我尝试根据类似问题 () 的答案改编,并使用了这个:
test_1.apply(lambda x: x.shift(periods = -df_x.loc[x,:]), axis = 1)
.
但是,我不断收到以下错误:
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([3, 4], dtype='int64'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"
PS:如果有人也知道如何在不从 Xarray 转换为 Pandas 的情况下做到这一点,那就更好了。
来自你的 DataFrame
:
>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
>>> test_1
0 1 2 3
0 1 2 3 4
1 10 12 13 14
2 20 22 23 24
我们可以像这样转置 DataFrame
:
>>> test_1 = test_1.T
>>> test_1
0 1 2
0 1 10 20
1 2 12 22
2 3 13 23
3 4 14 24
然后,使用 df_x
我们可以使用 for
循环将正确的列 select 到 shift
并再次重新转置 test_1
以获得预期结果:
>>> df_x = pd.DataFrame([[0],[1],[2]])
>>> line=0
>>> for val in list(df_x.values.flatten()):
... test_1[line] = test_1[line].shift(periods=-val)
... line+=1
>>> test_1 = test_1.T
0 1 2 3
0 1.0 2.0 3.0 4.0
1 12.0 13.0 14.0 NaN
2 23.0 24.0 NaN NaN
对于df_x = pd.DataFrame([[0],[2],[2]])
:
>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
>>> test_1 = test_1.T
>>> df_x = pd.DataFrame([[0],[2],[2]])
>>> line=0
>>> for val in list(df_x.values.flatten()):
... test_1[line] = test_1[line].shift(periods=-val)
... line+=1
>>> test_1 = test_1.T
>>> test_1
0 1 2 3
0 1.0 2.0 3.0 4.0
1 13.0 14.0 NaN NaN
2 23.0 24.0 NaN NaN
对于df_x = pd.DataFrame([[2],[2],[0]])
:
>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
>>> test_1 = test_1.T
>>> df_x = pd.DataFrame([[2],[2],[0]])
>>> line=0
>>> for val in list(df_x.values.flatten()):
... test_1[line] = test_1[line].shift(periods=-val)
... line+=1
>>> test_1 = test_1.T
>>> test_1
0 1 2 3
0 3.0 4.0 NaN NaN
1 13.0 14.0 NaN NaN
2 20.0 22.0 23.0 24.0
对于df_x = pd.DataFrame([[1],[3],[2]])
:
>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
>>> test_1 = test_1.T
>>> df_x = pd.DataFrame([[1],[3],[2]])
>>> line=0
>>> for val in list(df_x.values.flatten()):
... test_1[line] = test_1[line].shift(periods=-val)
... line+=1
>>> test_1 = test_1.T
>>> test_1
0 1 2 3
0 2.0 3.0 4.0 NaN
1 14.0 NaN NaN NaN
2 23.0 24.0 NaN NaN
我看到@tlentali 回答了你的问题作为替代答案你也可以使用这个:
df_x_seq = np.squeeze((df_x.values))
Lens=np.squeeze((test_1.shape[1]-df_x).values)
Id=-1
def shift(row):
global Id
Id+=1
array=np.empty((row.shape[0]))
array[:]=np.nan
if Lens[Id]!=row.shape[0]:
array[:Lens[Id]]=row.values[df_x_seq[Id]:]
return array
else:
return row
还有第三个回答也很好,但是不知为何被删了
解决方案:
test_2=(df_x.rename(columns={0:'s'})
.join(test_1)
.apply(lambda x:x.shift(-(x['s'])),axis=1)
.drop(columns=['s']))
我想水平移动数据帧 test_1
的每一行,跟随另一个数据帧 df_x
中相应行中的值。 df_x
每行中的值应定义向左移动的步数。
test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
df_x = pd.DataFrame([[1],[3],[2]])
我的预期输出是:
Out[157]:
0 1 2 3
0 2 3 4 NA
1 14 NA NA NA
2 23 24 NA NA
我尝试根据类似问题 (test_1.apply(lambda x: x.shift(periods = -df_x.loc[x,:]), axis = 1)
.
但是,我不断收到以下错误:
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([3, 4], dtype='int64'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"
PS:如果有人也知道如何在不从 Xarray 转换为 Pandas 的情况下做到这一点,那就更好了。
来自你的 DataFrame
:
>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
>>> test_1
0 1 2 3
0 1 2 3 4
1 10 12 13 14
2 20 22 23 24
我们可以像这样转置 DataFrame
:
>>> test_1 = test_1.T
>>> test_1
0 1 2
0 1 10 20
1 2 12 22
2 3 13 23
3 4 14 24
然后,使用 df_x
我们可以使用 for
循环将正确的列 select 到 shift
并再次重新转置 test_1
以获得预期结果:
>>> df_x = pd.DataFrame([[0],[1],[2]])
>>> line=0
>>> for val in list(df_x.values.flatten()):
... test_1[line] = test_1[line].shift(periods=-val)
... line+=1
>>> test_1 = test_1.T
0 1 2 3
0 1.0 2.0 3.0 4.0
1 12.0 13.0 14.0 NaN
2 23.0 24.0 NaN NaN
对于df_x = pd.DataFrame([[0],[2],[2]])
:
>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
>>> test_1 = test_1.T
>>> df_x = pd.DataFrame([[0],[2],[2]])
>>> line=0
>>> for val in list(df_x.values.flatten()):
... test_1[line] = test_1[line].shift(periods=-val)
... line+=1
>>> test_1 = test_1.T
>>> test_1
0 1 2 3
0 1.0 2.0 3.0 4.0
1 13.0 14.0 NaN NaN
2 23.0 24.0 NaN NaN
对于df_x = pd.DataFrame([[2],[2],[0]])
:
>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
>>> test_1 = test_1.T
>>> df_x = pd.DataFrame([[2],[2],[0]])
>>> line=0
>>> for val in list(df_x.values.flatten()):
... test_1[line] = test_1[line].shift(periods=-val)
... line+=1
>>> test_1 = test_1.T
>>> test_1
0 1 2 3
0 3.0 4.0 NaN NaN
1 13.0 14.0 NaN NaN
2 20.0 22.0 23.0 24.0
对于df_x = pd.DataFrame([[1],[3],[2]])
:
>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
>>> test_1 = test_1.T
>>> df_x = pd.DataFrame([[1],[3],[2]])
>>> line=0
>>> for val in list(df_x.values.flatten()):
... test_1[line] = test_1[line].shift(periods=-val)
... line+=1
>>> test_1 = test_1.T
>>> test_1
0 1 2 3
0 2.0 3.0 4.0 NaN
1 14.0 NaN NaN NaN
2 23.0 24.0 NaN NaN
我看到@tlentali 回答了你的问题作为替代答案你也可以使用这个:
df_x_seq = np.squeeze((df_x.values))
Lens=np.squeeze((test_1.shape[1]-df_x).values)
Id=-1
def shift(row):
global Id
Id+=1
array=np.empty((row.shape[0]))
array[:]=np.nan
if Lens[Id]!=row.shape[0]:
array[:Lens[Id]]=row.values[df_x_seq[Id]:]
return array
else:
return row
还有第三个回答也很好,但是不知为何被删了
解决方案:
test_2=(df_x.rename(columns={0:'s'})
.join(test_1)
.apply(lambda x:x.shift(-(x['s'])),axis=1)
.drop(columns=['s']))