Pandas/xarray - 根据另一个数据框动态水平移动值

Pandas/xarray - Shift horizontally values dynamically based on another dataframe

我想水平移动数据帧 test_1 的每一行,跟随另一个数据帧 df_x 中相应行中的值。 df_x 每行中的值应定义向左移动的步数。

test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])

df_x = pd.DataFrame([[1],[3],[2]])

我的预期输出是:

Out[157]: 
    0   1   2   3
0   2   3   4  NA
1  14  NA  NA  NA
2  23  24  NA  NA

我尝试根据类似问题 () 的答案改编,并使用了这个: test_1.apply(lambda x: x.shift(periods = -df_x.loc[x,:]), axis = 1).

但是,我不断收到以下错误:

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([3, 4], dtype='int64'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"

PS:如果有人也知道如何在不从 Xarray 转换为 Pandas 的情况下做到这一点,那就更好了。

来自你的 DataFrame :

>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
>>> test_1
    0   1   2   3
0   1   2   3   4
1   10  12  13  14
2   20  22  23  24

我们可以像这样转置 DataFrame :

>>> test_1 = test_1.T
>>> test_1
    0   1   2
0   1   10  20
1   2   12  22
2   3   13  23
3   4   14  24

然后,使用 df_x 我们可以使用 for 循环将正确的列 select 到 shift 并再次重新转置 test_1 以获得预期结果:

>>> df_x = pd.DataFrame([[0],[1],[2]])
>>> line=0
>>> for val in list(df_x.values.flatten()): 
...     test_1[line] = test_1[line].shift(periods=-val)
...     line+=1
>>> test_1 = test_1.T
    0       1       2       3
0   1.0     2.0     3.0     4.0
1   12.0    13.0    14.0    NaN
2   23.0    24.0    NaN     NaN

对于df_x = pd.DataFrame([[0],[2],[2]])

>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]]) 
>>> test_1 = test_1.T 
>>> df_x = pd.DataFrame([[0],[2],[2]]) 

>>> line=0
>>> for val in list(df_x.values.flatten()): 
...     test_1[line] = test_1[line].shift(periods=-val)
...     line+=1  
>>> test_1 = test_1.T         
>>> test_1
    0       1       2   3
0   1.0     2.0     3.0 4.0
1   13.0    14.0    NaN NaN
2   23.0    24.0    NaN NaN

对于df_x = pd.DataFrame([[2],[2],[0]])

>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]]) 
>>> test_1 = test_1.T 
>>> df_x = pd.DataFrame([[2],[2],[0]]) 

>>> line=0
>>> for val in list(df_x.values.flatten()): 
...     test_1[line] = test_1[line].shift(periods=-val)
...     line+=1  
>>> test_1 = test_1.T         
>>> test_1
    0       1       2       3
0   3.0     4.0     NaN     NaN
1   13.0    14.0    NaN     NaN
2   20.0    22.0    23.0    24.0

对于df_x = pd.DataFrame([[1],[3],[2]])

>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]]) 
>>> test_1 = test_1.T 
>>> df_x = pd.DataFrame([[1],[3],[2]]) 

>>> line=0
>>> for val in list(df_x.values.flatten()): 
...     test_1[line] = test_1[line].shift(periods=-val)
...     line+=1  
>>> test_1 = test_1.T         
>>> test_1
    0       1       2     3
0   2.0     3.0     4.0   NaN
1   14.0    NaN     NaN   NaN
2   23.0    24.0    NaN   NaN

我看到@tlentali 回答了你的问题作为替代答案你也可以使用这个:

df_x_seq = np.squeeze((df_x.values))
Lens=np.squeeze((test_1.shape[1]-df_x).values)
    Id=-1
def shift(row):
    global Id
    Id+=1
    array=np.empty((row.shape[0]))
    array[:]=np.nan
    if Lens[Id]!=row.shape[0]:
        array[:Lens[Id]]=row.values[df_x_seq[Id]:]
        return  array 
    else:
        return row
 

还有第三个回答也很好,但是不知为何被删了

解决方案:

test_2=(df_x.rename(columns={0:'s'})
            .join(test_1)
            .apply(lambda x:x.shift(-(x['s'])),axis=1)
            .drop(columns=['s']))