Dask，根据下一行添加新列

Question

我有这个 dask 数据框，最后一列是这个问题的重要信息：

Dask DataFrame Structure:
              asks[0].amount asks[1].amount asks[2].amount asks[3].amount asks[4].amount asks[5].amount asks[6].amount asks[7].amount asks[8].amount asks[9].amount asks[10].amount asks[11].amount asks[12].amount asks[13].amount asks[14].amount asks[15].amount asks[16].amount asks[17].amount asks[18].amount asks[19].amount asks[20].amount asks[21].amount asks[22].amount asks[23].amount asks[24].amount bids[0].amount bids[1].amount bids[2].amount bids[3].amount bids[4].amount bids[5].amount bids[6].amount bids[7].amount bids[8].amount bids[9].amount bids[10].amount bids[11].amount bids[12].amount bids[13].amount bids[14].amount bids[15].amount bids[16].amount bids[17].amount bids[18].amount bids[19].amount bids[20].amount bids[21].amount bids[22].amount bids[23].amount bids[24].amount currentPrice
npartitions=1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
                     float64        float64        float64        float64        float64        float64        float64        float64        float64        float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64        float64        float64        float64        float64        float64        float64        float64        float64        float64        float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64         float64      float64
                         ...            ...            ...            ...            ...            ...            ...            ...            ...            ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...            ...            ...            ...            ...            ...            ...            ...            ...            ...            ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...             ...          ...

现在我需要根据下一行 'currentPrice' 添加一个新的列（名为 succPrice）。例如：

row1: ask......, bids....., currentPrice(11), succPrice(12)
row2: ask......, bids....., currentPrice(12), succPrice(17)
row3: ask......, bids....., currentPrice(17), succPrice(.....)

我怎样才能得到这个结果？数据帧非常大，所以我需要使用 dask

Answer 1

使用`shift(-1)`

Dask 的 shift functions in the same way that Pandas shift 是。也就是说，如果要使用下一行的值，则必须使用 shift(-1).

请记住，数据帧的最后一个值将是 nan。

代码示例

import dask

# Create data
df = (dask.datasets.timeseries()
      .drop(columns=['id', 'name', 'y'])
      .rename(columns={'x': 'currentPrice'}))

# Assign `succPrice` equal to the next `currentPrice`
df = df.assign(succPrice=df['currentPrice'].shift(-1))

df.tail()

| timestamp           |   currentPrice |   succPrice |
|:--------------------|---------------:|------------:|
| 2000-01-30 23:59:55 |      -0.241575 |    0.65083  |
| 2000-01-30 23:59:56 |       0.65083  |    0.742577 |
| 2000-01-30 23:59:57 |       0.742577 |    0.313805 |
| 2000-01-30 23:59:58 |       0.313805 |    0.556262 |
| 2000-01-30 23:59:59 |       0.556262 |  nan        |

Dask，根据下一行添加新列

Dask, add new column based on next row

python

pandas

dask

dask-dataframe

使用`shift(-1)`

代码示例

Dask，根据下一行添加新列

Dask, add new column based on next row

python

pandas

dask

dask-dataframe

使用shift(-1)

代码示例

使用`shift(-1)`