Dask,根据下一行添加新列
Dask, add new column based on next row
我有这个 dask 数据框,最后一列是这个问题的重要信息:
Dask DataFrame Structure:
asks[0].amount asks[1].amount asks[2].amount asks[3].amount asks[4].amount asks[5].amount asks[6].amount asks[7].amount asks[8].amount asks[9].amount asks[10].amount asks[11].amount asks[12].amount asks[13].amount asks[14].amount asks[15].amount asks[16].amount asks[17].amount asks[18].amount asks[19].amount asks[20].amount asks[21].amount asks[22].amount asks[23].amount asks[24].amount bids[0].amount bids[1].amount bids[2].amount bids[3].amount bids[4].amount bids[5].amount bids[6].amount bids[7].amount bids[8].amount bids[9].amount bids[10].amount bids[11].amount bids[12].amount bids[13].amount bids[14].amount bids[15].amount bids[16].amount bids[17].amount bids[18].amount bids[19].amount bids[20].amount bids[21].amount bids[22].amount bids[23].amount bids[24].amount currentPrice
npartitions=1
float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
现在我需要根据下一行 'currentPrice' 添加一个新的列(名为 succPrice)。例如:
row1: ask......, bids....., currentPrice(11), succPrice(12)
row2: ask......, bids....., currentPrice(12), succPrice(17)
row3: ask......, bids....., currentPrice(17), succPrice(.....)
我怎样才能得到这个结果?数据帧非常大,所以我需要使用 dask
使用shift(-1)
Dask 的 shift
functions in the same way that Pandas shift
是。也就是说,如果要使用下一行的值,则必须使用 shift(-1)
.
请记住,数据帧的最后一个值将是 nan
。
代码示例
import dask
# Create data
df = (dask.datasets.timeseries()
.drop(columns=['id', 'name', 'y'])
.rename(columns={'x': 'currentPrice'}))
# Assign `succPrice` equal to the next `currentPrice`
df = df.assign(succPrice=df['currentPrice'].shift(-1))
df.tail()
| timestamp | currentPrice | succPrice |
|:--------------------|---------------:|------------:|
| 2000-01-30 23:59:55 | -0.241575 | 0.65083 |
| 2000-01-30 23:59:56 | 0.65083 | 0.742577 |
| 2000-01-30 23:59:57 | 0.742577 | 0.313805 |
| 2000-01-30 23:59:58 | 0.313805 | 0.556262 |
| 2000-01-30 23:59:59 | 0.556262 | nan |
我有这个 dask 数据框,最后一列是这个问题的重要信息:
Dask DataFrame Structure:
asks[0].amount asks[1].amount asks[2].amount asks[3].amount asks[4].amount asks[5].amount asks[6].amount asks[7].amount asks[8].amount asks[9].amount asks[10].amount asks[11].amount asks[12].amount asks[13].amount asks[14].amount asks[15].amount asks[16].amount asks[17].amount asks[18].amount asks[19].amount asks[20].amount asks[21].amount asks[22].amount asks[23].amount asks[24].amount bids[0].amount bids[1].amount bids[2].amount bids[3].amount bids[4].amount bids[5].amount bids[6].amount bids[7].amount bids[8].amount bids[9].amount bids[10].amount bids[11].amount bids[12].amount bids[13].amount bids[14].amount bids[15].amount bids[16].amount bids[17].amount bids[18].amount bids[19].amount bids[20].amount bids[21].amount bids[22].amount bids[23].amount bids[24].amount currentPrice
npartitions=1
float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
现在我需要根据下一行 'currentPrice' 添加一个新的列(名为 succPrice)。例如:
row1: ask......, bids....., currentPrice(11), succPrice(12)
row2: ask......, bids....., currentPrice(12), succPrice(17)
row3: ask......, bids....., currentPrice(17), succPrice(.....)
我怎样才能得到这个结果?数据帧非常大,所以我需要使用 dask
使用shift(-1)
Dask 的 shift
functions in the same way that Pandas shift
是。也就是说,如果要使用下一行的值,则必须使用 shift(-1)
.
请记住,数据帧的最后一个值将是 nan
。
代码示例
import dask
# Create data
df = (dask.datasets.timeseries()
.drop(columns=['id', 'name', 'y'])
.rename(columns={'x': 'currentPrice'}))
# Assign `succPrice` equal to the next `currentPrice`
df = df.assign(succPrice=df['currentPrice'].shift(-1))
df.tail()
| timestamp | currentPrice | succPrice |
|:--------------------|---------------:|------------:|
| 2000-01-30 23:59:55 | -0.241575 | 0.65083 |
| 2000-01-30 23:59:56 | 0.65083 | 0.742577 |
| 2000-01-30 23:59:57 | 0.742577 | 0.313805 |
| 2000-01-30 23:59:58 | 0.313805 | 0.556262 |
| 2000-01-30 23:59:59 | 0.556262 | nan |