数据框条件列减去直到零

Dataframe conditional column subtract until zero

这与此处常见的 'subtract until 0' 问题不同,因为它以另一列为条件。这个问题是关于创建条件列的。

此数据框由三列组成。

列 'quantity' 告诉您要 add/subtract 多少。

第 'in' 列告诉您何时减去。

第 'cumulative_in' 列告诉您您有多少。

+----------+----+---------------+
| quantity | in | cumulative_in |
+----------+----+---------------+
|        5 |  0 |               |
|        1 |  0 |               |
|        3 |  1 |             3 |
|        4 |  1 |             7 |
|        2 |  1 |             9 |
|        1 |  0 |               |
|        1 |  0 |               |
|        3 |  0 |               |
|        1 | -1 |               |
|        2 |  0 |               |
|        1 |  0 |               |
|        2 |  0 |               |
|        3 |  0 |               |
|        3 |  0 |               |
|        1 |  0 |               |
|        3 |  0 |               |
+----------+----+---------------+

只要列'in'等于-1,从下一行开始我想创建一个列'out' (0/1) 告诉它继续减去,直到 'cumulative_in' 达到 0。用手做,

第 'out' 列告诉您何时继续减去。

列 'cumulative_subtracted' 告诉您已经减去了多少。

我用 'cumulative_subtracted' 减去列 'cumulative_in' 直到它达到 0,输出看起来像这样:

+----------+----+---------------+-----+-----------------------+
| quantity | in | cumulative_in | out | cumulative_subtracted |
+----------+----+---------------+-----+-----------------------+
|        5 |  0 |               |     |                       |
|        1 |  0 |               |     |                       |
|        3 |  1 |             3 |     |                       |
|        4 |  1 |             7 |     |                       |
|        2 |  1 |             9 |     |                       |
|        1 |  0 |               |     |                       |
|        1 |  0 |               |     |                       |
|        3 |  0 |               |     |                       |
|        1 | -1 |               |     |                       |
|        2 |  0 |             7 |   1 |                     2 |
|        1 |  0 |             6 |   1 |                     3 |
|        2 |  0 |             4 |   1 |                     5 |
|        3 |  0 |             1 |   1 |                     8 |
|        3 |  0 |             0 |   1 |                     9 |
|        1 |  0 |               |     |                       |
|        3 |  0 |               |     |                       |
+----------+----+---------------+-----+-----------------------+

我不清楚当要减去的数量尚未达到零并且 'in' 列中还有另一个“1”时会发生什么。

然而,这里有一个简单案例的粗略解决方案:

import pandas as pd
import numpy as np

size = 20

df = pd.DataFrame(
    {
        "quantity": np.random.randint(1, 6, size),
        "in": np.full(size, np.nan),
    }
)

# These are just to place a random 1 and -1 into 'in', not important
df.loc[np.random.choice(df.iloc[:size//3, :].index, 1), 'in'] = 1
df.loc[np.random.choice(df.iloc[size//3:size//2, :].index, 1), 'in'] = -1
df.loc[np.random.choice(df.iloc[size//2:, :].index, 1), 'in'] = 1

# Fill up with 1/-1 values the missing values after each entry up to the
# next 1/-1 entry.
df.loc[:, 'in'] = df['in'].fillna(method='ffill')

# Calculates the cumulative sum with a negative value for subtractions
df["cum_in"] = (df["quantity"] * df['in']).cumsum()

# Subtraction indicator and cumulative column
df['out'] = (df['in'] == -1).astype(int)
df["cumulative_subtracted"] = df.loc[df['in'] == -1, 'quantity'].cumsum()

# Remove values when the 'cum_in' turns to negative
df.loc[
    df["cum_in"] < 0 , ["in", "cum_in", "out", "cumulative_subtracted"]
] = np.NaN


print(df)

我找不到解决此问题的矢量解决方案。我很想看一个。但是,逐行遍历时问题并不难。希望你的数据框不要太大!!

首先设置数据。

data = {
    "quantity": [
        5,1,3,4,2,1,1,3,1,2,1,2,3,3,1,3
    ], 
    "in":[
        0,0,1,1,1,0,0,0,-1,0,0,0,0,0,0,0
    ], 
    "cumulative_in":  [
        np.NaN,np.NaN,3,7,9,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN
    ]

}

然后设置数据框和额外的列。我使用 np.NaN 作为 'out' 但 0 对于 'cumulative_subtracted'

更容易
df=pd.DataFrame(data)
df['out'] = np.NaN
df['cumulative_subtracted'] = 0

设置初始变量

last_in = 0.
reduce = False

不幸的是,逐行浏览数据框。

for i in df.index:
    # check if necessary to adjust last_in value.
    if ~np.isnan(df.at[i, "cumulative_in"]) and reduce == False:
        last_in = df.at[i, "cumulative_in"]
    # check if -1 and change reduce to true
    elif df.at[i, "in"] == -1:
        reduce = True
    # check if reduce true, the implement reductions
    elif reduce == True:
        df.at[i, "out"] = 1
        if df.at[i, "quantity"] <= last_in:
            last_in -= df.at[i, "quantity"]
            df.at[i, "cumulative_in"] = last_in
            df.at[i, "cumulative_subtracted"] = (
                df.at[i - 1, "cumulative_subtracted"] + df.at[i, "quantity"]
            )
        elif df.at[i, "quantity"] > last_in:
            df.at[i, "cumulative_in"] = 0
            df.at[i, "cumulative_subtracted"] = (
                df.at[i - 1, "cumulative_subtracted"] + last_in
            )
            last_in = 0
            reduce = False

这适用于给定的数据,并希望适用于您的所有数据集。

打印(df)

    quantity  in  cumulative_in  out  cumulative_subtracted
0          5   0            NaN  NaN                      0
1          1   0            NaN  NaN                      0
2          3   1            3.0  NaN                      0
3          4   1            7.0  NaN                      0
4          2   1            9.0  NaN                      0
5          1   0            NaN  NaN                      0
6          1   0            NaN  NaN                      0
7          3   0            NaN  NaN                      0
8          1  -1            NaN  NaN                      0
9          2   0            7.0  1.0                      2
10         1   0            6.0  1.0                      3
11         2   0            4.0  1.0                      5
12         3   0            1.0  1.0                      8
13         3   0            0.0  1.0                      9
14         1   0            NaN  NaN                      0
15         3   0            NaN  NaN                      0