数据框条件列减去直到零
Dataframe conditional column subtract until zero
这与此处常见的 'subtract until 0' 问题不同,因为它以另一列为条件。这个问题是关于创建条件列的。
此数据框由三列组成。
列 'quantity' 告诉您要 add/subtract 多少。
第 'in' 列告诉您何时减去。
第 'cumulative_in' 列告诉您您有多少。
+----------+----+---------------+
| quantity | in | cumulative_in |
+----------+----+---------------+
| 5 | 0 | |
| 1 | 0 | |
| 3 | 1 | 3 |
| 4 | 1 | 7 |
| 2 | 1 | 9 |
| 1 | 0 | |
| 1 | 0 | |
| 3 | 0 | |
| 1 | -1 | |
| 2 | 0 | |
| 1 | 0 | |
| 2 | 0 | |
| 3 | 0 | |
| 3 | 0 | |
| 1 | 0 | |
| 3 | 0 | |
+----------+----+---------------+
只要列'in'等于-1,从下一行开始我想创建一个列'out' (0/1) 告诉它继续减去,直到 'cumulative_in' 达到 0。用手做,
第 'out' 列告诉您何时继续减去。
列 'cumulative_subtracted' 告诉您已经减去了多少。
我用 'cumulative_subtracted' 减去列 'cumulative_in' 直到它达到 0,输出看起来像这样:
+----------+----+---------------+-----+-----------------------+
| quantity | in | cumulative_in | out | cumulative_subtracted |
+----------+----+---------------+-----+-----------------------+
| 5 | 0 | | | |
| 1 | 0 | | | |
| 3 | 1 | 3 | | |
| 4 | 1 | 7 | | |
| 2 | 1 | 9 | | |
| 1 | 0 | | | |
| 1 | 0 | | | |
| 3 | 0 | | | |
| 1 | -1 | | | |
| 2 | 0 | 7 | 1 | 2 |
| 1 | 0 | 6 | 1 | 3 |
| 2 | 0 | 4 | 1 | 5 |
| 3 | 0 | 1 | 1 | 8 |
| 3 | 0 | 0 | 1 | 9 |
| 1 | 0 | | | |
| 3 | 0 | | | |
+----------+----+---------------+-----+-----------------------+
我不清楚当要减去的数量尚未达到零并且 'in' 列中还有另一个“1”时会发生什么。
然而,这里有一个简单案例的粗略解决方案:
import pandas as pd
import numpy as np
size = 20
df = pd.DataFrame(
{
"quantity": np.random.randint(1, 6, size),
"in": np.full(size, np.nan),
}
)
# These are just to place a random 1 and -1 into 'in', not important
df.loc[np.random.choice(df.iloc[:size//3, :].index, 1), 'in'] = 1
df.loc[np.random.choice(df.iloc[size//3:size//2, :].index, 1), 'in'] = -1
df.loc[np.random.choice(df.iloc[size//2:, :].index, 1), 'in'] = 1
# Fill up with 1/-1 values the missing values after each entry up to the
# next 1/-1 entry.
df.loc[:, 'in'] = df['in'].fillna(method='ffill')
# Calculates the cumulative sum with a negative value for subtractions
df["cum_in"] = (df["quantity"] * df['in']).cumsum()
# Subtraction indicator and cumulative column
df['out'] = (df['in'] == -1).astype(int)
df["cumulative_subtracted"] = df.loc[df['in'] == -1, 'quantity'].cumsum()
# Remove values when the 'cum_in' turns to negative
df.loc[
df["cum_in"] < 0 , ["in", "cum_in", "out", "cumulative_subtracted"]
] = np.NaN
print(df)
我找不到解决此问题的矢量解决方案。我很想看一个。但是,逐行遍历时问题并不难。希望你的数据框不要太大!!
首先设置数据。
data = {
"quantity": [
5,1,3,4,2,1,1,3,1,2,1,2,3,3,1,3
],
"in":[
0,0,1,1,1,0,0,0,-1,0,0,0,0,0,0,0
],
"cumulative_in": [
np.NaN,np.NaN,3,7,9,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN
]
}
然后设置数据框和额外的列。我使用 np.NaN 作为 'out' 但 0 对于 'cumulative_subtracted'
更容易
df=pd.DataFrame(data)
df['out'] = np.NaN
df['cumulative_subtracted'] = 0
设置初始变量
last_in = 0.
reduce = False
不幸的是,逐行浏览数据框。
for i in df.index:
# check if necessary to adjust last_in value.
if ~np.isnan(df.at[i, "cumulative_in"]) and reduce == False:
last_in = df.at[i, "cumulative_in"]
# check if -1 and change reduce to true
elif df.at[i, "in"] == -1:
reduce = True
# check if reduce true, the implement reductions
elif reduce == True:
df.at[i, "out"] = 1
if df.at[i, "quantity"] <= last_in:
last_in -= df.at[i, "quantity"]
df.at[i, "cumulative_in"] = last_in
df.at[i, "cumulative_subtracted"] = (
df.at[i - 1, "cumulative_subtracted"] + df.at[i, "quantity"]
)
elif df.at[i, "quantity"] > last_in:
df.at[i, "cumulative_in"] = 0
df.at[i, "cumulative_subtracted"] = (
df.at[i - 1, "cumulative_subtracted"] + last_in
)
last_in = 0
reduce = False
这适用于给定的数据,并希望适用于您的所有数据集。
打印(df)
quantity in cumulative_in out cumulative_subtracted
0 5 0 NaN NaN 0
1 1 0 NaN NaN 0
2 3 1 3.0 NaN 0
3 4 1 7.0 NaN 0
4 2 1 9.0 NaN 0
5 1 0 NaN NaN 0
6 1 0 NaN NaN 0
7 3 0 NaN NaN 0
8 1 -1 NaN NaN 0
9 2 0 7.0 1.0 2
10 1 0 6.0 1.0 3
11 2 0 4.0 1.0 5
12 3 0 1.0 1.0 8
13 3 0 0.0 1.0 9
14 1 0 NaN NaN 0
15 3 0 NaN NaN 0
这与此处常见的 'subtract until 0' 问题不同,因为它以另一列为条件。这个问题是关于创建条件列的。
此数据框由三列组成。
列 'quantity' 告诉您要 add/subtract 多少。
第 'in' 列告诉您何时减去。
第 'cumulative_in' 列告诉您您有多少。
+----------+----+---------------+
| quantity | in | cumulative_in |
+----------+----+---------------+
| 5 | 0 | |
| 1 | 0 | |
| 3 | 1 | 3 |
| 4 | 1 | 7 |
| 2 | 1 | 9 |
| 1 | 0 | |
| 1 | 0 | |
| 3 | 0 | |
| 1 | -1 | |
| 2 | 0 | |
| 1 | 0 | |
| 2 | 0 | |
| 3 | 0 | |
| 3 | 0 | |
| 1 | 0 | |
| 3 | 0 | |
+----------+----+---------------+
只要列'in'等于-1,从下一行开始我想创建一个列'out' (0/1) 告诉它继续减去,直到 'cumulative_in' 达到 0。用手做,
第 'out' 列告诉您何时继续减去。
列 'cumulative_subtracted' 告诉您已经减去了多少。
我用 'cumulative_subtracted' 减去列 'cumulative_in' 直到它达到 0,输出看起来像这样:
+----------+----+---------------+-----+-----------------------+
| quantity | in | cumulative_in | out | cumulative_subtracted |
+----------+----+---------------+-----+-----------------------+
| 5 | 0 | | | |
| 1 | 0 | | | |
| 3 | 1 | 3 | | |
| 4 | 1 | 7 | | |
| 2 | 1 | 9 | | |
| 1 | 0 | | | |
| 1 | 0 | | | |
| 3 | 0 | | | |
| 1 | -1 | | | |
| 2 | 0 | 7 | 1 | 2 |
| 1 | 0 | 6 | 1 | 3 |
| 2 | 0 | 4 | 1 | 5 |
| 3 | 0 | 1 | 1 | 8 |
| 3 | 0 | 0 | 1 | 9 |
| 1 | 0 | | | |
| 3 | 0 | | | |
+----------+----+---------------+-----+-----------------------+
我不清楚当要减去的数量尚未达到零并且 'in' 列中还有另一个“1”时会发生什么。
然而,这里有一个简单案例的粗略解决方案:
import pandas as pd
import numpy as np
size = 20
df = pd.DataFrame(
{
"quantity": np.random.randint(1, 6, size),
"in": np.full(size, np.nan),
}
)
# These are just to place a random 1 and -1 into 'in', not important
df.loc[np.random.choice(df.iloc[:size//3, :].index, 1), 'in'] = 1
df.loc[np.random.choice(df.iloc[size//3:size//2, :].index, 1), 'in'] = -1
df.loc[np.random.choice(df.iloc[size//2:, :].index, 1), 'in'] = 1
# Fill up with 1/-1 values the missing values after each entry up to the
# next 1/-1 entry.
df.loc[:, 'in'] = df['in'].fillna(method='ffill')
# Calculates the cumulative sum with a negative value for subtractions
df["cum_in"] = (df["quantity"] * df['in']).cumsum()
# Subtraction indicator and cumulative column
df['out'] = (df['in'] == -1).astype(int)
df["cumulative_subtracted"] = df.loc[df['in'] == -1, 'quantity'].cumsum()
# Remove values when the 'cum_in' turns to negative
df.loc[
df["cum_in"] < 0 , ["in", "cum_in", "out", "cumulative_subtracted"]
] = np.NaN
print(df)
我找不到解决此问题的矢量解决方案。我很想看一个。但是,逐行遍历时问题并不难。希望你的数据框不要太大!!
首先设置数据。
data = {
"quantity": [
5,1,3,4,2,1,1,3,1,2,1,2,3,3,1,3
],
"in":[
0,0,1,1,1,0,0,0,-1,0,0,0,0,0,0,0
],
"cumulative_in": [
np.NaN,np.NaN,3,7,9,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN
]
}
然后设置数据框和额外的列。我使用 np.NaN 作为 'out' 但 0 对于 'cumulative_subtracted'
更容易df=pd.DataFrame(data)
df['out'] = np.NaN
df['cumulative_subtracted'] = 0
设置初始变量
last_in = 0.
reduce = False
不幸的是,逐行浏览数据框。
for i in df.index:
# check if necessary to adjust last_in value.
if ~np.isnan(df.at[i, "cumulative_in"]) and reduce == False:
last_in = df.at[i, "cumulative_in"]
# check if -1 and change reduce to true
elif df.at[i, "in"] == -1:
reduce = True
# check if reduce true, the implement reductions
elif reduce == True:
df.at[i, "out"] = 1
if df.at[i, "quantity"] <= last_in:
last_in -= df.at[i, "quantity"]
df.at[i, "cumulative_in"] = last_in
df.at[i, "cumulative_subtracted"] = (
df.at[i - 1, "cumulative_subtracted"] + df.at[i, "quantity"]
)
elif df.at[i, "quantity"] > last_in:
df.at[i, "cumulative_in"] = 0
df.at[i, "cumulative_subtracted"] = (
df.at[i - 1, "cumulative_subtracted"] + last_in
)
last_in = 0
reduce = False
这适用于给定的数据,并希望适用于您的所有数据集。
打印(df)
quantity in cumulative_in out cumulative_subtracted
0 5 0 NaN NaN 0
1 1 0 NaN NaN 0
2 3 1 3.0 NaN 0
3 4 1 7.0 NaN 0
4 2 1 9.0 NaN 0
5 1 0 NaN NaN 0
6 1 0 NaN NaN 0
7 3 0 NaN NaN 0
8 1 -1 NaN NaN 0
9 2 0 7.0 1.0 2
10 1 0 6.0 1.0 3
11 2 0 4.0 1.0 5
12 3 0 1.0 1.0 8
13 3 0 0.0 1.0 9
14 1 0 NaN NaN 0
15 3 0 NaN NaN 0