根据条件为完整组添加一个具有虚拟变量的新列
Add a new column having a dummy variable for complete group based on a condition
我有以下数据框:
df = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C'],
"date": [2015, 2016, 2017, 2018,
2015, 2016, 2017, 2018,
2015, 2016, 2017, 2018],
"col_1": [1,1,1,0,1,0,1,1,0,1,1,1]})
如果“col_1”有三个连续的 1,我想添加一个新的 col,它应该是一个组的虚拟 1。生成的数据框应为:
df_new = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C' ],
"date": [2015, 2016, 2017, 2018,
2015, 2016, 2017, 2018,
2015, 2016, 2017, 2018],
"col_1": [1,1,1,0,1,0,1,1,0,1,1,1],
"col_2": [1,1,1,1,0,0,0,0,1,1,1,1]})
试试这个
df = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C'],
"date": [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
"col_1": [1,1,1,0,1,0,1,1,0,1,1,1]})
# identify 3 consecutive 1s
v = df.groupby('id')['col_1'].rolling(3).sum().eq(3)
# identify which ids contain consecutive 1s
ids = next(zip(*v[v].index))
# assign new column
df['col_2'] = df['id'].isin(ids).astype(int)
print(df)
id date col_1 col_2
0 A 2015 1 1
1 A 2016 1 1
2 A 2017 1 1
3 A 2018 0 1
4 B 2015 1 0
5 B 2016 0 0
6 B 2017 1 0
7 B 2018 1 0
8 C 2015 0 1
9 C 2016 1 1
10 C 2017 1 1
11 C 2018 1 1
你也可以这样做。
df['col_2'] = (df.groupby('id')['col_1']
.transform(lambda x: x.rolling(3).sum().eq(3).any())
.astype(int))
df
输出:
id date col_1 col_2
0 A 2015 1 1
1 A 2016 1 1
2 A 2017 1 1
3 A 2018 0 1
4 B 2015 1 0
5 B 2016 0 0
6 B 2017 1 0
7 B 2018 1 0
8 C 2015 0 1
9 C 2016 1 1
10 C 2017 1 1
11 C 2018 1 1
我有以下数据框:
df = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C'],
"date": [2015, 2016, 2017, 2018,
2015, 2016, 2017, 2018,
2015, 2016, 2017, 2018],
"col_1": [1,1,1,0,1,0,1,1,0,1,1,1]})
如果“col_1”有三个连续的 1,我想添加一个新的 col,它应该是一个组的虚拟 1。生成的数据框应为:
df_new = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C' ],
"date": [2015, 2016, 2017, 2018,
2015, 2016, 2017, 2018,
2015, 2016, 2017, 2018],
"col_1": [1,1,1,0,1,0,1,1,0,1,1,1],
"col_2": [1,1,1,1,0,0,0,0,1,1,1,1]})
试试这个
df = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C'],
"date": [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
"col_1": [1,1,1,0,1,0,1,1,0,1,1,1]})
# identify 3 consecutive 1s
v = df.groupby('id')['col_1'].rolling(3).sum().eq(3)
# identify which ids contain consecutive 1s
ids = next(zip(*v[v].index))
# assign new column
df['col_2'] = df['id'].isin(ids).astype(int)
print(df)
id date col_1 col_2
0 A 2015 1 1
1 A 2016 1 1
2 A 2017 1 1
3 A 2018 0 1
4 B 2015 1 0
5 B 2016 0 0
6 B 2017 1 0
7 B 2018 1 0
8 C 2015 0 1
9 C 2016 1 1
10 C 2017 1 1
11 C 2018 1 1
你也可以这样做。
df['col_2'] = (df.groupby('id')['col_1']
.transform(lambda x: x.rolling(3).sum().eq(3).any())
.astype(int))
df
输出:
id date col_1 col_2
0 A 2015 1 1
1 A 2016 1 1
2 A 2017 1 1
3 A 2018 0 1
4 B 2015 1 0
5 B 2016 0 0
6 B 2017 1 0
7 B 2018 1 0
8 C 2015 0 1
9 C 2016 1 1
10 C 2017 1 1
11 C 2018 1 1